Our new tool ‘Ghostbuster’ is a cutting-edge method of detecting AI-generated text. This becomes increasingly necessary as AI language models like ChatGPT evolve to produce better quality text causing potential problems such as enabling students to ‘ghostwrite’ assignments. Moreover, these models may give rise to factual inaccuracies leading to skepticism for readers.
Existing tools designed to detect AI-created content often struggle with data types they weren’t trained on. False classification of actual human writing as AI-based also risks undermining genuine academic work done by students.
Consequently, we offer Ghostbuster, a method to detect whether text has been AI-generated, without requiring knowledge of the specific model used to create it making it effective against black-box or unknown models as well. Ghostbuster calculates the probability of generating each token in a document under a variety of lighter language models, with the probabilities functioning as input for the final classifier.
A common problem with current AI text detection systems is accuracy when attempting to classify different types of text. Simple models that rely solely on perplexity often fail to capture more nuanced features and thus, underperform with novel writing domains. More complex models like RoBERTa, which captures intricate features effortlessly, may overfit to the training data and underperform in terms of generalization. Zero-shot methods which classify without training on labeled data also perform poorly if the text was generated by a varying model.
Ghostbuster follows a three-fold training protocol: probability computation, feature selection, and classifier training.
For calculating probabilities, each document was metamorphosed into a vector series by calculating probabilities of generating each word under weaker language models such as the unigram model, trigram model, and two GPT-3 models – ada and davinci. Feature selection follows a structured search mechanism by combining defined operations and searching for beneficial operation combinations. The final stage involves training a linear classifier on selected features and augmenting with manually chosen features.
Ghostbuster boasted an impressive 99.0 F1 on similar domain training and tests, outperforming the competition by a wide margin. It also showcased top-level out-of-domain performance, surpassing DetectGPT and GPTZero by 7.5 and 39.6 F1 respectively. Although RoBERTa produced excellent in-domain results, it struggled to replicate this in out-of-domain conditions.
Ghostbuster’s resilience towards edit types was tested and found to offer reliable detection even when the text had undergone sentence or paragraph substitution, word synonymation, or character reordering. It performs best with longer texts. Ghostbuster also tested well with non-native English speaker texts, where it performed as well as it did with other out-of-domain documents of a comparable length.
Ghostbuster’s application should be accompanied by a cautious consideration of its limitations such as short texts, text from uncommonly trained domains, non-native speaker texts, or texts that an AI model has modified from human-authored content. We advise against making firm judgments and instead endorse a cautious, human-supervised application of Ghostbuster in situations where misclassification may cause harm.
Ghostbuster represents a significant stride forward in AI text detection with an across-the-board 99.0 F1 performance. Despite its achievements, there remains room for further development, including explanatory mechanisms and resistance to specific purpose attacks. Ghostbuster, in combination with alternative methods like watermarking, could be applied in various ways, such as filtering AI-written text from language model training data or flagging AI-produced online content. Users are invited to explore Ghostbuster’s capabilities at ghostbuster.app.