AI-generated language models such as ChatGPT have become so well-structured and complex that they threaten the veracity of text content in various circles, driving the need for precise detection tools. Unfortunately, existing tools for detecting AI-generated text have proven insufficient, sometimes inaccurately classifying human-written content as AI-generated, thus leading to a false question about the legitimacy of students’ work.
Addressing this issue, a recent study introduced Ghostbuster, an advanced model for detecting AI-generated text. Unlike other models, Ghostbuster calculates the probability of each token in a text by referencing a list of weaker language models. The probabilities obtained are then used in a function that feeds into a final classifier, eliminating the need to establish the specific model used to generate the document or the probability that the document was created by that model. This innovative method makes Ghostbuster very effective in identifying text that could have been generated by unknown or proprietary black-box models, such as ChatGPT and Claude.
The conventional models for detecting AI-generated text often fail to discern differences in text types, writing styles, or generation prompts. Simpler perplexity-only based classifiers are inadequate on new writing domains, while more complex features capturing models, such as RoBERTa, tend to overfit training data, resulting in poor generalized performance. Zero-shot methods, which classify text according to the probability of it being generated by a specific model, also underperform when the text is crafted by a model different from the one they were trained on.
In contrast, Ghostbuster employs a three-level training process. The system computes probabilities by transforming each document into vectors via the calculation of the chances of generating each word in the document with the help of weaker language models. A structured search procedure is then used to select useful feature combinations. Lastly, a linear classifier is trained using the best probability-based features and other manually chosen features.
On testing, Ghostbuster demonstrated substantial superiority over other models. When tested on an identical domain, Ghostbuster garnered a 99.0 F1 score across all three datasets, outshining GPTZero by 5.9 F1 points and DetectGPT by a whopping 41.6 F1. Outside of the domain, Ghostbuster garnered 97.0 F1 on average, surpassing DetectGPT by 39.6 F1 and GPTZero by 7.5 F1.
Ghostbuster surpassed all other methods tested on different prompt variants with an F1 score of 99.5, proving its generalization capabilities. It also outdid other approaches in assessing content produced by the model Claude, scoring 92.2 F1. Additionally, Ghostbuster showed high resistance to minor alterations in the text, such as changing sentences, rearranging characters, or introducing synonyms.
Interestingly, Ghostbuster also performed exceptionally well on text written by non-native English speakers, with an accuracy exceeding 95% on two of the three tested datasets. Lastly, Ghostbuster’s robustness and adaptability render it usable even in instances where potential misuse of text generation may happen. However, it should be noted that mistakes are likelier with shorter texts, or those generated by non-native English speakers or heavily edited by humans. To prevent unfair bias, it is strongly recommended that Ghostbuster be employed in supervised, cautious human-led environments.
With Ghostbuster, the future of AI-generated text detection promises to offer more accurate explanations for model decisions and better resilience to attacks. It might also be helpful in highlighting AI-generated material on the internet or filtering language model training data. Ghostbuster is available to try at ghostbuster.app, with code and paper available for further reading.