Skip to content Skip to footer

Artificial intelligence-created answers for exams remain unnoticed during practical evaluation.

A study carried out by the University of Reading, UK, found that the majority of human educators are unable to detect content generated by Artificial Intelligence (AI). The shift towards remote learning and the increasing accessibility of sophisticated AI tools like ChatGPT led to this research. The researchers wanted to determine if university professors could identify whether a student used AI in answering online exam questions.

The researchers submitted fully AI-composed submissions into the examinations system of five undergraduate Psychology modules. The examiners of the assessment were unaware of this experiment, making it a de facto real-world Turing Test.

The submissions were produced using GPT-4, with a total of 33 AI-generated exam entries across the modules. The examinations comprised both short answer questions and lengthier essay-based questions. Throughout the assessment, the researchers found that 94% of the AI submissions were not flagged by examiners. The average grade awarded to the AI submissions was also half a grade boundary higher than those achieved by real students.

The study’s authors suggested that the actual ability to identify the use of AI in real-world scenarios may be even lower than their research suggests. Students might use AI as a starting point, refining and personalising the output, and thus making detection even more difficult.

The study also highlighted the inefficiency of AI detectors, such as those offered by popular academic plagiarism platform Turnitin, as they have been proven inaccurate. Current AI detectors risk wrongfully accusing non-native English speakers as they are less likely to use certain vocabulary and idioms, which AI often considers as signals of human writing.

In response to the challenges posed by AI misuse, the Reading researchers recommended considering a move away from unsupervised, take-home exams toward more controlled environments. This could involve returning to traditional in-person exams or developing new AI-resistant assessment formats.

The researchers suggested that AI literacy must also be improved among educators as the study revealed their evident lack of knowledge in this area. They proposed training educators to recognise the ‘tropes’ or sentence patterns that AI often resorts to.

Previous studies testing AI’s capacities in academic settings have shown mixed results, with AI’s performance varying greatly depending on the subject and type of test. However, the researchers concluded that current student assessment methods must change to uphold academic integrity in the context of increasingly indetectable AI-generated content.

Leave a comment

0.0/5