Authorship Verification (AV), a method used in natural language processing (NLP) to determine if two texts share the same author, is key in forensics, literature, and digital security. Originally, AV was primarily reliant on stylometric analysis, using features like word and sentence lengths and function word frequencies to distinguish between authors. However, with the introduction of deep learning models like BERT and RoBERTa, the approach to AV shifted significantly, leveraging intricate patterns in text to better differentiate authors.
A prevalent issue with AV is providing clear explanations for classifications, in addition to determining authorship accurately. Thus, current AV models must be both precise and interpretable, helping users understand their decision-making processes. This is especially vital in revealing and addressing any latent biases, building trust in and improving the reliability of these models.
Despite notable advancements in AV thanks to deep learning models, the need for clear explanations for classifications still remains a challenge. This is particularly critical as there’s an increasing demand for explainable artificial intelligence (AI). Efforts have been made to incorporate explainability into models, but issues persist in ensuring the consistency and relevance of these explanations across different contexts.
This is where InstructAV comes in. Developed by the Information Systems Technology and Design research team at the Singapore University of Technology and Design, InstructAV utilizes large language models (LLMs) enhanced with a Parameter-Efficient Fine-Tuning (PEFT) technique. Its primary goal is to provide transparent and highly comprehensible explanations. It combines accuracy with insight into decision-making logic, directly integrating explainability into the classification process.
The development process of InstructAV involves three steps: data collection, consistency verification, and fine-tuning with the Low-Rank Adaptation (LoRA) method. It starts with gathering explanatory data for AV samples using binary classification labels from existing AV datasets. Next, it verifies the alignment of these explanations with their matching classification labels. Finally, it formulates instruction-tuning data, merging the classification labels and their corresponding explanations to refine LLMs using the LoRA adaptation method.
The effectiveness of InstructAV was tested using diverse AV datasets, including IMDB reviews, Twitter, and Yelp Reviews. They recorded remarkable improvements over the top-performing baseline models, with an impressive 91.4% accuracy on the IMDB data set compared to BERT’s 67.7%. Plus, it demonstrated a high ROUGE-1 and ROUGE-2 score, showing excellent overlap at word and phrase levels. The final results indicated that InstructAV could generate coherent and substantiated explanations, validating its superiority over existing models.
In summary, InstructAV effectively addresses the persisting challenges in AV tasks, marrying top-notch classification accuracy with the capacity to generate detailed and trustworthy explanations. The researchers, while creating the InstructAV framework and its associated instruction-tuning datasets, ran both automated and human evaluations to confirm its effectivity.
This highly accurate and explainable model signifies future progress in authorship verification and fulfills the growing demand for explainable AI solutions. It sets a new benchmark in this field, opening doors for potential applications in critical areas such as forensics, literature, digital security, and more.