Artificial Intelligence (AI) is increasingly being used in legal research and document drafting, aimed at improving efficiency and accuracy. However, concerns regarding the reliability of these tools persist, especially given the potential for the creation of false or misleading information, referred to as “hallucinations”. This issue is of particular concern given the high-stakes nature of legal decisions and documentation. Recent research conducted by teams at Stanford and Yale universities attempted to understand and evaluate these ethical challenges by examining AI-driven legal research tools offered by LexisNexis and Thomson Reuters.
These two tools employ retrieval-augmented generation (RAG) techniques to reduce the instances of hallucinations. Essentially, these tools retrieve relevant legal documents and cross-verify them with AI-generated responses to ground the outputs in authoritative sources. However, the researchers noted a lack of empirical evidence supporting the tools’ claims of mitigating hallucinations. Therefore, the research aimed to assess these claims by categorically identifying and analysing hallucinations based on factual correctness and citation accuracy.
The researchers integrated a RAG system, which provided more detailed and accurate answers from retrieved texts into their empirical evaluation of the AI tools. They found that although LexisNexis and Thomson Reuters tools reduced hallucinations compared to general-purpose chatbots, they still had significant error rates. LexisNexis’ tool had a hallucination rate of 17%, while Thomson Reuters’ tools fluctuated between 17% and 33%. These findings revealed the need for continued improvement and evaluation of these AI tools to ensure reliable integration into legal practice.
Added to this, the research uncovered variations in responsiveness and accuracy among the AI tools. The LexisNexis tool proved to be the highest-performing system, accurately answering 65% of queries. In contrast, Westlaw’s AI-assisted research was only accurate 42% of the time and hallucinated almost twice as often.
The researchers concluded that, despite advancements in techniques like RAG, AI tools in legal research weren’t foolproof and warranted careful supervision by legal professionals. Further, the study advised against the indiscriminate trust in AI tools without verification of outputs. This advice underscores the broader message of the study, advocating for responsible integration of AI in law to minimise the risks of hallucinations. The researchers acknowledged the vast potential of AI in legal research but stressed the need for vigilance and continued scrutiny of its application. Their research, while still early in its application, offers a practical roadmap towards achieving more reliable AI systems in the legal field.