The use of Large Language Models (LLMs) for automating and assisting in coding holds promise for improving the efficiency of software development. However, the challenge is ensuring these models produce code that is not only helpful but also secure, as the code generated could potentially be used maliciously. This concern is not theoretical, as real-world studies have revealed significant risks. For instance, research on GitHub’s Copilot showed that approximately 40% of the generated programs contained vulnerabilities.
To mitigate these risks, current practices involve fine-tuning LLMs using datasets focused on safety, and implementing rule-based detectors to identify insecure code patterns. However, these methods have limitations. Fine-tuning can fall short against complex attack prompts and creating quality safety-related data is both costly and resource-consuming. Rule-based systems, on the other hand, may not cover all possible weaknesses, leaving exploitable gaps.
Salesforce Research has introduced a solution to these issues in the form of an innovative framework known as INDICT. This framework aims to increase both the safety and usefulness of code generated by LLMs. INDICT utilizes a two-pronged mechanism involving internal critiques conducted by two different critics, one focusing on safety and the other on helpfulness. This allows for comprehensive feedback that helps refine the model’s output incrementally. The critics are supported by external resources such as code snippets, web searches, and code interpreters, aiding in providing more informed critiques.
The INDICT framework functions through two primary stages: preemptive and post-hoc feedback. The safety critic assesses the potential risks of producing the code while the helpfulness critic ensures the code aligns with the task requirements in the preemptive stage. The post-hoc stage happens after the code is executed, where additional feedback based on the outcomes is given by the critics. This dual-stage structure assures that INDICT not only anticipates possible issues but also learns from execution results to augment future outputs.
INDICT was tested across eight diverse tasks in eight different programming languages using LLMs with parameters varying from seven billion to 70 billion. The results displayed considerable enhancements in both safety and helpfulness metrics. Specifically, it brought a 10% absolute improvement in code quality across all tested models. In the CyberSecEval-1 benchmark, for example, INDICT improved the safety of generated code by up to 30%, with safety measures showing that over 90% of outputs were secure. The useful metric also saw great improvements, with models enhanced by INDICT outperforming top baselines by up to 70%.
The framework’s success lies in its ability to provide detailed, context-aware critiques, enabling LLMs to produce better quality code. By integrating safety and helpful feedback, INDICT ensures that the generated code is both secure and functional. It provides a more robust solution to the challenges faced by LLM-generated code.
In conclusion, RESEARCH’s INDICT presents a remarkable mechanism for boosting the safety and usefulness of LLM-produced code. By employing a double-critic system and utilizing external knowledge sources, INDICT addresses the critical balance between functionality and security. The framework’s exceptional performance across diverse languages suggests it could set new benchmarks for responsible AI in coding.