Researchers at the University of Illinois Urbana-Champaign have found that AI agents utilizing GPT-4, a powerful language learning model (LLM), can effectively exploit documented cybersecurity vulnerabilities. These types of AI agents are increasingly playing a role in cybercrime.
In particular, the researchers studied the aptitude of such AI agents to exploit “one-day” vulnerabilities, which are identified security flaws within software that have been publicly disclosed but not yet resolved or patched. Exploitation can be done by those with the necessary skills until the vulnerability is addressed, further underlining the significance of early detection.
The standard protocol for detailing these vulnerabilities is the Common Vulnerabilities and Exposures (CVE) system. While it is intended to speed up the remedial process by providing precise information on the flaw, it can indicate potential points of attack for cybercriminals.
In their experiments, the researchers created AI agents using GPT-4, GPT-3.5, along with eight other open-source LLMs. The agents had access to tools, CVE data, and the ReAct framework for interacting with other software and systems. The agents’ objective was to autonomously exploit a benchmark set of 15 real-world “one-day” vulnerabilities.
The results were intriguing; while GPT-3.5 and the other models failed, GPT-4 managed to exploit 87% of the documented vulnerabilities. However, when the CVE description was removed, the success rate plummeted from 87% to 7%. This data suggests that GPT-4 is effectively able to exploit vulnerabilities when given the CVE details but struggles to identify vulnerabilities independently.
The implications of these findings are substantial. AI technology is lowering the skill threshold necessary for cybercrime and hacking, with the GPT-4 AI agent requiring only 91 lines of code to create. As AI models continue to improve, both the skill requisite and the cost to scale these autonomous attacks are likely to decrease. In their final computation, the researchers found it to be 2.8 times cheaper to use an LLM agent than human labor on these cybersecurity exploits.
The researchers concluded highlighting the necessity for the cybersecurity community to consider the role of autonomous LLM agents. They emphasized the importance of integrating such AI agents as components of cybersecurity defense models. The study also urges LLM providers to analyze the possible implications of their widespread use.