Researchers from the University of Illinois Urbana-Champaign (UIUC) have revealed that artificial intelligence (AI) agents powered by GPT-4 are capable of autonomously exploiting cybersecurity vulnerabilities. As AI models continue to progress, their dual functionalities can both be useful and potentially dangerous. For example, Google expects AI to be heavily involved in both committing and preventing future cybercrime.
The UIUC team studied whether AI agents could take advantage of ‘one-day’ vulnerabilities, which refer to security flaws that have been identified and disclosed but yet to be patched by the software creator. These susceptibilities are detailed within the Common Vulnerabilities and Exposures (CVE) standard, documenting the specifics of the vulnerabilities requiring repair. Unfortunately, providing this information also informs malicious users where the weak points are in the software.
In their experiment, the researchers developed AI agents loaded with GPT-4, GPT-3.5, and another eight open-source large language models (LLMs). The agents were given access to tools, CVE descriptions, and the ReAct agent framework, enabling the AI models to interact with other software and systems. The team established a benchmark of 15 real-world one-day susceptibilities for the agents to exploit. The open-source models and GPT-3.5 wholly failed, while GPT-4 managed to exploit 87% of the vulnerabilities.
However, the study also found that GPT-4’s success rate plummeted from 87% to just 7% when the CVE descriptions were removed, suggesting GPT-4’s capacity to exploit vulnerabilities relies upon these CVE details.
The researchers determined that using LLM agents is already significantly cheaper than employing human labor, and much easier to scale. According to their calculations, their GPT-4 agent incurred an $8.80 cost for each exploit. Comparatively, employing a cybersecurity expert at $50 an hour would cost around $25 for each exploit. It’s predicted that as larger LLMs, such as GPT-5, become available, these capabilities and cost inequalities will only expand. As a result, the researchers recommend the wider cybersecurity industry and LLM providers to carefully consider how these types of agents can be incorporated into protective techniques.