Artificial intelligence (AI) has advanced dramatically in recent years, opening up numerous new possibilities. However, these developments also carry significant risks, notably in relation to cybersecurity, privacy, and human autonomy. These are not purely theoretical fears, but are becoming increasingly dependant on AI systems’ growing sophistication.
Assessing the risks associated with AI involves evaluating performance across different domains, from verbal reasoning to coding abilities. However, the complexity of these systems means that comprehending potential dangers is difficult. This presents a unique challenge: how to evaluate AI capabilities that could inadvertently or otherwise produce harmful outcomes?
Responding to this problem, a team from Google’s DeepMind research lab has suggested a comprehensive scheme for evaluating AI systems’ “dangerous capabilities”. Their evaluations focus on four areas: Persuasion and deception, Cyber-security, Self-proliferation, and Self-reasoning.
In the area of persuasion and deception, the research assesses AI systems’ ability to manipulate beliefs, form emotional connections, and even tell convincing lies. For cybersecurity, the team evaluates AI knowledge of computer systems, vulnerabilities, and exploits, and also tests their capability to infiltrate and manipulate systems, launch attacks, and exploit known vulnerabilities.
Assessment of self-proliferation involves examining AI systems’ abilities to independently set up and manage digital infrastructure and resources, and to self-replicate or improve. This focuses on their capacity to undertake tasks like cloud computing, email account management, and the development of resources through various means. As for self-reasoning, this evaluates AI systems’ capability to reason about themselves and adapt their surroundings or functionality when it is beneficial. This involves their ability to understand their current states, make decisions based on this understanding, and potentially modify their behavior or code.
The research used datasets from the Security Patch Identification (SPI), which contains vulnerable and non-vulnerable commits from the Qemu and FFmpeg projects. It showed that persuasion and deception capabilities are highly developed, suggesting that AI’s potential to influence human beliefs and behaviors is growing. More advanced models also showed at least basic skills across all the areas evaluated, suggesting that improvements in general capabilities could result in the emergence of dangerous capabilities.
In conclusion, the research indicates that understanding and mitigating the risks related to advanced AI systems will require a collaborative and unified approach from researchers, policymakers, and technologists. By combining, refining, and expanding existing evaluation methods, experts can better anticipate potential risks and develop strategies to ensure that AI is used for the benefit of humanity rather than posing unintentional threats.