Google has introduced its Frontier Safety Framework’s first edition, meant to mitigate the severe risks that potent frontier AI models of the future might pose. It outlines Critical Capability Levels (CCLs), thresholds where the models may present an escalated risk without additional mitigation. Mitigation strategies to tackle models that exceed these CCLs are divided into two main sections: Security mitigations, attempting to forestall disclosure of the weights of a model that reaches CCLs, and Deployment mitigations, striving to avert misuse of a deployed model reaching CCLs.
Alongside this, Google indicates a comprehensive analysis of the Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D domains. Their initial research suggests that the powerful potential of future models could pose significant risks in these areas. Google introduced CCLs for domains like Autonomy, addressing the risk of a system curating resources and running self-replicated models, Biosecurity translating to a model that enables development of biothreats, Cybersecurity being a model capable of orchestrating cyber-attacks or enabling amateurs to perform high-grade attacks, and Machine Learning R&D that would hasten or automate AI research at a cutting-edge lab.
Especially disconcerting is the Autonomy CCL, posing the risk of AIs acting against humans. To manage this, Google plans to conduct periodic reviews of its models through an array of “early warning evaluations” signaling models that are close to the CCLs. These evaluations will activate mitigation measures when models display impending signs of reaching CCLs.
The framework does acknowledge the possibility of a model featuring hazardous potential before viable mitigations are in place. Here, development of that model will be temporarily halted, indicating Google’s commitment to addressing AI risks. It aims for this initial framework to be functional by early 2025, ahead of these risks materializing.
Hence, while the framework might enhance existing concerns about AI risks, it reflects a commitment to addressing these challenges comprehensively. The framework will continue to evolve as understanding of frontier models’ risks and benefits improve, leaving room for better risk understanding and management across different domains. Google’s rigorous approach suggests a serious commitment to AI risk management, navigating the thin line between precaution and fear. The ultimate hope is that these potential risks never materialize, thanks to Google’s proactive measures to address them.