The UK's AI Safety Institute (AISI) has conducted a study revealing that AI chatbots can be manipulated into producing harmful, illegal, or explicit responses. The AISI tested five large language models (LLMs), referred to by colour codes, using harmful prompts from an academic paper from 2024, along with a new set of harmful prompts unmodified…
Despite the growing interest in AI safety, a recent study by Georgetown University’s Emerging Technology Observatory reveals that only a small fraction of the industry’s research focuses on this area. After analyzing over 260 million scholarly publications, they found that just 2% of AI-related papers published between 2017 and 2022 directly addressed AI safety, ethics,…
Researchers, including experts from Scale AI, the Center for AI Safety, and leading academic institutions, have launched a benchmark to determine the potential threat large language models (LLMs) may hold in terms of the dangerous knowledge they contain. Using a new technique, these models can now "unlearn" hazardous data, preventing bad actors from using AI…