Skip to content Skip to sidebar Skip to footer

AI safety

Study by UK government discovers LLM protections can be readily circumvented.

The UK's AI Safety Institute (AISI) has conducted a study revealing that AI chatbots can be manipulated into producing harmful, illegal, or explicit responses. The AISI tested five large language models (LLMs), referred to by colour codes, using harmful prompts from an academic paper from 2024, along with a new set of harmful prompts unmodified…

Read More

According to a study by Georgetown University, only 2% of AI research is focusing on safety.

Despite the growing interest in AI safety, a recent study by Georgetown University’s Emerging Technology Observatory reveals that only a small fraction of the industry’s research focuses on this area. After analyzing over 260 million scholarly publications, they found that just 2% of AI-related papers published between 2017 and 2022 directly addressed AI safety, ethics,…

Read More

WMDP identifies and mitigates the harmful utilization of LLM through the process of unlearning.

Researchers, including experts from Scale AI, the Center for AI Safety, and leading academic institutions, have launched a benchmark to determine the potential threat large language models (LLMs) may hold in terms of the dangerous knowledge they contain. Using a new technique, these models can now "unlearn" hazardous data, preventing bad actors from using AI…

Read More