Researchers from the Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab have developed a new technique to improve "red-teaming," a process of safeguarding large language models, such as AI chatbot, through the use of machine learning. The new approach focuses on the automatic generation of diverse prompts that result in undesirable responses…
