Researchers from the Meaning Alignment Institute have proposed a new method for aligning artificial intelligence (AI) systems with human values, termed Moral Graph Elicitation (MGE). The team highlighted the increasing importance of ensuring that AI aligns with people’s values as these systems become more embedded in everyday life, and warned against AI systems being blindly obedient to user instructions, especially in contentious environments.
One potential solution is to pre-program AI systems with a set of values to reference each time they undertake a task. However, defining these values and ensuring they represent all people equitably presents challenges. The researchers, therefore, suggested the MGE method as a means of addressing these issues.
MGE involves two main components. The first, ‘value cards,’ encapsulate what is important to an individual in a specific situation. They include elements that a person might pay attention to when making a meaningful decision. The second component, the ‘moral graph,’ visually depicts the relationships between different value cards, indicating which values are more relevant or insightful in a given context. The graph is created by having participants compare different value cards and decide which they believe provide the best guidance for a particular situation.
A study involving 500 participants was conducted to test MGE. The participants used the method to explore three sensitive topics: abortion, parenting, and the weapons used in the January 6th Capitol riot. The results were promising, with 89.1% of participants feeling well-represented by the process and 89% believing the final moral graph was fair, despite their value not being deemed the wisest.
The researchers proposed that an alignment target must meet six criteria to influence model behavior based on human values: it must be detailed, generalizable, scalable, robust, legitimate, and auditable. The team argued that the moral graph produced by MGE performs well on these criteria.
Despite the study’s positive results, concerns remain around AI alignment methods, especially those that source values from the public. Historically, minority views have often been eventually adopted by the majority, indicating the importance of dissenting viewpoints in democratic decision-making. Furthermore, public input into AI systems, while democratic, could fuel populism and override minority opinions. Cultural disparities are another challenge, as principles widely accepted in one region may be controversial in another. Global and diverse cultural value integration into AI systems is a crucial consideration for any alignment strategy, to prevent Western value dominance and erosion of peripheral views.
In conclusion, although there are limitations and opportunities for further development, the study offers a valuable strategy for aligning AI systems with human values. As AI systems become more central to our lives, every effort to align these technologies with diverse human values is invaluable in ensuring fair and equitable AI governance.