Language and Large Model (LLM) research has shifted focus to steerability and persona congruity with complexities, challenging previous research simply based on one-dimensional personas or multiple-choice formats. A persona’s intricacy and its potential to multiply biases in LLM simulations when there’s lack of alignment with typical demographic views is now recognized.
A recent research by Carnegie Mellon University, focused on persona-steered generation, has redefined an incongruous persona as one where a trait makes other traits less likely, for example, a political liberal supporting military spending. Findings conclude that LLMs are less able to be steered towards such incongruous personas, often displaying a tendency to revert to stereotypical views.
To assess the steerability of LLMs, multifaceted personas were created by combining a demographic with a stance, based on data from the Pew Research Center. Study revealed significant variance in LLM biases highlighting the extensive challenges encountered in simulating diverse personas. It further underlines the importance of model alignment for predictive tasks, thereby questioning the reliability of LLM models.
LLMs were found to be more biased towards generating common stances for a demographic, with models often struggling to accurately steer incongruous personas, leading to less diversity and consequently more stereotypes. The study shows GPT-4 aligns strongly with human evaluations, indicating a high correlation in steerability assessment. Conversely, although RLHF (Reinforcement Learning from Human Feedback) fine-tuned models were more steerable, they displayed reduced view diversity, particularly towards women and political liberals.
The focus on persona congruity demonstrates the potential for LLMs to propagate demographic stereotypes. Realizing this, the research emphasizes the urgent need for improved steerability towards diverse personas and the generation of nuanced human opinion in LLMs. The reduction of biases and stereotypes could help avoid potential representational harm, social polarization, and the limitation of models’ ability to represent complex social identities.
Moving forward, research suggests focusing on the LLM’s behavior in more interactive settings and developing more complex, multifaceted representations to better fathom and mitigate these biases. The research marks a significant shift in the LLM understanding and highlights an urgent call for in-depth exploration of these biases and the development of tactics to counter them effectively.
In conclusion, the study explores how effectively LLMs can be guided to generate specific persona statements. It suggests that models are more easily steered towards congruent personas across various stances on politics, race, and gender. However, they may still propagate demographic stereotypes. Future investigations should focus on the interactive behavior of LLMs and building complex, multifaceted representations to understand and mitigate potential biases better.