Researchers have identified cultural accumulation as a crucial aspect of human success. This practice refers to our capacity to learn skills and accumulate knowledge over generations. However, currently used artificial learning systems, like deep reinforcement learning, frame the learning question as happening within a single “lifetime.” This approach does not account for the generational and ongoing nature of cultural accumulation observed in humans and other species.
Researchers have proposed a more robust method for balancing social learning and independent discovery, two elements seen as central to effective cultural accumulation in artificial agents. They propose two distinct models for this process: episodic generations for in-context learning (or knowledge accumulation) and train-time generations for in-weights learning (or skill accumulation). Adjusting for balance between these two methods, the agents can continuously accumulate knowledge and skills over multiple generations, leading to more successful outcomes compared to agents trained to live a single lifetime with the same cumulative experience.
The work is noteworthy as it represents the first general models that have proven successful at fostering emergent cultural accumulation in reinforcement learning. This success could lead to more open-ended learning systems and offer new opportunities for modeling human cultural progression.
The researchers open their investigation into cultural accumulation in artificial agents with two models: in-context accumulation and in-weights accumulation. Each employs a unique approach to determining the parameters and measures for defining “generations.” The researchers employ three environments to evaluate cultural accumulation: Goal Sequence, Travelling Salesperson Problem (TSP), and Memory Sequence. These environments replicate aspects of cultural accumulation as observed in human societies.
The researchers found that their proposed cultural accumulation models outperform the reinforcement learning baselines realized over a single lifetime in various conditions. The models showed a different result when tested in environments like TSP. Cultural accumulation allowed for sustained improvements beyond single-lifetime reinforcement learning.
The research proposes two models for cultural accumulation in reinforcement learning: in-context and in-weights, with specific algorithms suited for each model. According to the results, the performance of in-context accumulation can be hampered by oracles that are too reliable or unreliable, necessitating a balance between social learning and independent discovery. On the other hand, in-weights accumulation can effectively mitigate the primacy bias.
This work adds to the research in the cultural accumulation in artificial reinforcement learning agents, suggesting that more general approaches are required to enable knowledge transfer without constraints present in current methodologies. This could pave the way for more sophisticated, open-ended learning systems that could better mimic human cultural accumulation over generations.