Researchers from the University of Maryland, Tsinghua University, University of California, Shanghai Qi Zhi Institute, and Shanghai AI Lab have developed a novel methodology named Make-An-Agent for generating policies using conditional diffusion models. This method looks to improve upon traditional policy learning that uses sampled trajectories from a replay buffer or behavior demonstrations to learn policies or trajectory models. Such an approach typically models a narrow behavior distribution, and there are recognized challenges in guiding high-dimensional output generation using low-dimensional demonstrations.
The researchers’ approach aims to harness diffusion models, which have demonstrated strong performance in tasks such as text-to-image synthesis, to generate a policy network as a conditional denoising diffusion process. This process refines noise into structured parameters consistently, allowing the diffusion-based generator to uncover various policies with improved performance and a robust policy parameter space.
Existing methods in this space include Parameter Generation and Learning to Learn for Policy Learning. Parameter Generation, which has evolved from the introduction of Hypernetworks, focuses on predicting neural network weights. This alternative method involves meta-learning, aiming to develop a policy adaptable to any new task within a given task distribution.
The proposed Make-An-Agent model uses an autoencoder to compress policy networks into small latent representations based on their layer. Contrastive learning allows the connection between long-term trajectories and their outcomes or future states to be understood. An effective diffusion model is then applied based on the learned behavior embeddings to generate policy parameters, which are decoded into usable policies with the pre-trained decoder.
The performance of the Make-An-Agent model was evaluated in three continuous control domains and showed significant promise. Policies generated by Make-An-Agent outperformed those created by other methods in tasks involving tabletop manipulation and real-world locomotion tasks. The model was also able to produce policies with high performance even when given noisy trajectories, showing its robustness and adaptability to environmental randomness.
Moving forward, due to the limitations caused by the large number of parameters involved, the researchers aim to explore more diverse policy networks and investigate more flexible methods of generating parameters. The researchers believe that their groundbreaking Make-An-Agent model is a step forward in utilizing behavior-to-policy generation and has the potential for significant application in the real world.