Researchers from Pinterest have developed a reinforcement learning framework to enhance diffusion models – a set of generative models in Machine Learning that add noise to training data and then learn to recover it. This exciting advancement allows the models to accomplish top-tier image quality. These models’ performance, however, largely relies on the training data’s distribution, leading to potential issues such as aesthetic incongruence, biases, and stereotypes.
Previously, the solution was to use curated datasets or intervene in the sampling process. But these methods only impacted the model’s sampling time without enhancing its inherent abilities. The Pinterest researchers’ proposed framework allows training over a massive number of prompts across assorted tasks. In aiming to produce diverse outputs, the research team used a distribution-based reward function for reinforcement learning fine-tuning. They also performed multi-task joint training to allow the model to handle diverse objectives simultaneously.
In their evaluation, the researchers found that their method was more robust and performed much better on all counts. The team employed three separate reward functions – image composition, human preference, and diversity and fairness, as part of this evaluation. They used the ImageReward model to calculate human preference score, which functioned as the reward during the model’s training. They discovered that the new method outperformed other models, maintaining a balanced distribution and showcasing its superiority in generality, robustness, and ability to generate diverse images.
In conclusion, the researchers have introduced a scalable reinforcement learning training framework to fine-tune diffusion models, thereby alleviating issues with existing models. The team intend their research to inspire further work, to keep improving diffusion models and mitigate the critical problems of bias and fairness. The research is credited solely to the authors of this project.