In the age of artificial intelligence, computers can generate “art” using diffusion models. However, this often involves a complex, time-consuming process requiring multiple iterations for the algorithm to perfect the image. MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have now launched a new technique that simplifies this process into a single step using a type of teacher-student model. This approach, called distribution matching distillation (DMD), retains the image quality while enabling faster generation.
DMD uses a regression loss, which anchors the mapping to ensure a coarse organization of the space of images, making the training stable. It also uses a distribution matching loss to make sure that the chance of generating a certain image with the new model matches its real-world occurrence frequency. The system achieves faster generation by training a new network to minimize the distribution divergence between its generated images and those from the dataset used by traditional diffusion models.
The DMD method harnesses two diffusion models that help the system understand the difference between real and generated images, thus enabling fast one-step generator training. The pre-trained networks for the new model are used to simplify the process and have achieved fast training convergence. This method can also be combined with other system optimizations based on the original architecture to accelerate the creation process further.
When tested against conventional methods using a variety of benchmarks, DMD showed consistent performance. It is the first one-step diffusion technique that generates images almost on par with those from the original, more intricate models. It also excels in industrial-scale text-to-image generation and boasts state-of-the-art one-step generation performance. While there is still a slight quality gap in tackling more complex text-to-image applications, this suggests room for further improvement.
The performance of DMD-generated images largely depends on the capabilities of the ‘teacher’ model used during the distillation process. Therefore, using more advanced teacher models could further enhance the DMD-generated images. This groundbreaking work, supported by the U.S. National Science Foundation and other organizations, will be presented at the Conference on Computer Vision and Pattern Recognition in June.