Generative models, which can include GANs, often exhibit the ability to encode significant visual concepts linearly within their latent space. This feature allows these models to perform controlled image edits, making alterations to facial attributes such as age and gender. However, in the case of multi-step generative models, like diffusion models, identifying this linear latent space remains a challenging process. Meanwhile, personalization methodologies, as seen in Dreambooth and Custom Diffusion models, have suggested a path toward uncovering this interpretable latent space.
In these models, diffusion models are personalized by fine-tuning specific images of subjects. This method leads to model weights that are specific to each identity, bypassing the need for a latent code within the noise space. In a collaboration between researchers from UC Berkeley, Snap Inc., and Stanford University, they have explored the weight space of these customization diffusion models using a dataset of over 60,000 models. Each of these models was fine-tuned to reflect a different visual identity.
The researchers named this weight space “weights2weights” (w2w) and treated it as a subspace. An examination of this space showed its potential for sampling new identities, making semantic changes, such as adding a beard, and inverting images to generate practical identities, even when working with out-of-distribution inputs.
Many image-based generative model types, including VAEs, Flow-based models, GANs, and Diffusion models, are tailored to create high-quality, photorealistic images. Of these, GANs and Diffusion models had exceptional customization and controllability capabilities. In these examples, the researchers attempted to integrate user-defined concepts by reducing the parameters’ dimensionality, utilizing methods such as low-rank updates, implementing hypernetworks, and working within specific layers.
They developed a method that involved creating a manifold of model weights that reflected individual identities. This process was accomplished by fine-tuning latent diffusion models with Dreambooth and using LoRA to reduce the resulting weights’ dimensionality.
Results from their experiments indicated that the w2w space was useful in manipulating human identities across several tasks. It achieved this by allowing for identity editing and enabling the inversion of a single image into its identity by optimizing weights within the w2w space.
By using fine-tuning methods, they created a synthetic dataset of around 65,000 identities, encoded into model weights. Through this, they managed to sample new identities, modify the identities’ attributes, and invert out-of-distribution identities into realistic ones.
In conclusion, they introduced the concept of weights2weights space (w2w space), treating diffusion model weights as points in a defined space by other customized models. While this concept shows potential for misuse in the form of malicious identity manipulation, the researchers hope for a focus on enhancing model safety and exploring visual creativity. The researchers see potential for the w2w space concept to be expanded beyond identities, and this is a likely future research avenue. The w2w space acts as an interpretable latent space for identity manipulation. Their findings and resources are available on Github.