Skip to content Skip to footer

UT Austin and Meta Researchers Team Up to Create SteinDreamer: A Novel Text-to-3D Asset Synthesis Method Utilizing Stein Score Distillation for Improved Visual Quality and Faster Convergence

Recent breakthroughs in text-to-image generation powered by diffusion models have made text-guided 3D asset creation more accessible than ever before. This technology enables automated 3D asset production for virtual reality, films, and video games. Unfortunately, challenges arise in 3D synthesis due to the scarcity of high-quality data and the complexity of generative modeling with 3D representations. Score distillation techniques have been developed to address this lack of 3D data, utilizing a 2D diffusion model. Unfortunately, issues such as noisy gradients and instability stemming from denoising uncertainty and small batch sizes have been reported, causing slow convergence and suboptimal solutions.

But now, researchers from The University of Texas at Austin and Meta Reality Labs have developed SteinDreamer, which integrates the proposed Stein Score Distillation (SSD) into a text-to-3D generation pipeline. This new invention consistently reduces the variance of the score distillation process. Experiments conducted on 3D object and scene-level generation have demonstrated that SteinDreamer surpasses DreamFusion and ProlificDreamer in generating detailed textures and precise geometries, while also minimizing Janus and ghostly artifacts. Furthermore, SteinDreamer’s reduced variance accelerates the convergence of 3D generation, resulting in fewer iterations.

The SSD technique leverages Stein’s identity to reduce variance in score distillation for text-to-3D asset synthesis. The proposed SSD allows for the incorporation of flexible guidance priors and network architectures to optimize for variance reduction. The overall pipeline is implemented with a monocular depth estimator. The effectiveness of SSD in reducing distillation variance and improving visual quality is borne out by experiments run on both object-level and scene-level text-to-3D generation.

Results show that SteinDreamer generates views with less over-saturation and over-smoothing artifacts than SDS, and produces sharper results with better details than SDS and VSD in challenging scenarios for scene generation. Furthermore, experiments demonstrate that SSD effectively reduces distillation variance, thereby improving visual quality in both object- and scene-level generation.

With the development of SteinDreamer, incorporating the SSD technique, we are one step closer to automating and accelerating 3D asset creation in virtual reality, films, and video games. Through SteinDreamer, we have a more general solution for reducing variance in score distillation for text-to-3D asset synthesis. Thanks to the incorporation of control variates constructed by Stein’s identity, flexible guidance priors and network architectures can be included to optimize for variance reduction. Moreover, SSD implemented in SteinDreamer yields results with richer textures and lower level variance than SDS, leading to faster convergence than existing methods due to more stable gradient updates.

So, if you’re as enthusiastic about this technology as we are, don’t forget to follow us on Twitter. Join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group. If you like our work, you will love our newsletter!

Leave a comment

0.0/5