Alibaba has taken a giant leap in the field of text-to-3D modeling with their revolutionary Normal-Depth diffusion model, RichDreamer! With this model, the team has addressed the key challenge of lifting 2D diffusion to 3D generation by learning a joint distribution of normal and depth information, effectively describing scene geometry. The model is trained on the extensive LAION dataset, showcasing remarkable generalization abilities and fine-tuned on a synthetic dataset to learn diverse distributions of normal and depth in real-world scenes.
The proposed Normal-Depth diffusion model has also tackled the difficulty of mixed illumination effects in generated materials, introducing an albedo diffusion model to impose data-driven constraints on the albedo component, enhancing the disentanglement of reflectance and illumination effects and leading to more accurate and detailed results.
The geometry generation process involves score distillation sampling (SDS) and the integration of the proposed Normal-Depth diffusion model into the Fantasia3D pipeline. The team also explores the use of the model for optimizing Neural Radiance Fields (NeRF) and demonstrates its effectiveness in enhancing geometric reconstructions. The appearance modeling aspect involves a Physically-Based Rendering (PBR) Disney material model and shows significant improvements in both geometry and textured model generation compared to state-of-the-art approaches.
This pioneering approach to 3D generation sets a new standard in the field, and the team is excited to explore additional aspects of appearance modeling in future directions, such as text-to-scene generation. We can’t wait to see what the researchers from Alibaba come up with next!