In the field of 3D generative AI, a new dimension has emerged whereby 3D reconstruction can occur from limited views. Propelled by large-scale 3D datasets and advances in generative model topologies, research has been spearheaded into using 2D diffusion models to create 3D objects from input texts or photos. This is primarily to address the deficiency in 3D training data.
Notably, DreamFusion has pioneered the Score Distillation Sampling (SDS) technique that optimizes 3D models through a 2D diffusion model. While groundbreaking, this method is limited due to extensive computational requirements and difficulty managing output models. Consequently, more efficient feedforward 3D reconstruction models have been created. These new methods enable fast feedforward inference and provide greater control over output production.
The TripoSR model, developed by Stability AI and Tripo AI, generates 3D feedforward models from just one image in less than half a second. The process involves numerous enhancements, including data curation, model design, and rendering. Based on the transformer architecture, like LRM, it uses an object in a single RGB image to create a 3D model.
The TripoSR model comprises three main parts; an image encoder, a neural radiance field based on triplanes, and an image-to-triplane decoder. The image encoder, DINOv1, is a pre-trained vision transformer model crucial in the TripoSR model because it converts an RGB image into a series of latent vectors necessary for 3D reconstruction.
To manage real-world conditions without relying on accurate camera data, the model avoids explicit parameter conditioning. Crucial design elements include the triplane size, transformer layer count, NeRF model details, and primary training settings.
Regarding data collection, data curation involves selecting a subset of the Objaverse dataset for improving the training data quality. Also, distinct data-rendering strategies help improve the model’s efficacy, mimicking the dispersion of real-world images more closely.
In terms of performance, TripoSR outperforms its competitors both numerically and qualitatively. Additionally, providing a pre-trained model, an online interactive demo, and source code under the MIT license presents a significant breakthrough in AI, computer graphics, and computer vision. With these pioneering 3D generative AI tools, researchers, developers, and artists are expected to transform the sector significantly.