Skip to content Skip to footer

Research on artificial intelligence by Stability AI and Tripo AI presents the TripoSR Model, designed for swift FeedForward 3D generation using just one picture.

In the rapidly advancing field of 3D generative AI, a new wave of breakthroughs are paving the way for blurred boundaries between 3D generation and 3D reconstruction from limited views. Propelled by advancements in generative model topologies and publicly available 3D datasets, researchers have begun to explore the use of 2D diffusion models to generate 3D objects from input photos or phrases. Among these new models is the DreamFusion model, which utilizes an innovative method known as score distillation sampling (SDS) to optimise 3D models using a 2D diffusion model.

While this method is revolutionary in the creation of detailed 3D objects, the slow generation speed as a result of intensive computational and optimisation requirements and complexity in controlling output models limit its applicability. In contrast, feedforward 3D reconstruction models have demonstrated higher scalability in training on diverse 3D datasets. This improved efficiency paves the way for more practical and faster 3D model generation.

In a new collaboration between Stability AI and Tripo AI, the two companies have revealed a fresh approach: the TripoSR model. The TripoSR model can render feedforward 3D designs from a single image in under half a second, advancing beyond the limitations of previous techniques. The functionality of the TripoSR model relies on the coordination of three primary components: an image encoder, a neural radiance field (NeRF) rooted in triplanes, and an image-to-triplane decoder.

The image encoder is initialised using a pre-trained vision transformer model, the DINOv1, and plays a vital role in converting RGB images into latent vectors, encoding both global and local image properties. Importantly, these encoded properties drive the reconstruction of the 3D object.

Moreover, the new model avoids explicit parameter conditioning, allowing it to handle various real-world conditions without depending heavily on precise camera data. Two key enhancements have also been added in response to data’s crucial role in model training; data curation and data rendering, which both vastly improve model generalisability and training data quality.

The results of the research indicate that the TripoSR model surpasses other available open-source options both numerically and qualitatively. With the model available for use, it encourages continued advancements in the AI, computer vision and computer graphics sectors, with researchers, developers and artists gaining access to new, innovative tools for 3D generative AI.

In short, this collaborative research by Stability AI and Tripo AI represents a significant boundary-pushing achievement in 3D generative AI. It introduces the TripoSR model that creates 3D feedforward models from a single image. This advancement, aside from enhancing the quality and generalizability of 3D model production, also dramatically speeds up the 3D generation from images, which represents a promising shift towards faster and more efficient 3D modeling in real-world scenarios.

Leave a comment

0.0/5