Researchers from the University of Surrey have developed a novel method of generating detailed 3D models of dogs using photographs, training the requisite artificial intelligence with computer-simulated canines from the popular video game Grand Theft Auto V (GTA V). It was traditionally challenging to apply existing 3D modeling techniques on animals due to their unpredictable movement and behaviour.
Modifying the game’s code to replace human characters with dog avatars, the researchers crafted a synthetic dog dataset comprising 27,900 frames from 118 videos. These videos hosted a variety of dog animations performing various actions such as sitting, walking, barking, and running across different environments.
This rich dataset, labelled DigiDogs, allowed for more accurate data collection compared to real-world methods, offering more variation in dog appearances and movements. The research team took advantage of Meta’s DINOv2 AI model, known for its robust generalization capabilities, fine-tuning it with the DigiDogs dataset. The goal was to enable the AI model to predict realistic 3D dog poses from single-view RGB images accurately.
Comparison between models trained on the synthetic DigiDogs data and real-world datasets showed that the former produced more precise and lifelike 3D dog poses. The researchers verified the enhanced performance of the model through extensive qualitative and quantitative assessments. However, the team acknowledges that there is still room for improvement, particularly in predicting the depth aspect of images.
The successful development and application of the training procedure have wide-ranging potential implications. From ecology to animation, the method opens doors to improved performance in wildlife conservation and 3D object rendering for virtual reality. The paper detailing this method has won the Best Paper prize at the IEEE/CVF Winter Conference on Applications of Computer Vision.
This study delivers a significant step forward in 3D animal modelling. However, the team is aware that there’s more work to be done, especially in improving how the model predicts the depth aspect of the images. The study opens the door to better model performance in several areas, including wildlife conservation and 3D rendering for VR.