Skip to content Skip to footer

RTMW: A Range of Advanced AI Models for Whole-Body Pose Estimation in 2D/3D Format

Whole-body pose estimation is an integral aspect in enhancing the capabilities of AI systems that center around human interaction. It plays a significant role in various applications such as human-computer interaction, avatar animation, and the film industry. Despite the progression of lightweight tools like MediaPipe that deliver good real-time performance, the accuracy still requires further improvement. Sophisticated models for estimating 2D/3D whole-body pose were introduced, such as RTMW, by researchers from Shanghai AI Laboratory to address these challenges.

The RTMW model uses the RTMPose model architecture coupled with FPN and HEM (Hierarchical Encoding Module) to capture pose information better. The model is trained using a significant assortment of open-source human datasets that have annotations with manual alignment, which are further refined using a two-stage distillation technique. RTMW delivers strong results on various whole-body pose estimation tests while ensuring high inference efficiency and consistent deployment friendliness.

The RTMPose implementation applies 14 different datasets to overcome the limitations of open-source whole-body pose estimation datasets. These datasets comprise 3 whole-body, 6 human body, 4 face, 1 hand, and 3 3D whole-body point datasets. They are manually aligned and uniformly mapped to the 133-point definition of COCO-Wholebody. This range of datasets creates a holistic set to train the model dedicated to the task of pose estimation.

The performance of the RTMW model has been tested against the COCOWholeBody dataset, with results indicating that RTMW maintains a balance between accuracy and complexity. An extension of the method, RTMW3D, showed successful results on the COCOWholeBody and H3WB datasets. The inference speed of the RTMW models was evaluated, revealing that while RTMW is slower due to having an extra module compared to RTMPose, it significantly enhances accuracy.

In summary, researchers from the Shanghai AI Laboratory have developed RTMW and RTMW3D, an advanced series of models for 2D/3D whole-body pose estimation. These high-performance models have shown strong results on various datasets and provide a balanced ratio between accuracy and complexity. Their unique monocular 3D pose estimation capabilities set them apart from the existing methods in this field. With their open-source availability, it is hoped that these advanced models will serve several industry needs for robust pose estimation solutions.

Leave a comment

0.0/5