A novel data-driven model has been developed to generate natural human motions for virtual avatars. Human beings possess an inherent ability to execute a diverse array of movements, which facilitates their effectiveness in accomplishing various tasks in everyday life. Replicating these movements automatically in virtual avatars and 3D animated characters holds significant potential across multiple applications, ranging from metaverse environments to digital entertainment, AI interfaces, and robotics.
Researchers from the Max Planck Institute for Intelligent Systems and ETH Zurich have recently unveiled WANDR, a pioneering model capable of producing lifelike human motions for avatars. Presented in a paper scheduled for the Conference on Computer Vision and Pattern Recognition (CVPR 2024), this model integrates disparate data sources into a unified framework to achieve more authentic motions in 3D humanoid characters. The paper has also been shared on the arXiv preprint server.
Markos Diomataris, the lead author of the paper, explained the overarching goal of their research as striving to understand what it takes to create virtual humans capable of emulating human behavior. This entails learning to perceive and navigate the world, setting goals, and endeavoring to accomplish them. By adopting a philosophy akin to “building what you want to understand,” the researchers aim to gain deeper insights into human behavior.
The primary focus of the recent study was to devise a model capable of generating realistic motions for 3D avatars, enabling them to interact with their virtual environment effectively, such as reaching to grasp objects. For instance, the seemingly simple act of reaching for a coffee cup involves a combination of movements like arm extension, bending down, and walking, requiring continuous adjustments to maintain balance and achieve the desired goal.
To impart these skills to virtual agents, the researchers explored two main approaches: reinforcement learning (RL) and dataset-driven training. While RL involves learning through trial and error, dataset-driven training leverages human motion demonstrations to train machine learning models. Given the scarcity of datasets containing goal-oriented reaching motions, the team developed a method capable of learning this skill from available data sources.
Their model, WANDR, represents a significant advancement as it learns adaptive avatar behaviors solely from data, without the need for additional reinforcement learning steps. By conditioning avatar actions on time and goal-dependent features, WANDR guides avatars in reaching specified goals using their wrists, akin to how humans adjust their actions to achieve objectives. This approach allows avatars to approach and reach moving or sequential goals, even without explicit training for such scenarios.
While existing datasets are limited in their coverage of goal-oriented reaching motions, the proposed model combines data from various sources to produce more natural motions, enabling avatars to reach arbitrary goals in their environment. Moving forward, this model holds promise for enhancing the realism of virtual characters in applications like videogames, VR experiences, and animated films. As datasets of human motions continue to expand, WANDR’s performance is expected to further improve, paving the way for more immersive virtual experiences. In the future, research will focus on enabling avatars to learn from large, uncurated video datasets and explore their virtual environment autonomously, mirroring how humans acquire experience through action and observation.