Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation

Singing is a vital form of human emotional expression and social interaction, distinguished from speech by its richer emotional nuances and freer expressive style. Thus, investigating 3D facial animation driven by singing holds significant research value. Our work focuses on 3D singing facial animation driven by mixed singing audio, and to the best of our knowledge, no prior studies have explored this area. Additionally, the absence of existing 3D singing datasets poses a considerable challenge.

To address this, we collect a novel audiovisual dataset, ChorusHead which features synchronized mixed vocal audio and pseudo-3D flame motions for chorus singing. In addition, We propose a partner-aware 3D chorus head generation framework driven by mixed audio inputs. The proposed framework extracts emotional features from the background music and dependence between singers and models the head movement in a latent space from the Variational Autoencoder (VAE), enabling diverse interactive head animation generation. Extensive experimental results demonstrate that our approach effectively generates 3D facial animations of interacting singers, achieving notable improvements in realism and handling background music interference with strong robustness.

Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation

Let's Chorus: Given a segment of chorus song consisting of background music and vocals as input, PaChorus can generates a realistic animations with consistent emotion and dynamic head movement.

Abstract

Method

Video