PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation😀

Baiqin Wang1,2, Xiangyu Zhu1,2, Fan Shen3, Hao Xu3,4, Zhen Lei1,2,5
1MAIS, Institute of Automation, Chinese Academy of Sciences
2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Psyche AI.INC 4The Hong Kong University of Science and Technology 5CAIR, HKISI, Chinese Academy of Sciences

Abstract

MY ALT TEXT

Recent advancements in audio-driven talking face generation have made great progress in lip synchronization. However, current methods often lack sufficient control over facial animation such as speaking style and emotional expression, resulting in uniform outputs. In this paper, we focus on improving two key factors: lip-audio alignment and emotion control, to enhance the diversity and user-friendliness of talking videos. Lip-audio alignment control focuses on elements like speaking style and the scale of lip movements, whereas emotion control is centered on generating realistic emotional expressions, allowing for modifications in multiple attributes such as intensity. To achieve precise control of facial animation, we propose a novel framework, PC-Talk, which enables lip-audio alignment and emotion control through implicit keypoint deformations. First, our lip-audio alignment control module facilitates precise editing of speaking styles at the word level and adjusts lip movement scales to simulate varying vocal loudness levels, maintaining lip synchronization with the audio. Second, our emotion control module generates vivid emotional features with pure emotional deformation. This module also enables the fine modification of intensity and the combinationof multiple emotions across different facial regions. Our method demonstrates outstanding control capabilities and achieves state-of-the-art performance on both HDTF and MEAD datasets in extensive experiments.

Lip-audio Alignment Control

Singer

Scale of Lip Movement

PC-Talk can modify the scale of lip movement to simulate the effect of different volume. Using the slider to control the scale.

Weak

Loading...

Strengthen


Emotion Control

Emotion Intensity

PC-Talk can generate emotional expression with controlable intensity. Using the slider to view different emotional intensity.

Loading...
Loading...

Complex Emotional Expression

PC-Talk can generate emotion in different facial region independently, and then combined into complex emotional expression, such as a smiling mouth with a sorrowful eyebrow.

Method

MY ALT TEXT

Overview of the proposed method.

Quality Comparison

We compare PC-Talk with other methods, including image input, video input and emotional talking face.

Quantitative Results

MY ALT TEXT

Animal

BibTeX

@misc{wang2025pctalk,
    title={PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation},
    author={Baiqin Wang and Xiangyu Zhu and Fan Shen and Hao Xu and Zhen Lei},
    year={2025},
    eprint={2503.14295},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
    }