Facial Animation

The face is the most important element of visual communication between humans. High fidelity virtual humans must be able to synthesize convincing facial expressions, visual speech, and exhibit intelligent attention control. At the same time, a human-like face is a complex system that requires significant computational resources to be simulated. We have been investigating efficient techniques to animate realistic faces for interactive applications.

Selected publications and demos:

1.“Assembling an Expressive Facial Animation System”, Alice Wang, Michael Emmi, Petros Faloutsos, in ACM SIGGRAPH Video Game Symposium (Sandbox), 2007, pp. 21-26.

Abstract:

In this paper we investigate the development of an expressive facial animation system from publicly available components. There is a great body of work on face modeling, facial animation and conversational agents. However, most of the current research either targets a specific aspect of a conversational agent or is tailored to systems that are not publicly available. We propose a high quality facial animation system that can be easily built based on affordable off-the-shelf components. The proposed system is modular, extensible, efficient and suitable for a wide range of applications that require expressive speaking avatars. We demonstrate the effectiveness of the system with two applications: (a) a text-to-speech synthesizer with expression control and (b) a conversational agent that can react to simple phrases.
2.“Expressive Speech-Driven Facial Animation”,Yong Cao,Wen C. Tien, Petros Faloutsos, Fred Pighin, ACM Transactions on Graphics, Volume 24, Issue 4, 1283-1302, October 2005.

Abstract:

Speech-driven facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech-related high-fidelity facial motions. From this training set, we derive a generative model of expressive facial motion that incorporates emotion control, while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.
3.“Real-time Speech Motion Synthesis from Recorded Motions”, Yong Cao, Petros Faloutsos, Eddie Kohler, Fred Pighin, in Proceedings of ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2004, pp. 347-355.

Abstract:

Data-driven approaches have been successfully used for realistic visual speech synthesis. However, little effort has been devoted to real-time lip-synching for interactive applications. In particular, algorithms that are based on a graph of motions are notorious for their exponential complexity. In this paper, we present a greedy graph search algorithm that yields vastly superior performance and allows real-time motion synthesis from a large database of motions. The time complexity of the algorithm is linear with respect to the size of an input utterance. In our experiments, the synthesis time for an input sentence of average length is under a second.