Mimicking Human Body Movement and Facial Expressions

•

Abstract

This project presents a vision-based interface that impersonates user body movements and facial expressions, and mirrors them via a computer-generated (CG) avatar in real-time. It utilizes Microsoft Kinect, a 3D depth sensor device to accurately track user's full-body joints movement, orientation and facial features. It is computationally inexpensive, and thus ideal for real-world applications for tracking as it realistically reconstructs human body and facial dynamics in real-time without using intrusive devices.

This project creats an interface that tracks, recognises and reconstructs human body movements and facial expressions in real-time using Microsoft Kinect sensor. This project delivers an application that automatically tracks full-body joints actions and facial features, and classifies extracted data from Kinect data streams to display by a computer generated (CG) avatar.

Associated with motion tracking, recognition and reconstruction it can be used for intelligent interaction, digital entertainment, gaming, tele- presence, virtual and augmented reality.

Real-Time Animation with Maya

Autodesk Maya is a very powerful 3D modelling and animation software. It has wide range of tools used by architectures, engineers, game designers, 3D artists, film makers and animators to create stunning visual effects, simulations and realistic renders. It is a part of most design studio environments due to its high quality animation and rendering capabilities that are relatively easy to use. Also, unlike most 3D packages Maya provides a cross-platform for scripting languages. This mean that by using scripted languages such as Maya Embedded Language (MEL) or Python we can automate repetitive and/or tedious task thus reduce human error and produce effective workflow. For this project, Maya has been used as an environment for animating the 3D character in real-time using data from Kinect. Where Python script has been used to create a UDP (User Datagram Protocol) Socket in Maya which uses a UDP protocol to receive data from Kinect SDK and assign it to appropriate attributes inside Maya.

The communication between Microsoft Kinect APIs and Autodesk Maya had to be in platform independent manner, due to Kinect APIs required C# as a programming language and Maya required Python language. Therefore, a Client and Server mode of communication is employed to simplify transfer of data from Microsoft Kinect APIs to Autodesk Maya. The data streams from Kinect device was processed by Skeletal and Face Tracking Clients then through a UDP protocol the tracked data was used inside Maya to animate the body and facial expressions of a 3D avatar. The Skeleton and Face Animator Servers can thereby be considered as listeners for the Skeletal and Face Tracking Clients respectively.

Tracking Human Body

This project utilized a Kinect for Microsoft software development kit (SDK) to track human body movements and facial expressions. It contains application programming interfaces (APIs) and tools that provide exclusive access to Kinect data streams. Specifically, by enabling Skeletal Tracking this system can recognize up to 6 people in the field of the infrared sensor. Though, only two people can be actively tracked for pose estimation. Each tracked skeleton offers comprehensive information regarding the X, Y and Z position of 20 joints of the person’s body. Thus, each person’s skeleton can be tracked in 3D space to infer their body movements over time.

Drive and Control the Body of CG Avatar

The CG Avatar has been animated in real-time inside Maya using Python. This 3D Character was created using Autodesk Project Pinocchio (2013), which provides a web-based laboratory to create full-rigged custom 3D Characters by using pre-defined body types, facial features and clothing styles. It has a skeleton based on Inverse Kinematics (IK) and constraining locators parented to each joints. These locators are responsible for animating Maya skeleton which produces realistic movement of an organic body simulating muscles contractions and stretching when moving bones in a skeleton using data from Kinect device. Hence, for animating realistic human body a skeleton made from Maya bones containing minimum 20 joints has been attached to the 3D model that corresponded to the Kinect Skeletal joint structure and was driven by the values acquired from Kinect Skeletal Tracking.

The 3D model of an Avatar is animated inside Maya according to the parameters received through a UDP protocol. When each skeleton joint information is successfully tracked, it is sent from SkeletonTrakingClient over a socket. A listener server SkeletonAnimatorServer coded using Python in Maya receives these joint information in a world coordinate system. It splits the incoming message received through a UDP protocol and assigns its x, y and z values to an appropriate locator attributes. Inside Maya, each locators is attached to a corresponding bones forming a skeleton. These locators controls the movement of the skeleton in 3D space by constraining the joints according to the values obtained.

Tracking Facial Expressions

Face Tracking SDK enables Kinect to track face using point tracking methods based on locating corresponding points in following frames. This method is very effective for tracking small features such as eyes or nose. It uses a constructive tracking process in which facial movement is tracked frame to frame by using previous locations as starting approximations. Also, it tracks head pose and semantic facial expressions by computing both colour and depth map, due to various technical advantages this system particularly uses colour camera space.

Moreover, Kinect’s Face Tracking system is based on CANDIDE-3 face model, a parameterised mask for coding human faces introduced by Mikael Rydfalk in 1987. This face model is very popular because of its simplicity of low polygonal structure facilities a robust facial reconstruction. This mask uses 113 vertices and 168 surfaces to model a human face and controlled by mapping set of parameters known as Action Units. The Action Units are based on facial action coding system (FACS), a system for automatically detecting face in video stream, extracting the geometric features of the face and provides temporal profiles of every facial expression (Valstar et. al. 2006). These Action Units represents the fundamental actions of individual muscles or groups of muscles. Hence mimic the implementation of facial expressions where each action unit act as a single facial muscle in CANDIDE-3 model.

At the moment Kinect Face Tracking SDK support for extracting 6 different Action Units deciphered from depth data captured by Kinect device. These Action Units provides floating point values ranging from -1 to 1, representing the extreme of each facial mimic. In addition, it provides the functionality to obtain 3 Rotation Angle of the head movement.

Animating Facial Expressions of a CG Avatar

The facial animation requires a 3D deformable face model which has a set of blend shape (key facial expressions) deformers. A blend shape deformer has the capability to morph from one shape of object into shape of another object but only if both objects have the same polygonal typology. In facial animation setup, a typical set of blend shape deformers were used to express humanly emotions and speech by creating various poses required for facial animation.

This project used a similar method based on blend shape model, a linear combination of more than two facial expressions. This method allows the production of a wide range of facial expressions with very little computation by varying the weights of the linear combination, thus obtaining realistic yet robust facial expressions reconstruction (Joshi 2003). This project is used 7 blend shape models that correspond to 6 Action Units values obtained from Kinect Face Tracking SDK.