Department of Psychology

Department of

Psychology

Department of

Psychology

site header

The Speech Perception and Production Lab Banner

Queen's University Department of Psychology
Humphrey Hall, 62 Arch St., Kingston, ON K7L 3N6
T: 613-533-6000 • ext 77595 F: 613-533-2499
E: kevin.munhall@queensu.ca

 

Research

Facial Animation
Facial animation is a research tool in the Speech Perception and Production Laboratory. Animation allows us to have experimental control of the dynamics of facial motion and motion cues are critical to the visual perception of speech and emotion. In recent years we have worked on three types of animation:
  1. A physical model of the face that includes the biomechanics of the skin and the physiological characteristics of the facial musculature.
  2. A kinematic facial animation system that is based on the principal components of facial deformation.
  3. A kinematic facial animation system that is controlled by the action of facial regions that correspond to muscle lines of action.

Our goal, in the area of facial animation, is to make a facial model which can be used in speech perception and production research. In driving a realistic facial model, we learn about the neural control of speech production and how neural signals interact with the biomechanical and physiological characteristics of the articulators and the vocal tract. In addition, we make possible the systematic manipulation of physical parameters to study their effect on speech perception.

Our facial model is an extension of previous works on muscle-based models of facial animation (Lee, Terzopoulos, and Waters 1993, 1995; Parke and Waters, 1996; Terzopoulos and Waters, 1993; Waters and Terzopoulos, 1991, 1992). The modeled face consists of a deformable multi-layered mesh, with the following generic geometry: the nodes in the mesh are point masses, and are connected by spring and damping elements (i.e., each segment connecting nodes in the mesh consists of a spring and a damper in a parallel configuration). The nodes are arranged in three layers representing the structure of facial tissues. The top layer represents the epidermis, the middle layer represents the fascia, and the bottom layer represents the skull surface. The elements between the top and middle layers represent the dermal-fatty tissues, and elements between the middle and bottom layer represent the muscle. The skull nodes are fixed in the three-dimensional space. The fascia nodes are connected to the skull layer except in the region around the upper and lower lips and the cheeks The mesh is driven by modeling the activation and motion of several facial muscles in various facial expressions.

Face layer diagram

The figure (below) shows the full face mesh. In this figure we have individualized the shape of the mesh by adapting it to a subject's morphology using data from a Cyberware scanner. This is a 3-D laser rangefinder which provides a range map that is used to reproduce the subject's morphology and a texture map (shown below) that is used to simulate the subject's skin quality.

Full face mesh computerized face drawings

The red lines on the face mesh represent the lines of action of the modeled facial muscles. The lines of action, origins, insertions, and physiological cross-sectional areas are based on the anatomy literature and our measures of muscle geometry in cadavers. Our muscle model is a variant of the standard Hill model and includes dependence of force on muscle length and velocity.

At present, we can drive the model in two ways:

  1. by simulating the activation of several facial muscles during various facial gestures or
  2. by using processed electromyographic (EMG) recordings from a subject's actual facial muscles.

In the animation below you can watch the face model when it is driven by EMG recordings from the muscles around the mouth. The speaker is repeating the nonsense utterance /upae/. This animation of the lower face movements was produced using only the EMG recordings and thus several seconds of realistic animation were produced from previously recorded muscle activity.

Please note that the speed at which the movie plays will be determined by the computer on which it is viewed. It does not represent the real-time speed of the animation.

Computer rendered mesh face with skin

Audiovisual Speech Perception

Our work on audiovisual speech perception focuses on three aspects of face-to-face communication. First, the mechanisms underlying cross-modal integration. Second, eye movement of perceivers during audiovisual speech perception. And finally, studies of the visual information for speech.

Three human face images

Our work on audiovisual speech perception focuses on three aspects of face-to-face communication:

  1. Studies of the visual information for speech. In these studies we focus on the analysis of the facial dynamics and what role they play in speech perception. This work involves detailed kinematic analysis of facial motion and psychophysics of face perception.
    • Lucero, J., Maciel, S., Johns, D., & Munhall, K.G. (2005). Empirical modeling of human face kinematics during speech using motion clustering. Journal of the Acoustical Society of America, 118, 405-409.
    • Munhall, K.G., Jones, J.A., Callan, D. Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 15, 133-137.
    • Munhall, K.G., Kroos, C., Jozan, G. & Vatikiotis-Bateson, E. (2004). Spatial frequency requirements for audiovisual speech perception. Perception and Psychophysics, 66, 574-583.
    • Campbell, R., Zihl, J., Massaro, D., Munhall, K., & Cohen, M. (1997). Speechreading in a patient with severe impairment in visual motion perception (Akinetopsia). Brain, 120, 1793-1803.
    • Munhall, K.G. , Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the McGurk Effect. Perception and Psychophysics, 58, 351-362.
  2. Eye movement of perceivers during audiovisual speech perception. In these studies we have examined the patterns of eye movements when subjects watch and listen to another person speak.
    • Buchan, J.N., Paré, M., & Munhall, K.G. (in press). Spatial statistics of gaze fixations during dynamic face processing. Social Neuroscience.
    • Paré, M., Richler, R., ten Hove, M., & Munhall, K.G. (2003). Gaze Behavior in Audiovisual Speech Perception: The Influence of Ocular Fixations on the McGurk Effect.Perception and Psychophysics, 65, 553-567.
    • Vatikiotis-Bateson, E., Eigsti, I.M., Yano, S., & Munhall, K. (1998) Eye movement of perceivers during audiovisual speech perception. Perception and Psychophysics, 60(6), 926-940
  3. The mechanisms underlying cross-modal integration. To study the way the perceptual system uses information from different sensory modalities we make use of an audiovisual illusion called the McGurk Effect. The McGurk Effect (McGurk and McDonald, 1976) occurs when conflicting consonant information is presented simultaneously to the visual and auditory modalities. When this is done a third and distinct consonant is perceived. In our studies, an audio /aba/ was dubbed onto a visual /aga/, with the resultant percept of /ada/. Our lab has manipulated timing and spatial variables within the McGurk paradigm.
    • Munhall, K.G. & Vatikiotis-Bateson, E. (2004). Spatial and temporal constraints on audiovisual speech perception. In G. Calvert, J. Spence, B. Stein (eds.) Handbook of Multisensory Processing. Cambridge, MA: MIT Press.
    • Callan, D., Jones, J.A., Munhall, K.G., Kroos, C., Callan, A. & Vatikiotis-Bateson, E. (2004). Multisensory-integration sites identified by perception of spatial wavelet filtered visual speech gesture information. Journal of Cognitive Neuroscience, 16, 805-816.
    • Munhall, K.G. , Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the McGurk Effect. Perception and Psychophysics, 58, 351-362.
    • Jones, J. A. & Munhall, K. G. (1997) The effects of separating auditory and visual sources on audiovisual integration of speech. Canadian Acoustics, 25(4)13-19.
    • Munhall, K.G. & Tohkura, Y. (1998) Audiovisual gating and the time course of speech perception. Journal of the Acoustical Society of America, 104, 530-539.
Speech Motor Control

The goal of our speech motor control work is to identify organizing principles underlying speech coordination. To this end we study the kinematics of lip, tongue, jaw and vocal fold movements and the muscle activity involve in producing these movements.

Recently we have focused on how auditory feedback influences speech motor control. When you speak, the sound of your own voice influences articulation and our studies use custom signal processing techniques to manipulate the feedback in real time.

  • Purcell, D. & Munhall, K.G. (2006) Adaptive control of vowel formant frequency: Evidence from real-time formant manipulation. Journal of the Acoustical Society of America. 120, 966-977.
  • Purcell, D. & Munhall, K.G. (2006) Compensation following real-time manipulation of formants in isolated vowels. Journal of the Acoustical Society of America, 119, 2288-2297.
  • Jones, J.A., & Munhall, K.G. (2005). Remapping auditory-motor representations in voice production. Current Biology, 15, 1768-1772.
  • Jones, J.A. & Munhall, K.G. (2003). Learning to produce speech with an altered vocal tract: the role of auditory feedback. Journal of the Acoustical Society of America. 113, 532-543.
  • Jones, J. A. & Munhall, K. G. (2002). Adaptation of fundamental frequency production under conditions of altered auditory feedback. Journal of Phonetics, 30, 303-320.

fMRI image of profile of human head with coloured area depicting mouth and throat movement during speech