Editor: Gyanendra K. Verma

Multimodal Affective Computing: Affective Information Representation, Modelling, and Analysis

eBook: US $49 Special Offer (PDF + Printed Copy): US $79
Printed Copy: US $55
Library License: US $196
ISBN: 978-981-5124-46-0 (Print)
ISBN: 978-981-5124-45-3 (Online)
Year of Publication: 2023
DOI: 10.2174/97898151244531230101
(1 Comment) | Rate This Book


Affective computing is an emerging field situated at the intersection of artificial intelligence and behavioral science. Affective computing refers to studying and developing systems that recognize, interpret, process, and simulate human emotions. It has recently seen significant advances from exploratory studies to real-world applications.

Multimodal Affective Computing offers readers a concise overview of the state-of-the-art and emerging themes in affective computing, including a comprehensive review of the existing approaches in applied affective computing systems and social signal processing. It covers affective facial expression and recognition, affective body expression and recognition, affective speech processing, affective text, and dialogue processing, recognizing affect using physiological measures, computational models of emotion and theoretical foundations, and affective sound and music processing.

This book identifies future directions for the field and summarizes a set of guidelines for developing next-generation affective computing systems that are effective, safe, and human-centered.The book is an informative resource for academicians, professionals, researchers, and students at engineering and medical institutions working in the areas of applied affective computing, sentiment analysis, and emotion recognition.


Affective Computing is an emerging field with the prime focus on developing intelligent systems that can perceive, interpret, process human emotions and act accordingly. Affective Computing incorporates interdisciplinary research areas like Computer Science, Psychology, Cognitive Science, Machine Learning, etc. Machines must perceive and interpret emotions in real-time and act accordingly for intelligent communication with human beings. Emotion plays a significant role in communication and can be expressed through many ways, like facial or auditory expression, gesture or sign language, etc. Brain activity, heart rate, muscular activity, blood pressure, and skin temperature are a few examples of physiological signals. It plays a crucial role in affect recognition compared to other emotion modalities. Humans perceive emotion primarily through facial expressions; yet, complex emotions such as pride, love, mellowness, and sorrow cannot be identified just by facial expressions. Physiological signals can thus be employed to recognize complex emotions.

The objective of this book is mainly three-fold: (1) Provide in-depth knowledge about affective Computing, affect information representation, models, and theories of emotions. (2) Emotion recognition from different affective modalities, such as audio, facial expression, and physiological signals, and (2) Multimodal fusion framework for emotion recognition in three-dimensional Valence, Arousal, and Dominance space.

Human emotions can be captured from various modalities, such as speech, facial expressions, physiological signals, etc. These modalities provide critical information that may be utilised to infer a user's emotional state. The primary emotions can be captured easily by facial and vocal expressions. However, facial expressions or audio information cannot detect complex emotions. Therefore, an efficient emotion model is required to predict complex emotions. The dimensional model of emotion can effectively model and recognize complex emotions.

Most emotion recognition work is based on facial and vocal expressions. However, the existing literature completely lacks emotion modeling in a continuous space. This book contributes in this direction by proposing an emotion model to predict a large number (more than fifteen) of complex emotions in a three-dimensional continuous space. We have implemented various systems to recognize emotion from speech, facial expression, physiological signals, and multimodal fusion of the above modalities. Our emphasis is on emotion modeling in a continuous space. Emotion prediction from physiological signals as complex emotions is better captured by physiological signals rather than facial or vocal expressions. The main contributions of this book can be summarized as follows:

  1. This book presents a state-of-the-art review of Affective Computing and its application in various areas like gaming, medicine, virtual reality, etc.
  2. A detailed review of multimodal fusion techniques is presented to assimilate multiple modalities to accomplish multimodal fusion tasks. The fusion methods are provided from the perspective of the requirement of multimodal fusions, the level of information fusion, and their applications in various domains, as reported in the literature. Moreover, significant challenges in multimodal fusions are also highlighted. Further, we present the evaluation measures for evaluating multimodal fusion techniques.
  3. The significant contribution of this book is the three-dimensional emotion model based on valence, arousal, and dominance. The emotion prediction in three-dimensional space based on valence, arousal, and dominance is also presented.


Not applicable.


The author declares no conflict of interest, financial or otherwise.

Dr. Gyanendra K. Verma
Department of Information Technology
National Institute of Technology Raipur