← Back to home

Multimodal Emotion Recognition

Signal Processing & Computer Vision

This project involved building a multimodal multi-class classification system designed to identify human emotions by synthesizing disparate data sources. The model processes a combination of Video, Audio, and PPG (Photoplethysmography) signals.

Personal Contribution

My primary focus was the implementation of remote-PPG (rPPG). This technique allowed for the non-contact extraction of pulse signals directly from video streams by detecting subtle changes in skin color.

These extracted physiological signals were then integrated with visual and auditory features to train the final emotion recognition architecture.

Input Modalities

  • 01 Video: Spatial and temporal facial feature extraction.
  • 02 Audio: Spectral analysis of vocal patterns.
  • 03 Remote-PPG: Contactless pulse extraction for physiological monitoring.