How To Track Emotions In A Video Conference

Measuring aggregate trends in customer satisfaction over time, not surveilling individual interactions. This is how Uniphore, Cogito, and most contact-center platforms position their offering today. For an implementation angle, see our post on video emotion analysis for customer service. First, the video and audio streams are tapped before encoding using WebRTC Insertable Streams (Chrome 94+, Safari still partial). Second, video frames are fed to an on-device vision model (MediaPipe Face Mesh is the common default) that extracts facial landmarks at 20–30 FPS. Third, audio is run through a prosody analyzer for pitch, energy, and rhythm, optionally alongside a transcription model (Whisper-small locally, or Deepgram / OpenAI Whisper cloud).

Cultivating Emotional Balance

emotion expression in video calls

To navigate these differences, it’s vital to understand your counterpart’s cultural background. If you’re unsure, you can start with moderate eye contact and adjust based on their response. When you engage in video calls with people from different cultures, misunderstandings can easily arise from differing interpretations of gestures and eye contact. Being aware of these variations helps you navigate conversations more effectively. You may find yourself adjusting your expectations and responses based on the cultural context of your conversation partner.

Cite Article

When video AI can read facial expressions and emotional cues in real time, it brings a new level of connection to the table, making digital conversations feel much more personal and responsive. Thus, when interpreting our findings, these processes have to be considered as additional factors potentially influencing the participants’ interactions. We conducted preliminary analyses regarding the distribution of the self-reported and facially expressed emotion data.

We further explored temporal co-occurrences of facially expressed joy through cross-recurrence quantification analysis.
The autonomic nervous system controls our involuntary bodily responses and regulates our fight-or-flight response.
Therefore, during virtual communication, it’s recommended to speak somewhat slower and more expressively than during face-to-face meetings.
First, the video and audio streams are tapped before encoding using WebRTC Insertable Streams (Chrome 94+, Safari still partial).

These technologies have made it possible for video AI to interpret even the subtlest emotional signals—like a raised eyebrow or a fleeting smile. Zoom IQ / Revenue Accelerator provides transcript-based sentiment scoring and some meeting-level engagement analytics. It does not expose frame-level emotion data to third parties and does not sell overview of OrchidRomance user experience raw emotion inference. For many small teams it’s enough; for regulated industries or custom use cases, the sub-processor relationship and limited control make in-house development preferable.

To this end, a collaborative effort of different laboratories and researchers could help to sufficiently increase sample size and statistical power to investigate facial expressions of emotions and their role in social interaction and interpersonal functioning on a larger scale. To answer our second research question, we went beyond aggregated percentage scores and repeated-measures analyses and, instead, made use of the dyadic and highly dynamic moment-to-moment nature of the assessed facial expression data. To this end, we applied non-linear time series analyses (i.e., CRQA) to the dyadic time series data in combination with pairwise comparisons with so-called surrogate time series comprising the same data points, but in a randomly shuffled order. Regarding facially expressed anger and joy, the real cross-recurrence rates within the pre-defined time window of ±5 s were significantly larger than the surrogate cross-recurrence rates at each time lag in the Anger condition and in the Joy condition, respectively. This finding implies that, when listening and responding to a person talking about a recent personally relevant event that made the speaking person particularly joyful or angry, facial expressions corresponding with those of the speaker appeared to be imitated in systematic temporal alignment. In contrast, the real cross-recurrence rates of facially expressed sadness were significantly lower than the surrogate cross-recurrence rates at each time lag in the Sadness condition.

On the one hand, online video conference applications seem to be capable of transmitting emotional signals in social interaction. This represents a practically useful finding for any interactional context in which emotions are important; be they private (e.g., family interaction or romantic relationships), or professional. For example, in an educational context, when having online classes, teachers are likely to transmit subjectively experienced joy to their students and vice versa, similar to face-to-face settings (Frenzel et al., 2018). This dynamic and reciprocal process stresses the importance of authentically expressing positive emotions in class, even when teaching in an online video conference setting (Keller et al., 2018; Taxer and Frenzel, 2018; Schwab et al., 2022).

Recent research highlights the potential of video conferencing technologies in fostering connections between generations. According to a study published in SpringerLink, personalized notifications and user-friendly design can significantly enhance the accessibility of video conferencing tools for elderly users, promoting meaningful intergenerational interactions (source). This reinforces the importance of Emotion AI in adapting virtual communication to diverse user needs, making tools like MorphCast’s Emotion AI Video Conference even more relevant for inclusive engagement. Emotional psychologist Paul Ekman identified six basic emotions that could be interpreted through facial expressions.

Those who have emotional intelligence open themselves to positive and negative emotional experiences, identify the emotions and communicate those emotions appropriately. Emotionally intelligent people can use their understanding of their emotions and the emotions of others to move toward personal and social growth. Those with low emotional intelligence may unable to understand and control their emotions or those of others. This could leave others feeling badly when they don’t understand their emotions, feelings, or expressions. More recently, a new study from the Institute of Neuroscience and Psychology at the University of Glasgow in 2014 found that instead of six, there may only be four easily recognizable basic emotions.

Public-space surveillance with emotion inference (data-protection authority scrutiny). Political-campaign emotion targeting (AI Act transparency requirements plus election-law exposure). Washington MHMDA (My Health My Data Act, in force March 31, 2026) creates a private right of action for biometric data collected in health or wellness contexts. Collecting “facial geometry” without informed written consent is an individual right of action.

Kraus found that, once you remove other inputs (like facial expressions), your attention naturally sharpens and hones in on vocal cues. But a new study by Michael Kraus of the Yale University School of Management has found that our sense of hearing may be even stronger than our sight when it comes to accurately detecting emotion. Kraus found that we are more accurate when we hear someone’s voice than when we look only at their facial expressions, or see their face and hear their voice.

Moreover, the FACET classifier in particular has been repeatedly reported to outperform other facial expression analysis algorithms (Stöckli et al., 2018; Dupré et al., 2020) and to achieve similar performance rates as other measures of facial expressions/muscle movement (Beringer et al., 2019). Hence, we conclude that facial expressions—independent of the underlying emotional state—represent highly relevant nonverbal cues and a channel for the interpersonal communication of subjective emotional experiences that can be assessed using automated facial expression analysis. When comparing the participants’ self-reported emotions across the three emotion conditions, our analyses provided clear evidence that the listening/responding individuals reported greater levels of subjectively experienced anger, joy, and sadness in the respective emotion conditions. More specifically, they reported greater levels of subjectively experienced anger when listening/responding to their interaction partner’s report of a recent experience that made them particularly angry (Anger condition) compared to a joyful (Joy condition) or sad experience (Sadness condition).

Technical Limitations And Model Robustness

This is a use case that survives AI Act scrutiny and lands squarely in the legitimate-interest zone under GDPR. We’ll break your build into consent UX, ML pipeline, analytics, and compliance — with senior-dev hours per item — so your board sees real numbers, not hand-waving. Employee monitoring during work hours (EU ban, US labor-law friction, reputational risk). Student attention tracking in classrooms (EU ban, educational ethics). Emotion-aware hiring interviews (EU ban, EEOC discrimination risk).

How To Track Emotions In A Video Conference

Cultivating Emotional Balance

Cite Article

Technical Limitations And Model Robustness

Similar Posts

Thunderstruck online streaming: where you should watch on the internet?

And you can, obviously, only use Uk-registered local casino websites to ensure they are safe and fair