UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children

Milling, M; Baird, A; Bartl-Pokorny, KD; Liu, S; Alcorn, AM; Shen, J; Tavassoli, T; ... Schuller, BW; + view all (2022) Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children. Frontiers in Computer Science , 4 , Article 837269. 10.3389/fcomp.2022.837269. Green open access

[thumbnail of Pellicano_Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children_VoR.pdf]
Preview
PDF
Pellicano_Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children_VoR.pdf - Published Version

Download (897kB) | Preview

Abstract

Individuals with autism are known to face challenges with emotion regulation, and express their affective states in a variety of ways. With this in mind, an increasing amount of research on automatic affect recognition from speech and other modalities has recently been presented to assist and provide support, as well as to improve understanding of autistic individuals' behaviours. As well as the emotion expressed from the voice, for autistic children the dynamics of verbal speech can be inconsistent and vary greatly amongst individuals. The current contribution outlines a voice activity detection (VAD) system specifically adapted to autistic children's vocalisations. The presented VAD system is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. It is trained on 130 acoustic Low-Level Descriptors (LLDs) extracted from more than 17 h of audio recordings, which were richly annotated by experts in terms of perceived emotion as well as occurrence and type of vocalisations. The data consist of 25 English-speaking autistic children undertaking a structured, partly robot-assisted emotion-training activity and was collected as part of the DE-ENIGMA project. The VAD system is further utilised as a preprocessing step for a continuous speech emotion recognition (SER) task aiming to minimise the effects of potential confounding information, such as noise, silence, or non-child vocalisation. Its impact on the SER performance is compared to the impact of other VAD systems, including a general VAD system trained from the same data set, an out-of-the-box Web Real-Time Communication (WebRTC) VAD system, as well as the expert annotations. Our experiments show that the child VAD system achieves a lower performance than our general VAD system, trained under identical conditions, as we obtain receiver operating characteristic area under the curve (ROC-AUC) metrics of 0.662 and 0.850, respectively. The SER results show varying performances across valence and arousal depending on the utilised VAD system with a maximum concordance correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107. Although the performance of the SER models is generally low, the child VAD system can lead to slightly improved results compared to other VAD systems and in particular the VAD-less baseline, supporting the hypothesised importance of child VAD systems in the discussed context.

Type: Article
Title: Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children
Open access status: An open access version is available from UCL Discovery
DOI: 10.3389/fcomp.2022.837269
Publisher version: https://doi.org/10.3389/fcomp.2022.837269
Language: English
Additional information: © 2022 Milling, Baird, Bartl-Pokorny, Liu, Alcorn, Shen, Tavassoli, Ainger, Pellicano, Pantic, Cummins and Schuller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
UCL classification: UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Clinical, Edu and Hlth Psychology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences
URI: https://discovery.ucl.ac.uk/id/eprint/10146484
Downloads since deposit
35Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item