Milling, M;
Baird, A;
Bartl-Pokorny, KD;
Liu, S;
Alcorn, AM;
Shen, J;
Tavassoli, T;
... Schuller, BW; + view all
(2022)
Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children.
Frontiers in Computer Science
, 4
, Article 837269. 10.3389/fcomp.2022.837269.
Preview |
PDF
Pellicano_Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children_VoR.pdf - Published Version Download (897kB) | Preview |
Abstract
Individuals with autism are known to face challenges with emotion regulation, and express their affective states in a variety of ways. With this in mind, an increasing amount of research on automatic affect recognition from speech and other modalities has recently been presented to assist and provide support, as well as to improve understanding of autistic individuals' behaviours. As well as the emotion expressed from the voice, for autistic children the dynamics of verbal speech can be inconsistent and vary greatly amongst individuals. The current contribution outlines a voice activity detection (VAD) system specifically adapted to autistic children's vocalisations. The presented VAD system is a recurrent neural network (RNN) with long short-term memory (LSTM) cells. It is trained on 130 acoustic Low-Level Descriptors (LLDs) extracted from more than 17 h of audio recordings, which were richly annotated by experts in terms of perceived emotion as well as occurrence and type of vocalisations. The data consist of 25 English-speaking autistic children undertaking a structured, partly robot-assisted emotion-training activity and was collected as part of the DE-ENIGMA project. The VAD system is further utilised as a preprocessing step for a continuous speech emotion recognition (SER) task aiming to minimise the effects of potential confounding information, such as noise, silence, or non-child vocalisation. Its impact on the SER performance is compared to the impact of other VAD systems, including a general VAD system trained from the same data set, an out-of-the-box Web Real-Time Communication (WebRTC) VAD system, as well as the expert annotations. Our experiments show that the child VAD system achieves a lower performance than our general VAD system, trained under identical conditions, as we obtain receiver operating characteristic area under the curve (ROC-AUC) metrics of 0.662 and 0.850, respectively. The SER results show varying performances across valence and arousal depending on the utilised VAD system with a maximum concordance correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107. Although the performance of the SER models is generally low, the child VAD system can lead to slightly improved results compared to other VAD systems and in particular the VAD-less baseline, supporting the hypothesised importance of child VAD systems in the discussed context.
Type: | Article |
---|---|
Title: | Evaluating the Impact of Voice Activity Detection on Speech Emotion Recognition for Autistic Children |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.3389/fcomp.2022.837269 |
Publisher version: | https://doi.org/10.3389/fcomp.2022.837269 |
Language: | English |
Additional information: | © 2022 Milling, Baird, Bartl-Pokorny, Liu, Alcorn, Shen, Tavassoli, Ainger, Pellicano, Pantic, Cummins and Schuller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
UCL classification: | UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Clinical, Edu and Hlth Psychology UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences |
URI: | https://discovery.ucl.ac.uk/id/eprint/10146484 |
Archive Staff Only
View Item |