Meng, Q;
Hu, X;
Kang, J;
Wu, Y;
(2020)
On the effectiveness of facial expression recognition for evaluation of urban sound perception.
Science of The Total Environment
, 710
, Article 135484. 10.1016/j.scitotenv.2019.135484.
Preview |
Text
White rose On the Effectiveness of Facial Expression Recognition for Evaluation of Urban Sound Perception.pdf - Accepted Version Download (754kB) | Preview |
Abstract
Sound perception studies mostly depend on questionnaires with fixed indicators. Therefore, it is desirable to explore methods with dynamic outputs. The present study aims to explore the effects of sound perception in the urban environment on facial expressions using a software named FaceReader based on facial expression recognition (FER). The experiment involved three typical urban sound recordings, namely, traffic noise, natural sound, and community sound. A questionnaire on the evaluation of sound perception was also used, for comparison. The results show that, first, FER is an effective tool for sound perception research, since it is capable of detecting differences in participants' reactions to different sounds and how their facial expressions change over time in response to those sounds, with mean difference of valence between recordings from 0.019 to 0.059 (p < 0.05or p < 0.01). In a natural sound environment, for example, facial expression increased by 0.04 in the first 15 s and then went down steadily at 0.004 every 20 s. Second, the expression indices, namely, happy, sad, and surprised, change significantly under the effect of sound perception. In the traffic sound environment, for example, happy decreased by 0.012, sad increased by 0.032, and surprised decreased by 0.018. Furthermore, social characteristics such as distance from living place to natural environment (r = 0.313), inclination to communicate (r = 0.253), and preference for crowd (r = 0.296) have effects on facial expression. Finally, the comparison of FER and questionnaire survey results showed that in the traffic noise recording, valence in the first 20 s best represents acoustic comfort and eventfulness; for natural sound, valence in the first 40 s best represents pleasantness; and for community sound, valence in the first 20 s of the recording best represents acoustic comfort, subjective loudness, and calmness.
Type: | Article |
---|---|
Title: | On the effectiveness of facial expression recognition for evaluation of urban sound perception |
Location: | Netherlands |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1016/j.scitotenv.2019.135484 |
Publisher version: | https://doi.org/10.1016/j.scitotenv.2019.135484 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | FaceReader, Facial expression recognition, Sound perception, Urban soundscape |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment > Bartlett School Env, Energy and Resources |
URI: | https://discovery.ucl.ac.uk/id/eprint/10090026 |
Archive Staff Only
View Item |