UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Learnability of English diphthongs: One dynamic target vs. two static targets

Xu, Anqi; van Niekerk, Daniel R; Gerazov, Branislav; Krug, Paul Konstantin; Prom-on, Santitham; Birkholz, Peter; Xu, Yi; (2025) Learnability of English diphthongs: One dynamic target vs. two static targets. Speech Communication , 170 , Article 103225. 10.1016/j.specom.2025.103225.

[thumbnail of A_Xu_etAl_SpCm_2025accepted.pdf] Text
A_Xu_etAl_SpCm_2025accepted.pdf - Accepted Version
Access restricted to UCL open access staff until 14 September 2026.

Download (7MB)

Abstract

As vowels with intrinsic movements, diphthongs are among the most elusive sounds of speech. Previous research has characterized diphthongs as a combination of two vowels, a vowel followed by a formant transition, or a constant rate of formant change. These accounts are based on acoustic patterns, perceptual cues, and either acoustic or articulatory synthesis, but no consensus has been reached. In this study, we explore the nature of diphthongs by exploring how they can be acquired through vocal learning. The acquisition is simulated by a three-dimensional (3D) vocal tract model with built-in target approximation dynamics, which can learn articulatory targets of phonetic categories under the guidance of a speech recognizer. The simulation attempts to learn to articulate diphthong-embedded monosyllabic English words with either a single dynamic target or two static targets, and the learned synthetic words were presented to native listeners for identification. The results showed that diphthongs learned with dynamic targets were consistently more intelligible across variable durations than those learned with two static targets, with only the exception of /aɪ/. From the perspective of learnability, therefore, English diphthongs are likely unitary vowels with dynamic targets.

Type: Article
Title: Learnability of English diphthongs: One dynamic target vs. two static targets
DOI: 10.1016/j.specom.2025.103225
Publisher version: https://doi.org/10.1016/j.specom.2025.103225
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Diphthongs, Computational simulation, 3D vocal tract model, Vocal learning, American English
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences
URI: https://discovery.ucl.ac.uk/id/eprint/10206192
Downloads since deposit
Loading...
1Download
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item