Prom-on, S;
Birkholz, P;
Xu, Y;
(2014)
Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach.
EURASIP Journal on Audio, Speech, and Music Processing
, 2014
, Article 23. 10.1186/1687-4722-2014-23.
Preview |
PDF
Prom-on_etAl_EURASIP2014.pdf Available under License : See the attached licence file. Download (4MB) |
Abstract
This paper investigates the estimation of underlying articulatory targets of Thai vowels as invariant representation of vocal tract shapes by means of analysis-by-synthesis based on acoustic data. The basic idea is to simulate the process of learning speech production as a distal learning task, with acoustic signals of natural utterances in the form of Mel-frequency cepstral coefficients (MFCCs) as input, VocalTractLab - a 3D articulatory synthesizer controlled by target approximation models as the learner, and stochastic gradient descent as the target training method. To test the effectiveness of this approach, a speech corpus was designed to contain contextual variations of Thai vowels by juxtaposing nine Thai long vowels in two-syllable sequences. A speech corpus consisting of 81 disyllabic utterances was recorded from a native Thai speaker. Nine vocal tract shapes, each corresponding to a vowel, were estimated by optimizing the vocal tract shape parameters of each vowel to minimize the sum of square error of MFCCs between original and synthesized speech. The stochastic gradient descent algorithm was used to iteratively optimize the shape parameters. The optimized vocal tract shapes were then used to synthesize Thai vowels both in monosyllables and in disyllabic sequences. The results, both numerically and perceptually, indicate that this model-based analysis strategy allows us to effectively and economically estimate the vocal tract shapes to synthesize accurate Thai vowels as well as smooth formant transitions between adjacent vowels.
Type: | Article |
---|---|
Title: | Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1186/1687-4722-2014-23 |
Publisher version: | http://dx.doi.org/10.1186/1687-4722-2014-23 |
Language: | English |
Additional information: | © 2014 Prom-on et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
Keywords: | Articulatory target; Articulatory synthesis; Target approximation; Acoustic-to-articulatory inversion; Thai vowels |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences |
URI: | https://discovery.ucl.ac.uk/id/eprint/1432133 |
Archive Staff Only
View Item |