Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach

Prom-on, S; Birkholz, P; Xu, Y; (2014) Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach. EURASIP Journal on Audio, Speech, and Music Processing , 2014 , Article 23. 10.1186/1687-4722-2014-23. Green open access

[thumbnail of Prom-on_etAl_EURASIP2014.pdf]

Preview

PDF
Prom-on_etAl_EURASIP2014.pdf
Available under License : See the attached licence file.
Download (4MB)

Abstract

This paper investigates the estimation of underlying articulatory targets of Thai vowels as invariant representation of vocal tract shapes by means of analysis-by-synthesis based on acoustic data. The basic idea is to simulate the process of learning speech production as a distal learning task, with acoustic signals of natural utterances in the form of Mel-frequency cepstral coefficients (MFCCs) as input, VocalTractLab - a 3D articulatory synthesizer controlled by target approximation models as the learner, and stochastic gradient descent as the target training method. To test the effectiveness of this approach, a speech corpus was designed to contain contextual variations of Thai vowels by juxtaposing nine Thai long vowels in two-syllable sequences. A speech corpus consisting of 81 disyllabic utterances was recorded from a native Thai speaker. Nine vocal tract shapes, each corresponding to a vowel, were estimated by optimizing the vocal tract shape parameters of each vowel to minimize the sum of square error of MFCCs between original and synthesized speech. The stochastic gradient descent algorithm was used to iteratively optimize the shape parameters. The optimized vocal tract shapes were then used to synthesize Thai vowels both in monosyllables and in disyllabic sequences. The results, both numerically and perceptually, indicate that this model-based analysis strategy allows us to effectively and economically estimate the vocal tract shapes to synthesize accurate Thai vowels as well as smooth formant transitions between adjacent vowels.

Type:	Article
Title:	Identifying underlying articulatory targets of Thai vowels from acoustic data based on an analysis-by-synthesis approach
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1186/1687-4722-2014-23
Publisher version:	http://dx.doi.org/10.1186/1687-4722-2014-23
Language:	English
Additional information:	© 2014 Prom-on et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
Keywords:	Articulatory target; Articulatory synthesis; Target approximation; Acoustic-to-articulatory inversion; Thai vowels
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences
URI:	https://discovery.ucl.ac.uk/id/eprint/1432133

Downloads since deposit

119Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item