Chung, Hyunsong;
(2002)
Analysis of the timing of spoken Korean with application to speech synthesis.
Doctoral thesis (Ph.D), UCL (University College London).
Text
out.pdf Download (6MB) |
Abstract
The thesis describes new analysis and modelling of Korean segmental duration. It takes into account contemporary approaches to duration modelling, as used in English and Japanese synthesis to build predictive models of segment duration in context which could be used in Korean language text-to-speech (TTS) systems. It also analyses those models to learn more about which factors and which structures are most important in Korean prosody. The thesis concentrates on the duration modelling of a news-reading speech style; using a corpus of 670 read sentences collected from one speaker of standard Korean. The duration of each segment and its phonological context were extracted from the corpus. Statistical modelling explored the relationship between the context features and the realised duration. Based on previous research on timing, Sums-of-Products models and Classification And Regression Tree (CART) models were applied and evaluated on the data. Objective quality of the modelling was evaluated by root mean squared prediction error (RMSE) and the correlation coefficient between actual and predicted durations in reserved test data. The best performance result was obtained from a CART model with an RMSE of 25.11 ms and a correlation of 0.77; a result which was comparable with other published results on Korean segment durations. Analysis showed that prosodic phrase features have the greatest influence on segment duration, among them, the accentual phrase final position feature. In terms of segmental context, surrounding nasals were shown to have consistent shortening effect, while vowels seemed to be affected by the degree of glottal opening of adjacent consonants. Other segmental effects were less consistent. Perceptual tests show a slight listener preference for durations calculated from a CART model in this thesis compared to durations calculated from a commercial Korean TTS system.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Analysis of the timing of spoken Korean with application to speech synthesis |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
Keywords: | Language, literature and linguistics; Korean; Phonetics; Speech |
URI: | https://discovery.ucl.ac.uk/id/eprint/10101452 |
Archive Staff Only
View Item |