UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Integrating speech and visual text in multimodal interfaces

Shmueli, Yael; (2005) Integrating speech and visual text in multimodal interfaces. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of Shmueli.Yael_thesis.pdf]

Download (41MB) | Preview


This work systematically investigates when and how combining speech output and visual text may facilitate processing and comprehension of sentences. It is proposed that a redundant multimodal presentation of speech and text has the potential for improving sentence processing but also for severely disrupting it. The effectiveness of the presentation is assumed to depend on the linguistic complexity of the sentence, the memory demands incurred by the selected multimodal configuration and the characteristics of the user. The thesis employs both theoretical and empirical methods to examine this claim. At the theoretical front, the research makes explicit features of multimodal sentence presentation and of structures and processes involved in multimodal language processing. Two entities are presented: a multimodal design space (MMDS) and a multimodal user model (MMUM). The dimensions of the MMDS include aspects of (i) the sentence (linguistic complexity, c.f., Gibson, 1991), (ii) the presentation (configurations of media), and (iii) user cost (a function of the first two dimensions). The second entity, the MMUM, is a cognitive model of the user. The MMUM attempts to characterise the cognitive structures and processes underlying multimodal language processing, including the supervisory attentional mechanisms that coordinate the processing of language in parallel modalities. The model includes an account of individual differences in verbal working memory (WM) capacity (c.f.. Just and Carpenter, 1992) and can predict the variation in the cognitive cost experienced by the user when presented with different contents in a variety of multimodal configurations. The work attempts to validate through 3 controlled studies with users the central propositions of the MMUM. The experimental findings indicate the validity of some features of the MMUM but also the need for further refinement. Overall, they suggest that a durable text may reduce the processing cost of demanding sentences delivered by speech, whereas adding speech to such sentences when presented visually increases processing cost. Speech can be added to various visual forms of text only if the linguistic complexity of the sentence imposes a low to moderate load on the user. These conclusions are translated to a set of guidelines for effective multimodal presentation of sentences. A final study then examines the validity of some of these guidelines in an applied setting. Results highlight the need for an enhanced experimental control. However, they also demonstrate that the approach used in this research can validate specific assumptions regarding the relationship between cognitive cost, sentence complexity and multimodal configuration aspects and thereby to inform the design process of effective multimodal user interfaces.

Type: Thesis (Doctoral)
Title: Integrating speech and visual text in multimodal interfaces
Identifier: PQ ETD:602613
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/1446688
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item