UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Internet delivery of time-synchronised multimedia: the SCOTS project

Anderson, W; Beavan, D; (2005) Internet delivery of time-synchronised multimedia: the SCOTS project. In: Proceedings of the Corpus Linguistics Conference. University of Birmingham: Birmingham, UK. Green open access

[thumbnail of Internet delivery of time-synchronised multimedia the SCOTS project - David Beavan.pdf]
Preview
PDF
Internet delivery of time-synchronised multimedia the SCOTS project - David Beavan.pdf
Available under License : See the attached licence file.

Download (757kB)

Abstract

The Scottish Corpus of Texts and Speech (SCOTS) Project at Glasgow University aims to make available over the Internet a 4 million-word multimedia corpus of texts in the languages of Scotland. Twenty percent of this final total will comprise spoken language, in a combination of audio and video material. Versions of SCOTS have been accessible on the Internet since November 2004, and regular additions are made to the Corpus as texts are processed and functionality is improved. While the Corpus is a valuable resource for research, our target users also include the general public, and this has important implications for the nature of the Corpus and website. This paper will begin with a general introduction to the SCOTS Project, and in particular to the nature of our data. The main part of the paper will then present the approach taken to spoken texts. Transcriptions are made using Praat (Boersma and Weenink, University of Amsterdam), which produces a time-based transcription and allows for multiple speakers though independent tiers. This output is then processed to produce a turn-based transcription with overlap and non-linguistic noises indicated. As this transcription is synchronised with the source audio/video material it allows users direct access to any particular passage of the recording, possibly based upon a word query. This process and the end result will be demonstrated and discussed. We shall end by considering the value which is added to an Internet-delivered Corpus by these means of treating spoken text. The advantages include the possibility of returning search results from both written texts and multimedia documents; the easy location of the relevant section of the audio file; and the production through Praat of a turn-based orthographic transcription, which is accessible to a general as well as an academic user. These techniques can also be extended to other research requirements, such as the mark-up of gesture in video texts.

Type: Proceedings paper
Title: Internet delivery of time-synchronised multimedia: the SCOTS project
Event: Corpus Linguistics 2005
Location: Birmingham, UK
Dates: 2005-07-14 - 2005-07-17
Open access status: An open access version is available from UCL Discovery
Publisher version: http://www.birmingham.ac.uk/research/activity/corp...
Language: English
Additional information: © The Author and University of Birmingham.
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL SLASH
URI: https://discovery.ucl.ac.uk/id/eprint/1404368
Downloads since deposit
89Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item