What might a corpus of parsed spoken data tell us about language?

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

What might a corpus of parsed spoken data tell us about language?

Wallis, S; (2014) What might a corpus of parsed spoken data tell us about language? In: Veselovská, L and Janebová, M, (eds.) Complex Visibles Out There: Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure. (pp. 641-662). Palacký University: Olomouc, Czech Republic. Green open access

Preview

Text
corpus-language.pdf
Download (921kB) | Preview

Abstract

This paper summarises a methodological perspective towards corpus linguistics that is both unifying and critical. It emphasises that the processes involved in annotating corpora and carrying out research with corpora are fundamentally cyclic, i.e. involving both bottom-up and top-down processes. Knowledge is necessarily partial and refutable. This perspective unifies ‘corpus-driven’ and ‘theory-driven’ research as two aspects of a research cycle. We identify three distinct but linked cyclical processes: annotation, abstraction and analysis. These cycles exist at different levels and perform distinct tasks, but are linked together such that the output of one feeds the input of the next. This subdivision of research activity into integrated cycles is particularly important in the case of working with spoken data. The act of transcription is itself an annotation, and decisions to structurally identify distinct sentences are best understood as integral with parsing. Spoken data should be preferred in linguistic research, but current corpora are dominated by large amounts of written text. We point out that this is not a necessary aspect of corpus linguistics and introduce two parsed corpora containing spoken transcriptions. We identify three types of evidence that can be obtained from a corpus: factual, frequency and interaction evidence, representing distinct logical statements about data. Each may exist at any level of the 3A hierarchy. Moreover, enriching the annotation of a corpus allows evidence to be drawn based on those richer annotations. We demonstrate this by discussing the parsing of a corpus of spoken language data and two recent pieces of research that illustrate this perspective.

Type:	Book chapter
Title:	What might a corpus of parsed spoken data tell us about language?
ISBN:	8024443856
ISBN-13:	9788024443850
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://corplingstats.wordpress.com/2014/06/24/cor...
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Corpus linguistics, philosophy of science, epistemology, 3A cycle, parsing, speech.
UCL classification:	UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL SLASH UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of English Lang and Literature
URI:	https://discovery.ucl.ac.uk/id/eprint/1521977

Downloads since deposit

0Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item