Wallis, SA;
(2014)
What might a corpus of parsed spoken data tell us about language?
In: Veselovská, L and Janebová, M, (eds.)
Complex Visibles Out There: Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure.
(pp. 641-662).
Palacky University: Olomouc, Czech Republic.
Text
corpus-language.pdf Access restricted to UCL open access staff Download (921kB) |
Abstract
This paper summarises a methodological perspective towards corpus linguistics that is both unifying and critical. It emphasises that the processes involved in annotating corpora and carrying out research with corpora are fundamentally cyclic, i.e. involving both bottom-up and top-down processes. Knowledge is necessarily partial and refutable. This perspective unifies ‘corpus-driven’ and ‘theory-driven’ research as two aspects of a research cycle. We identify three distinct but linked cyclical processes: annotation, abstraction and analysis. These cycles exist at different levels and perform distinct tasks, but are linked together such that the output of one feeds the input of the next. This subdivision of research activity into integrated cycles is particularly important in the case of working with spoken data. The act of transcription is itself an annotation, and decisions to structurally identify distinct sentences are best understood as integral with parsing. Spoken data should be preferred in linguistic research, but current corpora are dominated by large amounts of written text. We point out that this is not a necessary aspect of corpus linguistics and introduce two parsed corpora containing spoken transcriptions. We identify three types of evidence that can be obtained from a corpus: factual, frequency and interaction evidence, representing distinct logical statements about data. Each may exist at any level of the 3A hierarchy. Moreover, enriching the annotation of a corpus allows evidence to be drawn based on those richer annotations. We demonstrate this by discussing the parsing of a corpus of spoken language data and two recent pieces of research that illustrate this perspective.
Type: | Book chapter |
---|---|
Title: | What might a corpus of parsed spoken data tell us about language? |
ISBN: | 8024443856 |
ISBN-13: | 9788024443850 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Corpus linguistics, philosophy of science, epistemology, 3A cycle, parsing, speech. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL SLASH UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of English Lang and Literature |
URI: | https://discovery.ucl.ac.uk/id/eprint/1521977 |
Archive Staff Only
View Item |