UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Explaining unintelligible words by means of their context

Pintér, B; Vörös, G; Szabo, Z; Lőrincz, A; (2013) Explaining unintelligible words by means of their context. In: (Proceedings) International Conference on Pattern Recognition Applications and Methods (ICPRAM). (pp. 382 - 387). Green open access

[thumbnail of pinter13explaining.pdf]
Preview
PDF
pinter13explaining.pdf
Available under License : See the attached licence file.

Download (92kB)

Abstract

Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.

Type: Proceedings paper
Title: Explaining unintelligible words by means of their context
Event: International Conference on Pattern Recognition Applications and Methods (ICPRAM)
Location: Barcelona, Spain
Dates: 2013-02-15 - 2013-02-18
Open access status: An open access version is available from UCL Discovery
Publisher version: http://www.icpram.org/
Language: English
Additional information: Reproduced here with permission of the ICPRAM.
Keywords: link disambiguation, natural language processing, structured sparse coding, unintelligible words, wikification
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit
URI: https://discovery.ucl.ac.uk/id/eprint/1433137
Downloads since deposit
147Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item