UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Distantly Supervised Web Relation Extraction for Knowledge Base Population

Augenstein, I; Maynard, D; Ciravegna, F; (2016) Distantly Supervised Web Relation Extraction for Knowledge Base Population. Semantic Web Journal , 7 (4) pp. 335-349. 10.3233/SW-150180. Green open access

[thumbnail of Augenstein_sw%252F2016%252F7-4%252Fsw-7-4-sw180%252Fsw-7-sw180.pdf]
Preview
Text
Augenstein_sw%252F2016%252F7-4%252Fsw-7-4-sw180%252Fsw-7-sw180.pdf - Published Version

Download (180kB) | Preview

Abstract

Extracting information from Web pages for populating large, cross-domain knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise, and integrate information extracted from different Web pages. Recent approaches have used existing knowledge bases to learn to extract information with promising results, one of those approaches being distant supervision. Distant supervision is an unsupervised method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. In this paper we propose the use of distant supervision for relation extraction from the Web. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains and extracting relations across sentence boundaries using unsupervised co-reference resolution methods. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. To combine information extracted from multiple sources for populating knowledge bases we present and evaluate several information integration strategies and show that those benefit immensely from additional relation mentions extracted using co-reference resolution, increasing precision by 8%. We further show that strategically selecting training data can increase precision by a further 3%.

Type: Article
Title: Distantly Supervised Web Relation Extraction for Knowledge Base Population
Open access status: An open access version is available from UCL Discovery
DOI: 10.3233/SW-150180
Publisher version: http://dx.doi.org/10.3233/SW-150180
Language: English
Additional information: © 2016 IOS Press and the authors. This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License.
Keywords: Knowledge base population, distant supervision, relation extraction, Web-based methods, Linked Open Data, Freebase, unsupervised learning, natural language processing
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/1547797
Downloads since deposit
55Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item