Distantly Supervised Web Relation Extraction for Knowledge Base Population
Isabelle Augenstein and Diana Maynard and Fabio Ciravegna

swj1049.pdf 229.54kB
Type: Paper
Tags:

Bibtex:
@article{,
title= {Distantly Supervised Web Relation Extraction for Knowledge Base Population},
journal= {Semantic Web},
author= {Isabelle Augenstein and Diana Maynard and Fabio Ciravegna},
year= {2015},
url= {http://www.semantic-web-journal.net/content/distantly-supervised-web-relation-extraction-knowledge-base-population-0},
license= {},
abstract= {Extracting information from Web pages for populating large, cross-domain knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise, and integrate information extracted from different Web pages. Recent approaches have used existing knowledge bases to learn to extract information with promising results, one of those approaches being distant supervision. Distant supervision is an unsupervised method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. In this paper we propose the use of distant supervision for relation extraction from the Web. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains and extracting relations across sentence boundaries using unsupervised co- reference resolution methods. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. To combine information extracted from multiple sources for populating knowledge bases we present and evaluate several information integration strategies and show that those benefit immensely from additional relation mentions extracted using co-reference resolution, increasing precision by 8%. We further show that strategically selecting training data can increase precision by a further 3%.},
keywords= {},
terms= {}
}
No stats to report yet.

Send Feedback Start
   0.000006
DB Connect
   0.000431
Lookup hash in DB
   0.000383
Get torrent details
   0.000122
Get torrent details, finished
   0.000197
Get authors
   0.000001
Select authors
   0.000175
Parse bibtex
   0.000062
Write header
   0.000177
get stars
   0.000105
home tab
   0.000138
render right panel
   0.000007
render ads
   0.000341
fetch current hosters
   0.000212
Start get stats
   0.002497
End get stats
   0.000002
related datasets
   0.000891
Done