LILLIE : information extraction and database integration using linguistics and learning-based algorithms

Smith, Ellery; Papadopoulos, Dimitris; Braschler, Martin; Stockinger, Kurt

doi:10.1016/j.is.2021.101938

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-23604

Full metadata record

DC Field	Value	Language
dc.contributor.author	Smith, Ellery	-
dc.contributor.author	Papadopoulos, Dimitris	-
dc.contributor.author	Braschler, Martin	-
dc.contributor.author	Stockinger, Kurt	-
dc.date.accessioned	2021-11-29T14:12:37Z	-
dc.date.available	2021-11-29T14:12:37Z	-
dc.date.issued	2021	-
dc.identifier.issn	0306-4379	de_CH
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/23604	-
dc.description.abstract	Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with ``clean'', structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of ``triples'' needs to be both 1) of high quality and 2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground. The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction.	de_CH
dc.language.iso	en	de_CH
dc.publisher	Elsevier	de_CH
dc.relation.ispartof	Information Systems	de_CH
dc.rights	http://creativecommons.org/licenses/by/4.0/	de_CH
dc.subject	Information extraction	de_CH
dc.subject	Data integration	de_CH
dc.subject	Machine learning for database systems	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.title	LILLIE : information extraction and database integration using linguistics and learning-based algorithms	de_CH
dc.type	Beitrag in wissenschaftlicher Zeitschrift	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.1016/j.is.2021.101938	de_CH
dc.identifier.doi	10.21256/zhaw-23604	-
zhaw.funding.eu	info:eu-repo/grantAgreement/EC/H2020/863410//INODE - Intelligent Open Data Exploration/INODE	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.publication.status	publishedVersion	de_CH
zhaw.volume	105	de_CH
zhaw.publication.review	Peer review (Publikation)	de_CH
zhaw.webfeed	Datalab	de_CH
zhaw.webfeed	Information Engineering	de_CH
zhaw.webfeed	ZHAW digital	de_CH
zhaw.funding.zhaw	INODE – Intelligent Open Data Exploration (EU Horizon 2020)	de_CH
zhaw.author.additional	No	de_CH
zhaw.display.portrait	Yes	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2021_Smith_LILLIE_InformationSystems.pdf		2.36 MB	Adobe PDF	View/Open

Show simple item record

Smith, E., Papadopoulos, D., Braschler, M., & Stockinger, K. (2021). LILLIE : information extraction and database integration using linguistics and learning-based algorithms. Information Systems, 105. https://doi.org/10.1016/j.is.2021.101938

Smith, E. et al. (2021) ‘LILLIE : information extraction and database integration using linguistics and learning-based algorithms’, Information Systems, 105. Available at: https://doi.org/10.1016/j.is.2021.101938.

E. Smith, D. Papadopoulos, M. Braschler, and K. Stockinger, “LILLIE : information extraction and database integration using linguistics and learning-based algorithms,” Information Systems, vol. 105, 2021, doi: 10.1016/j.is.2021.101938.

SMITH, Ellery, Dimitris PAPADOPOULOS, Martin BRASCHLER und Kurt STOCKINGER, 2021. LILLIE : information extraction and database integration using linguistics and learning-based algorithms. Information Systems. 2021. Bd. 105. DOI 10.1016/j.is.2021.101938

Smith, Ellery, Dimitris Papadopoulos, Martin Braschler, and Kurt Stockinger. 2021. “LILLIE : Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms.” Information Systems 105. https://doi.org/10.1016/j.is.2021.101938.

Smith, Ellery, et al. “LILLIE : Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms.” Information Systems, vol. 105, 2021, https://doi.org/10.1016/j.is.2021.101938.