LILLIE : information extraction and database integration using linguistics and learning-based algorithms

Smith, Ellery; Papadopoulos, Dimitris; Braschler, Martin; Stockinger, Kurt

doi:10.1016/j.is.2021.101938

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-23604

Publication type:	Article in scientific journal
Type of review:	Peer review (publication)
Title:	LILLIE : information extraction and database integration using linguistics and learning-based algorithms
Authors:	Smith, Ellery Papadopoulos, Dimitris Braschler, Martin Stockinger, Kurt
et. al:	No
DOI:	10.1016/j.is.2021.101938 10.21256/zhaw-23604
Published in:	Information Systems
Volume(Issue):	105
Issue Date:	2021
Publisher / Ed. Institution:	Elsevier
ISSN:	0306-4379
Language:	English
Subjects:	Information extraction; Data integration; Machine learning for database systems
Subject (DDC):	006: Special computer methods
Abstract:	Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with ``clean'', structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of ``triples'' needs to be both 1) of high quality and 2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground. The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction.
URI:	https://digitalcollection.zhaw.ch/handle/11475/23604
Fulltext version:	Published version
License (according to publishing contract):	CC BY 4.0: Attribution 4.0 International
Departement:	School of Engineering
Organisational Unit:	Institute of Computer Science (InIT)
Published as part of the ZHAW project:	INODE – Intelligent Open Data Exploration (EU Horizon 2020)
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2021_Smith_LILLIE_InformationSystems.pdf		2.36 MB	Adobe PDF	View/Open

Show full item record

Smith, E., Papadopoulos, D., Braschler, M., & Stockinger, K. (2021). LILLIE : information extraction and database integration using linguistics and learning-based algorithms. Information Systems, 105. https://doi.org/10.1016/j.is.2021.101938

Smith, E. et al. (2021) ‘LILLIE : information extraction and database integration using linguistics and learning-based algorithms’, Information Systems, 105. Available at: https://doi.org/10.1016/j.is.2021.101938.

E. Smith, D. Papadopoulos, M. Braschler, and K. Stockinger, “LILLIE : information extraction and database integration using linguistics and learning-based algorithms,” Information Systems, vol. 105, 2021, doi: 10.1016/j.is.2021.101938.

SMITH, Ellery, Dimitris PAPADOPOULOS, Martin BRASCHLER und Kurt STOCKINGER, 2021. LILLIE : information extraction and database integration using linguistics and learning-based algorithms. Information Systems. 2021. Bd. 105. DOI 10.1016/j.is.2021.101938

Smith, Ellery, Dimitris Papadopoulos, Martin Braschler, and Kurt Stockinger. 2021. “LILLIE : Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms.” Information Systems 105. https://doi.org/10.1016/j.is.2021.101938.

Smith, Ellery, et al. “LILLIE : Information Extraction and Database Integration Using Linguistics and Learning-Based Algorithms.” Information Systems, vol. 105, 2021, https://doi.org/10.1016/j.is.2021.101938.