Please use this identifier to cite or link to this item:
https://doi.org/10.21256/zhaw-30173
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Yi | - |
dc.contributor.author | Deriu, Jan Milan | - |
dc.contributor.author | Katsogiannis-Meimarakis, George | - |
dc.contributor.author | Kosten, Catherine | - |
dc.contributor.author | Koutrika, Georgia | - |
dc.contributor.author | Stockinger, Kurt | - |
dc.date.accessioned | 2024-03-09T19:25:17Z | - |
dc.date.available | 2024-03-09T19:25:17Z | - |
dc.date.issued | 2024-03 | - |
dc.identifier.issn | 2150-8097 | de_CH |
dc.identifier.uri | https://digitalcollection.zhaw.ch/handle/11475/30173 | - |
dc.description.abstract | Natural Language to SQL systems (NL-to-SQL) have recently shown improved accuracy (exceeding 80%) for natural language to SQL query translation due to the emergence of transformer-based language models, and the popularity of the Spider benchmark. However, Spider mainly contains simple databases with few tables, columns, and entries, which do not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL/SQL-pairs leading to poor performance of existing NL-to-SQL systems. In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To garner more data, we extended the small amount of human-generated data with synthetic data generated using GPT-3. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark. Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts. | de_CH |
dc.language.iso | en | de_CH |
dc.publisher | Association for Computing Machinery | de_CH |
dc.relation.ispartof | Proceedings of the VLDB Endowment | de_CH |
dc.rights | https://creativecommons.org/licenses/by-nc-nd/4.0/ | de_CH |
dc.subject | Database system | de_CH |
dc.subject | Latural language processing | de_CH |
dc.subject | Machine learning | de_CH |
dc.subject | Large language model | de_CH |
dc.subject.ddc | 005: Computerprogrammierung, Programme und Daten | de_CH |
dc.subject.ddc | 006: Spezielle Computerverfahren | de_CH |
dc.title | ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems | de_CH |
dc.type | Beitrag in wissenschaftlicher Zeitschrift | de_CH |
dcterms.type | Text | de_CH |
zhaw.departement | School of Engineering | de_CH |
zhaw.organisationalunit | Centre for Artificial Intelligence (CAI) | de_CH |
zhaw.organisationalunit | Institut für Informatik (InIT) | de_CH |
dc.identifier.doi | 10.14778/3636218.3636225 | de_CH |
dc.identifier.doi | 10.21256/zhaw-30173 | - |
zhaw.funding.eu | info:eu-repo/grantAgreement/EC/H2020/863410//INODE - Intelligent Open Data Exploration/INODE | de_CH |
zhaw.issue | 4 | de_CH |
zhaw.originated.zhaw | Yes | de_CH |
zhaw.pages.end | 698 | de_CH |
zhaw.pages.start | 685 | de_CH |
zhaw.publication.status | publishedVersion | de_CH |
zhaw.volume | 17 | de_CH |
zhaw.publication.review | Peer review (Publikation) | de_CH |
zhaw.webfeed | Datalab | de_CH |
zhaw.webfeed | Intelligent Information Systems | de_CH |
zhaw.webfeed | Natural Language Processing | de_CH |
zhaw.funding.zhaw | INODE – Intelligent Open Data Exploration (EU Horizon 2020) | de_CH |
zhaw.author.additional | No | de_CH |
zhaw.display.portrait | Yes | de_CH |
Appears in collections: | Publikationen School of Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2024_Zhang-etal_ScienceBenchmark-PVLDB2024.pdf | 608.6 kB | Adobe PDF | ![]() View/Open |
Show simple item record
Zhang, Y., Deriu, J. M., Katsogiannis-Meimarakis, G., Kosten, C., Koutrika, G., & Stockinger, K. (2024). ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems. Proceedings of the VLDB Endowment, 17(4), 685–698. https://doi.org/10.14778/3636218.3636225
Zhang, Y. et al. (2024) ‘ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems’, Proceedings of the VLDB Endowment, 17(4), pp. 685–698. Available at: https://doi.org/10.14778/3636218.3636225.
Y. Zhang, J. M. Deriu, G. Katsogiannis-Meimarakis, C. Kosten, G. Koutrika, and K. Stockinger, “ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems,” Proceedings of the VLDB Endowment, vol. 17, no. 4, pp. 685–698, Mar. 2024, doi: 10.14778/3636218.3636225.
ZHANG, Yi, Jan Milan DERIU, George KATSOGIANNIS-MEIMARAKIS, Catherine KOSTEN, Georgia KOUTRIKA und Kurt STOCKINGER, 2024. ScienceBenchmark : a complex real-world benchmark for evaluating natural language to SQL systems. Proceedings of the VLDB Endowment. März 2024. Bd. 17, Nr. 4, S. 685–698. DOI 10.14778/3636218.3636225
Zhang, Yi, Jan Milan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten, Georgia Koutrika, and Kurt Stockinger. 2024. “ScienceBenchmark : A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems.” Proceedings of the VLDB Endowment 17 (4): 685–98. https://doi.org/10.14778/3636218.3636225.
Zhang, Yi, et al. “ScienceBenchmark : A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems.” Proceedings of the VLDB Endowment, vol. 17, no. 4, Mar. 2024, pp. 685–98, https://doi.org/10.14778/3636218.3636225.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.