Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-30993
Publication type: Conference paper
Type of review: Peer review (publication)
Title: StatBot.Swiss : bilingual open data exploration in natural language
Authors: Nooralahzadeh, Farhad
Zhang, Yi
Smith, Ellery
Maennel, Sabine
Matthey-Doret, Cyril
Raphaël, de Fondville
Stockinger, Kurt
et. al: No
DOI: 10.21256/zhaw-30993
Proceedings: Findings of the Association for Computational Linguistics: ACL 2024
Conference details: 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Bangkok, Thailand, 11-16 August 2024
Issue Date: Aug-2024
Publisher / Ed. Institution: Association for Computational Linguistics
Language: English
Subjects: Natural language processing; Machine learning; Database; Generative AI
Subject (DDC): 005: Computer programming, programs and data
006: Special computer methods
Abstract: The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the \emph{first bilingual benchmark for evaluating Text-to-SQL systems} based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German. We evaluate the performance of state-of-the-art LLMs such as GPT-3.5-Turbo and mixtral-8x7b-instruct for the Text-to-SQL translation task using an in-context learning approach. Our experimental analysis illustrates that current LLMs struggle to generalize well in generating SQL queries on our novel bilingual dataset.
URI: https://digitalcollection.zhaw.ch/handle/11475/30993
Related research data: https://github.com/dscc-admin-ch/statbot.swiss
Fulltext version: Accepted version
License (according to publishing contract): Licence according to publishing contract
Departement: School of Engineering
Organisational Unit: Institute of Computer Science (InIT)
Published as part of the ZHAW project: INODE4StatBot.swiss – Anwendung neuer Algorithmen zur automatischen Übersetzung natürlicher Sprache in die Datenbankabfragesprache SQL (NL-to-SQL)
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2024_Nooralahzadeh-etal_StatBot-Swiss_ACL2024.pdfAccepted Version882.9 kBAdobe PDFThumbnail
View/Open
Show full item record
Nooralahzadeh, F., Zhang, Y., Smith, E., Maennel, S., Matthey-Doret, C., Raphaël, d. F., & Stockinger, K. (2024, August). StatBot.Swiss : bilingual open data exploration in natural language. Findings of the Association for Computational Linguistics: ACL 2024. https://doi.org/10.21256/zhaw-30993
Nooralahzadeh, F. et al. (2024) ‘StatBot.Swiss : bilingual open data exploration in natural language’, in Findings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics. Available at: https://doi.org/10.21256/zhaw-30993.
F. Nooralahzadeh et al., “StatBot.Swiss : bilingual open data exploration in natural language,” in Findings of the Association for Computational Linguistics: ACL 2024, Aug. 2024. doi: 10.21256/zhaw-30993.
NOORALAHZADEH, Farhad, Yi ZHANG, Ellery SMITH, Sabine MAENNEL, Cyril MATTHEY-DORET, de Fondville RAPHAËL und Kurt STOCKINGER, 2024. StatBot.Swiss : bilingual open data exploration in natural language. In: Findings of the Association for Computational Linguistics: ACL 2024. Conference paper. Association for Computational Linguistics. August 2024
Nooralahzadeh, Farhad, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, de Fondville Raphaël, and Kurt Stockinger. 2024. “StatBot.Swiss : Bilingual Open Data Exploration in Natural Language.” Conference paper. In Findings of the Association for Computational Linguistics: ACL 2024. Association for Computational Linguistics. https://doi.org/10.21256/zhaw-30993.
Nooralahzadeh, Farhad, et al. “StatBot.Swiss : Bilingual Open Data Exploration in Natural Language.” Findings of the Association for Computational Linguistics: ACL 2024, Association for Computational Linguistics, 2024, https://doi.org/10.21256/zhaw-30993.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.