Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-30279
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTuggener, Lukas-
dc.contributor.authorSager, Pascal-
dc.contributor.authorTaoudi-Benchekroun, Yassine-
dc.contributor.authorGrewe, Benjamin F.-
dc.contributor.authorStadelmann, Thilo-
dc.date.accessioned2024-03-16T09:51:58Z-
dc.date.available2024-03-16T09:51:58Z-
dc.date.issued2024-05-31-
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/30279-
dc.description.abstractAt least since the introduction of ChatGPT, the abilities of generative large language models (LLMs), sometimes called GPTs, are at the center of the attention of AI researchers, entrepreneurs, and others. However, for many applications, it is not possible to call an existing LLM service via an API due to data protection concerns or when no task-appropriate LLM exists. On the other hand, deploying or training a private LLM is often prohibitively computationally expensive. In this paper, we give an overview of the most important recent methodologies that help reduce the computational footprint of LLMs. We further present extensive benchmarks for seven methods from two of the most important areas of recent progress: model quantization and low-rank adapters, showcasing how it is possible to leverage state-of-the-art LLMs with limited resources. Our benchmarks include resource consumption metrics (e.g. GPU memory usage), a state-of-the-art quantitative performance evaluation as well as a qualitative performance study conducted by eight individual human raters. Our evaluations show that quantization has a profound effect on GPU memory requirements. However, we also show that these quantization methods, contrary to how they are advertised, cause a noticeable loss in text quality. We further show that low-rank adapters allow effective model fine-tuning with moderate compute resources. For methods that require less than 16 GB of GPU memory, we provide easy-to-use Jupyter notebooks that allow anyone to deploy and fine-tune state-of-theart LLMs on the Google Colab free tier within minutes without any prior experience or infrastructure.de_CH
dc.language.isoende_CH
dc.publisherZHAW Zürcher Hochschule für Angewandte Wissenschaftende_CH
dc.rightsLicence according to publishing contractde_CH
dc.subjectLarge language modelde_CH
dc.subjectLlamaV2de_CH
dc.subjectFine-tuningde_CH
dc.subjectLLM quantizationde_CH
dc.subjectLLM deploymentde_CH
dc.subject.ddc006: Spezielle Computerverfahrende_CH
dc.titleSo you want your private LLM at home? : a survey and benchmark of methods for efficient GPTsde_CH
dc.typeKonferenz: Paperde_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitCentre for Artificial Intelligence (CAI)de_CH
dc.identifier.doi10.21256/zhaw-30279-
zhaw.conference.details11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024de_CH
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.publication.statusacceptedVersionde_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.webfeedDatalabde_CH
zhaw.webfeedDIZH Fellowshipde_CH
zhaw.webfeedMachine Perception and Cognitionde_CH
zhaw.webfeedZHAW digitalde_CH
zhaw.funding.zhawPractical data efficient deep learning trough contrastive self-supervised learningde_CH
zhaw.author.additionalNode_CH
zhaw.display.portraitYesde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2024_Tuggener-etal_Survey-and-benchmark-of-methods-for-efficient-GPTs_SDS.pdf1.99 MBAdobe PDFThumbnail
View/Open
Show simple item record
Tuggener, L., Sager, P., Taoudi-Benchekroun, Y., Grewe, B. F., & Stadelmann, T. (2024, May 31). So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs. 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. https://doi.org/10.21256/zhaw-30279
Tuggener, L. et al. (2024) ‘So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs’, in 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. Available at: https://doi.org/10.21256/zhaw-30279.
L. Tuggener, P. Sager, Y. Taoudi-Benchekroun, B. F. Grewe, and T. Stadelmann, “So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs,” in 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024, May 2024. doi: 10.21256/zhaw-30279.
TUGGENER, Lukas, Pascal SAGER, Yassine TAOUDI-BENCHEKROUN, Benjamin F. GREWE und Thilo STADELMANN, 2024. So you want your private LLM at home? : a survey and benchmark of methods for efficient GPTs. In: 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. Conference paper. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. 31 Mai 2024
Tuggener, Lukas, Pascal Sager, Yassine Taoudi-Benchekroun, Benjamin F. Grewe, and Thilo Stadelmann. 2024. “So You Want Your Private LLM at Home? : A Survey and Benchmark of Methods for Efficient GPTs.” Conference paper. In 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024. ZHAW Zürcher Hochschule für Angewandte Wissenschaften. https://doi.org/10.21256/zhaw-30279.
Tuggener, Lukas, et al. “So You Want Your Private LLM at Home? : A Survey and Benchmark of Methods for Efficient GPTs.” 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30-31 May 2024, ZHAW Zürcher Hochschule für Angewandte Wissenschaften, 2024, https://doi.org/10.21256/zhaw-30279.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.