Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-28571
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMildenberger, Thoralf-
dc.date.accessioned2023-09-01T13:16:30Z-
dc.date.available2023-09-01T13:16:30Z-
dc.date.issued2023-08-
dc.identifier.otherarXiv:2308.13383de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/28571-
dc.description.abstractWe propose a resampling-based approach for assessing keyness in corpus linguistics based on suggestions by Gries (2006, 2022). Traditional approaches based on hypothesis tests (e.g. Likelihood Ratio) model the copora as independent identically distributed samples of tokens. This model does not account for the often observed uneven distribution of occurences of a word across a corpus. When occurences of a word are concentrated in few documents, large values of LLR and similar scores are in fact much more likely than accounted for by the token-by-token sampling model, leading to false positives. We replace the token-by-token sampling model by a model where corpora are samples of documents rather than tokens, which is much closer to the way corpora are actually assembled. We then use a permutation approach to approximate the distribution of a given keyness score under the null hypothesis of equal frequencies and obtain p-values for assessing significance. We do not need any assumption on how the tokens are organized within or across documents, and the approach works with basically *any* keyness score. Hence, appart from obtaining more accurate p-values for scores like LLR, we can also assess significance for e.g. the logratio which has been proposed as a measure of effect size. An efficient implementation of the proposed approach is provided in the `R` package `keyperm` available from github.de_CH
dc.format.extent15de_CH
dc.language.isoende_CH
dc.publisherarXivde_CH
dc.rightsLicence according to publishing contractde_CH
dc.subjectCorpus linguisticsde_CH
dc.subjectApplied statisticsde_CH
dc.subject.ddc400: Sprache und Linguistikde_CH
dc.subject.ddc510: Mathematikde_CH
dc.titleAssessing keyness using permutation testsde_CH
dc.typeWorking Paper – Gutachten – Studiede_CH
dcterms.typeTextde_CH
zhaw.departementSchool of Engineeringde_CH
zhaw.organisationalunitInstitut für Datenanalyse und Prozessdesign (IDP)de_CH
dc.identifier.doi10.48550/arXiv.2308.13383de_CH
dc.identifier.doi10.21256/zhaw-28571-
zhaw.funding.euNode_CH
zhaw.originated.zhawYesde_CH
zhaw.webfeedDatalabde_CH
zhaw.author.additionalNode_CH
zhaw.display.portraitYesde_CH
Appears in collections:Publikationen School of Engineering

Files in This Item:
File Description SizeFormat 
2023_Mildenberger_Assessing-keyness-using-permutation-tests.pdf595.35 kBAdobe PDFThumbnail
View/Open
Show simple item record
Mildenberger, T. (2023). Assessing keyness using permutation tests. arXiv. https://doi.org/10.48550/arXiv.2308.13383
Mildenberger, T. (2023) Assessing keyness using permutation tests. arXiv. Available at: https://doi.org/10.48550/arXiv.2308.13383.
T. Mildenberger, “Assessing keyness using permutation tests,” arXiv, Aug. 2023. doi: 10.48550/arXiv.2308.13383.
MILDENBERGER, Thoralf, 2023. Assessing keyness using permutation tests. arXiv
Mildenberger, Thoralf. 2023. “Assessing Keyness Using Permutation Tests.” arXiv. https://doi.org/10.48550/arXiv.2308.13383.
Mildenberger, Thoralf. Assessing Keyness Using Permutation Tests. arXiv, Aug. 2023, https://doi.org/10.48550/arXiv.2308.13383.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.