DIACHRONIC CORPUS ALGORITHMS

Authors

  • Zilola Xusainova
  • Surayyo Yangibayeva

DOI:

https://doi.org/10.47390/SPR1342V5SI11Y2025N34

Keywords:

diachronic corpus, pipeline, sentence segmentation, preprocessing, metadata, NLP.

Abstract

This article describes the creation of a diachronic corpus of Uzbek fiction published between 1991 and 2021, along with its processing algorithms. Within the framework of corpus linguistics, the processes of text collection, preprocessing, sentence segmentation, metadata formation, and verification were scientifically implemented. As a result, a clean and standardized corpus comprising 116 works was obtained. Using the corpus algorithms, it is possible to analyze the temporal changes of linguistic units, perform statistical analysis by genre and demographic characteristics, and build n-gram models. This study serves as a reliable resource for diachronic research on the Uzbek language and practical investigations in the field of NLP.

References

1. Atabayeva N. B. “Mediamatnlar diaxronik korpusida til rivojining empiric tahlil tamoyillari” monografiya. Buxoro. 2024. 67-68.

2. Elov B.B., KHamroeva Sh.M., Xusainova Z.Y. NLP (tabiiy tilga ishlov berish) ning Pipeline konveyeri. Muhammad al-xorazmiy avlodlari ilmiy-amaliy va axborottahliliy jurnal. 2023. 181-182.

3. Elov B. B., Amirkulov M. Uzbeki-English Parallel Corpus Algorithm and Alignment Problem. Central Asian Studies. 2023. 71-76.

4. Xusainova Z.Y., Yangibayeva S.G. “Diaxron korpus yaratish bosqichlari” maqola. Toshkent. 2025. 165-166.

5. Xusainova Z., Yangibayeva S. Mustaqillik davri nashrlariga asoslangan diaxron korpus yaratishning lingvistik ta’minoti. International scientific-practical conference: Contemporary Technologies of Computational Linguistics – CTCL. 2025. 270-273.

6. Xusainova Z.Y., Yangibayeva S.G. “Diaxron korpus arxitekturasi” maqola. Qo‘qon. 2025. 1073-1078.

7. https://uznatcorpara.uz

8. https://ruscorpora.ru/

Downloads

Submitted

2025-12-31

Published

2025-12-31

How to Cite

Xusainova, Z., & Yangibayeva, S. (2025). DIACHRONIC CORPUS ALGORITHMS. Ижтимоий-гуманитар фанларнинг долзарб муаммолари Актуальные проблемы социально-гуманитарных наук Actual Problems of Humanities and Social Sciences., 5(S/11), 223–228. https://doi.org/10.47390/SPR1342V5SI11Y2025N34