A Machine Learning Approach of Text Classification forHigh- and Low-Resource Languages

Raza, Muhammad Owais; Mahoto, Naeem Ahmed; Shaikh, Asadullah; Pathan, Nazia; Alshahrani, Hani Mohammed; Elmagzoub, Mohamed A.

doi:10.1111/coin.70114

A Machine Learning Approach of Text Classification forHigh- and Low-Resource Languages

dc.authorscopusid	57215599346
dc.authorscopusid	36628922900
dc.authorscopusid	35085432000
dc.authorscopusid	59060653400
dc.authorscopusid	57202577300
dc.authorscopusid	56028380900
dc.authorwosid	FWU-2100-2022
dc.authorwosid	FOF-9383-2022
dc.authorwosid	S-4815-2016
dc.authorwosid	OEB-8153-2025
dc.authorwosid	GQQ-1607-2022
dc.authorwosid	LLM-4686-2024
dc.contributor.author	Raza, Muhammad Owais
dc.contributor.author	Mahoto, Naeem Ahmed
dc.contributor.author	Shaikh, Asadullah
dc.contributor.author	Pathan, Nazia
dc.contributor.author	Alshahrani, Hani Mohammed
dc.contributor.author	Elmagzoub, Mohamed A.
dc.contributor.department-temp
dc.date.accessioned	2026-05-05T13:48:56Z
dc.date.issued	2025
dc.department	Mühendislik ve Doğa Bilimleri Fakültesi
dc.description.abstract	A large amount of data have been published online in textual format for the last decade because of the advancement of informationand communication technologies. This is an open challenge to organize and classify large amounts of textual data automatically,especially for a language that has limited resources available online. In this study, two types of approaches are adopted for exper-iments. First one is a traditional strategy that uses six (06) classical state-of-the-art classification models (1. decision tree (DT),2. logistic regression (LR), 3. support vector machine (SVM), 4. k-nearest neighbour (k-NN), 5. Naive Bayes (NB), and 6. randomforest (RF)) along with two (02) ensemble methods (1. Adaboost and 2. gradient boosting (GB)) and second modeling technique isour proposed voting based ensembling scheme. Models are trained on a 75-25 split where 75% of data is used for training and 25%for testing. The evaluation of the classification models is carried out based on accuracy, precision, recall, and F1-score indexes.The experimental outcomes witnessed that for the traditional approach, gradient boosting outperformed for the limited resourcelanguage with 98.08% F1-score, while SVM performed better (97.34% F1-score) for the resource-rich language.
dc.identifier.citation	Raza, M. O., Mahoto, N. A., Shaikh, A., Pathan, N., Alshahrani, H., & Elmagzoub, M. A.. (2025). A Machine Learning Approach of Text Classification for High‐ and Low‐Resource Languages. Computational Intelligence, 41(4). https://doi.org/10.1111/coin.70114
dc.identifier.doi	10.1111/coin.70114
dc.identifier.endpage	17
dc.identifier.issn	0824-7935
dc.identifier.issn	1467-8640
dc.identifier.issue	4
dc.identifier.scopus	2-s2.0-105013557512
dc.identifier.startpage	1
dc.identifier.uri	https://doi.org/10.1111/coin.70114
dc.identifier.uri	https://hdl.handle.net/20.500.12436/9492
dc.identifier.volume	41
dc.identifier.wos	001550494700001
dc.identifier.wosquality	Q3
dc.indekslendigikaynak	Web of Science
dc.language.iso	en
dc.publisher	Wiley
dc.relation.ispartof	Computational Intelligence
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Öğrenci
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Deep learning
dc.subject	Implied threat detection
dc.subject	Machine learning
dc.subject	Natural language processing
dc.title	A Machine Learning Approach of Text Classification forHigh- and Low-Resource Languages
dc.type	Article
dspace.entity.type	Publication

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Raza-2025-A-machine-learning-approach-of-text.pdf
Boyut:: 5.27 MB
Biçim:: Adobe Portable Document Format
Açıklama:: Article file

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bilgisayar Mühendisliği Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu