A Machine Learning Approach of Text Classification forHigh- and Low-Resource Languages
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
A large amount of data have been published online in textual format for the last decade because of the advancement of informationand communication technologies. This is an open challenge to organize and classify large amounts of textual data automatically,especially for a language that has limited resources available online. In this study, two types of approaches are adopted for exper-iments. First one is a traditional strategy that uses six (06) classical state-of-the-art classification models (1. decision tree (DT),2. logistic regression (LR), 3. support vector machine (SVM), 4. k-nearest neighbour (k-NN), 5. Naive Bayes (NB), and 6. randomforest (RF)) along with two (02) ensemble methods (1. Adaboost and 2. gradient boosting (GB)) and second modeling technique isour proposed voting based ensembling scheme. Models are trained on a 75-25 split where 75% of data is used for training and 25%for testing. The evaluation of the classification models is carried out based on accuracy, precision, recall, and F1-score indexes.The experimental outcomes witnessed that for the traditional approach, gradient boosting outperformed for the limited resourcelanguage with 98.08% F1-score, while SVM performed better (97.34% F1-score) for the resource-rich language.









