Comparative Study of Feature Selection Approaches for Urdu Text Categorization

Tehseen Zia; Muhammad Pervez Akhter; Qaiser Abbas

FULL TEXT

Published: Jun 1, 2015

Keywords:

Text Categorization Feature Selection Urdu Performance Evaluation Test Collection

Tehseen Zia

Department of Computer Science & IT, University of Sargodha

Muhammad Pervez Akhter

Department of Computer Science & IT, University of Sargodha

Qaiser Abbas

Department of Computer Science & IT, University of Sargodha

Abstract

This paper presentsacomparative study of feature selection methods for Urdu text categorization. Fivewellknownfeature selection methods were analyzedby means ofsixrecognized classification algorithms: support vector machines (with linear, polynomial and radial basis kernels), naive Bayes, k-nearest neighbour (KNN), and decision tree (i.e. J48). Experimentations are performed on two test collections includinga standard EMILLE collection and a naive collection. We have found that information gain, Chi statistics, and symmetrical uncertainfeature selection methods have uniformly performed in mostly cases. We also found that no solo feature selection technique is best for every classifier.That is,naive Bayes and J48 have advantage with gain ratio than other feature selection methods. Similarly, support vector machines (SVM) and KNN classifiers have shown top performance with information gain.Generally,linear SVM with any of feature selection methods outperformed other classifiers on moderate-size naive collection.Conversely, naive Bayes with any of feature selection technique has an advantage over other classifiers for a small-size EMILLE corpus.

Downloads

Download data is not yet available.

How to Cite

Zia, T., Akhter, M. P., & Abbas, Q. (2015). Comparative Study of Feature Selection Approaches for Urdu Text Categorization. Malaysian Journal of Computer Science, 28(2), 93–109. Retrieved from https://ejournal.um.edu.my/index.php/MJCS/article/view/6857

Issue

Vol. 28 No. 2 (2015): Malaysian Journal of Computer Science

Section

Articles

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Most read articles by the same author(s)