Linguistic Feature Classifying and Tracing

Main Article Content

Mohammadreza Moohebat
Ram Gopal Raj
Dirk Thorleuchter
Sameem Binti Abdul Kareem


We investigate the identification and analysis of linguistic (lexico-grammatical) features that are characteristically used by articles of a specific year of publication. Linguistic features differ from shallow features because they represent authors’ lexico-grammatical writing styles and do not consider well-known bag-of-words model. Current literature focusses on shallow features rather than on linguistic features and existing methods for identifying linguistic features use well-known knowledge-structure based approaches. In contrast to this, we advance these existing methods by applying semantic clustering instead of using knowledge-structure based approaches. For evaluation purpose, a linguistic feature-based prediction model is built to enable an automated assignment of articles to their years of publication. In a case study, the proposed methodology is applied to articles of the Springer book series 'Communications in Computer and Information Science' published from 2009 to 2013. The Case study results show the feasibility of the proposed approach as compared to frequently used baseline.


Download data is not yet available.

Article Details

How to Cite
Moohebat, M., Raj, R. G., Thorleuchter, D., & Binti Abdul Kareem, S. (2017). Linguistic Feature Classifying and Tracing. Malaysian Journal of Computer Science, 30(2), 77–90.

Most read articles by the same author(s)