IDENTIFICATION OF FEATURES IN PREDICTING PROMINENT MALAY WORDS USING DECISION TREE

Main Article Content

Sabrina Tiun
Liew Siaw Hong

Abstract

Predicting word prominence is a major topic in the field of speech synthesis where predicting prominent words is necessary to produce a natural-sounding speech synthesis. In our previous work, marking prominent words in a speech corpus is required to select the most suitable unit for speech synthesis; however, given that marking is performed manually, building a large speech corpus will be expensive in terms of labor and time-consuming. Thus, predicting prominent words automatically for which features represent an important aspect is required. This study presents an experimental work on identifying features (including part-of-speech (POS) sequence, phrasal break, and word position) in predicting prominent Malay words using decision tree and WEKA feature selection correlation method. Results show that using the decision tree for predicting prominent words (Precision = 85.0%, Recall = 84.2%, and F-measure = 83.5%) is optimal when the phrasal break is omitted as a feature. In addition, the results (Precision = 66.40%, Recall = 67.2%, and F-measure = 66.60%) are poorest when the POS sequence is excluded from the features. Therefore, this study concludes that phrasal break is a weak (noisy) feature, whereas POS sequence is an important feature in predicting prominent Malay words.

Downloads

Download data is not yet available.

Article Details

How to Cite
Tiun, S., & Hong, L. S. (2020). IDENTIFICATION OF FEATURES IN PREDICTING PROMINENT MALAY WORDS USING DECISION TREE. Malaysian Journal of Computer Science, 33(4), 298–305. https://doi.org/10.22452/mjcs.vol33no4.4
Section
Articles