DEVELOPMENT OF CYBERBULLYING DATASET (CYTED): FLAMING CLASSIFICATION

Nor Izna Mohd Isa; Madihah Mohd Saudi

Authors

Nor Izna Mohd Isa
Madihah Mohd Saudi Universiti Sains Islam Malaysia (USIM)

Abstract

Cyberbullying is a widespread issue that has significant psychological impacts, particularly in its flaming form, which often occurs on social media platforms like Twitter. Detecting flaming behavior within the Malaysian context is challenging due to the scarcity of reliable datasets, especially in the Malay language. This paper aims to address this gap by developing a small dataset of keywords in both Malay and English related to flaming cyberbullying. The objectives of this paper are to extract relevant keywords from Twitter, to develop a flaming classification dataset, and to evaluate this dataset by applying various machine learning algorithms. A total of 3,600 samples (1,800 in Malay and 1,800 in English) were collected through keyword-based searches using the TweetHarvest tool. The processes of data preprocessing, feature extraction, and classification using Logistic Regression, Random Forest, and Support Vector Machine (SVM) were carried out with 10-fold cross-validation. Based on the conducted experiments, Logistic Regression achieved the highest accuracy, with a rate of 94% for Malay keywords and 95% for English keywords. This paper successfully developed a dataset for flaming classification, which can serve as a basis for creating a cyberbullying detection model.

DEVELOPMENT OF CYBERBULLYING DATASET (CYTED): FLAMING CLASSIFICATION

Authors

Abstract

Downloads

Published

Issue

Section

Most read articles by the same author(s)

NAVIGATION

TBA