Journal of Statistical Modeling and Analytics (JOSMA)

Performance of Chili Price Forecasting Models in Johor: A Comparative Study

Fri, 26 Dec 2025 00:00:00 +0800

A substantial portion of household income in Malaysia is allocated to food expenditures, and chili is a staple ingredient in Malaysian cuisine. Fluctuations in chili prices directly affect the cost of living for individuals and families, impacting their purchasing power and overall well-being. Forecasting chili prices helps in effective supply chain management. Producers, distributors, and retailers can plan and adjust their operations based on anticipated price trends. This, in turn, contributes to the country's efficiency and stability of the chili supply chain. This study emphasises the importance of comparing various forecasting models to identify the most accurate predictors of chili prices. The goal is to develop a model that can contribute to more informed decision-making in crop production and market interventions, ultimately promoting stability in the chili industry and ensuring sustainable practices. Statistical models, time series forecasting models and machine learning models which include multiple linear regression (MLR), Auto Regressive Integrated Moving Average with exogenous inputs (ARIMAX), and machine learning models that consist of Support Vector Regression (SVR) were tested and compared using ex-farm prices in Johor with the duration of 5 years, starting from 2018 to 2022. This study reveals that SVR under machine learning algorithms performed best as the forecasted model followed by ARIMAX and MLR. However, ARIMAX models, an extension of the ARIMA model, effectively capture and predict patterns by incorporating significant exogenous variables. Overall, the results show that the price of fertilisers, Movement Control Order (PKP) season and chili production significantly affect the prices of chilies.

Sample Size Requirements for the Central Limit Theorem for Skewed Distributions: A Simulation Study

Sinha Aziz, HM Nayem, BM Golam Kibria — Fri, 26 Dec 2025 00:00:00 +0800

The Central Limit Theorem (CLT) plays a foundational role in statistical inference, often serving as the rationale for assuming a normal approximation of the sample mean. Yet, the pace at which this assumption becomes valid is influenced by the shape of the parent distribution, especially its skewness. This research quantifies the minimum number of observations required for the mean of samples drawn from skewed, non-normal distributions specifically Gamma, Poisson, Binomial, and Beta to achieve a satisfactory normal approximation. We implemented a Monte Carlo simulation and applied both the Shapiro-Wilk and Kolmogorov-Smirnov tests to assess the adequacy of the normal approximation. Results indicate a nonlinear association between the degree of skewness and the sample size required for acceptable normal approximation. For distributions with mild asymmetry (|skewness| < 0.5), 20 samples often suffice, whereas more heavily skewed distributions (|skewness| ≥ 2.5) may necessitate sample sizes beyond 100. These findings call into question the blanket use of the "n ≥ 30" heuristic and suggest more tailored guidelines are necessary for accurate inference. A graphical overview summarizes these results across the examined distributional families, offering clear guidance for applied researchers working with non-normal data.

Comparative Analysis of Machine Learning Models for Vintage-Based Credit Scoring

Tan Yong Seng, Soo Huei Ching — Fri, 26 Dec 2025 00:00:00 +0800

Accurate credit risk assessment is crucial for financial institutions to minimise loan defaults. This study proposes a vintage-based credit scoring framework that integrates individual repayment behaviour with vintage analysis and evaluates five machine learning models, including logistic regression, random forest, XGBoost, stacking ensemble, and multilayer perceptron (MLP), for binary credit risk classification. Results show that ensemble methods, particularly random forest, achieve superior predictive performance with the highest F1-score (0.81), precision (0.87) and accuracy (0.96), while logistic regression exhibits high recall but low precision. The MLP shows good recall (0.79) and a competitive F1-score (0.77), making it suitable for prioritising high-risk borrower detection, although it lacks interpretability. Overall, the study highlights the trade-offs between predictive performance and interpretability, emphasising the potential of vintage-based approaches and ensemble learning for practical credit scoring applications.

Autoregressive Modelling with Seasonal Variations for Malaysian Crude Palm Oil Price Forecasting

Aniza Akaram, Arifah Bahar — Fri, 26 Dec 2025 00:00:00 +0800

Crude palm oil (CPO) plays a vital role in Malaysia’s economy, yet its price dynamics remain highly volatile due to global market fluctuations, policy changes, and demand uncertainties. Reliable short-term forecasting is therefore essential for industry stakeholders and policymakers. This study employs Autoregressive Integrated Moving Average (ARIMA) and Seasonal ARIMA (SARIMA) modelling to analyze and forecast monthly Malaysian CPO prices from 2015 to 2024. Preliminary seasonality diagnostics using STL decomposition indicated weak and statistically insignificant seasonal patterns, with a seasonal strength of 0.2046. Consequently, seasonal differencing was unnecessary, and first-order non-seasonal differencing was sufficient to achieve stationarity. Model identification and estimation were performed using an in-sample dataset (2015–2022), while model validation was conducted using out-of-sample data (2023–2024). A manual grid search across ARIMA candidates identified ARIMA(3,1,3) as the optimal model based on the Corrected Akaike Information Criterion (AICc). Residual diagnostics confirmed that the model errors behaved as white noise with no remaining autocorrelation. Out-of-sample testing further demonstrated that ARIMA(3,1,3) produced satisfactory predictive accuracy across RMSE, MAE, and MAPE metrics. Overall, the findings indicate that non-seasonal ARIMA models are sufficient for CPO price forecasting and that ARIMA(3,1,3) provides a reliable framework for short-term prediction and decision-making.

Modeling birth registration in the Savannah Region of Ghana

Fatawu Issah, Jakperik Dioggban — Fri, 06 Feb 2026 00:00:00 +0800

Accurate birth registration plays a vital role in national development planning and organizational decision-making. The Births and Deaths Registry is a key provider of demographic information, offering insights into the composition, magnitude, growth, and spatial distribution of a country's population across different administrative divisions. Inaccurate birth records can lead to misallocation of resources, such as immunization supplies, educational funding, and child protection initiatives. This research examines the factors influencing birth registration in the Savannah Region of Ghana, based on secondary data obtained from the regional Births and Deaths office for the years 2020 and 2021. The analysis was conducted using a Linear Mixed Effects model. The results indicate that parental education levels, employment status, the identity of the informant, and the location of residence significantly impact birth registration outcomes. The findings highlight an urgent need to improve public awareness, especially within rural areas and communities with low literacy levels, in order to increase birth registration rates.