International Journal of Advanced Multidisciplinary Research and Studies
Volume 5, Issue 5, 2025
Enhancing K-Means Clustering Algorithm for Predictive Analysis in Big Data
Author(s): Elhadi AA Suiam, Awad H Ali
Abstract:
The rapid expansion of big data has intensified the demand for clustering algorithms that are both accurate and scalable. K-Means remains one of the most widely adopted clustering methods due to its simplicity and efficiency; however, it suffers from persistent limitations, including sensitivity to initial centroid placement, scalability challenges with massive datasets, and reduced effectiveness in high-dimensional spaces. This study introduces an enhanced K-Means clustering framework that integrates two key improvements: (1) smarter centroid initialization using k-means++, which mitigates local minima convergence, and (2) dimensionality reduction through Principal Component Analysis (PCA), which reduces computational complexity while preserving variance. The framework is evaluated on both benchmark and real-world datasets, including customer segmentation and financial risk analysis. Experimental results demonstrate that the proposed method consistently outperforms standard K-Means and several alternative clustering approaches in terms of accuracy, cohesion, and scalability. By addressing long-standing limitations of the algorithm, this work contributes a practical and robust solution for predictive analytics in the era of big data.
Keywords: K-Means Clustering, PCA, Big Data, Predictive Analytics, Dimensionality Reduction
Pages: 250-254
Download Full Article: Click Here

