E ISSN: 2583-049X
logo

International Journal of Advanced Multidisciplinary Research and Studies

Volume 5, Issue 5, 2025

Enhancing K-Means Clustering Algorithm for Predictive Analysis in Big Data



Author(s): Elhadi AA Suiam, Awad H Ali

Abstract:

The rapid expansion of big data has intensified the demand for clustering algorithms that are both accurate and scalable. K-Means remains one of the most widely adopted clustering methods due to its simplicity and efficiency; however, it suffers from persistent limitations, including sensitivity to initial centroid placement, scalability challenges with massive datasets, and reduced effectiveness in high-dimensional spaces. This study introduces an enhanced K-Means clustering framework that integrates two key improvements: (1) smarter centroid initialization using k-means++, which mitigates local minima convergence, and (2) dimensionality reduction through Principal Component Analysis (PCA), which reduces computational complexity while preserving variance. The framework is evaluated on both benchmark and real-world datasets, including customer segmentation and financial risk analysis. Experimental results demonstrate that the proposed method consistently outperforms standard K-Means and several alternative clustering approaches in terms of accuracy, cohesion, and scalability. By addressing long-standing limitations of the algorithm, this work contributes a practical and robust solution for predictive analytics in the era of big data.


Keywords: K-Means Clustering, PCA, Big Data, Predictive Analytics, Dimensionality Reduction

Pages: 250-254

Download Full Article: Click Here