Synergistic Intelligence: A Stacking Ensemble Approach for Accurate and Scalable Diabetes Prediction
Keywords:
Diabetes prediction, Ensemble stacking model, Machine learning classifiers, Early diagnosis and screeningAbstract
Diabetes mellitus poses a rapidly escalating global health crisis, currently affecting over 537 million adults and demanding scalable, automated diagnostic solutions. However, current machine learning interventions often face critical bottlenecks, particularly model overfitting and poor generalization due to severe class imbalance in clinical datasets. To overcome these limitations, this study engineers a robust, clinically applicable Stacking Ensemble framework validated on the Pima Indians Diabetes Dataset. We employed a rigorous data preprocessing pipeline that utilizes the Synthetic Minority Oversampling Technique (SMOTE) to rectify class distribution, ensuring unbiased decision boundaries. By strategically integrating the complementary strengths of Logistic Regression, Support Vector Machines, K-Nearest Neighbors, and Naive Bayes via a meta-learning architecture, our approach successfully mitigates the individual weaknesses of single classifiers. The proposed ensemble demonstrated superior performance, achieving an accuracy of 81.5% and a critical recall rate of 84.0%, significantly reducing the risk of missed diagnoses compared to baseline models. Crucially, the system maintains exceptional computational efficiency with an inference latency of only 27.43 ms, confirming its viability for real-time deployment in resource-constrained medical environments. This research bridges the gap between algorithmic complexity and practical utility, offering a scalable, interpretable solution for early diabetes detection.






