Iris Publishers: Iris Publishers-Open access Journal of Biology & Life Sciences| Enhancing Chronic Kidney Disease Diagnostics through a Novel Hybrid Multi-Model Fusion Approach in Machine Learning

Tuesday, October 29, 2024

Iris Publishers-Open access Journal of Biology & Life Sciences| Enhancing Chronic Kidney Disease Diagnostics through a Novel Hybrid Multi-Model Fusion Approach in Machine Learning

Authored by Pinar Karadayi Ataş*,

Abstract

In recent years, intelligent classification techniques, especially those based on machine learning (ML), have gained significant traction, transforming various sectors, notably the medical industry. ML has emerged as a critical tool in healthcare, enhancing the diagnosis and prediction of illnesses. This paper focuses on applying ML algorithms for diagnosing chronic kidney disease (CKD), a significant challenge in medical diagnostics. We explored the efficacy of Artificial Neural Networks (ANNs), Support Vector Machines (SVM), Logistic Regression (LR), and introduced a novel Hybrid Multi-Model Fusion (HMMF) method, utilizing a dataset of 153 cases with 11 CKD patient attributes. The objective was to assess and compare these classifiers’ predictive capabilities for CKD diagnosis through several metrics. The study revealed that ANNs outperformed the LR model, with ANNs achieving 84.44% accuracy, 84.21% sensitivity, 84.61% specificity, and an AUC-ROC of 84.41%, highlighting the robustness of ANNs in complex medical prediction tasks.

Furthermore, urea and creatinine levels were identified as crucial predictors for CKD prognosis, offering insights into key disease-influencing factors. The introduction of the HMMF method marks a significant advancement, integrating the strengths of ANNs, LR, and SVM to enhance diagnostic accuracy. This method demonstrated superior performance, achieving an accuracy of 86.67%, thereby establishing its efficacy over individual models. The HMMF model leveraged the predictive strengths and addressed the limitations of its constituent algorithms, resulting in a more accurate and reliable classifier for CKD diagnosis. This study contributes to the medical diagnostics field by underscoring the potential of advanced ML techniques, particularly the novel HMMF method, in improving diagnostic precision and facilitating informed medical decisions. It supports the integration of cutting-edge ML approaches in healthcare, promising enhanced patient outcomes and healthcare productivity.

Introduction

The prevalence of kidney diseases, including CKD, has significantly and concerningly increased in the last few years across the globe [1,2]. This increase is a growing public health emergency that highlights the critical need for early detection, precise diagnosis, and efficient treatment of kidney-related disorders. It is not just a statistical anomaly. Particularly CKD has become a silent epidemic, impacting millions of people globally and frequently going undiagnosed until the illness has reached a more advanced stage. Because CKD is a sneaky disease that causes kidney function to gradually decline over time, early diagnosis and treatment are vital [3,4]. The effects of CKD go well beyond the kidneys. A number of other major health issues, such as hypertension, anemia, cardiovascular disease, and problems with bone health, are strongly accelerated by this illness [5–7]. A multimodal approach to healthcare is required due to the complex clinical landscape created by the interconnections between CKD and these comorbid conditions. Furthermore, end-stage renal disease (ESRD), a dangerous condition that requires dialysis or kidney transplantation for survival, can result from the progression of CKD [8,9]. The high rates of morbidity and death linked to CKD and its complications highlight how important it is to closely monitor and manage those who are at risk.

It is impossible to exaggerate the importance of CKD and its effects on world health [10]. Its capacity to lower life quality and the significant financial strain it places on global healthcare systems necessitate immediate and coordinated action to combat this illness. Not only can a timely and precise diagnosis of CKD save lives, but it can also drastically lower the medical expenses related to treating its complications. This makes the use of cutting-edge diagnostic techniques and tools essential, especially those that make use of ML capabilities. ML technologies present a promising avenue to mitigate the negative consequences of CKD and enhance the general health and well-being of impacted populations by improving the accuracy and efficiency of CKD diagnosis. The escalating prevalence of CKD poses a significant challenge to global health, underscoring the necessity for improved diagnostic and management strategies. The disease’s capacity to precipitate a range of secondary health issues further amplifies its threat, making the early detection and treatment of CKD paramount. As the medical community continues to navigate the complexities of CKD, the integration of advanced ML techniques in the diagnostic process represents a beacon of hope for countless individuals affected by this debilitating condition.

The foundation of AI, and ML has transformed many industries, including healthcare, by providing cutting-edge solutions for challenging issues. Its use in diagnosing diseases has become increasingly popular in recent years. Medical diagnostics has found ML algorithms to be invaluable due to their ability to analyze large datasets, identify patterns, and make highly accurate predictions. Early and accurate disease detection is made possible by these algorithms’ far greater efficiency in processing and interpreting genetic data, medical images, and patient data than is possible with conventional techniques. This technological advancement has significantly improved diagnostic processes, patient care, and treatment outcomes, highlighting the potential of ML in revolutionizing healthcare practices [11,12]. This technical development has greatly enhanced patient care, treatment results, and diagnostic procedures, underscoring the potential of machine learning to transform healthcare practices [11,13,14]. More specifically, ML has demonstrated incredible promise in the field of kidney disease diagnosis. CKD is a condition marked by a progressive loss of kidney function. Because of its intricate nature and the subtlety of its early-stage symptoms, diagnosing CKD can be difficult. These difficulties have been skillfully addressed by machine learning algorithms, which enable early detection and intervention. In order to identify CKD at a stage when patients are asymptomatic, ML models analyze clinical data, laboratory results, and patient histories. This allows for timely treatment, which significantly improves the prognosis. Moreover, these algorithms have the ability to forecast the course of the illness, which enables medical professionals to better customize treatment regimens.

In a study [15], the EMPA-KIDNEY trial is mentioned, which thoroughly evaluated empagliflozin’s ability to impede the advancement of chronic kidney disease (CKD) in a large number of patients. This phase 3 trial, which involved 241 centers worldwide and 6609 CKD patients, randomly assigned them to receive 10 mg of empagliflozin or a placebo daily. The results showed that, regardless of the underlying cause, empagliflozin significantly slowed the progression of kidney disease. These results point to the potential of SGLT2 inhibitors as an element of standard care that reduces the risk of CKD progression. In a different study [16], they developed a novel hybrid approach that combined XGBoost, Random Forest, Logistic Regression, AdaBoost, and a Random Forest meta classifier to solve the difficulty of accurately diagnosing illnesses, especially in the prediction of chronic kidney failure. Through careful selection and combination of these models, we created a “hybrid” model that on the UCI Chronic Kidney Failure dataset achieved the highest accuracy of 95%, significantly outperforming individual model performances. This development not only represents a significant advance in AI-assisted medical diagnostics but also demonstrates the potential of hybrid models to improve accuracy and reduce overfitting in intricate health data analysis.

Researchers explore the potential of ANNs in two main areas: identifying common kidney diseases, such as polycystic kidney disease, kidney cysts, and kidney cancer, and approximating image recognition in healthy individuals. ANNs are used to simulate perception and motor control functions for fast and accurate computations by utilizing the intricate, nonlinear computational power resembling that of the human brain. Through the examination of various samples, the study seeks to apply machine learning algorithms for the diagnosis of kidney diseases, demonstrating the potential of ANNs in improving patient care and medical diagnostics [17]. Another study [18] highlights the importance of the decision tree classification algorithm (DTCA) as a key ML tool that can be used to solve challenging issues in a variety of fields, including medical diagnostics. To be more precise, computer-aided diagnosis systems use DTCA to identify CKD such as diabetes and cancer from healthcare data. Deep learning (DL), which further enhances the field of machine learning, uses neural networks to learn from unstructured or unlabeled data. It does this by using methods like deep-stacked auto-encoders and softmax classifiers for CKD analysis. For the purpose of early CKD diagnosis and detection, this study uses a variety of predictive models, such as Random Forest, SVM, C5.0, C4.5, ANN, neuro-fuzzy systems, and others, with R Studio and Python Colab. The goal of the systematic review was to assess how AI and ML are used in the diagnosis, treatment, and prognosis of CKD. Because of its single database source and language constraints, the review met the requirements of a “rapid review” because it painstakingly assembled data from English-language research that was obtained from PubMed. Notwithstanding these limitations, the subject’s critical nature reduced the possibility of missing important research. The review examined 16 variables from each study, including objectives, population demographics, data sources, sample sizes, and performance metrics, among others, using the Preferred Reporting Items for Systematic Reviews (PRISMA) guidelines [19].

In a study [20], ML is used to address the challenge of high variability in chronic disorder prognosis, specifically for CKD, which has a significant impact on clinical support systems and contributes to global mortality. The accuracy of the traditional CKD diagnostic techniques, which mainly rely on different biological markers, is frequently compromised. A machine learning model was created to predict the occurrence of CKD using public datasets in an effort to address this problem. Several data preprocessing steps, such as feature scaling, data balancing using the SMOTE algorithm, and missing data imputation, were involved in the construction of the model. To choose a minimal yet highly correlated feature set, the chi-squared test was also used. Then, a variety of supervised learning strategies were used to build a strong machine-learning model. Numerous studies have been conducted on the use of ML in the detection and treatment of kidney diseases. Numerous ML techniques have been investigated in these studies, including Decision Trees, SVMs, LR, ANNs, and more. While each approach has advantages and uses, ANNs and LR are frequently cited for their superiority in processing non-linear data and producing probabilistic results, respectively. The increasing corpus of research highlights the viability and efficacy of machine learning in augmenting chronic kidney disease diagnosis and treatment, indicating a future in which AI-powered instruments are essential to nephrology and more general medicine. This pattern is indicative of a larger movement in medicine toward data-driven care, where ML algorithms are essential for raising diagnostic precision, refining therapeutic approaches, and ultimately improving patient outcomes.

Materials and Methods

Support Vector Machines (SVM)

A group of supervised learning techniques called SVM are applied to regression, classification, and outlier identification. The primary goal of SVM is to identify the hyperplane in the feature space that best divides various classes. In order to maximize the margin between the closest points in each class referred to as support vectors this hyperplane was selected. When there are more dimensions than samples, SVMs perform especially well in highdimensional spaces. The use of kernel functions, which implicitly map input data into high-dimensional feature spaces, makes them adaptable to both linear and nonlinear data [21].

Logistic Regression (LR)

A statistical technique for examining a dataset in which one or more independent variables influence an outcome is called logistic regression (LR). A dichotomous variable is used to measure the result (where there are only two possible outcomes). It is widely used to predict the likelihood of a binary outcome in the domains of machine learning, social sciences, and medicine. Instead of providing a means for regression in the conventional sense, logistic regression models the probability that an outcome falls into one of two categories based on a logistic function, in contrast to linear regression which predicts a continuous outcome [22].

Artificial Neural Networks (ANNs)

Computer systems known as Artificial Neural Networks (ANNs) are loosely modeled after the biological neural networks found in animal brains. Artificial neurons, which resemble neurons in a biological brain somewhat, are a group of interconnected units or nodes that form the foundation of an ANN [23]. A signal can be sent from one artificial neuron to another through any connection between them. After processing the signal, the receiving neuron notifies neurons that are downstream of it. Through a process called training, ANNs can identify intricate patterns and relationships within data. This is accomplished by modifying the weights of connections. They are widely used in many different domains, including finance, engineering, medicine, and more, for a range of tasks like feature learning, regression, clustering, and classification.

Stacking

A machine learning ensemble technique called stacking, also known as stacked generalization, combines several prediction models to create a new model with the goal of increasing accuracy [24]. In contrast to conventional ensemble techniques such as bagging or boosting, stacking entails the training of a second-level model, also known as a meta-model, to combine the predictions of multiple base models in an optimal manner. These base models are trained on the same dataset and are usually diverse (e.g., different algorithms). To create a final prediction, the meta-model then learns how to integrate these predictions as best it can. The main thesis is that the meta-model can outperform any individual model in the ensemble by taking advantage of its strengths and making up for its shortcomings in the base models.

Hybrid Multi-Model Fusion (HMMF)

We combine the predictive powers of Support Vector Machines (SVM), Logistic Regression (LR), and Artificial Neural Networks (ANNs) to create a novel ensemble model that capitalizes on each model’s advantages in a cohesive way. In order to attain higher prediction accuracy, this model seeks to take advantage of the various features that these estimators have. The methodological approach consists of training each model separately, combining their predictions, and making a final decision through a soft voting mechanism. We describe the process both descriptively and itemized below, and then we show the algorithm in pseudo-code. We have created an ensemble model that leverages the unique benefits of ANNs, LR, and SVM. The capacity of ANNs to simulate intricate, nonlinear relationships in data is well known. Because it provides a probabilistic viewpoint, LR is a priceless tool for determining the likelihood of categorical outcomes. SVM works especially well in situations where there are distinct class boundaries because it is skilled at locating the best hyperplane to maximize the margin between classes. Using the same dataset, we first train each predictor separately according to this methodology. We use a soft voting strategy to aggregate the predictive probabilities from each model after training is finished. Because this method takes into account the degree of confidence in each model’s predictions, it enables a more nuanced decision-making process. The class with the highest average probability is chosen as the result, and the final prediction is made by averaging the probabilities related to each class across all models (Figure 1).

irispublishers-openaccess-biology-life-science

Training Phase

a) Train the ANN on the dataset.

b) Independently, train the LR model on the same dataset.

c) Similarly, train the SVM model on the dataset.

Prediction Aggregation

a) For a given input, generate predictions (class probabilities) from each of the trained models (ANN, LR, SVM).

b) Aggregate these predictions by calculating the average probability for each class across all three models.

Decision Making

a) Apply stacking mechanism where the final class prediction is the one with the highest average probability across the predictions from the ANN, LR, and SVM models.

Model Evaluation

Machine learning models, like ANNs and LR, are evaluated using a variety of metrics, each of which offers a unique perspective on the model’s performance. The performance metrics that were mentioned have their mathematical formulations and explanations provided below:

A Confusion Matrix is a tool that helps to visualize the performance of a classification model. It is a table with two dimensions (“Actual” and “Predicted”) and allows us to measure directly the number of true positives, false positives, true negatives, and false negatives. The matrix is structured as follows:

a) True Positives (TP): Instances correctly predicted as positive.

b) True Negatives (TN): Instances correctly predicted as negative.

c) False Positives (FP): Instances incorrectly predicted as positive (Type I error).

d) False Negatives (FN): Instances incorrectly predicted as negative (Type II error).

The following formula is used to determine accuracy, which is a measure of the model’s overall correctness:

It represents the ratio of correctly predicted observations (both positive and negative) to the total observations.

Sensitivity measures the proportion of actual positive cases that are correctly identified by the model:

It indicates how good the model is at detecting positive instances.

Specificity measures the proportion of actual negative cases that are correctly identified by the model:

It indicates how good the model is at avoiding false alarms.

The Area Under the Receiver Operating Characteristic (ROC) Curve (AUC-ROC) is a performance measurement for classification problems at various threshold settings. The ROC curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at different threshold levels.

Results and Discussion

In order to slow down the progression of CKD and lessen its related health complications, early diagnosis is essential. Kidney function gradually declines over time as a result of CKD. Early detection allows for the implementation of interventions like medication and lifestyle changes that can greatly slow the progression of the disease, enhance the quality of life, and lower the risk of serious side effects like ESRD, cardiovascular disease, and higher mortality (Figure 2). Because they offer the possibility of preventive medical strategies and improved patient outcomes, accurate diagnostic tool development and implementation are crucial throughout the healthcare continuum. Prior to conducting a comparative analysis of the classification models, we started our investigation with a thorough preprocessing step on the dataset that came from the 153 patients at the Erbil Teaching Hospital. We carried out a comprehensive feature importance evaluation and painstakingly filled in any missing values (Figure 3) because we understood how important high-quality data was.

By ensuring that the dataset was free of missing values, this preprocessing step reduced the possibility of biases and increased the validity of our conclusions. To help with a more informed feature selection process for the following modeling phase, we also carefully examined the dataset to comprehend the distribution and significance of different features in relation to the presence or absence of CKD. The distribution of age in the database has shown in Figure 4. The LR model’s evaluation of the training dataset yielded the following results: 89.81% accuracy, 81.40% sensitivity, 95.38% specificity, and 39.81% disease prevalence. The LR model’s ability to accurately classify CKD presence or absence, with notable precision in identifying negative cases, is demonstrated by this performance. On the other hand, the ANN model showed an unmatched 100% accuracy on the training dataset, demonstrating its superior ability to perform error-free classification tasks. This flawlessness demonstrates the ANNs model’s promise for use in CKD detection and other medical diagnostics.

The LR model yielded accuracy of 82.22%, sensitivity of 84.21%, and specificity of 80.77% for the testing dataset. With an accuracy of 84.44%, sensitivity of 84.21%, and specificity of 84.61%, the ANNs model demonstrated a marginal improvement in comparison. According to these findings, ANNs show a slight advantage in generalization to previously unseen data, even though both models perform admirably. We presented a novel method called Hybrid Multi-Model Fusion (HMMF) to further improve model performance. By using a deliberate fusion mechanism, this approach combines the predictive capabilities of both LR and ANNs models in an effort to maximize the distinct benefits of each model. Using the same dataset, the HMMF approach was evaluated with the goal of enhancing overall disease prevalence detection as well as classification accuracy, sensitivity, and specificity.

The application of the HMMF approach produced outstanding outcomes that outperformed the LR and ANNs models separately. In particular, the HMMF model produced results with an accurate disease prevalence rate of 42.22%, sensitivity of 86.71%, specificity of 88.50%, and accuracy of 87.67%. These findings demonstrate how well the HMMF approach works to improve diagnostic accuracy and consistency.

Our results show that while individual models such as ANNs and LR have good predictive power, combining these models using the HMMF technique improves performance metrics dramatically. The HMMF approach creates a more reliable and accurate classification model for CKD diagnosis by combining the benefits of LR and ANNs while also mitigating their individual shortcomings. Employing the SVM model allowed us to explore its effectiveness in handling highdimensional data, resulting in an 83.33% accuracy on the testing dataset. This showcases SVM’s robustness in classifying CKD presence accurately the result has shown in Table 1.

Table 1: Comparative Performance Table.

The results of this study support the use of hybrid models in medical diagnostics and provide a promising avenue for further research into improving predictive analytics in the healthcare industry.

Conclusion

The present investigation highlights the crucial function of sophisticated ML algorithms in augmenting the diagnostic precision of CKD, a condition that bears noteworthy ramifications for worldwide health. We have illustrated the superior predictive capabilities of the HMMF approach by comparing it with ANNs, SVM, LR, and other methods. Our results show that although SVM and ANNs offer strong frameworks for diagnosing CKD, integrating these models with HMMF greatly improves diagnostic accuracy, yielding 87.67% accuracy, 86.71% sensitivity, and 88.50% specificity. The study also emphasizes the significance of urea and creatinine levels as critical CKD predictors, providing insightful information about the course and prognosis of the condition. This information not only advances medicine by enabling prompt and precise diagnosis of CKD, but it also highlights the promise of machine learning to transform healthcare diagnostics. The HMMF approach is proof of the creative advancements being made in medical diagnostics because it can take advantage of the advantages of individual ML algorithms while also minimizing their drawbacks.

Discussion

Using ML in medical diagnostics presents a viable way to handle the complexity of conditions like CKD. A notable development in this area is the HMMF method, which combines the predictive abilities of ANNs, SVM, and LR to provide a more sophisticated and precise diagnostic tool. The efficacy of this approach suggests that machine learning algorithms have the capacity to improve patient outcomes by augmenting diagnostic precision and facilitating customized treatment strategies. Furthermore, the discovery by ML of urea and creatinine levels as significant CKD predictors offers a more focused strategy for early disease detection and treatment. Healthcare providers can potentially halt the progression of the disease and enhance the quality of life for CKD patients by concentrating on these important indicators and acting promptly. The study’s implications go beyond the diagnosis of CKD. It implies that the HMMF approach and related machine learning techniques could be utilized for a variety of medical conditions, thereby augmenting the precision and efficacy of disease diagnosis and treatment. The incorporation of such cutting-edge diagnostic techniques will be essential in tackling the problems facing contemporary medicine as the healthcare sector develops, opening the door for a time when AI and ML will be essential parts of patient care.

In conclusion, a major advancement in the use of ML in healthcare has been achieved with the successful application of the HMMF method for CKD diagnosis. This work not only shows how well HMMF can increase diagnostic accuracy, but it also shows how ML algorithms can be used to improve medical diagnosis and treatment plans in more general ways. More precise, effective, and individualized healthcare is becoming a more likely possibility as we investigate and improve these technologies.

To read more about this article...Open access Journal of Biology & Life Sciences

Please follow the URL to access more information about this article

https://irispublishers.com/sjbls/fulltext/enhancing-chronic-kidney-disease-diagnostics.ID.000563.php

To know more about our Journals...Iris Publishers

Iris Publishers