From Data to Decision: Machine Learning and Explainable AI in Student Dropout Prediction

Shahad ALBUGAMI, Hana ALMAGRABI and Arwa WALI

 Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

Academic Editor: Amine Nehari Talet

Cite this Article as:

Shahad ALBUGAMI, Hana ALMAGRABI and Arwa WALI (2024)," From Data to Decision: Machine Learning and Explainable AI in Student Dropout Prediction", Journal of eLearning and Higher Education, Vol. 2024 (2024), Article ID 246301, https://doi.org/10.5171/2024.246301

Copyright © 2024. Shahad ALBUGAMI, Hana ALMAGRABI and Arwa WALI. Distributed under Creative Commons Attribution 4.0 International CC-BY 4.0

Abstract

Student dropout is a critical issue with diverse consequences for the success of students, universities, and society. By delving into the factors behind dropout rates and leveraging predictive methodologies, universities and policymakers can develop targeted interventions to reduce dropout rates. Despite the progress in this area, limitations remain, including narrow research scopes, a reliance on traditional academic factors, and limited integration of qualitative perspectives. This paper highlights these gaps and provides actionable recommendations for future research, such as incorporating explainable AI, expanding sample populations, and integrating diverse factors like psychological health and cultural influences.

Keywords: Student Dropout, Machine Learning, Deep Learning, Explainable AI

Introduction

Student dropout is the process of students leaving their educational institution before obtaining their degree (Tinto, 1994). High dropout rates are a critical concern for educational institutions in terms of planning, reputation management, and resource allocation. According to Hanson (2024), up to 39% of undergraduate students in the United States do not complete their degree program, which illustrates the significant challenges and obstacles many students face in their education. The decision to dropout of university is influenced by different factors, including academic performance, socioeconomic background, health issues, and lack of support. Understanding these factors is crucial for developing effective interventions and support initiatives to reduce dropout rates and enhance student retention.

In recent years, the use of machine learning and deep learning techniques has made significant advancements in predicting student dropout and identifying at-risk students (Agrusti et al., 2020; Kemper et al., 2020). Using these techniques, researchers and educational institutions can proactively intervene and reduce student dropout rates. Moreover, the emergence of eXplainable Artificial Intelligence (XAI) provides a transparent framework for analyzing predictive models and understanding key factors behind dropout predictions (Alwarthan et al., 2022; Delen et al., 2023). XAI also enables educational institutions to make informed decisions and implement tailored strategies to reduce student dropout.

The paper aims to shed light on key factors influencing student dropout and offer valuable insights to enhance prediction methods, ultimately informing more effective strategies for improving student retention. The paper is organized as follows: the first section discusses factors contributing to student dropout; the second section explores the application of machine learning and deep learning in student dropout; the third section examines the role of eXplainable Artificial Intelligence (XAI) in student dropout; the fourth section addresses the limitations and future directions; and the final section provides the conclusion.

Factors Contributing to Student Dropout

Researchers identified various factors contributing to student dropout. According to Gonzalez-Nucamendi et al. (2023) and Mnyawami et al. (2022), age is among the important factors in predicting student dropout. When comparing dropout rates by gender, a significant difference was observed by Fauszt et al. (2023) with 489 females and 1362 males dropping out from a total of  1851 students. Similarly, Cocoradă et al. (2021) concluded that males have a higher intention to drop out compared to females. Furthermore, according to Ashour (2019), males are often seen as the primary breadwinners, and this can place pressure on young males to drop out of the university to financially support their families. Some male students attempted to balance work and university but struggled with the conflict between attending classes and working, which eventually led them to drop out. This issue is also emphasized by both Kim et al. (2023) and Kocsis and Pusztai (2020), who stated that student employment increases the dropout rate. Moreover, Kumar et al. (2023) revealed that dropout rates were notably high among married female students and especially among those from lower socioeconomic backgrounds. In Ashour’s study (2019), female participants reported that their decision to dropout is influenced by their marriage and child responsibilities. On the other hand, Zahra (2020) found that female students who postponed marriage until the later stages of their education had a lower likelihood of dropping out.

Furthermore, Singh and Alhulail (2022) identified academic performance as the most contributing factor to student dropout, noting that early academic difficulties often lead to dropout. Similarly, Nurmalitasari et al. (2023) and Kim et al. (2023) highlighted the significance of academic satisfaction and academic performance in predicting student dropout. Tanvir and Chounta (2021) also indicated that performance-based characteristics, such as grades, are stable and reliable factors of student dropout. Moreover, Tamada et al. (2022), Costa et al. (2020), and Shynarbek et al. (2021) suggested that academic performance data can be used in predicting student dropout with high accuracy. Likewise, Demeter et al. (2022) emphasized the importance of GPA and completed credit hours in predicting student graduation.

For a comprehensive understanding of factors contributing to student dropout, Table 1 summarizes the various factors studied and the results of the studies on the most contributing factors.

Table 1: Summary of the Most Contributing Factors to Student Dropout

246301Machine Learning and Deep Learning in Student Dropout

Machine Learning (ML) and Deep Learning (DL) are branches of artificial intelligence that offer powerful models for analyzing student data and predicting dropout rates. A recent comparative study by Villar and De Andrade (2024) applied different machine learning models such as Decision Tree, Support Vector Machine (SVM), Random Forest (RF), Gradient boosting, XGBoost, CatBoost, and LightGBM to assess their performance in predicting student dropout. The study found that boosting algorithms, including LightGBM and CatBoost, outperformed traditional classification techniques with an accuracy (AUC) exceeding 0.90. Similarly, in the study by Fernández-García et al. (2021), boosting algorithms demonstrated their superiority. The researchers utilized Gradient Boosting, RF, SVM, and Ensemble models at different stages of prediction. Their findings showed that the Gradient Boosting model is the best model to use at the time of enrollment while SVM achieved a recall of 91.5% in detecting students likely to drop out by the end of the 4th semester. Additionally, Naseem et al. (2019) use an ensemble of Random Forest with different cross-validation techniques to predict dropout for first-year undergraduate CS students in the South Pacific. Results show that the accuracy of both models was higher than 80%, but the Random Forest model with 5-fold cross-validation has better Sensitivity and Kappa score. Furthermore, Revathy et al. (2022) utilized different machine learning models such as SVM, LDA, LR, and KNN. Compared to other ML models, KNN achieved a higher accuracy of 97.6%.

Moreover, many researchers believe that students decide whether to drop out or remain at university in their first year. Therefore, several studies have focused on predicting dropout rates among first-year students (Delen et al., 2023; Niyogisubizo et al., 2022; Singh and Alhulail, 2022). For example, Bonifro et al. (2020) applied Linear Discriminant Analysis (LDA), SVM, and RF to predict dropout among first-year undergraduate students. The average accuracies for each model were 74% for LDA, 76% for SVM, and 68% for RF. However, the results of this prediction cannot be generalized across all university levels. In a study conducted by Delen et al. (2023), a multilayer perceptron (MLP), a type of deep learning model, was employed to predict freshman student dropout, achieving an accuracy rate of 88.4%. Similarly, Melo et al. (2022) utilized an MLP to predict student dropout, resulting in a higher accuracy of 97%. In addition, Mnyawami (2022) applied different machine learning models, including MLP, to predict student dropout in schools across Tanzania, Uganda, and Kenya. In this study, the MLP achieved an excellent accuracy of 96%. Moreover, Alwarthan et al. (2022) employed several models ANN, SVM, and RF to predict at-risk students, with performance varying across different datasets. The highest accuracies achieved were 98.816% for ANN, 98.731% for SVM, and 99.662% for RF. Table 2 provides a comparative analysis of the different models applied in student dropout prediction.

Table 2: Summary of ML and DL in Student Dropout246301

Explainable Artificial Intelligence (XAI) in Student Dropout

Explainable AI (XAI) was developed to solve the black-box nature of the machine learning models and make them more understandable and transparent (Hassija et al., 2024). According to Delen et al. (2023), while Local Interpretable Model-Agnostic Explanations (LIME) efficiently provide local importance scores for each instance, they do not ensure the same level of accuracy and consistency as SHapley Additive exPlanations (SHAP). Consequently, they employed SHAP to provide “features of overall importance SHAP scores” and “individual-specific features of SHAP scores”. A recent study by Villar and De Andrade (2024) applied SHAP to better understand the effects of various factors in predicting student dropout. The method was specifically applied to CATBoost and LightGBM algorithms, and the study provides a visual representation of the top ten factors and their importance scores. Baranyi et al. (2020) utilized SHAP and permutation importance techniques to determine the factors influencing dropout probability and identified the 12 most significant contributing factors based on the models’ analysis. Additionally, Alwarthan et al. (2022) applied LIME, SHAP, and the Global Surrogate model. They also provide a written and visual explanation for each of the techniques used. Melo et al. (2022) define 14 metrics of the XAI framework for the school dropout problem, then they calculate the XAI explainability index for each explainer to compare them. The results indicate that SHAP has the highest XAI explainability index of 78%. Where LIME has 57% and Shapley Values has 35%. Table 3 summarizes the models used and the XAI techniques applied to interpret the models’ results. 

Table 3: Summary of Explainable AI in Student Dropout

246301

Limitations and Future Directions

The limitations of existing research in the field of student dropout prediction are notable. Primarily, many studies focus on specific universities, which limits the generalizability of their results to broader and more diverse student populations. Additionally, most research focused on traditional academic factors, which alone cannot fully explain the complexity of student dropout. Moreover, methodological constraints in several studies, such as small sample sizes, biased data collection methods, outdated datasets, or limited model evaluation techniques can affect the reliability and generalizability of the findings. Furthermore, there is a lack of qualitative research that directly incorporates the perspectives and experiences of students who have dropped out, which could provide deeper insights into the reasons behind their decisions. To address these limitations and guide future research directions, researchers should consider incorporating advanced technologies such as Large Language Models (LLMs). Additionally, more studies should integrate explainable AI (XAI) techniques to enhance the transparency and interpretability of predictive model results. Future research should also expand the scope of sample populations and develop predictive models that target student dropout on a national level. This is crucial for obtaining more comprehensive and generalizable results within specific countries. Furthermore, future studies should consider including factors that have been highlighted in statistical research such as psychological health, language barriers, and cultural differences. Incorporating these factors into dropout prediction models would help assess their significance and impact. Moreover, translating research findings into actionable policies is necessary to bridge the gap between theoretical insights and practical implementation. By overcoming these limitations and following these future directions, the quality of research in the field of student dropout prediction can be significantly enhanced.  

Conclusion

In conclusion, this paper has explored different factors contributing to student dropout, highlighting the importance of academic performance, demographic characteristics, and social factors in dropout prediction. In addition, the paper discussed several studies that used machine learning, deep learning, and explainable AI to predict and understand student dropout, which showed promising results in helping universities implement effective interventions to reduce dropout rates. However, several limitations of the existing studies were identified, including the narrow scope of the populations studied, the need to consider a broader range of factors, the lack of research considering the dropped-out students’ opinions, and the limited use of explainable AI. Future research should expand the scope to a national level, apply advanced models, incorporate new factors, and focus on translating research findings into actionable insights and strategies.  

References

  • Agrusti, F., Mezzini, M., Bonavolontà, G., 2020. Deep learning approach for predicting university dropout: a case study at Roma Tre University. J. E-Learn. Knowl. Soc. 44-54 Pages. https://doi.org/10.20368/1971-8829/1135192
  • Alwarthan, S., Aslam, N., Khan, I.U., 2022. An Explainable Model for Identifying At-Risk Student at Higher Education 10.
  • Ashour, S., 2019. Analysis of the attrition phenomenon through the lens of university dropouts in the United Arab Emirates. J. Appl. Res. High. Educ. 12, 357–374. https://doi.org/10.1108/JARHE-05-2019-0110
  • Baranyi, M., Nagy, M., Molontay, R., 2020. Interpretable Deep Learning for University Dropout Prediction, in: Proceedings of the 21st Annual Conference on Information Technology Education. Presented at the SIGITE ’20: The 21st Annual Conference on Information Technology Education, ACM, Virtual Event USA, pp. 13–19. https://doi.org/10.1145/3368308.3415382
  • Cocoradă, E., Curtu, A.L., Năstasă, L.E., Vorovencii, I., 2021. Dropout Intention, Motivation, and Socio-Demographics of Forestry Students in Romania. Forests 12, 618. https://doi.org/10.3390/f12050618
  • Costa, A.G., Queiroga, E., Primo, T.T., Mattos, J.C.B., Cechinel, C., 2020. Prediction analysis of student dropout in a Computer Science course using Educational Data Mining, in: 2020 XV Conferencia Latinoamericana de Tecnologias de Aprendizaje (LACLO). Presented at the 2020 XV Conferencia Latinoamericana de Tecnologias de Aprendizaje (LACLO), IEEE, Loja, Ecuador, pp. 1–6. https://doi.org/10.1109/LACLO50806.2020.9381166
  • Del Bonifro, F., Gabbrielli, M., Lisanti, G., Zingaro, S.P., 2020. Student Dropout Prediction, in: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (Eds.), Artificial Intelligence in Education, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp. 129–140. https://doi.org/10.1007/978-3-030-52237-7_11
  • Delen, D., Davazdahemami, B., Rasouli Dezfouli, E., 2023. Predicting and Mitigating Freshmen Student Attrition: A Local-Explainable Machine Learning Framework. Inf. Syst. Front. https://doi.org/10.1007/s10796-023-10397-3
  • Demeter, E., Dorodchi, M., Al-Hossami, E., Benedict, A., Slattery Walker, L., Smail, J., 2022. Predicting first-time-in-college students’ degree completion outcomes. High. Educ. 84, 589–609. https://doi.org/10.1007/s10734-021-00790-9
  • Fauszt, T., Erdélyi, K., Dobák, D., Bognár, L., Kovács, E., 2023. Design of a Machine Learning Model to Predict Student Attrition. Int. J. Emerg. Technol. Learn. IJET 18, 184–195. https://doi.org/10.3991/ijet.v18i17.41449
  • Fernandez-Garcia, A.J., Preciado, J.C., Melchor, F., Rodriguez-Echeverria, R., Conejero, J.M., Sanchez-Figueroa, F., 2021. A Real-Life Machine Learning Experience for Predicting University Dropout at Different Stages Using Academic Data. IEEE Access 9, 133076–133090. https://doi.org/10.1109/ACCESS.2021.3115851
  • Gonzalez-Nucamendi, A., Noguez, J., Neri, L., Robledo-Rella, V., García-Castelán, R.M.G., 2023. Predictive analytics study to determine undergraduate students at risk of dropout. Front. Educ. 8, 1244686. https://doi.org/10.3389/feduc.2023.1244686 Hanson, M., 2024. College Dropout Rates. EducationData.org.
  • Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M., Hussain, A., 2024. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 16, 45–74. https://doi.org/10.1007/s12559-023-10179-8
  • Kemper, L., Vorhoff, G., Wigger, B.U., 2020. Predicting student dropout: A machine learning approach. Eur. J. High. Educ. 10, 28–47. https://doi.org/10.1080/21568235.2020.1718520
  • Kim, Sangyun, Choi, E., Jun, Y.-K., Lee, S., 2023. Student Dropout Prediction for University with High Precision and Recall. Appl. Sci. 13, 6275. https://doi.org/10.3390/app13106275
  • Kim, Sean, Yoo, E., Kim, Samuel, 2023. Why Do Students Drop Out? University Dropout Prediction and Associated Factor Analysis Using Machine Learning Techniques. https://doi.org/10.48550/ARXIV.2310.10987
  • Kocsis, Z., Pusztai, G., 2020. Student Employment as a Possible Factor of Dropout. Acta Polytech. Hung. 17, 183–199. https://doi.org/10.12700/APH.17.4.2020.4.10
  • Kumar, P., Patel, S.K., Debbarma, S., Saggurti, N., 2023. Determinants of School dropouts among adolescents: Evidence from a longitudinal study in India. PLOS ONE 18, e0282468. https://doi.org/10.1371/journal.pone.0282468
  • Melo, E., Silva, I., Costa, D., Viegas, C., Barros, T., 2022. On the Use of eXplainable Artificial Intelligence to Evaluate School Dropout. Educ. Sci. 12, 845. https://doi.org/10.3390/educsci12120845
  • Mnyawami, Y.N., Maziku, H.H., Mushi, J.C., 2022. Enhanced Model for Predicting Student Dropouts in Developing Countries Using Automated Machine Learning Approach: A Case of Tanzanian’s Secondary Schools. Appl. Artif. Intell. 36, 2071406. https://doi.org/10.1080/08839514.2022.2071406
  • Naseem, M., Chaudhary, K., Sharma, B., Lal, A.G., 2019. Using Ensemble Decision Tree Model to Predict Student Dropout in Computing Science, in: 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). Presented at the 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), IEEE, Melbourne, Australia, pp. 1–8. https://doi.org/10.1109/CSDE48274.2019.9162389
  • Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., Nshimyumukiza, P.C., 2022. Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Comput. Educ. Artif. Intell. 3, 100066. https://doi.org/10.1016/j.caeai.2022.100066
  • Nurmalitasari, Awang Long, Z., Faizuddin Mohd Noor, M., 2023. Factors Influencing Dropout Students in Higher Education. Educ. Res. Int. 2023, 1–13. https://doi.org/10.1155/2023/7704142
  • Revathy, M., Kamalakkannan, S., Kavitha, P., 2022. Machine Learning based Prediction of Dropout Students from the Education University using SMOTE, in: 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT). Presented at the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), IEEE, Tirunelveli, India, pp. 1750–1758. https://doi.org/10.1109/ICSSIT53264.2022.9716450
  • Shynarbek, N., Orynbassar, A., Sapazhanov, Y., Kadyrov, S., 2021. Prediction of Student’s Dropout from a University Program, in: 2021 16th International Conference on Electronics Computer and Computation (ICECCO). Presented at the 2021 16th International Conference on Electronics Computer and Computation (ICECCO), IEEE, Kaskelen, Kazakhstan, pp. 1–4. https://doi.org/10.1109/ICECCO53203.2021.9663763
  • Singh, H.P., Alhulail, H.N., 2022. Predicting Student-Teachers Dropout Risk and Early Identification: A Four-Step Logistic Regression Approach. IEEE Access 10, 6470–6482. https://doi.org/10.1109/ACCESS.2022.3141992
  • Tamada, M.M., Giusti, R., Netto, J.F.D.M., 2022. Predicting Students at Risk of Dropout in Technical Course Using LMS Logs. Electronics 11, 468. https://doi.org/10.3390/electronics11030468
  • Tanvir, H., Chounta, I.-A., 2021. Exploring the Importance of Factors Contributing to Dropouts in Higher Education Over Time.
  • Tinto, V., 1994. Leaving College: Rethinking the Causes and Cures of Student Attrition. The University of Chicago Press.
  • Villar, A., De Andrade, C.R.V., 2024. Supervised machine learning algorithms for predicting student dropout and academic success: a comparative study. Discov. Artif. Intell. 4, 2. https://doi.org/10.1007/s44163-023-00079-z
  • Zahra, F., 2020. High Hopes, Low Dropout: Gender Differences in Aspirations for Education and Marriage, and Educational Outcomes in Rural Malawi. Comp. Educ. Rev. 64, 670–702. https://doi.org/10.1086/710778

 

Shares