Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program
This study examined the prediction of dropouts through data mining approaches in an online program. The subject of the study was selected from a total of 189 students who registered to the online Information Technologies Certificate Program in 2007-2009. The data was collected through online questionnaires (Demographic Survey, Online Technologies Self-Efficacy Scale, Readiness for Online Learning Questionnaire, Locus of Control Scale, and Prior Knowledge Questionnaire). The collected data included 10 variables, which were gender, age, educational level, previous online experience, occupation, self efficacy, readiness, prior knowledge, locus of control, and the dropout status as the class label (dropout/not). In order to classify dropout students, four data mining approaches were applied based on k-Nearest Neighbour (k-NN), Decision Tree (DT), Naive Bayes (NB) and Neural Network (NN). These methods were trained and tested using 10-fold cross validation. The detection sensitivities of 3-NN, DT, NN and NB classifiers were 87%, 79.7%, 76.8% and 73.9% respectively. Also, using Genetic Algorithm (GA) based feature selection method, online technologies self-efficacy, online learning readiness, and previous online experience were found as the most important factors in predicting the dropouts.
The European Journal of Open, Distance and E-Learning