A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics that Impact Online Learning
Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying which of the many variables derived from tracked data are useful for predicting student learning. This challenge has prompted considerable research in the field of educational data mining and learning analytics. This work advances such research in four ways. First, we bring together two approaches for finding salient variables from separate research areas: hierarchical linear modeling (HLM) from education and Lasso feature selection from computer science. Second, we show that these two approaches have complimentary and synergistic results with some variables considers salient by both and others salient by only one. Third, and most importantly, we demonstrate the benefits of a combined approach that considers a variable salient when either HLM or Lasso consider that variable salient. This combined approach both improves model predictive accuracy and finds additional variables considered salient in previous datasets on student learning. Lastly, we use the results to provide insights into the salient variables to the learning outcome in undergraduate CS education. Overall, this work suggests a combined approach that improves the identification of salient variables in big data and also improves the design of LO tracking systems for learning management systems.