Core Questions
What is educational data mining?
What is the difference between educational data mining on the one hand, and traditional psychometrics and statistics on the other hand?
Given that difference, what is the implication of the increase in use of psychometric frameworks in EDM, reported in all of the articles, and continuing to this day? (see for instance, the special issue on ECD & EDM in JEDM)
In past classes, I have claimed that EDM is associated with reductionism and with McKeon's entitative perspective? Is this claim reflected in these three articles?
Across Romero and Ventura, Baker and Yacef, and McLaren and Scheuer, there are three different perspectives on what the categories of methods are in educational data mining. What are the relative merits of these different perspectives?
Baker and Yacef, in 2009, refer to four major areas of application of EDM models: Improving student models, discovering domain structure, determining what types of pedagogical support are most effective, and enhancing educational theories. What does this tell us about EDM? What currently important uses of EDM/learning analytics are missing?
Scheuer and McLaren refer to six major areas of application of EDM models: Scientific inquiry and system evaluation, determining student model parameters, informing domain models, creating diagnostic models, creating reports and alerts, and recommending resources and activities? How does this perspective differ from Baker and Yacef? What are the pluses and minuses of these differences?
Baker and Yacef argue that EDM opened research on gaming the system to "concrete, quantitative, and fine-grained analysis." Has EDM reached its potential to do this?
One early theme in EDM is the development of research tools, a theme that has continued throughout EDM. However, most EDM research is still conducted using general-purpose tools. (BKT is one of the few exceptions to this). Why might this be?
What are your thoughts about the cycle of applying data mining from Romero & Ventura?
Secondary Questions
Romero & Ventura refer to EDM as a field; Scheuer & McLaren refer to it as a discipline (and later as a field); Baker & Yacef refer to it as a research community. What is the difference between these three perspectives? Which one is most accurate? Which one is most useful?
The definition of EDM in Baker & Yacef is focused on the specific properties of educational data, whereas the definition of EDM in Scheuer & McLaren is focused on the fact that there's a lot of data. Why might Scheuer & McLaren have made this shift? What are its implications?
What is the implication of the use of discovery with models in EDM research? What is positive about this trend? Are there any negatives?
Romero and Ventura argue that one key differences between data mining for e-commerce and e-learning is that the purpose in e-commerce is to guide clients in purchasing or increasing profit while the purpose in e-learning is to guide students in learning. With the recent use of EDM by for-profit corporations (such as the Apollo Group) in using EDM to predict program dropout, is this distinction as clear as it was?
The PSLC DataShop played a major role in the field around 2008 and 2009. That role is diminishing (in terms of proportion if not absolute numbers); why? Is it simply due to more proprietary data? the emergence of new types of data not explicitly considered during the design of DataShop? the rapid growth of the field? or other factors?
Baker and Yacef argue that the use of public data sets will support external validation of analyses, and help researchers build on each others' efforts. As researchers increasingly work with non-public data, are these benefits lost, or are they being achieved in other ways?
Scheuer and McLaren treat parameter estimation as a separate area of EDM. Do you agree?
Scheuer and McLaren call for replication studies to test for model reliability and generalization. Do you agree? Are there any other ways to test model reliability and generalization?