Development of an Educational Data Mining Workbench

The EDM Workbench was conceptualized by Ryan Baker of Worcester Polytechnic Institute and Bruce McLaren of the Pittsburgh Science of Learning Center.

The workbench will allow learning scientists to
1) define and modify behavior categories of interest (e.g., gaming, unresponsiveness, off-task conversation, help avoidance). The Workbench will also support researchers in automatically re-labeling data when labeling schemes change.
2) label previously collected educational log data with the categories of interest, considerably faster than is possible through previous live observation or existing data labeling methods, through a “Customized Log Action Viewer” (CLAV) (see Figure 1).
3) collaborate with others in labeling data by providing tools to communicate and document labeling guidelines and standards.
4) validate inter-rater reliability between multiple labelers of the same educational log data corpus.
5) analyze textual data (e.g., chat), in collaborative learning situations, by integrating a text categorization tool such as TagHelper (e.g. Rosé et al, 2008).
6) automatically distill additional information from log files for use in machine learning, such as estimates of student knowledge and context about student response time (i.e. how much faster or slower was the student’s action, than the average for that problem step) .
7) provide support for directly and immediately running the labeled data through a machine-learning tool, such as WEKA or RapidMiner.
8 ) produce code that can be used to immediately transfer the detectors generated by the EDM Workbench and machine-learning tool into educational software that can use the detectors to react to student metacognitive and motivational behavior in real time.
9) export resultant models of student behavior to tools which enable sophisticated secondary analyses, such as the sequential pattern analysis offered by Jeong’s (2003) DAT.

Through the use of a tool such as the one proposed here, the process of developing a detector of relevant metacognitive and motivational behaviors can be sped up by as much as a factor of 100 – i.e. a detector can be developed in about 1% as much time as was previously possible. Just the use of “text replays” (cf. Baker, Corbett, & Wagner, 2006; Baker & de Carvalho, 2008), a visualization technique similar to but much less flexible than the visualizations given by CLAV (point 2 above) on previously collected log data has been shown to speed detector development by about 40 times, with no reduction in detector accuracy. However, text replays do not provide any of the other capabilities listed above.

We will test the EDM Workbench in at least five separate and quite distinct STEM domains in which student action data has been and will continue to be logged. Possible domains include but are not limited to: algebra (Koedinger, Cunningham, Skogsholm, & Leber, under review), SQL data base queries (Mitrovic, Martin, & Mayo, 2002), collaborative discussion of ethical issues in science (de Groot et al, 2007; McLaren, Scheuer, et al, 2007), early mathematics (in an action game) (Habgood, 2007), stoichiometry (McLaren et al, 2006; McLaren, Lim, Yaron, & Koedinger, 2007), and Ecology (Rebolledo-Mendez, et al. 2006). In each case, we will conduct parallel studies in which we will do the same labeling task with the EDM Workbench and either live observation (Rodrigo et al, 2007; Rodrigo et al, 2008; Rodrigo et al, 2009) or an existing general-purpose tool, such as Excel (cf. McLaren, Scheuer, et al, 2007). This will enable us to test whether the EDM Workbench speeds up the process of detector development – and by how much – and whether the resultant detector is as accurate as detectors developed through existing methods.

Funded by Department of Science and Technology’s Engineering Research and Development for Technology program (DOST-ERDT)

People:
Ma. Mercedes T. Rodrigo, Ph. D.
Jessica O. Sugay
Jeffrey Jongko
John Paul Contillo

Photos

Archives

Search

Site Guide

Partners