Educational data mining overview - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Educational data mining overview

Description:

... Psychology. Carnegie Mellon University. Overview ... dataset name Geometry Hampton 2005-2006 /name level type='Lesson' name PACT-AREA /name ... – PowerPoint PPT presentation

Number of Views:1733
Avg rating:3.0/5.0
Date added: 30 May 2020
Slides: 46
Provided by: scie290
Learn more at: http://www.learnlab.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Educational data mining overview


1
Educational data mining overview Introduction
to Exploratory Data Analysis with DataShop
  • Ken Koedinger CMU Director of PSLC
  • Professor of Human-Computer Interaction
    Psychology
  • Carnegie Mellon University

2
Overview
  • DataShop Overview
  • Logging model
  • DataShop Features
  • Quantitative models of learning curves
  • Power law, logistic regression
  • Contrasting KC models
  • Exploratory Data Analysis Exercise (start)
  • Knowledge Component Model Editing

3
Logging Storage Models
  • Education technologies are instrumented to
    produce log data
  • We encourage a standard log format
  • XML format generalized from Ritter Koedinger
    (1995)
  • Also convert log data from other formats

4
Relational Database -- complex!
5
Example activity generating click stream data
  • Geometry Cognitive Tutor Making Cans problem
  • Find the area of scrap metal left over after
    removing a circular area (the end of a can) from
    a metal square.
  • Student enters values in worksheet
  • Tutor provides feedback instruction
  • Records students actions tutor responses
  • Logs stored in files on school server or database
    at Carnegie Learning
  • Later imported into DataShop

6
DataShop logging model
  • Main constructs
  • Context message the student, problem, and
    session with the tutor
  • Tool message represents an action in the tool
    performed by a student or tutor
  • Tutor message represents a tutors response to a
    student action

7
DataShop XML format Context message
  • ltcontext_message context_message_id"C2badca9c5c-
    7fe5" name"START_PROBLEM"gt ltdatasetgt
    ltnamegtGeometry Hampton 2005-2006lt/namegt
    ltlevel type"Lesson"gt ltnamegtPACT-AREAlt/namegt
    ltlevel type"Section"gt ltnamegtPACT-AREA-6lt/namegt
    ltproblemgt ltnamegtMAKING-CANSlt/namegt
    lt/problemgt lt/levelgt lt/levelgt
    lt/datasetgt lt/context_messagegt

Dataset name
Course unit
Course section
Problem
8
DataShop XML format Tool Tutor Messages
  • lttool_message context_message_id"C2badca9c5c-7fe
    5"gt ltsemantic_event transaction_id"T2a9c5c-7fe
    7" name"ATTEMPT" /gt ltevent_descriptorgt
    ltselectiongt(POG-AREA QUESTION2)lt/selectiongt
    ltactiongtINPUT-CELL-VALUElt/actiongt
    ltinputgt200.96lt/inputgt lt/event_descriptorgt lt/tool
    _messagegt lttutor_message context_message_id"C2bad
    ca9c5c-7fe5"gt ltsemantic_event
    transaction_id"T2a9c5c-7fe7" name"RESULT" /gt
    ltevent_descriptorgt as above
    lt/event_descriptorgt ltaction_evaluationgtCORRECTlt/
    action_evaluationgt lt/tutor_messagegt

9
Example Stored Transactions
  • Student interactions (or transactions) are stored
    in a relational database, can be exported as
    table
  • Example Student S01 on Making-Cans problem

10
Transactions
  • Info for each transaction
  • student(s), session, time, problem, problem step,
    attempt number, student action
  • tutor response, number of hints, knowledge
    component code
  • Logging of on-line tools (e.g., a virtual lab)
    does not include tutor response

11
Step Transaction Definitions
  • A problem-solving activity typically involves
    many tool tutor messages.
  • Steps represent completion of possible subgoals
    or pieces of a problem solution
  • Transactions are attempts at a step or requests
    for instructional help

12
Example data aggregated by student-step
13
Overview
  • DataShop Overview
  • Logging model
  • DataShop Features
  • Quantitative models of learning curves
  • Power law, logistic regression
  • Contrasting KC models
  • Exploratory Data Analysis Exercise (start)
  • Knowledge Component Model Editing

14
DataShop Analysis Tools
  • Dataset Info
  • Performance Profiler
  • Learning Curve
  • Error Report
  • Export
  • Sample Selector

15
Dataset Info
  • Meta data for given dataset
  • PIs get edit privileges, others must request it

Papers and Files storage
15
16
Performance Profiler
Multipurpose tool to help identify areas that are
too hard or easy
  • View measures of
  • Error Rate
  • Assistance Score
  • Avg Hints
  • Avg Incorrect
  • Residual Error Rate
  • Aggregate by
  • Step
  • Problem
  • KC
  • Dataset Level

17
Learning Curve
Visualizes changes in student performance over
time
View by KC or Student, Assistance Score or Error
Rate
Time is represented on the x-axis as
opportunity, or the of times a student (or
students) had an opportunity to demonstrate a KC
18
Error Report
  • Provides a breakdown of problem information (by
    step) for fine-grained analysis of
    problem-solving behavior
  • Attempts are categorized by student

View by Problem or KC
19
Sample Selector
Easily create a sample/filter to view a smaller
subset of data
  • Filter by
  • Condition
  • Dataset Level
  • Problem
  • School
  • Student
  • Tutor Transaction

Shared (only owner can edit) and private samples
20
Export
You can also export the Problem Breakdown table
and LFA values!
  • Two types of export available
  • By Transaction
  • By Step
  • Anonymous, tab-delimited file
  • Easy to import into Excel!

21
Help/Documentation
  • Extensive documentation with examples
  • Contextual by tool/report
  • http//learnlab.web.cmu.edu/datashop/help

Glossary of common terms, tied in with PSLC
Theory wiki
22
New Features
  • Manage Knowledge Component models
  • Create, Modify Delete KC models within DataShop
  • Addition of Latency Curves to Learning Curve
    Reporting
  • Time to Correct
  • Assistance Time
  • Problem Rollup Export
  • Enhanced Contextual Help

23
Overview
  • DataShop Overview
  • Logging model
  • DataShop Features
  • Quantitative models of learning curves
  • Power law, logistic regression
  • Contrasting KC models
  • Exploratory Data Analysis Exercise (start)
  • Knowledge Component Model Editing

24
Recall learning curve story
Without decomposition, using just a single
Geometry KC,
no smooth learning curve.
But with decomposition, 12 KCs for area concepts,
a smooth learning curve.
Upshot A decomposed KC model fits learning
transfer data better than a faculty theory of
mind
25
Learning curve analysis
  • The Power Law of Learning (Newell Rosenbloom,
    1993)
  • Y a Xb
  • Y error rate
  • X opportunities to practice a skill
  • a error rate on 1st opportunity
  • b learning rate
  • After the log transformation
  • a is the intercept or starting point of the
    learning curve
  • b is the slope or steepness of the learning
    curve

26
More sophisticated learning curve model
  • Generalized Power Law to fit learning curves
  • Logistic regression (Draney, Wilson, Pirolli,
    1995)
  • Assumptions
  • Different students may initially know more or
    less
  • gt use an intercept parameter for each student
  • Students learn at the same rate
  • gt no slope parameters for each student
  • Some productions may be more known than others
  • gt use an intercept parameter for each
    production
  • Some productions are easier to learn than others
  • gt use a slope parameter for each production
  • These assumptions are reflected in detailed math
    model

27
More sophisticated learning curve model
p ?
  • Probability of getting a step correct (p) is
    proportional to
  • if student i performed this step Xi, add
    overall smarts of that student ?i
  • if skill j is needed for this step Yj, add
    easiness of that skill ?j add product of number
    of opportunities to learn Tj
    amount gained for each opportunity ?j

Use logistic regression because response is
discrete (correct or not) Probability (p) is
transformed by log odds stretched out with
s curve to not bump up against 0 or 1 (Related
to Item Response Theory, behind standardized
tests )
28
Different representation, same model
  • Predicts whether student is correct depending on
    knowledge practice
  • Additive Factor Model (Draney, et al. 1995, Cen,
    Koedinger, Junker, 2006)

29
The Q Matrix
  • How to represent relationship between knowledge
    components and student tasks?
  • Tasks also called items, questions, problems, or
    steps (in problems)
  • Q-Matrix (Tatsuoka. 1983)
  • 2 8 is a single-KC item
  • 28 3 is a conjunctive-KC item, involves two
    KCs

Item KC Add Sub Mul Div
28 0 0 1 0
28 - 3 0 1 1 0
29
30
Model Evaluation
  • How to compare cognitive models?
  • A good model minimizes prediction risk by
    balancing fit with data complexity (Wasserman
    2005)
  • Compare BIC for the cognitive models
  • BIC is Bayesian Information Criteria
  • BIC -2log-likelihood numPar log(numOb)
  • Better (lower) BIC better predict data that
    havent seen
  • Mimics cross validation, but is faster to compute

30
31
  • Data the Geometry Area Unit
  • 24 students, 230 items, 15 KCs

Model Title LL BIC numPar
G -2,175 4,566 26
Original -1,911 4,271 54
Item -1,720 5,554 254
31
32
Learning curve constrast in Physics dataset
33
Not a smooth learning curve -gt this knowledge
component model is wrong. Does not capture
genuine student difficulties.
34
More detailed cognitive model yields smoother
learning curve. Better tracks nature of student
difficulties transfer (Few observations after
10 opportunities yields noisy data)
35
Best BIC (parsimonious fit) for Default
(original) KC model
36
Overview
  • DataShop Overview
  • Logging model
  • DataShop Features
  • Quantitative models of learning curves
  • Power law, logistic regression
  • Contrasting KC models
  • Exploratory Data Analysis Exercise (start)
  • Knowledge Component Model Editing

37
Exploratory Data Analysis Exercise
  • Goals 1) Get familiar with data 2)
    Learn/practice Excel skills
  • Tasks 1) create a step table 2) graph
    learning curves

38
TWO_CIRCLES_IN_SQUARE problem Initial screen
39
TWO_CIRCLES_IN_SQUARE problem An error a few
steps later
40
TWO_CIRCLES_IN_SQUARE problem Student follows
hint completes prob
41
Exported File Loaded into Excel
42
See handout of exercise Do some of in next
session
43
Overview
  • DataShop Overview
  • Logging model
  • DataShop Features
  • Quantitative models of learning curves
  • Power law, logistic regression
  • Contrasting KC models
  • Exploratory Data Analysis Exercise (start)
  • Knowledge Component Model Editing

44
DataShop Demo
  • Examples of exercise
  • KC model editing

45
END
About PowerShow.com