The Statistics Concept Inventory Presenters: Kirk Allen, School of Industrial Engineering Robert Terry, Department of Psychology Other team members: Teri Reed Rhoads, Director of Engineering Education Teri Jo Murphy, Associate Professor of - PowerPoint PPT Presentation

Loading...

PPT – The Statistics Concept Inventory Presenters: Kirk Allen, School of Industrial Engineering Robert Terry, Department of Psychology Other team members: Teri Reed Rhoads, Director of Engineering Education Teri Jo Murphy, Associate Professor of PowerPoint presentation | free to download - id: 407153-MTYxN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

The Statistics Concept Inventory Presenters: Kirk Allen, School of Industrial Engineering Robert Terry, Department of Psychology Other team members: Teri Reed Rhoads, Director of Engineering Education Teri Jo Murphy, Associate Professor of

Description:

The Statistics Concept Inventory Presenters: Kirk Allen, School of Industrial Engineering Robert Terry, Department of Psychology Other team members: – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: The Statistics Concept Inventory Presenters: Kirk Allen, School of Industrial Engineering Robert Terry, Department of Psychology Other team members: Teri Reed Rhoads, Director of Engineering Education Teri Jo Murphy, Associate Professor of


1
The Statistics Concept InventoryPresentersK
irk Allen, School of Industrial
EngineeringRobert Terry, Department of
PsychologyOther team membersTeri Reed Rhoads,
Director of Engineering EducationTeri Jo Murphy,
Associate Professor of MathematicsAndrea Stone,
Ph.D. student in MathematicsMaria Cohenour,
Ph.D. student in Psychology
2
Overview
  • Background
  • Big picture analysis
  • reliability, validity
  • Item analysis

3
Background
  • Statistics Concept Inventory (SCI) project began
    in Fall 2002
  • Based on the format of the Force Concept
    Inventory (FCI)
  • Shifts focus away from problem solving, which is
    the typical classroom format
  • Focus on conceptual understanding
  • Multiple choice, around 30 items

4
Force Concept Inventory
  • Focuses on Newtons three laws and related
    concepts
  • Scores and gains on initial testing much lower
    than expected
  • Led to evaluating teaching styles
  • Interactive engagement found to be most effective
    at increasing student understanding

5
(No Transcript)
6
Other Concept Inventories
  • Many engineering disciplines are developing
    concept inventories
  • e.g., thermodynamics, circuits, materials,
    dynamics, statics, systems signals
  • Foundation Coalition
  • http//www.foundationcoalition.org/home/keycompone
    nts/concept/index.html

7
Statistics Concept Inventory
  • Currently at 4 dimensions 38 items total
  • Descriptive statistics - 11 items
  • Inferential statistics - 11 items
  • Probability - 9 items
  • Graphical - 7 items

8
SCI Pilot Study (2003 FIE)
  • Pilot version of the instrument tested in 6
    classes during Fall 2002, near the end of the
    semester
  • Intro classes Engineering, Communications,
    Mathematics (2)
  • Advanced classes Regression, Design of
    Experiments
  • 32 questions
  • Results
  • Males significantly outperformed females on the
    SCI
  • Mathematics majors outperformed social science
    majors, but no other pairs of majors differed
    significantly
  • Most of the social science majors were in a class
    with poor testing conditions, which may be the
    reason for their low scores
  • SCI scores positively correlated with statistics
    experience and a statistics attitudinal measure

9
Further Work (2004 ASEE)
  • Based on results from the revised SCI from Summer
    2003 and Fall 2003
  • Focus on assessing and improving the validity,
    reliability, and discriminatory power of the
    instrument
  • Utilized focus groups, psychometric analyses, and
    expert opinions

10
Results Spring 2005
Course Level Mean, pre Mean, post SD, post
Quality Junior IE 44.9 -- 13.4 (pre)
Engr Intro, calc 40.7 44.9 14.6
Math 1 Intro, calc 46.8 44.0 14.4
Math 2 Intro, calc 48.5 45.6 14.4
External Intro, calc -- 49.8 13.8
Psych Intro, algebra 38.6 43.9 10.9
11
Reliability
  • A reliable instrument is one in which measurement
    error is small, which can also be stated as the
    extent that the instrument is repeatable
  • Most commonly measured using internal consistency
  • Cronbachs alpha, which is a generalization of
    Kuder-Richardson equation 20 (KR-20)
  • Alpha above 0.80 is reliable by virtually any
    standard
  • Alpha 0.60 to 0.80 is considered reliable for
    classroom tests according to some references
    (e.g., Oosterhof)

12
Reliability Big picture
Semester Pre-Test Alpha Post-Test Alpha
Fall 2002 -- 0.6494
Summer 2003 0.7434 0.6965
Fall 2003 0.6915 0.7031
Spring 2004 0.6979 0.7203
Fall 2004 0.6943 0.6692
Spring 2005 0.6852 0.7600
13
Reliability More detailed(Spring 2005)
Course Pre-Test Alpha Post-Test Alpha
Quality 0.7084 --
Engr 0.6619 0.7744
Math 1 0.6071 0.7676
Math 2 0.7640 0.7079
Psych 0.4284 0.5918
14
Reliability Observations
  • Reliability generally increases from pre-test to
    post-test
  • Guessing (pre) tends to lower alpha
  • Reliability varies by class, especially the lower
    value for the Psych Gen Ed course
  • Similar results for other algebra-based courses
  • Different math background
  • Questions are context-dependent?
  • Reliability at other universities has been lower
    at times but not always
  • Reliability generalization?

15
Reliability other measures
  • For multi-dimensional data, alpha underestimates
    the true reliability
  • Theta based on largest eigenvalue from
    principal components
  • Value 0.77
  • Omega based on communalities from common factor
    model
  • Value 0.86
  • a ? O ? 0.70 lt 0.77 lt 0.86
  • Force Concept Inventory is more uni-dimensional
    (a0.89)
  • SCI designed to measure four concepts

16
Validity
  • Many types of validity (e.g., face, concurrent,
    predictive, incremental, construct)
  • Focused on content, concurrent, and predictive
    because they are broad validity concepts and are
    commonly used in the literature

17
Content Validity
  • Content validity refers to the extent to which
    items are (1) representative of the knowledge
    base being tested and (2) constructed in a
    sensible manner (Nunnally)
  • Focus groups ensure that the question is being
    properly interpreted and help develop useful
    distracters

18
Content Validity
  • Faculty survey statistics topics were rated for
    their importance to the faculty helps provide a
    list of which topics to include on the SCI
  • Planning to conduct a new survey (online) using
    faculty from outside OU as well as
    non-engineering faculty
  • AP Statistics course outline also consulted for
    topic coverage
  • Gibbs criteria identify poorly written
    questions

19
Concurrent Validity
  • Concurrent validity is assessed by correlating
    the test with other tests (Klein)
  • For the other test, we used the overall course
    grade.

20
Concurrent Validity
  • For Fall 2003 Added 2 external universities (3
    classes total, intro level, Engineering depts.),
    along with Engr Math
  • Valid as a post-test for all four engineering
    stats courses, but again not for Math

Course SCI Pre SCI Post SCI Gain SCI Norm.Gain
Engr (n47) r 0.360 (p 0.012) r 0.406 (p 0.005) r 0.114 (p 0.444) r 0.139 (p 0.352)
Math 2 (n14) r -0.066 (p 0.823) r -0.054 (p 0.854) r 0.011 (p 0.970) r 0.200 (p 0.492)
External 1 (n43) -- r 0.343 (p 0.024) -- --
External 2a (n51) r 0.224 (p 0.113) r 0.296 (p 0.035) r 0.094 (p 0.514) r 0.052 (p 0.716)
External 2b (n48) r 0.400 (p 0.005) r 0.425 (p 0.003) r -0.034 (p 0.818) r -0.041 (p 0.780)
21
Concurrent Validity
  • For Spring 2004 Three courses 1 Engr, 2 Math
  • Different results Math course now valid but not
    Engr
  • Engr had a different professor and textbook

Course SCI Pre SCI Post SCI Gain SCI Norm.Gain
Engr (n29) r 0.060 (p 0.758) r 0.133 (p 0.493) r 0.080 (p 0.679) r 0.108 (p 0.578)
Math 1 (n30) r 0.323 (p 0.081) r 0.502 (p 0.005) r 0.316 (p 0.089) r 0.353 (p 0.056)
Math 2 (n26) r 0.219 (p 0.282) r 0.384 (p 0.053) r 0.303 (p 0.133) r 0.336 (p 0.094)
22
More observations
  • In general, pre-test performance on the SCI is
    less predictive of end-of-course performance than
    the post-test SCI performance, as expected
  • This analysis could serve as a diagnostic to
    determine which instructors focus on concepts vs.
    calculations

23
Construct Validity
  • Three-factor and four-factor FIML with general
    factor
  • Descriptive, inferential, probability, and
    graphical sub-tests
  • Graphical a priori grouped with Descriptive in
    3-factor Confirmatory Model
  • Overall results Item Uniqueness 70.1 and 70.4
  • Most items share about 30 factor variance
  • Preference is for four-factor model because
    graphical items are a separate sub-test
  • Verify analysis with more recent data and more
    graphical items added to the SCI
  • These results are based on Fall 2003 (largest
    dataset thus far)

24
(No Transcript)
25
Test Information Curve for Probability Subtest
26
Test Information Curve for Descriptive Subtest
27
Test Information Curve for Inferential Subscale
28
Test Information Curve for Graphical Subtest (2
items)
29
Item Discrimination Index
  • Compares top quartile to bottom quartile on each
    item
  • Generally around 1/3 of the items fall into each
    of the ranges poor (lt 0.20), moderate (0.20 to
    0.40) and high (gt 0.40)

30
Percentage of items falling into each range
Course Poor (lt 0.20) Moderate (0.20 to 0.40) Good ( 0.40)
Fall 2002 28 53 19
Summer 2003 35 24 41
Fall 2003 35 35 30
Spring 2004 31 39 30
Fall 2004 42 25 33
Spring 2005 22 51 27
31

Course Poor (lt 0.20) Moderate (0.20 to 0.40) Good ( 0.40) Average Disc. Index
Fall 2004 Fall 2004 Fall 2004 Fall 2004  
Engr 18 4 15 0.22
Math 1 15 9 13 0.30
Math 2 13 10 14 0.33
External Fa04 12 15 10 0.26
External Sp05 20 8 9 0.22
Spring 2005 Spring 2005 Spring 2005 Spring 2005  
Engr 9 8 22 0.35
Math 1 13 10 16 0.35
Math 2 8 16 15 0.32
Psych 8 19 10 0.27
32
Item Analysis
  • Discrimination index
  • Alpha-if-deleted
  • Reported by SPSS or SAS
  • Shows how overall alpha would change if that one
    item were deleted
  • Answer distribution
  • Try to eliminate or improve choices which are
    consistently not chosen
  • Focus group comments
  • IRT curves

33
Item Analysis
  • Question which was totally changed
  • Fall 2002
  • Discrimination index poor (0.16)
  • Alpha-if-deleted above the overall alpha
    (deleting the item would increase alpha)
  • Too symbol-oriented, not focused on the concept
  • Topic of conditional probability too important to
    delete (faculty survey)

If P (AB) 0.70, what is P (BA)? a) 0.70 b)
0.30 c) 1.00 d) 0 e) Not enough information (
correct ) f) Other __________________________
34
Item Analysis
  • Replacement item

In a manufacturing process, the error rate is 1
in 1000. However, errors often occur in bursts.
Given that the previous output contained an
error, what is the probability that the next unit
will also contain an error? a) Less than 1 in
1000 b) Greater than 1 in 1000 ( correct ) c)
Equal to 1 in 1000 d) Insufficient information
35
Item Analysis
  • Summer 2003
  • Three of four classes have discriminatory indices
    above 0.30 (max 0.55)
  • Same three also have positive effect on alpha
  • Focus groups comments on non-memoryless
    property and bursts would throw off the odds
  • Possible problem some students chose D because
    unsure how a burst is defined

36
Upon Further Revision
  • In a manufacturing process, the error rate is 1
    in 1000. However, errors often occur in groups,
    that is, they are not independent. Given that the
    previous output contained an error, what is the
    probability that the next unit will also contain
    an error?
  • Less than 1 in 1000
  • Greater than 1 in 1000
  • Equal to 1 in 1000
  • Insufficient information

37
New data
  • Item discrimination improved compared to Summer
    2003
  • Fall 2004, average 0.63
  • Spring 2005, average 0.56
  • Around 40 to 50 correct
  • Other options are chosen in approximately equal
    proportions, with A only slightly less

38
IRT curves First Version
39
IRT curves Second Version
40
Law of Large Numbers
Which would be more likely to have 70 boys born
on a given day A small rural hospital or a large
urban hospital? a) Rural b) Urban c)
Equally likely d) Both are extremely unlikely
41

Pre 1 Post 1 Pre 2 Post 2 Pre 3 Post 3
Choice a 36 17 (-19) 32 32 (none) 26 23 (-3)
Choice b 6 7 5 3 9 10
Choice c 43 69 45 45 51 63
Choice d 15 7 18 19 14 3
Results from 3 classes, percent of students
choosing each letter (Spring 2004). Misconception
do not realize the importance of sample size on
variability of means Discrimination index on the
post test is 0.44, 0.27, and 0.50. So the
question could be considered basically good
psychometrically, as well as demonstrating the
lack of knowledge gain.
42
  • A fair coin is flipped four times in a row, each
    time landing with heads up. What is the most
    likely outcome if the coin is flipped a fifth
    time?
  • Tails, because even though for each flip heads
    and tails are equally likely, since there have
    been four heads, tails is slightly more likely
  • Heads, because this coin has a pattern of landing
    heads up
  • Tails, because in any sequence of tosses, there
    should be about the same number of heads and
    tails
  • Heads and tails are equally likely

43
Results
  • Almost everyone gets this question correct
  • Fall 2003, Six courses
  • 83 to 98 correct
  • Discrimination -0.06, 0.02, 0.08, 0.14, 0.19, 0.31

44
New Version (same choices)
  • A coin of unknown origin is flipped twelve times
    in a row, each time landing with heads up. What
    is the most likely outcome if the coin is flipped
    a thirteenth time?
  • Tails, because even though for each flip heads
    and tails are equally likely, since there have
    been twelve heads, tails is slightly more likely
  • Heads, because this coin has a pattern of landing
    heads up
  • Tails, because in any sequence of tosses, there
    should be about the same number of heads and
    tails
  • Heads and tails are equally likely

45
Results
  • Fall 2004 and Spring 2005
  • Engineering class does ok (38), disc. 0.67
  • Other classes poor (10 15 correct), low
    discrimination
  • Nearly all students still choose D
  • Students seem to be trained to answer that
    coins are fair, but they cannot adapt to a
    situation where the coin (or the flipper) is
    most likely unfair.

46
Understanding p-values
  • A researcher performs a t-test to test the
    following hypotheses
    He
    rejects the null hypothesis and reports a p-value
    of 0.10. Which of the following must be
    correct?
  • The test statistic fell within the rejection
    region at the significance level
  • The power of the test statistic used was 90
  • Assuming Ho is true, there is a 10 possibility
    that the observed value is due to chance
  • The probability that the null hypothesis is not
    true is 0.10
  • The probability that the null hypothesis is
    actually true is 0.9




47
Results for 4 classes
Pre 1 Post 1 Pre 2 Post 2 Pre 3 Post 3 Pre 4 Post 4
Choice a 15 41 32 52 5 67 17 18
Choice b 16 18 14 20 14 0 6 9
Choice c 41 35 (-6) 41 15 (-24) 62 27 (-35) 47 42 (-5)
Choice d 18 6 14 12 19 7 19 24
Choice e 2 0 0 0 0 0 11 6
48
Analysis
  • Discrimination
  • Pre 0.25, -0.17, 0.52, 0.15
  • Post 0.00, -0.14, 0.25, 0.33

49
P-value question
  • Problems?
  • too definitional
  • p-value taught from an interpretive standpoint
  • when to reject or not reject the null hypothesis
  • Therefore

50
New question (not a replacement)
  • An engineer performs a hypothesis test and
    reports a p-value of 0.03. Based on a
    significance level of 0.05, what is the correct
    conclusion?
  • The null hypothesis is true.
  • The alternate hypothesis is true.
  • Do not reject the null hypothesis.
  • Reject the null hypothesis

51
Results of New Question
  • Discrimination better (post-test)
  • 0.20, 0.29, 0.75, 0.12
  • still not great overall (0.19)
  • Percent correct and gains low
  • Only one negative gain
  • Post-test correct (gain /-)
  • 6 (-17)
  • 20 (-3)
  • 33 (4)
  • 19 (10)

52
Issues
  • Misconceptions vs. misunderstandings
  • Breadth of statistics field compared to other
    concept inventories
  • Is this useful to any statistics course? Or are
    we still using too much Engineering context?

53
Future Development
  • Expand base item pool in the 4 current areas
  • Add advanced topic subscales
  • Regression
  • Design of experiments
  • Allow instructors to choose only subscales or
    specific items from subscales based on teaching
    goals and coverage of concepts

54
Lessons
  • Hey, engineers care about this stuff too!
  • Students appear to have little understanding of
    most basic statistics concepts
  • Using a mixture of analysis techniques has
    improved the test
  • No silver bullet

55
Contact Information
  • Website http//coecs.ou.edu/sci/
  • Info on scores from various classes
  • Other papers relating to SCI
  • Email Kirk Allen, kcallen_at_ou.edu
  • Teri Reed Rhoads, PI, teri.rhoads_at_ou.edu
  • Teri Jo Murphy, Co-PI, tjmurphy_at_math.ou.edu
About PowerShow.com