Title: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss rbennettets'org
1Randy BennettFrank JenkinsHilary PerskyAndy
Weissrbennett_at_ets.org
- Scoring Simulation Assessments
Funded by the National Center for Education
Statistics, US Department of Education
2What is NAEP?
- National Assessment of Educational Progress
- The only nationally representative and continuing
assessment of what US students know and can do in
various subject areas - Paper testing program
- Administered to samples in grades 4, 8, and 12
- Scores reported for groups but not individuals
3TRE Study Purpose
- Demonstrate an approach to assessing problem
solving with technology at the 8th grade level
that - Fits the NAEP context
- Uses extended performance tasks
- Models student proficiency in an
evidence-centered way
4Conceptualizing Problem Solving with Technology
5What do the Example Modules Attempt to Measure?
- By scientific-inquiry skill, we mean being able
to find information about a given topic, judge
what information is relevant, plan and conduct
experiments, monitor ones efforts, organize and
interpret results, and communicate a coherent
interpretation. - By computer skill, we mean being able to carry
out the largely mechanical operations of using a
computer to find information, run simulated
experiments, get information from dynamic visual
displays, construct a table or graph, sort data,
and enter text.
6(No Transcript)
7(No Transcript)
8Scoring the TRE Modules
- Develop initial scoring specifications during
assessment design - Represent what is being measured as a graphical
model - Proposal for how the components of proficiency
are organized in the domain of problem solving in
technology-rich environments
9TRE Student Model
10Connecting Observations to the Student Model
- Three-step process
- Feature extraction
- Feature evaluation
- Evidence accumulation
11Feature Extraction
- All student actions are logged in a transaction
record - Feature extraction involves pulling out
particular observations from the student
transaction record - Example the specific experiments the student
chose to run for each of the Simulation problems
12A Portion of the Student Transaction Record
13Feature Evaluation
- Each extraction needs to be judged as to its
correctness - Feature evaluation involves assigning scores to
these observations
14A Provisional Feature-Evaluation Rule
- Quality of experiments used to solve Problem 1
- IF the list of payload masses includes the low
extreme (10), the middle value (50), and the high
extreme (90) with or without additional values,
THEN the best experiments were run. - IF the list omits one or more of the above
required values but includes at least 3
experiments having a range of 50 or more, THEN
very good experiments were run. - IF the list has only two experiments but the
range is at least 50 OR the list has more than
two experiments with a range equal to 40, THEN
good experiments were run. - IF the list has two or fewer experiments with a
range less than 50 OR has more than two
experiments with a range less than 40, THEN
insufficient experiments were run.
15An Example of a Best Solution
16An Example of an Insufficient Solution
17Evidence Accumulation
- Feature evaluations (like item responses) need to
be combined into summary scores that support the
inferences we want to make from performance - Evidence accumulation entails combining the
feature scores in some principled manner - Bayesian inference networks
- Offer a very general, formal, statistical
framework for reasoning about interdependent
variables in the presence of uncertainty
18An Evidence Model Fragment for Exploration Skill
in Simulation 1
19Using Evidence to Update the Student Model
20Using Evidence to Update the Student Model
21TRE Student Model
22Conclusion
- TRE illustrates
- Measuring problem-solving with technology, with
emphasis on the integration of the two skill sets - Using extended tasks like those encountered in
advanced academic and work environments - Modeling student performance in a way that
explicitly accounts for multidimensionality and
for uncertainty
23Conclusion
- Important remaining issues
- Measurement
- Tools to evaluate model fit not well-developed
- Extended performance tasks have limited
generalizability - Logistical
- Adequate school technology not yet universal
- Cost
- Task production and scoring are labor-intensive
24Randy BennettFrank JenkinsHilary PerskyAndy
Weissrbennett_at_ets.org
- Scoring Simulation Assessments
Funded by the National Center for Education
Statistics, US Department of Education