Prospectus for the PADI design framework in language testing - PowerPoint PPT Presentation


PPT – Prospectus for the PADI design framework in language testing PowerPoint presentation | free to view - id: 1b727c-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Prospectus for the PADI design framework in language testing


Constructing measures (Wilson) Understanding by design (Wiggins) ... What behaviors or performances should reveal those constructs? ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 44
Provided by: bobmi9


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Prospectus for the PADI design framework in language testing

Prospectus for the PADI design framework in
language testing
Robert J. Mislevy Professor of Measurement Statistics University of Maryland Geneva D. Haertel Assessment Research Area Director SRI International
  • ECOLT 2006, October 13, 2006, Washington, D.C.
  • PADI is supported by the National Science
    Foundation under grant REC-0129331. Any opinions,
    findings, and conclusions or recommendations
    expressed in this material are those of the
    authors and do not necessarily reflect the views
    of the National Science Foundation.

Some Challenges in Language Testing
  • Sorting out evidence about interacting aspects of
    knowledge proficiency in complex performances
  • Understanding the impact of complexity factors
    and difficulty factors on inference
  • Scaling up efficiently to high volume teststask
    creation, scoring, delivery
  • Creating valid cost-effective low volume tests

Evidence-Centered Design
  • Evidence-centered assessment design (ECD)
    provides language, concepts, knowledge
    representations, data structures, and supporting
    tools to help design and deliver educational
  • all organized around the evidentiary argument an
    assessment is meant to embody.

The Assessment Argument
  • What kinds of claims do we want to make about
  • What behaviors or performances can provide us
    with evidence for those claims?
  • What tasks or situations should elicit those
  • Generalizing from Messick (1994)

Evidence-Centered Design
  • With Linda Steinberg Russell Almond at ETS
  • The Portal project / TOEFL
  • NetPASS with Cisco (computer network design
  • Principled Assessment Design for Inquiry (PADI)
  • Supported by NSF (co-PI Geneva Haertel, SRI)
  • Focus on science inquirye.g., investigations
  • Models, tools, examples

Some allied work
  • Cognitive design for generating tasks (Embretson)
  • Model-based assessment (Baker)
  • Analyses of task characteristicstest and TLU
    (Bachman Palmer)
  • Test specifications (Davidson Lynch)
  • Constructing measures (Wilson)
  • Understanding by design (Wiggins)
  • Integrated Test Design, Development, and Delivery

Key ideas Explicit relationships Explicit
structures Generativity Re-usability Recomb
inability Interoperability
Layers in the assessment enterprise
  • From Mislevy Riconscente, in press

Expertise research, task analysis, curriculum,
target use, critical incident analysis,
ethnographic studies, etc.
  • In language assessment, importance of
  • Psycholinguistics
  • Sociolinguistics
  • Target language use
  • From Mislevy Riconscente, in press

Tangible stuff
e.g., what gets made and how it operates in
testing situation
  • From Mislevy Riconscente, in press

How do you get from here to here?
  • From Mislevy Riconscente, in press

We will focus today on two hidden layers
  • From Mislevy Riconscente, in press

We will focus today on two hidden layers
Domain modeling, which concerns the Assessment
  • From Mislevy Riconscente, in press

And the Conceptual Assessment Framework, which
concerns generative re-combinable design schemas
  • From Mislevy Riconscente, in press

More on the Assessment Argument
  • From Mislevy Riconscente, in press

PADI Design Patterns
  • Organized around elements of assessment argument
  • Narrative structures for assessing pervasive
    kinds of knowledge / skill / capabilities
  • Based on research experience , e.g.
  • PADI Design under constraint, inquiry cycles,
  • Compliance w. Grices maxims cause/effect
    reasoning giving spoken directions
  • Suggest design choices that apply to different
    contexts, levels, purposes, formats
  • Capture experience in structured form
  • Organized in terms of assessment argument

A Design Pattern Motivated by Grices Relation
Attribute Value(s)
Name Grices Relation MaximResponding to a Request
Summary In this design pattern, an examinee will demonstrate following Grices Relation Maxim in a given language, by producing or selecting a response in a situation that presents a request for information (e.g., conversation).
Central claims In contexts/situations with xxx characteristics, can formulate and respond to representations of implicature from referents . semantic implication pragmatic implication
Additional knowledge that may be at issue Substantive knowledge in domain Familiarity with cultural models Knowledge of language
Grices Relation Maxims
Characteristic features The stimulus situation needs to present a request for relevant information to the examinee, either explicitly or implicitly.
Variable task features Production or choice as response? If production, oral or written production required? If oral, single response to a preconfigured situation or part of an evolving conversation? If evolving conversation, open or structured interview? Formality of prepackaged products (multiple choice, video taped conversations, written questions or conversations, one to one or more conversations which are prepared by interviewers) Formality of information and task (concrete or abstract, immediate or remote, information requiring retrieval or transformation, familiar or unfamiliar setting and topic, written or spoken) If prepackaged speech stimulus length, content, difficulty of language, explicitness of request, degree of cultural dependence. Content of situation (familiar or unfamiliar, degree of difficulty) Time pressure (e.g., time for planning and response) Opportunity for control the conversation
Grices Relation Maxims
Potential performances and work products Constructed oral response Constructed written or typed-in response Answer to a multiple-choice question where alternatives vary
Potential features of performance to evaluate Whether a student can formulate representations of implicature, as they are required in the given situation. Whether a student can make a conversational contribution or express the idea towards the accepted direction. Whether a student provides the relevant information as is required. Whether quality of choice among alternatives offered for a production in a given situation satisfies the Relation Maxim.
Potential rubrics (later slide)
Examples (in paper)
Some Relationships between Design Patterns and
Other TD Tools
  • Conceptual models for proficiency
  • Task characteristic frameworks
  • Grist for design choices about KSAs task
  • DPs present integrated design space
  • Test specifications
  • DPs for generating argument, design choices
  • Test specs for documenting, specifying choices

More on the Conceptual Assessment Framework
  • From Mislevy Riconscente, in press

Evidence-centered assessment design
Technical specs that embody the elements
suggested in the design pattern
  • The three basic models

Evidence-centered assessment design
Conceptual Representation
  • The three basic models

Screen shot of user interface
User-Interface Representation
High-level UML Representation of the PADI Object
UML Representation (sharable data structures,
behind the screen)
Evidence-centered assessment design
  • What complex of knowledge, skills, or other
    attributes should be assessed?

The NetPass Student Model
Multidimensional measurement model with selected
aspects of proficiency
Can use same student model with different tasks.
Evidence-centered assessment design
  • What behaviors or performances should reveal
    those constructs?

Evidence-centered assessment design
  • What behaviors or performances should reveal
    those constructs?

From unique student work product to evaluations
of observable variablesi.e., task-level scoring
Skeletal Rubric for Satisfaction of Quality
  • 4 Responses and explanations are relevant as
    required for current purposes of the exchange and
    neither more elaborated than appropriate or
    insufficient for the context. They fulfill the
    demands of the task with at most minor lapses in
    completeness. They are appropriate for the task
    and exhibit coherent discourse.
  • 3 Responses and explanations address the task
    appropriately and are relevant as required for
    current purposes of the exchange, but they may
    either more elaborated than required or fall
    short of being fully developed.
  • 2 The responses and explanations are connected
    to the task, but are either markedly excessive in
    information supplied or not very relevant to the
    current purpose of the exchange. Some relevant
    information might be missing or inaccurately
  • 1 The responses and explanations are either
    grossly relevant or are very limited in content
    or coherence. In either case they may be only
    minimally connected to the task.
  • 0 Speaker makes no attempt to respond or
    response is unrelated to the topic. A writing
    response at this level merely copies sentences
    from the topic, rejects the topic or is otherwise
    not connected to the topic. A spoken response is
    not connected to the direct or implied request
    for information.

Notes re Observable Variables
  • Re-usable (tailorable) to different tasks
  • Can be multiple aspects of performance being
  • May be 1-1 relationship with Student model
    Variables, but need not be.
  • That is, there can be multiple aspects of
    proficiency that are involved in probability of
    high / satisfactory/ certain style of response

Evidence-centered assessment design
Values of observable variables used to update
probability distributions for student-model
variables via psychometric modeli.e., test-level
  • What behaviors or performances should reveal
    those constructs?

An NetPass Evidence- Model Fragment for Design
Measurement models indicate which SMVs, in which
combinations, affect which observables. Task
features influence which ones and how much, in
structured measurement models.
Re-usable conditional-probability fragments and
variable names for different tasks with the same
evidentiary structure.
Evidence-centered assessment design
  • What tasks or situations should elicit those

Representations to the student, and sources of
Task Specification Template - Determining Key
Features (Wizards)
  • Setting Corporation
  • Conference Center
  • University
  • Building Length Less than 100m
  • More than 100m
  • Ethernet Standard 10BaseT
  • 100BaseT
  • Subgroup Name Teacher
  • Student
  • Customer
  • Bandwidth for a Subgroup Drop 10Mbps
  • 100Mbps
  • Growth Requirements Given
  • NA

Structured Measurement Models
  • Examples of models
  • Multivariate Random Coefficients Multinomial
    Logit Model (MRCMLM Adams, Wilson, Wang, 1997)
  • Bayes nets (Mislevy, 1996)
  • General Diagnostic Model (von Davier Yamamoto)
  • By relating task characteristics to difficulty
    with respect to different aspects of proficiency,
    create tasks with known properties.
  • Can create families of tasks around same
    evidentiary frameworks e.g., For read write
    tasks, can vary characteristics of texts,
    directives, audience, purpose.

Structured Measurement Models
  • Articulated connection between task
    characteristics and models of proficiency
  • Moves beyond modeling difficulty
  • Traditional test theory a bottleneck in
    multivariate environment
  • Dealing with complexity factors and difficulty
    factors (Robinson)
  • Model complexity factors as covariates for
    difficulty parameters wrt those aspects of
    proficiency they impact
  • Model difficulty factors as either SMVs, if
    target of inference, or as noise, if nuisance.

Advantages A framework that
  • Guides task and test construction (Wizards)
  • Provides high efficiency and scalability
  • By relating task characteristics to difficulty,
    allows creating tasks with targeted properties
  • Promotes re-use of conceptual structures (DPs,
    arguments) in different projects
  • Promotes re-use of machinery in different projects

Evidence of effectiveness
  • Cisco
  • Certification training assessment
  • Simulation-based assessment tasks
  • Conceptual model for standards for data
    structures for computer-based testing
  • ETS

  • Isnt this just a bunch of new words for
    describing what we already do?

An answer (Part 1)
  • No.

An answer (Part 2)
  • An explicit, general framework makes
    similarities and implicit principles explicit
  • To better understand current assessments
  • To design for new kinds of assessment
  • Tasks that tap multiple aspects of proficiency
  • Technology-based tasks (e.g., simulations)
  • Complex observations, student models, evaluation
  • To foster re-use, sharing, modularity
  • Concepts arguments
  • Pieces of machinery processes (QTI)

For more information
  • Has links to PADI, Cisco, articles, etc.
  • (e.g., CRESST report on Task-Based Language