Title: Dimensions Affecting the Assessment of Reading Comprehension
1Dimensions Affecting the Assessment of Reading
Comprehension
- David J. Francis University of Houston
- Jack M. Fletcher University of Texas-Houston
- Hugh Catts University of Kansas
- Bruce Tomblin University of Iowa
- Presented to PREL Focus on Comprehension Forum,
- New York, Sept. 29, 2004
- This research was supported in part by funding
from NICHD under PO1 HD31952, HD 30995, HD28172,
P01HD 21888, - and NIDCD under P50 DC 002746
2Factors Affecting the Assessment of Reading
Comprehension in Adults
Although they limited themselves to one drink at
lunch, Jack and David nevertheless scored more
poorly on reading assessments in the afternoon.
3Overview
- Reading is multi-dimensional
- Implications for Assessment
- Factors affecting performance on Comprehension
Assessments - Mitigating the effects of decoding on State
Assessments for children with RD - Conclusions
4Reading is Multi-dimensional
- Reading is the process of extracting meaning from
printed language - There are numerous characteristics of the reader
and of the printed language (i.e., text) that
affect this process of constructing meaning - Not surprisingly, there are numerous approaches
to the assessment of reading comprehension - These approaches differ in both inputs and
outputs, but also in the purposes of assessment
5The 2005 NAEP Reading Framework
- Reading is an active and complex process that
involves - understanding written text
- developing and interpreting meaning and
- using meaning as appropriate to type of text,
purpose, and situation. - Taken from the 2005 NAEP Reading Framework
6Meaningful Variations in Reading Assessments
- Common Variations in inputs
- Type of text presented (e.g., expository,
narrative) - Length of text presented
- Linguistic and orthographic complexity of the
text - Semantic complexity of the text
- Demands on background knowledge
7The 2005 NAEP Reading Framework
Taken from the 2005 NAEP Reading Framework
8The 2005 NAEP Reading Framework
Taken from the 2005 NAEP Reading Framework
9The 2005 NAEP Reading Framework
Taken from the 2005 NAEP Reading Framework
10Meaningful Variations in Reading Outcomes
- The Rand Reading Research Study Group cited three
outcomes of reading - Knowledge (critical evaluation and integration of
new content with stored information) - Application (utilization of new content to solve
problems) - Engagement (involvement with ideas, experience,
and styles of texts) - These relate fairly closely to the NAEP aspects
of reading
11Taken from the 2005 NAEP Reading Framework
12Meaningful Variations in Reading Assessments
- General variations in response formats
- Type of response (multiple choice, cloze,
constructed response, extended response, retell) - Length of response
- Speed of response
- NAEP varies type and length of response
- Constructed response (brief and extended)
- Multiple choice
- State assessments often use similar options
13NAEP Grade 4 Blue Crabs
- By George W. Frame
- Nearly every day last summer my nephew Keith and
I went crabbing in a creek on the New Jersey
coast. We used a wire trap baited with scraps of
fish and meat. Each time a crab entered the trap
to eat, we pulled the doors closed. We cooked and
ate the crabs we caught. Â Â Â Â Blue crabs are very
strong. Their big claws can make a painful pinch.
When cornered, the crabs boldly defend
themselves. They wave their outstretched claws
and are fast and ready to fight. Keith and I had
to be very careful to avoid having our fingers
pinched. Â Â Â Â Crabs are arthropods, a very large
group of animals that have an external skeleton
and jointed legs. Other kinds of arthropods are
insects, spiders, and centipedes. Blue crabs
belong to a particular arthropod group called
crustaceans. Crustaceans are abundant in the
ocean, just as insects are on land. Â Â Â Â The blue
crab's hard shell is a strong armor. But the
armor must be cast off from time to time so the
crab can grow bigger. Getting rid of its shell is
called molting. Â Â Â Â Each blue crab molts about
twenty times during its life. Just before
molting, a new soft shell forms under the hard
outer shell. Then the outer shell splits apart,
and the crab backs out. This leaves the crab with
a soft, wrinkled, outer covering. The body
increases in size by absorbing water, stretching
the soft shell to a much larger size. The crab
hides for a few hours until its new shell has
hardened. Â Â Â Â Keith and I sometimes found these
soft-shell crabs clinging to pilings and hiding
beneath seaweed.
14Sample NAEP Questions Blue Crabs
- 1. Do you think it would be fun to catch blue
crabs? Using information from the passage,
explain why or why not. - 2. According to the passage, what do blue crabs
have in common with all other arthropods?   - A) They have a skeleton on the outside of their
bodies. - B)Â They hatch out of a shell-like pod.
- C)Â They live in the shallow waters of North
America. - D) They are delicious to eat.     Â
15Sample NAEP Questions Blue Crabs
- 3. The growth of a blue crab larva into a
full-grown blue crab is most like the development
of   - A) a human baby into a teen-ager
- B)Â an egg into a chicken
- C)Â a tadpole into a frog
- D)Â a seed into a tree
- 4. Write a paragraph telling the major things you
learned about blue crabs.  Â
16Meaningful Variations in the Purposes of
Assessment
- Variations in the purposes of assessment are also
relevant - Student evaluation (formal informal high
stakes low stakes diagnostic and prescriptive) - School evaluation
- Reporting (e.g., NAEP)
- Research (where reading might be an outcome, or a
predictor, or both) - These variations in purpose also affect student
motivation, which can impact performance
17Reading Comprehension and its Assessment are
Multidimensional
- The foregoing makes clear that reading is
multi-dimensional in its presentation to the
reader, in what is expected of the reader, in the
contexts in which it occurs, and in the purposes
which it serves - For assessment to be successful, we must be clear
of its purposes and mindful of its consequences
18Reading Comprehension and its Assessment are
Multidimensional
- For some purposes, the choice of assessment may
have minimal impact on decisions that we reach - In evaluating students, the choice of which
assessment to use appears to have only minimal
bearing on final decisions about the relative
positions of students (Feinberg, 1990 Campbell,
2002) - That is not to say that the choice of assessment
is inconsequential
19Reading Comprehension and its Assessment are
Multidimensional
- One consequence stems from the link between
assessment and instruction - Given that different response types tend to
engage different thought processes (Campbell,
2002), reliance on a single response format in
state assessments may adversely narrow
instruction - But this link between response type and cognitive
processes may also bear on research findings in
reading comprehension
20Reading Comprehension and its Assessment are
Multidimensional
- If an assessment engages certain cognitive
processes, then research that favors that
assessment may be biased in favor of factors
related to those processes - For example, if the assessment fails to engage
students in evaluation and integration of
information, then research will find negligible
effects for the higher order linguistic and
cognitive abilities sub-serving these processes,
or the instructional practices that develop those
abilities and processes
21Reading Comprehension and its Assessment are
Multidimensional
- Psychometrically motivated research can help to
shed light on the extent to which such factors
may be operating - To see how this might work, lets consider the
role of decoding in comprehension - Lets do this in the light of several different
studies with samples from different populations
22Connecticut Longitudinal Study (Shaywitz et al.)
- This first slide shows correlations over time
between the Woodcock Reading Mastery Test Passage
Comprehension Scores and WRMT Decoding composite
(Letter Word and Word Attack) scores - The CLS sample is an epidemiologic sample from
Connecticut, largely white, middle to upper
income children (Shaywitz, et al., 1990) with
very low attrition (over 90 retention through
Grade 9)
23(No Transcript)
24EARS Sample
- This next slide shows correlations between two
reading measures, WJ PC and the Formal Reading
Inventory (FRI), at grades 1 and 2 in a large
normative sample from three schools in Houston - The sample is a multi-cohort, longitudinal sample
that is balanced for gender and roughly balanced
for race/ethnicity.
25EARS Sample Demographics
- Total N945 across 5 cohorts
- 3 schools
- All children in all K, 1, and 2 invited. (Random
sample of those consenting - 80) - Free lunch participation ranged from 13 to 30
- Boys and girls were equally represented
- Caucasian (54), African American (18), Hispanic
(15), Asian (12) - SES - LC (9), WC (43), MC (48)
26Correlations among WJ Passage Comprehension and
FRI Silent Reading with Decoding and Vocabulary
27Early Interventions Sample (Foorman, et al.)
- The following slide shows correlations for two
measures of comprehension, WJ PC and the CRAB
(Fuchs Fuchs), with three measures of decoding
over four years in a freshened longitudinal
sample recruited from 17 high poverty schools in
two cities. - The sample was over 95 African American.
- Children were randomly sampled from Kindergarten
and Grade 1 classrooms and followed
longitudinally through Grade 4.
28Correlations for WJ PC and CRAB with three
Decoding Measures from Grades 1 and 4 for
Ethnic-minority Children from 17 High-Poverty
Schools
29CLRC Sample of Children with and without Specific
Language Impairment (Tomblin and Catts)
- This final sample comes from an epidemiologic
study of specific language impairment being
directed by Bruce Tomblin and Hugh Catts. - The children were recruited in Kindergarten and
followed longitudinally in Grades 2, 4, and 8.
Grade 10 assessment is beginning this year. - There are four groups of children (n570),
Controls (n268), SLI (n117), NLI (n91), and
low-cognition (n94)
30Correlations for three comprehension measures
with language, decoding, and fluency at Grades 2
and 4 for CLRC Sample (N570)
DABS Diagnostic Assessment Battery
Comprehension Score GORT Gray Oral Reading
31Latent Variable Perspective
- The presence of multiple measures of
comprehension across multiple time points allows
examination of more precisely formulated
hypotheses about the relations among the measures - The WRMT-PC, DABS, and GORT are all purported to
measure reading comprehension. - They correlate reasonably high with one another
and with factors known to be associated with
reading comprehension.
32Latent Variable Perspective
- Do these three measures reflect an underlying
ability, which no one test measures perfectly,
but which all measure somewhat imperfectly? - A strong version of this idea would say that the
three tests share one thing in common, and it is
this commonality which reflects the underlying
process of reading comprehension.
33Latent Variable Perspective
- Such psychometric hypotheses carry with them very
specific assertions about - The relations among the variables
- The relations of each of the variables to other
variables that are related to the proposed
construct - As well as relations to variables not related to
the proposed construct - These assertions are falsifiable, which is what
makes psychometric models useful for studying the
properties of tests
34One Factor Model for CLRC Sample Multiple Group
Analysis (factor loadings constrained equal)
35Multiple Group Single Factor Model for
Comprehension with Language and Decoding at Grade
2 as Predictors
36Correlations among factors in multi-group model
RC2 RC4
lang2 decode2 RC2
1.00 RC4 0.88 1.00 lang2
0.59 0.63 1.00 decode2 0.92
0.78 0.45 1.00
Residual Correlation between RC-2 and RC-4 0.61
37Problems with the One-Factor Model
- Overall the model fit is not particularly strong,
especially in light of the strong support for the
one factor model in Grades 2 and 4 without
predictors - Introducing the predictors into the model
increases our power for discriminating among the
different measures of comprehension, and
falsifying the uni-dimensionality hypothesis - Lack of fit in the model tends to come from the
somewhat stronger relationship between decoding
and WJ PC than the other comprehension measures,
and their somewhat greater relation with language.
38Points of Clarification
- It should be noted that all of the models allow
for test specific relations over time for any
repeated measure - The correlation among the comprehension factors
is substantial ranging from .88 to .98 - The correlations over time for all factors are
quite high, indicating a high degree of stability
in all factors
39Conclusions
- Reliance on a single measure of comprehension may
diminish our understanding of the importance of
different skills to comprehension. - Inclusion of multiple measures mitigates that
bias somewhat, but the comprehension measures in
this study do not function as a single factor. - By formulating and testing an explicit model for
the set of observed relations among measures, we
obtained considerably more information about how
the tests actually function than by eyeballing
the correlations among different individual tests.
40Conclusions
- It is worth considering that comprehension might
be better conceptualized in a production
indicator framework, akin to the relation of SES
to parents education and family income. - This alternate measurement framework is not
without challenges, but may better reflect the
complementary roles of decoding, language,
background knowledge, long term working memory
(Kintsch) and other cognitive processes in the
formation of meaning from text.
41Accommodations for Children with RD
- The effects of accommodations on the performance
of students with disabilities on accountability
and other high stakes tests have been the topic
of several recent reviews (Chiu Pearson, 1999
Fuchs, Fuchs, Capizzi, in press Sireci, Li,
Scarpati, 2003 Thompson, Blount, Thurlow,
2000 Tindal Fuchs, 2000). - These reviews uniformly lamented the relative
dearth of empirical studies of the effects of
accommodations, noting that the research base was
inconsistent and generally not adequate to
support firm conclusions about the effects of
specific accommodations.
42Accommodations for Children with RD
- The lack of consistency across studies reflected
the wide range of accommodations evaluated in
research, differences in implementation, and the
heterogeneity of the students identified as
disabled (Sireci et al., 2003).
43Accommodations for Children with RD
- For accommodations to be fair, they must not
alter the validity of the test - In practice, appropriate accommodations will
improve the performance of students with
disabilities but have negligible impact on the
performance of students without disabilities
44Accommodations for Children with RD
- One way to think about this notion of
differential impact is to think of the
accommodation as removing some construct
irrelevant variance from the test - That is, for children with a disability, there
are factors which contribute to performance on
the test which are not essential elements of the
construct of interest and do not affect the
performance of children without disabilities
45Hypothetical Example
- For example, suppose that for students with RD,
reading ability is a source of variance in
performance on a math test - In contrast, for children without RD, reading is
not a significant factor in math performance - Then, reading the directions and word problems on
the math test to students who are poor in reading
would remove this irrelevant source of variance
in the math test for students with RD
46Possible Accommodations on Reading Assessments
for Children with RD
- Children with RD struggle with comprehension
because of poor decoding skills - A number of possible accommodations have been
proposed and examined - Increased time to read the test
- Allowing children to read material out loud
- Increasing print size
- Reading passages to children (NOTE THIS
INVALIDATES THE TEST)
47Suite of Accommodations in Study
- Extended Time (students allowed to complete
assessment on two days) - Examiner read aloud
- Instructions,
- Proper nouns,
- Item stems
- These were chosen because they could be
implemented in practice and because they
preserved the validity of the state outcome
assessment
48Study Design
- 182 Grade 3 children were recruited from 6
districts, 48 schools, and 113 classrooms - N91 grade 3 children with RD
- N91 grade 3 children who were average readers
from the same classrooms - Children in each group were randomly assigned to
take the TAKS reading assessment either under
standard administration conditions (n47 in each
group), or under the accommodations (n44 in each
group)
49Study Design
- All children were tested on a practice version of
the Grade 3 Texas Assessment of Knowledge and
Skills (TAKS) that was built by the test
developer during field testing - In addition, children were given the Letter Word
and Word Attack subtests of the WJ III and the
picture vocabulary subtest of the WLPB (Woodcock,
1991)
50TAKS Reading
- No modifications were made of the TAKS booklets
the only modifications were in the instructions
provided by the examiners. - The Grade 3 reading assessment of the TAKS
involves a practice story and three stories of
increasing difficulty. - Questions are designed to access the literal
meaning of the passage, vocabulary, and different
aspects of critical reasoning about the material
in the paragraph.
51TAKS Reading
- Both expository and narrative materials are
included. - The TAKS is an untimed measure during standard
administration guidelines and students are
typically allowed as much time as they need to
complete the assessment. - Like all TAKS tests, the Grade 3 reading
comprehension assessment is a criterion
referenced assessment that is aligned to state
standards.
52Study Results
- There was a significant interaction between RD
status and Accommodations - Specifically, access to accommodations
significantly improved performance, but only for
children in the RD group. - Accommodations had a negligible effect for
children without RD.
53Study Results
Test of Interaction F (1, 155) 12.04, p
.0007 Note Model included random effects of
school within district.
54Study Results
- In addition to significantly improving average
performance levels, accommodations significantly
affected student passing rates. - Again, improvements were seen only for children
with RD. - For children with RD, accommodations improved
passing rates to 41 from 9 (p lt .0005) - Pass rates for children without RD went down
slightly from 83 to 77 (p is n.s.)
55Conclusions
- The study showed that an appropriate suite of
accommodations could substantially and
significantly improve performance for children
with RD on the Grade 3 TAKS - Effects were seen in both the level of
performance and in the percentage of children
meeting standards - These same accommodations had virtually no effect
on the performance of children without RD
56Conclusions
- Given the goal of the TAKS to assess students
ability to understand text, the suite of
accommodations used here is appropriate - Whether similar accommodations would be
successful for older students remains to be
determined.