EDN 523 Educational Research - PowerPoint PPT Presentation


PPT – EDN 523 Educational Research PowerPoint presentation | free to download - id: 246809-MWRlN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

EDN 523 Educational Research


Next, we could compare scores on Billy Bob's Test at the beginning of the year ... NAEP does not provide scores for individual students or schools. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 103
Provided by: ITSD
Learn more at: http://people.uncw.edu


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: EDN 523 Educational Research

EDN 523Educational Research
  • Validity and Educational Reform

Accountability Models
  • In the current age of educational reform,
    high-stakes decisions based on large-scale
    testing performance are becoming increasingly
    common. The decisions associated with test
    performance carry significant consequences (e.g.,
    rewards and sanctions). The degree of confidence
    in, and the defensibility of test score
    interpretations depends upon valid, consistent,
    and reliable measurements. Stated differently, as
    large-scale assessment becomes more visible to
    the public, the roles of validity (and
    reliability) become more important.

The Validity of Test Scores
  • Validity is a tests most important
  • Why?
  • It is the degree to which a test measures what
    it is supposed to measure and as a consequence
    allows and supports appropriate interpretation of
    the scores.

Types of Test Validity
  • Construct Validity the degree to which a test
    measures an intended hypothetical construct.
  • Content Validity the degree to which a test
    measures an intended content area.

Construct Validity
  • The most important question to ask
  • What is this test really measuring?
  • As Professor Kozloff shared with you last week,
    we must first define what we want to measure
  • Constructs or concepts can be non-observable
    traits, such as intelligence, which are
    invented terms to explain educational outcomes.

  • Constructs underlie the variables we want to
  • You cannot see a construct you can only observe
    its effect.
  • You can however, use constructs to explain
    differences among individuals.

For Example
  • It has been observed that some students learn
    faster than others
  • Some students learn more than others
  • And some students retain what they learn longer
    than other students
  • Using this information, we created a construct
    called intelligence that is related to learning
    and that everyone possesses to a greater or
    lesser degree.

A Theory of Intelligence
  • From the observed and collected data, the
    construct of intelligence led to the development
    of a theory of intelligence.
  • Tests were developed to measure the amount or
    how much intelligence a person has.
  • Sometimes (at least occasionally) students whose
    achievement and/or test scores indicate that they
    have a lot of it tend to do better in school
    and other learning environments than those who
    have less of it.

Construct Validity
  • Research studies involving a construct are valid
    only to the extent that the test or instrument
    used actually measures the intended construct and
    not some unanticipated or intervening variable.
  • Determining test construct validity is not easy!

For Example Billy Bobs IQ Test
Lets say Billy Bob owns a testing company in
Texas. Billy Bob has designed an IQ test that he
is marketing to public schools across the
country. If we wanted to determine if Billy
Bobs IQ Test was construct valid, then we would
need to carry out several validation studies.
Validating Billy Bobs Test
  • First, we could see if students who scored high
    on Billy Bobs Test learned faster, learned more,
    and retained more of what they learned than
    students who scored low on his test.
  • Next, we could compare scores on Billy Bobs Test
    at the beginning of the year with students
    grades at the end of the school year.
  • We could also compare students scores on Billy
    Bobs Test with scores on other, well-established
    IQ tests to see if they were highly related.

Professor Kozloffs Findings
  • Using Billy Bobs Test and relating it to Dr.
    Kozloffs presentation last weekif the findings
    say that a student has an IQ of 140 and this is
    confirmed when compared with the students scores
    on other, well-established IQ tests and other
    measurements, then these matches strengthen the
    construct validity or validness of Billy Bobs
    Test as an intelligence measuring instrument.

What about Test Content Validity?
  • Content Validity is based on professional
    judgments about the relevance of the test content
    to the particular domain of interest, such as the
    NC Standard Course Of Study, and about the
    representativeness with which items and/or task
    content on the instrument cover the domain.

NC ABCs of Accountability
  • The Content Validity of the NC ABC
    Accountability Tests is directly related to
    whether or not the items, tasks, and concepts are
    aligned with the domains, constructs, and/or
    variables we are attempting to assess and
  • The establishment of evidence of test relevance
    and representativeness of the target domains is
    a critical first step in being able to say that
    the test score interpretations are based on
    strong content validity.

Content ?Alignment ? Construct
  • Alignment is a key issue in as much as it
    provides one avenue for establishing evidence for
    score interpretation. Validity is not a static
    quality it is an evolving property and
    validation is a continuing process.
  • Evaluating test alignment should occur regularly,
    taking its place in the recurring process of
    assessment development and revision of the
    testing instruments.
  • An objective analysis of alignment as tests are
    adopted, built, or revised ought to be conducted
    on an ongoing basis.
  • This is a critical step in establishing evidence
    of the validity of test score or performance

Test Evaluation
  • Professor Kozloff discussed program decision
    making last week relative to evaluating whether
    or not a new reading program would be needed if
    students have low reading test scores.
  • He recommended that we ask the following
  • How was the reading evaluated?
  • Did the reading test DIRECTLY measure reading

For Example
  • Did students get points if they guessed at what
    words said?
  • Did students get points off if they guessed
    rather than sounded out the words?
  • Are we measuring reading or are we measuring
  • Did the test use objective/quantitative data or
    subjective/qualitative data?

What are we really measuring?
  • Math Example If we simply look at data on math
    ability among individuals, it initially appears
    that shoe size is directly related to a persons
    math skills. Each time we test or measure a
    persons math ability and compare it to their
    shoe size, there is a 11 positive correlation
    the larger a persons shoe size, the better they
    are in solving higher level math problems.
  • How is that possible?

A Valid Measurement Age
  • Age
  • Shoe Size ? Math Ability

A Valid Measurement
  • Education Level ? Earning Power
  • College Graduates Earning Power
  • High School Graduates Earning
  • Are they related? You bet your derriere
    they are!
  • College graduates average lifetime earnings
    2.2 million
  • HS graduates average lifetime earnings
    1.2 million
  • Difference 1
  • 30 year work life/ 1 million 33, 333 more per

Cost of a College Education
  • The average yearly cost at a public 4-year
    college is 8,655 for in-state tuition, room and
  • Over a four year period, the total cost for a
    college degree, considering increases and
    miscellaneous expenses, is approximately 40,000.
  • This data supports the contention that, though
    the cost of higher education is significant,
    given the earnings disparity that exists between
    those who earn a bachelor's degree and those who
    do not, the individual rate of return on an
    investment in higher education is sufficiently
    high enough to warrant the cost.

Individual and Community Benefits
  • Rate of return for college educated individual is
    almost twice as much income during his or her
    work life than a high school graduate.
  • A college educated person contributes taxes into
    the national coffers and buys goods, services,
    and products to keep the countrys economy
    humming (an additional 425 billion by 2015)
  • A person with a college degree typically makes
    better lifestyle choices which decreases health
    costs and burdens on medical support services.

Cost of An Uneducated Person
  • It is estimated that there are 3.8 million youth
    between the ages of 18 and 24 who are neither
    employed nor in schoolroughly 15 percent of all
    young adults. Since 2000 alone, the ranks of
    these non-engaged young adults grew by 700,000, a
    19 percent increase over 3 years (Annie E. Casey
    Foundation, 2004).

High School Dropouts A National Crisis
  • Almost 70 of prisoners did not complete high
  • There appears to be some kind of relationship
    between dropping out of high school and a life of
  • Limited alternatives, easily turned to the dark
    side, and being reinforced for criminal behavior.

The Cost of Crime
  • What is the impact of poorly educated citizens on
    our economy and quality of life?
  • What are the costs?

Estimated Cost of Crime in the US
  • Two Data Sources
  • U.S. Department of Justice Office of Justice
  • Programs Bureau of Justice Statistics Data,
  • D. A. Anderson, (1999) "The Aggregate Burden
  • of Crime," Journal of Law and Economics,

1. Cost of Crime Police, Courts, and Prisons
             Source Justice Expenditure and
Employment Extracts
70 Billion
60 Billion
40 Billion
Total 170 Billion
2. Crime-Related Production Costs?
  • Cost of goods and services that would be
    unnecessary in a country with low crime rates.
  • Money spent on locks and safes 4 billion
  • Surveillance cameras 1.4 billion
  • Computer virus screening and security 8
  • Drug trafficking 160 billion
  • Federal agencies to fight crime 23 billion
  • Medical care to treat the victims of crime 8.9
  • Newborns exposed to cocaine heroine 28
  • Total 233.3 billion

Opportunity Costs of Crime
  • In addition to the direct cost of resources
    devoted to crime, there is a sizable loss of time
    by people who are potential victims of crime and
    by those who have committed crime. And as the
    saying goes - time is money.

Time Lost to Crime
  • If there were no crime, there would be no
    criminals. Instead of sitting idle in jails,
    these people - over 1 million of them - could be
    productive members of the economy, so chalk up
    35 billion. Also, there is much time lost in
    planning and executing crimes - 4.1 billion
  • Then add to that the value of work time lost by
    victims (876 million) and the time spent on
    neighborhood watches (655 million).

Total for Lost Time Due to Crime
  • 39.1 Billion Prisoners not working
  • 1.5 Billion Victims lost work time
  • 4.1 Billion Time planning committing crimes
  • Total 44.7 Billion

Life and Health Crime Costs
  • We accounted for the cost of medical expenses,
    but we should also consider the cost of a life
    and injury in addition to the direct costs
    incurred by the medical system.
  • There are approximately 72,000 crime related
    deaths a year and 2.5 million crime related
    injuries every year.
  • Lets put a dollar value on a life and injuries.
    What is a life worth? Conservative estimate
    80,000. An injury 52,637.
  • With these numbers, the total cost of crime for
    lost life and injuries comes to 137.8 billion

Crime Transfer Costs - Fraud
  • We also need to add the cost of property and
  • that is stolen or obtained through fraud.
  • Fraud at work 203 Billion
  • Unpaid taxes 123 Billion
  • Health insurance fraud 108 Billion
  • Auto theft 8.9 billion
  • Coupon fraud racks up 912 million annually.
  • Total comes to 603 billion.

Grand Total (Conservative Estimate)
  • Police, Courts, and Prisons 170.0
  • Crime-Related Production Costs 234.0 B
  • Lost Time Due to Crime 44.7
  • Life and Health Crime Costs 137.8 B
  • Crime Transfer Costs (Fraud) 603.0 B
  • Grand Total 1.12 Trillion Dollars Per Year

An Uneducated Person Impact Effects
  • Adults without a high school diploma are twice as
    likely to be unemployed.
  • They will earn 260,000 less over a lifetime than
    a high school graduate and 1 million less than a
    college graduate
  • Dropouts make up nearly 70 percent of inmates
    crowding state prisons and at least half those on
  • Crime costs our country 1.12 Trillion Dollars A

The High School Diploma
  • Gateway or Barrier to a prosperous, successful
  • If you have it, you have opportunities
  • If you dont have it, you have a cross to bear

Failure to Graduate High School
  • What is the primary reason students drop out of
  • school and do not graduate with a diploma?

So, What is the Big Deal About Reading?
  • Lets seeeducated citizenry, economic
    well-being, secure and prosperous nation,
    successful individuals capable of providing for
    themselves and their familieswhat do you think?

NC Reading Legislation Senate Bill 16
Passed in 1995 sponsored by then state senator
Beverly Perdue, required the State Board of
Education to reorganize the NCDPI and to develop
an accountability plan for the state. The
result was the ABCs of Public Education, which
included a plan to revise the Standard Course of
Study. The General Assembly accepted the
accountability plan and subsequently enacted
Senate Bill 1139, the School-Based Management and
Accountability Program.
Senate Bill 1139
  • Sponsored by former Senator Leslie Winner (D-40th
    Dist.), established the requirement that the
    State Board of Education develop a comprehensive
    reading plan for North Carolina as part of the
    revised Standard Course of Study.
  • According to the bill, the General Assembly
    believes that the first, essential step in the
    complex process of learning to read is the
    accurate pronunciation of written words and that
    phonics is the most reliable approach to arriving
    at the accurate pronunciation of a printed word.

Outcomes of Senate Bill 1139
  • The bill mandated that the reading plan include
    early and systematic phonics.
  • A member of a prominent reading interest group,
    who shall remain nameless, (but their initials
    are Whole Language) called the bill the phonics
  • A representative from the NCDPI spoke to the
    legislatures focus on phonics and suggested
    that, some members of the legislature had the
    opinion that teachers werent teaching phonics,
    that the whole language movement had gotten the
    upper hand in the state and that reading skills
    were not being taught.

Reading Legislation
  • In 1997, Congress asked the Director of the
    National Institute of Child Health and Human
    Development (NICHD) at the National Institutes of
    Health, in consultation with the Secretary of
    Education, to convene a national panel to assess
    the effectiveness of different approaches used to
    teach children to read.
  • For over two years, the National Reading Panel
    (NRP) reviewed research-based findings on reading
    instruction and held open panel meetings in
    Washington, DC, and regional meetings across the
    United States.

NRP Research Report
  • The National Reading Panel only looked at studies
    that met the most rigorous standards of
    scientifically based research.
  • On April 13, 2000, the NRP concluded its work and
    submitted "The Report of the National Reading
    Panel Teaching Children to Read," at a hearing
    before the U.S. Senate Appropriations Committee's
    Subcommittee on Labor, Health and Human Services,
    and Education.

Impact of NRP Report
  • Part of the No Child Left Behind Act of 2001,
    signed into law on January 8, 2002, was President
    Bush's unequivocal commitment to ensuring that
    every child can read by the end of third grade.
  • To accomplish this goal, the new Reading First
    Initiative significantly increased the federal
    investment in scientifically based reading
    instruction programs in the early grades.

Reading First
  • NCLB established Reading First as a new
    high-quality, evidence-based program for students
  • The new Reading First State Grant program makes
    six-year grants to states, which in turn make
    competitive subgrants to local communities.
  • Building on a solid foundation of research, the
    program is designed to select, implement, and
    provide professional development for teachers
    using scientifically based reading programs, and
    to ensure accountability through ongoing, valid
    and reliable screening, diagnostic, and
    classroom-based assessment.

Reading First SBR Research
  • The Goal of North Carolina's Reading First
    (NCRF) initiative is to ensure that all children
    learn to read well by the end of the third grade.
    This goal will be accomplished by applying
    scientifically based reading research to reading
    instruction in all North Carolina schools. The
    initiative requires phonics instruction.

Phonics Model of Reading Instruction
  • Phonemic Awareness Attentiveness to the sounds
    of spoken language.
  • Phonics Decoding unfamiliar words using knowledge
    of the alphabetic principle.
  • Fluency Grade-appropriate oral reading with
    appropriate speed, accuracy, and expression.
  • Vocabulary Development Knowledge of word meanings
    to facilitate effective spoken and written
    language communication.
  • Text Comprehension Use of a variety of
    comprehension strategies to monitor comprehension
    to construct meaning from print.

(No Transcript)
SBR and Standardized Tests
  • Test Construct Validity - Research studies
    involving a construct are valid only to the
    extent that the test or instrument used has been
    determined by SBR to actually measure the
    intended construct and not some unanticipated or
    intervening variable.
  • Test Content Validity - based on the relevance of
    test items and tasks to the domain of interest
    (in our case the NC Standard Course Of Study),
    and the representativeness with which items and
    tasks on the instrument cover the domain.

Content ?Alignment ? Construct
  • Alignment is a key issue in standardized tests,
    as much as it provides a means for establishing
    evidence for score interpretation.
  • Test validity is not a static quality it is an
    evolving property and a continuing process.

NC ABCs Model
  • Test content validity is based on professional
  • judgments about the relevance of the test
  • content to the content of a particular behavioral
  • domain of interest about the representativeness
  • with which items and tasks cover that domain.

Reading Achievement Test
  • If a test is designed to measure reading
    achievement and a test score is judged relative
    to a set proficiency standard (i.e., a cut
    score), the interpretation of reading proficiency
    will be heavily dependent on a match (or
    alignment) between test content and content area

Alignment is a Key Issue
  • Alignment provides an avenue for
  • establishing evidence for score interpretation.
  • Evaluating test alignment should occur
  • regularly, taking its place in the recurring
  • process of assessment development and revision
  • of the testing instruments. In addition, tests
  • should be regularly assessed and evaluated
  • following established standards.

Standards for Test Validity
  • The "Standards for Educational and Psychological
    Testing" established by the American Educational
    Research Association, the American Psychological
    Association, and the National Council on
    Measurement in Education, are intended to provide
    a comprehensive approach for evaluating tests
    based on key standards applicable to most test
    evaluation situations (Rudner, 1994).

Part I Test Construction, Evaluation, and
  • Validity
  • Reliability and Errors of Measurement
  • Test Development and Revision
  • Scales, Norms, and Score Comparability
  • Test Administration, Scoring, and Reporting
  • Supporting Documentation for Tests

Part II Fairness in Testing
  • Fairness in Testing and Test Use
  • The Rights and Responsibilities of Test Takers
  • Testing Individuals of Diverse Linguistic
  • Testing Individuals with Disabilities

Part III Testing Applications
  • The Responsibilities of Test Users
  • Psychological Testing and Assessment
  • Educational Testing and Assessment
  • Testing in Employment and Credentialing
  • Testing in Program Evaluation and Public Policy

If we consider the NC ABCs Testing Model, then
we should ask the following questions
  • Who is the test designed for?
  • What are the intended uses of the test?
  • What interpretations are appropriate?
  • Does the test represent a valid assessment of the
    content and constructs identified in the NC
    Standard Course of Study?
  • Is the test appropriate for students?

NC ABCs Intended Uses
  • If we consider these questions relative to the
    NC ABCs Testing Model , then there must be a
    clear statement of recommended uses, a meaningful
    description of the population for which the test
    is intended, and a valid representation of the NC
    Standard Course of Study (NCSCOS).

What Do We Mean By Test Validity?
  • The individuals in the norming and validation
    samples should represent the group for which the
    test is intended in terms of age, experience and

Questions To Ask
  • How were the samples used in pilot testing,
    validation and norming chosen?
  • How is this sample related to your student
  • Were participation rates appropriate?
  • Was the sample size large enough to develop
    stable estimates with minimal fluctuation due to
    sampling errors?
  • Where statements are made concerning subgroups,
    are there enough test-takers in each subgroup?
  • Do the difficulty levels of the test and
    criterion measures (if any) provide an adequate
    basis for validating and norming the instrument?
  • Are there sufficient variations in test scores?

Evaluating the Validity of NC ABCs Model
  • In an effort to evaluate the validity of the NC
    ABCs Accountability System, the State Board of
    Education established a plan in March, 2005 that
    will utilize the Standards for Educational
    Accountability Systems model developed by the
    National Center for Research on Evaluation,
    Standards, and Student Testing.

Testing Standards ABCs System Components
  • 1. Accountability expectations should be made
    public for all participants
  • in the system.
  • 2. Accountability systems should employ many
    different types of data
  • from multiple sources.
  • 3. Accountability systems should include data
    elements that allow for
  • interpretations of student, institution, and
    administrative performance.
  • 4. Accountability systems should include the
    performance of all students,
  • including subgroups that historically have
    been difficult to assess.
  • 5. The weighting of elements in the system,
    including different types of
  • test content, and different information
    sources, should be made
  • explicit.
  • 6. Rules for determining adequate progress of
    schools and individuals
  • should be developed to avoid erroneous
    judgments attributable to
  • fluctuations of the student population or
    errors in measurement.

Standards Components (continued)
  • 7. Decisions about individual students should
    not be made on the basis of a
  • single test.
  • 8. Multiple test forms should be used when there
    are repeated administrations
  • of an assessment.
  • 9. The validity of measures that have been
    administered should be documented for the various
  • 10. If tests are to help improve system
    performance, there should be
  • information provided to document that test
    results are modifiable by quality
  • instruction and student effort.
  • 11. If test data are used as a basis of rewards
    or sanctions, evidence of
  • technical quality of the measures and error
    rates associated with
  • misclassification of individuals or
    institutions should be published.
  • 12. Evidence of test validity for students with
    different language backgrounds
  • should be made public.
  • 13. Evidence of test validity for children with
    disabilities should be made
  • publicly available.
  • 14. If tests are claimed to measure content and
    performance standards,
  • analysis should document the relationship
    between the items and specific standards or
    sets of standards.

Test Validity Legislation
  • NC G.S. 115C-105.35, Section 7.12 (a)
  • During the 2004-2005 school year and at least
    every five
  • years thereafter, the State Board shall evaluate
  • accountability system and, if necessary, modify
    the testing
  • standards to assure the testing standards
    continue to
  • reasonably reflect the level of performance
    necessary to be
  • successful at the next grade level or for more
  • study in the content area.

ABCs Test Evaluation
  • As part of this evaluation, the State Board
    shall, where available, review the historical
    trend data on student academic performance on
    State tests.
  • To the extent that the historical trend data
    suggests that the current standards for student
    performance may not be appropriate, the State
    Board shall adjust the standards to assure that
    they continue to reflect the State's high
    expectations for student performance.

Test Alignment Adjustments
  • SECTION 7.12.(b) The State Board shall complete
    its initial evaluation and any necessary
    modifications to the testing standards required
    under G.S. 115C-105.35, as rewritten by
    subsection (a) of this section, so that the
    modified standards are in effect no later than
    the 2005-2006 school year.

Results of the 2004/2005 Evaluation
  • A review of the original growth formulas found
  • Statewide ABCs growth over time, by grade level,
    forms a
  • saw-toothed pattern of gains and dips in the
    percent of
  • schools meeting and exceeding growth targets in
    reading or
  • mathematics as a cohort of students moves from
    grade to
  • grade.
  • The percent of schools meeting or exceeding
  • expectations in reading or mathematics does not
    appear to
  • be highly correlated to curricular implementation
    (i.e., an
  • historically high percent of schools met and
  • expectations in the first year of testing a new

SBE Determined Test Adjustments Were Needed
  • Base expected student performance on past student
  • Separate reading scores from mathematics scores
    i.e., a students reading score can be included
    in the ABCs calculations, even if that student
    does not have a score in mathematics,
  • Do not produce the saw-toothed pattern in
    percentages of schools meeting expectations at
    consecutive grade levels over time as a cohort
    moves through a school,
  • Adjust for the difference in relative difficulty
    of the curriculum among adjacent grade levels as
    the curriculum is revised,
  • Produces valid and reliable results.

Quasi-Longitudinal Model
  • North Carolinas ABCs Accountability Model uses a
  • approach wherein at least a years worth of
    growth for a year of
  • schooling is expected.
  • The average rate of growth observed across the
    state as a whole from one
  • grade in the spring of one year to the next grade
    in the spring of the next
  • year serves as a benchmark improvement for
    students in a given grade.
  • Comparisons to expected growth are used to
    classify schools into one of
  • Four categories exemplary schools, schools
    meeting expected growth,
  • schools having adequate performance, and
    low-performing schools

NC ABCs Model
A Recommended Model
  • The quasi-longitudinal model is recommended
  • by most research and evaluation groups and
  • organizations as a more accurate means for
  • assessing public school performance.
  • If that is true, how does NCs Accountability
  • Model compare to other states?

NC ABCs Model and Other Models
  • The National Assessment of Educational Progress
    (NAEP), also known as "the Nation's Report Card,"
    is the only nationally representative and
    continuing assessment of what America's students
    know and can do in various subject areas. Since
    1969, assessments have been conducted
    periodically in reading, mathematics, science,
    writing, US history, civics, geography, and the

NAEP The Nations Report Card
  • The National Assessment of Educational Progress
    (NAEP) is a congressionally mandated project of
    the National Center for
  • Education Statistics (NCES), within the
    Institute of Education Sciences at the U.S.
    Department of Education

National Comparisons
  • NAEP produces a national set of scores based on
    the performance of students across the country
    and state-level results for participating states
    and jurisdictions.
  • While all states conduct annual standardized
    tests to report on the performance of students on
    their specific curriculum objectives, the state
    assessments vary substantially from state to
    state, so results among the states cannot be
  • NAEP helps understand how a given states
    performance compares to the national average and
    to that of other states.

Goals of NAEP
  • NAEP has two major goals
  • 1) to discover what American students know and
    can do in key subject areas, and
  • 2) to measure educational progress over long
    periods of time.

Two Types of NAEP Assessments
  • Main NAEP - Every 2 years, reading and
    mathematics are assessed at the national level at
    grades 4, 8, and 12, and at the state level at
    grades 4 and 8. Every 4 years, science and
    writing are assessed at the national level at
    grades 4, 8, and 12, and at the state level at
    grades 4 and 8. Other subjects are assessed
  • Long-term Trend NAEP - measures student
    performance at the national level in reading and
    mathematics over time, using questions and
    question formats that have remained relatively
    fixed from 1969 to the present. Recent scores can
    be compared with those from earlier decades. The
    Long-term Trend assessment is administered to 9,
    13, and 17-year-olds. Beginning with the 2004
    Long-term Trend assessment, results will be
    reported for the nation every 4 years.

NAEP Sample Selection
  • NAEP assessments are administered to samples of
    U.S. students in grades 4, 8, and 12. Schools are
    selected to be representative of states, the
    nation, or other jurisdictions as appropriate.
    Students are then randomly selected from those
    schools to participate. NAEP does not provide
    scores for individual students or schools.
  • Any one student takes only a small portion of the
    whole assessment. The responses are combined and
    the results are reported for groups of students
    by characteristics such as gender and
    racial/ethnic membership.
  • Participation in NAEP is voluntary for students
    who are selected and has no effect on a students

Deciding What is Assessed
  • The National Assessment Governing Board (NAGB)
    works with teachers, curriculum specialists,
    administrators, parents, and members of the
    public to select subjects and define content for
    each Main NAEP assessment, as well as to develop
    assessment objectives and test specifications.
  • NAGB members are appointed by the U.S. Secretary
    of Education. The Board consists of 26 members,
    including teachers, state governors, state
    legislators, testing experts, school principals,
    and parents.

NAEP Assessment Frameworks
  • Each Main NAEP assessment is built from an
    organizing content framework which specifies what
    students should know and be able to do at a given
    grade level.
  • These frameworks guide the design of the
    questions that are ultimately given to students.

NAEP Participation Rates
  • Over the years, 40 to 45 states usually elected
    to participate in the state NAEP assessments.
  • Beginning in 2003, the No Child Left Behind Act
    of 2001 required all states and school districts
    receiving federal Title I funds to participate in
    biennial NAEP reading and mathematics assessments
    at the fourth and eighth grades.

NAEP Test Item Development
  • The NAEP item developers go to great lengths to
    make certain that the questions reflect
    educators best thinking about what students know
    and should be able to do. (CONSTRUCT VALIDITY)
  • Based on NAEP frameworks, testing specialists
    develop questions with the help of teachers,
    curriculum specialists, and measurement experts.
  • Numerous and on-going reviews of the questions
    and responses take place to assure
    appropriateness and fairness. (CONTENT VALIDITY)

Pilots and Field Tests
  • After questions undergo initial reviews by test
    development staff, subject-area specialists, and
    NAGB, they are pilot tested with small numbers of
  • Based on the results of the pilot tests and
    additional reviews by content and assessment
    experts, questions are selected and refined for
    use in a field test.
  • The field tests, which are administered to
    thousands of students, are then scored and
    analyzed. Suitable questions for the actual
    assessment are chosen based on field-test results
    and framework specifications.

NAEP Test Booklets
  • The NAEP development process results in an
    assessment that may have hundreds of questions.
    The 2003 fourth-grade mathematics assessment, for
    example, had more than 140 individual items.
    However, no student who participates in NAEP
    takes the entire NAEP assessment.
  • The questions are separated into groups and
    packaged into booklets. Each student receives one
    booklet, which contains a portion of the
    questions anywhere from 10 to 20 percent of the
    whole assessment.
  • The booklets are distributed so that only a small
    percentage of students in a given school get the
    same assessment booklet.

NAEP Test Items
  • NAEP assessments include both multiple-choice and
    open-ended questions. The open-ended questions
    call for student-composed responses.
  • The students write an answer that can be anywhere
    from a few words to several paragraphs in
    lengtheven longer on writing assessments.

Sample Selection Process
  • NAEP selects a sample large enough to assure
    valid and reliable results but generally does not
    include every student in the grades being
  • For example, in 2003, about 190,000 grade 4
    students in 7,500 schools and about 153,000 grade
    8 students in 6,100 schools participated in the
    mathematics assessment. (Similar numbers of
    students participated in the reading assessments
    at grades 4 and 8.)
  • Schools located in states or districts that
    receive Title I funds are required to participate
    in biennial NAEP reading and mathematics
    assessments at grades 4 and 8 if they are
    selected. School participation is voluntary in
    all other assessments.

NAEP National Sample Selection
  • For the national assessments, in a year without
    state assessments, NAEP selects a random sample
    of public and private school students to
    represent the diverse student population in the
    United States.
  • When conducting only a national assessment
    without state assessments, typically 6,000 to
    10,000 students per grade for each subject are
  • The national samples are chosen using a
    stratified two-stage design schools are
    classified first by geographic location and then
    by level of minority enrollment. Within each
    location and enrollment based category, a
    predetermined proportion of schools is randomly
    selected, providing accurate results for all
    studentsand all subgroups.

State Sample Selection Process
  • In a typical state, NAEP selects about 3,000
    students in approximately 100 schools for each
    grade and each subject assessed. The samples are
    selected using a two-stage design that first
    selects schools within each state or other
    jurisdiction and then selects students within
  • First, schools are classified and sorted based on
    characteristics they have in common. Then they
    are randomly selected. (To ensure that the
    student sample represents students from large and
    small schools in the appropriate proportions,
    large schools are more likely to be selected than
    small ones.)
  • Once NAEP has identified the schools for the
    assessment, students within each participating
    school are selected randomly. In state assessment
    years, the state samples are combined to produce
    a national sample.

NAEP Scale Scores
  • A score, derived from student responses to NAEP
    assessment items, that summarizes the overall
    level of performance attained by a group of
  • NAEP subject area scales typically range from 0
    to 500 (reading, mathematics, history, and
    geography) or from 0 to 300 (science, writing,
    and civics).
  • When used in conjunction with interpretive aids,
    such as item maps, they provide information about
    what a particular aggregate of students in the
    population knows and can do.

NAEP Assessments
  • The 2003 assessments in reading and mathematics
    had a state components at grades 4 and 8.
  • Overall, 53 states and jurisdictions participated
    in the two assessments.
  • In 2002, 51 states and jurisdictions participated
    in the writing assessment at grades 4 and 8.
  • The most recent state assessments were held in
    2005 in reading and mathematics.

NAEP Achievement Level Scores
  • In addition to reporting student scale scores,
    NAEP reports the percentage of students in the
    nation, state, or group of students who have
    reached certain NCLB achievement levels of
  • The achievement level scores are specified as
    Basic, Proficient, or Advancedfor each subject.

NAEP Contextual Results
  • As part of the NAEP assessment, teachers,
    administrators, and students complete background
    questionnaires that are analyzed along with the
    assessment results.
  • Using the information obtained from the
    background questionnaires, student performance
    can be compared across NAEPs reporting variables
    (ex., boys vs. girls)
  • NAEP also looks at various factors that may
    relate to academic achievement including courses
    taken, homework requirements, how often textbooks
    are used, and amount of computer use.

Reliability of NAEP Test Results
  • Comparisons in NAEP are not based just on
    numerical differences.
  • NAEP uses traditional statistical methods to test
    whether the differences might have occurred just
    by chance or whether they are reliablewhether
    repeating the assessment with other students in
    the same groups, with other schools, or on
    different days would give the same result.

NAEP Reports
  • NAEP reports information for the nation and
    specific geographic regions of the country. It
    includes students drawn from both public and
    nonpublic schools and reports results for student
    achievement at grades 4, 8, and 12.
  • It also provides state comparison information.

NAEP State Results
  • Since 1990, NAEP assessments have been conducted
    to give results for participating states. Those
    that choose to participate receive assessment
    results that report on the performance of
    students in that state. In its content, the state
    assessment is identical to the assessment
    conducted nationally.
  • However, because the national NAEP samples were
    not, and are not currently designed to support
    the reporting of accurate and representative
    state-level results, separate representative
    sample of students are selected for each
    participating jurisdiction/state.

NAEP Assessments
  • These assessments follow the frameworks developed
    by the National Assessment Governing Board
    (NAGB), and use the latest advances in assessment
  • For example, NAEP assessments include a large
    percentage of constructed-response questions and
    questions that require the use of calculators and
    other materials.

Student Samples
  • Beginning with the 2002 assessments, a combined
    sample of public schools was selected for both
    state and national NAEP.
  • Therefore, the national sample is a subset of the
    combined sample of students assessed in each
    participating state, plus an additional sample
    from the states that did not participate in the
    state assessment.
  • This additional sample ensures that the national
    sample is representative of the total national
    student population. The full data set is analyzed
    together, allowing all data to contribute to the
    final results and setting a single scale for the
  • All results are then reported in the scale score
    metric used for the specific assessment.
  • State Proficiency Scores vs. NAEP Proficiency

lt Back to the article                           
lt Back to the article
NC Ranks 37th Out 40 States 4th Grd Reading
D 4th Grd Math D- 8th
Grd Reading F 8th Grd Math
F Overall Grade D-
NAEP on the Web
  • Public access to all NAEP information, data, and
  • reports is available on the following web sites
  • NAEP Data Explorer http//nces.ed.gov/nationsrepo
  • NAEP Questions Tool http//nces.ed.gov/nationsrep
  • NAEP State Profiles http//nces.ed.gov/nationsrep

In Summary
  • The validity (and reliability) of high-stakes
    testing has a long way to go. NCLB has
  • the beginning of an accountability system that
    it is attempting to ensure high quality public
    schools across the country. Regardless of how we
    feel about the mandate and subsequent
    requirements we should be compelled to continue
    our quest for measuring student performance and
    how we are doing for the following reasons

North Carolina ranks 34th in college attendance
rates among the 50 states. 85 of all jobs by
2010 will require 14 or more years of education.
Less than 80 of adult North Carolinians have
completed high school. Only 3 states have lower
high school completion rates than North
Carolina. Less than a third of NC's fourth and
eighth graders scored in the proficient range on
standardized tests. NC places 47th in a state
comparison of SAT scores. The placement improves
to 32nd when the scores are adjusted for the
number of students being tested. Twelve percent
of teens 16 to 19 in NC are not enrolled in
school are not high school graduates. When the
number of adults holding college degrees in NC is
compared with other states, we're dropping
further behind, from 37th to 39th.
There is A Lot of Work to Do!
  • Questions Comments
About PowerShow.com