Lesson Six - PowerPoint PPT Presentation

About This Presentation

Title:

Lesson Six

Description:

Lesson Six Reliability Contents Definition of reliability Factors contributing to unreliability Types of reliability Indication of reliability: Reliability ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 32

Provided by: engFjuEd4

Category:

more less

Transcript and Presenter's Notes

Title: Lesson Six

1
Lesson Six

Reliability

2
Contents

Definition of reliability
Factors contributing to unreliability
Types of reliability
Indication of reliability Reliability
coefficient
Ways of obtaining reliability coefficient
Alternate/Parallel forms
Test-retest
Split-half KR-21/KR-20
Two ways of testing reliability
How to make test more reliable

3
Definition of Reliability (1)

The consistency of measures across different
times, test forms, raters, and other
characteristics of the measurement context
(Bachman, 1990, p. 24).
If you give the same test to the same testees on
two different occasions, the test should yield
similar results.

4
Definition of Reliability (2)

A reliable test is consistent and dependable.
Scores are consistent and reproducible.
The accuracy or precision with which a test
measures something that is, consistency,
dependability, or stability of test results.

5
Factors Contributing to Unreliability

XT E (observed score true score error
score)
Concerned with freedom from nonsystematic
fluctuation.
Fluctuations in
the student
scoring
test administration
the test itself

6
Types of Reliability

Student- (or Person-) related reliability
Rater- (or Scorer-) related reliability
Intra-rater reliability
Inter-rater reliability
Test administration reliability
Test (or instrument-related) reliability

7
Student-Related Reliability (1)

The source of the error score comes from the test
takers.
Temporary illness
Fatigue
Anxiety
Other physical or psychological factors
Test-wiseness (i.e., strategies for efficient
test taking)

8
Student-Related Reliability (2)

Principles
Assess on several occasions
Assess when person is prepared and best able to
perform well
Ensure that person understands what is expected
(e.g., instructions are clear)

9
Rater (or Scorer) Reliability (1)

Fluctuations including human error,
subjectivity, and bias
Principles
Use experienced trained raters.
Use more than one rater.
Raters should carry out their assessments
independently.

10
Rater Reliability (2)

Two kinds of rater reliability
Intra-rater reliability
Inter-rater reliability

11
Intra-Rater Reliability

Fluctuations including
Unclear scoring criteria
Fatigue
Bias toward particular good and bad students
Simple carelessness

12
Inter-Rater Reliability (1)

Fluctuations including
Lack of attention to scoring criteria
Inexperience
Inattention
Preconceived biases

13
Inter-Rater Reliability (2)

Used with subjective tests when two or more
independent raters are involved in scoring
Train the raters before scoring (e.g., TWE, dept.
oral and composition tests for recommended
students).

14
Inter-Rater Reliability (3)

Compare the scores of the same testee given by
different raters. If r high, theres
inter-rater reliability.

15
Test Administration Reliability

Street noise
Listening comprehension test
Photocopying variations
Lighting
Variations in temperature
Condition of desks and chairs
Monitors

16
Test Reliability

Measurement errors come from the test itself
Test is too long
Test with a time limit
Test format allows for guessing
Ambiguous test items
Test with more than one correct answer

17
Reliability Coefficient (r)

To quantify the reliability of a test ? allow us
to compare the reliability of different tests.
0 r 1 (ideal r 1, which means the test gives
precisely the same results for particular testees
regardless of when it happened to be
administered).
If r 1 100 reliable
A good achievement test rgt .90
Rlt.70 ? shouldnt use the test

18
How to Get Reliability Coefficient

Two forms, two administrations
alternate/parallel forms
One form, two administrations test-retest
One form, one administration (internal
consistency)
split-half (Spearman-Brown procedure)
KR-21
KR-20

19
Alternate/Parallel Forms

Two forms, two administrations
Equivalent forms (i.e., different items testing
the same topic) taken by the same test taker on
different days
If r is high, this test is said to have good
reliability.
the most stringent form

20
Test-Retest

One form, two administrations

The same test is administered to the same testees
with a short time lag, and then calculate r.
Appropriate for highly speeded test

21
Split-half (Spearman-Brown Procedure)

One test, one administration
Split the test into halves (i.e., odd questions
vs even questions) to form two sets of scores.
Also called internal consistency

Q1 Q2 Q3 Q4 Q5 Q6
First Half
Second Half
22
Split-half (2)

Note that the r isnt the reliability of the test
A math relationship between test length and
reliability the longer the test, the more
reliable it is.
Rel.total nr/1 (n-1)r ? Spearman Brown
Prophecy Formula
E.g., correlation between 2 parts of test r .6
? rel. of full test .75
If lengthen the test items into 3 times r .82

23
Kuder-Ridchardson formula 21

KR-21 k/(k-1)1-x (1- x/k)/s2
k number of items x mean
s standard deviation (formula see Bailey 100)
description of the spread outness in a set of
scores (or score deviations from the mean)
olts ? the larger s, the more spread out
E.g., 2 sets of scores (5, 4,3) and (7,4,1)
which group in general behaves more similarly?

24
Kuder-Ridchardson formula 20

KR-20 k/(k-1)1-(?pq/s2)
p item difficulty (percent of people who got an
item right)
q 1-p (i.e., percent of people who got an item
wrong)

25
Ways of Testing Reliability

Examine the amount of variation
Standard Error of Measurement (SEM)
The smaller the better
Calculate reliability coefficient
r
The bigger the better

26
Standard Error of Measurement (1)

Average SD of an individual over a large number
of testing
Essence of variability of scores of an individual
How large the error component is likely to be
Particularly useful in interpretation of test
scores
SEM Sv1-rel.

27
Standard Error of Measurement (2)

Average of a set of scores true score of the
individual
X1T1 E1
X2T2 E2
Xn Tn En
X T 0

28
Standard Error of Measurement (3)

E.g., GRE SD 100, rel. .91
SEM 100 v1-.91 30
How do we apply the SEM in the interpretation of
the score?
For a given spread of scores, the greater the
reliability coefficient, the smaller will be the
SEM.

29
Ways of Enhancing Reliability