Reliability Coefficient for Criterion Referenced Tests - PowerPoint PPT Presentation

PPT – Reliability Coefficient for Criterion Referenced Tests PowerPoint presentation | free to download - id: 6ef2b3-ZGJlN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

Reliability Coefficient for Criterion Referenced Tests

Description:

CHAPTER 9 Reliability Coefficient for Criterion Referenced Tests * – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 78
Provided by: sheh2
Category:
Tags:
Transcript and Presenter's Notes

Title: Reliability Coefficient for Criterion Referenced Tests

1
CHAPTER 9
• Reliability Coefficient for Criterion Referenced
Tests

2
Reliability Coefficients for Criterion Referenced
Tests
• Criterion What we intend to measure (DV)
• Norm-Referenced As in Intelligence tests for Ex.
We compare the examinee's score with their norm
(Normative IQ) or Deviation IQ.
• Criterion-Referenced As in achievement tests we
want to know if the examinee achieved a
particular domain (math, psych, or a particular
behavior).

3
Reliability Coefficients for Criterion Referenced
Tests
• Reliability Coefficients for Criterion Referenced
Tests are used for 2 different purposes
• 1-Domain Score Estimation or
• 2- Mastery Allocations

4
1. Domain Score Estimation
• We use the same type of calculation to determine
the reliability coefficient as we did before.
Reliability coefficient for Domain Score
Estimation of data in table 9.1 is same as table
7.1
• Ex. First we do an ANOVA to find the MS(MS within
or MS person, and MS residual) then use the
Hoyts Method to calculate the reliability
coefficient. Next slides

5
Reliability Coefficients for Criterion Referenced
Tests MS person MS withinMS items MS between
6
Hoyts (1941) MethodMS person MS withinMS
items MS betweenMS residual has its own
calculations, it is not MS total
7
1. Domain Score Estimation
• 1-Domain Score Estimation
• Domain Score for an examinee is same as
Observed Score (X) in Classical theory. It is the
proportion of the items in a specific domain that
• Ex. Your score of 85 on Test Construction has a
D.S. of 85.

8
Reliability Coefficients for Criterion Referenced
Tests
• Decision Consistency
Decision Consistency concerns with the extent to
which the same decisions are made from different
sets of measurements. Consistency of decisions is
based on two different forms of a test (parallel
forms test). or, on two administrations of the
same test (test-retest).
• A high reliability coefficient (p) indicates
that there is consistency in examinees scores.

9
Reliability Coefficients for Criterion Referenced
Tests
• Factors Affecting Decision Consistency
• 1. Test length
• 2. Location of the cut-score in the score
distributions
• 3. Test score generalizability
• 4. Similarity of the score distributions for the
two forms

10
Mastery Allocation
• 2. Mastery Allocation
• Involves comparing the percent-correct score
to an arbitrary established cut score. If the
percent-correct score is equal or greater than
the cut score, the examinee has mastered that
domain.

11
Mastery Allocation
• Mastery Allocation
• Mastering a domain is called Mastery
Allocation
• Ex. EPPP exam cut score in Florida is 70,
If you scored 70 or greater on this exam then
you mastered the psychology domain. You get your
psychologist license and you can call yourself a
psychologist.

12
(No Transcript)
13
UNIT III VALIDITY
• CHAP 10 INTRODUCTION TO VALIDITY
• CHAP 11 STATISTICAL PROCEDURES FOR PREDICTION
AND CLASSIFICATION
• CHAP 12 BIAS IN SELECTION
• CHAP 13 FACTOR ANALYSIS

14
(No Transcript)
15
CHAPTER 10INTRODUCTION TO Validity
• Validity Validity refers to the degree that a
test measures what is intended to measure. It is
about the quality (accuracy/trueness ) of a test.
• Characteristics of Validity
• 1. Result
• 2. Context
• 3. Coefficient

16
Characteristics of Validity
• 1. Result
• Validity refers to the results of a test, not to
the test itself.
• Ex. If you are taking a statistic test you want
to know that the resulting score is valid to

17
INTRODUCTION TO VALIDITY
• 2. Context
• Validity of The resulting score (statistics) must
be interpreted within the context in which the
test occurs (statistics).

18
INTRODUCTION TO VALIDITY
• 3. Coefficient
• Just like reliability coefficient validity
coefficient also has degrees of variability from
low to high.
• P 0 to 1
• Ex. The validity of the last year Test
Construction Exam. p0.90

19
Validity
• Validity has been described as 'the agreement
between a test score and the quality it is
believed to measure' (Kaplan and Saccuzzo, 2001).
In other words, it measures the gap between what
a test actually measures and what is intended to
measure. Next Slide

20
Validity
• This gap can be caused by two particular
circumstances
• (a) the design of the test is insufficient for
the intended purpose, (ex. use essays for older
examinees) and (b) the test is used in a context
or fashion which was not intended in the design
(change questions to multiple choice for math).

21
External Internal Validity
• External Validity
• External validity addresses the ability to
generalize your study to other people and other
situations. Ex. Correlational studies. The
association between stress and depression

22
External Internal Validity
• Internal Validity
• Internal validity addresses the "true" causes of
the outcomes that you observed in your study.
Strong internal validity means that you not only
have reliable measures of your independent and
dependent variables But a strong justification
studies. The affect of stress on heart attack).

23
Major Types of Validity 3Cs
Items

Stats how well a
test estimates/predict a performance

teachers Math test and the researcher test
(fcat) EPPP GRE Test
non-observable
construct or trait your Dep Test or
Clinical interview (underlying construct i.e.
Sleeping, eating, hopeless) BDI-2 score

24
(No Transcript)
25
Face validity
• Face validity is that the test appears to be
valid. This is validated using common-sense
rules, for example
• a mathematical test should include some
numerical elements.

26
Face validity
• 1. 35
• 2. 12-10
• 3. 8-5
• 4. 25-16
• 5. 133-8
• 6. Judy had 10 pennies. She lost 2. How many
pennies does she have left?
• A. 2
• B. 8
• C. 10
• D. 12

27
Face validity
28
Face Validity
• A test can appear to be invalid but actually be
perfectly valid, for example where correlations
between unrelated items and the desired items
have been found.
• Ex. Successful pilots in WW2 were found to very
often have had an active childhood interest in
flying model planes (The association between
flying model planes and WW2 successful pilots).

29
Face Validity
• A test that does not have face validity may be
rejected by test-takers (if they have that
option) and people who are choosing the test to
use from amongst a set of options.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Types of Validity
• 1. Content Validity
• Measures the knowledge of the content domain of
which it was designed to measure.
• Ex. If the content domain is statistics the test
should measure the statistical knowledge, not
English, Math, or psychology etc.,

34
1. Content Validity
• Instruction Multiple Choice Please select the
• 6. Judy had 10 pennies. She lost 2. How many
pennies does she have left?
• A. 2
• B. 8
• C. 10
• D. 12
• The red part is called Performance Domain or
Domain Characteristic, which deals with your
knowledge of the domain..
• The yellow is called Matching Item.

35
1.Content Validity
• Content Validity
• A test has content validity if it sufficiently
covers the area that it is intended to cover.
This is particularly important in ability or
attainment/achievement tests that validate skills
or knowledge in a particular domain.
• Content Under-Representation occurs when
important areas are missed. Construct-Irrelevant
Variation occurs when irrelevant factors
contaminate the test.

36
1. Content Validity
• Content Validity has 4 Steps
• 1. Defining the performance domain of interest
• 2. Selecting a panel of qualified experts in the
content domain.
• 3. Providing a structured framework (instruction)
for the process of matching item (Question) to
• 4. Collecting and summarizing the data from the
matching process.

37
1. Content Validity
• Content Validity has 4 Steps
• 1. Defining the performance domain of interest
• Ex. Ask yourself what am I trying to measure?
Psych, Stats, English??

38
1. Content Validity
• 2. Selecting a panel of qualified experts in the
content domain.
• Ex. Select expert statisticians to review your
stats questions. Another ex. Qualifying exam
questions.

39
1. Content Validity
• 3. Providing a structured framework (instruction)
for the process of matching item (Question) to
• Ex. Go back 4 slides and see Question 3

40
1. Content Validity
• 4. Collecting and summarizing the data from the
matching process.
• Select and collect a sample of these relevant
questions (items).

41
1. Content Validity
• Practical Considerations in Content Validity
• Content validity requires the following 4
decisions (questions).
• 1. Should objective be weighted to reflect their
importance? Ex. Next slide

42
1. Content Validity
• 2. How should the item-matching task be
structured? Ex. Next slide
• 3. What aspect of item should be examined? Ex.
Next slide
• 4. How should results be summarized?
• Ex. Next slide

43
1. Content Validity
• 1. Should objective be weighted to reflect their
importance?
• In Content Validity we should rate the
importance of objectives. The designer of the
test should provide a scale such as a rubric
for measuring the objectives in a test. This also
helps you to measure the inter-rater reliability
of a test more accurately.

44
1. Content Validity
• 2. How should the item-matching task be
structured?
• Katz (1958) suggested that the expert
reviewers should read the item and identify the
correct/best response.
• Hambleton (1980) idea was that the experts
should rate the degree of matching to a specific
objective by using a 5 point scale
• poor fit 1____2____3____4____5 excellent
fit

45
1. Content Validity
• 3. What aspect of item should be examined?
• We should have a clear description of item and
domain to consider the matching item(s) to a
performance domain or domain characteristics.
• Ex. Go back to Question 6

46
1. Content Validity
• 4. How should results be summarized
• There are 5 ways read p. 221
• 1. Percentage of items matched to objectives
• 2. Percentage of items matched to objectives
with high importance rating
• 3. Correlation between the importance
weighting of objectives and the number of items
measuring those objectives
• 4. Index of item-objective congruence
• 5. Percentage of objectives not assessed by
any of the items on the test

47
2. Criterion Related Validity
• Criterion Related Validity is a measure of the
extent to which a test is related to same
criterion or, how well a test estimates/predict a
performance
• Ex. SAT would be a predictor of college
Test, basic traffic signs and signals and/or
driving performance.

48
2. Criterion Related Validity
• Criterion Related Validity is concerned with how
well a test either estimates current performance
(Concurrent Validity) or how well it predicts
the future performance (Predictive Validity). Ex.
EPPP Exam

49
Ex. of Concurrent and Predictive Validity
• Researchers want to know if 6 grade students
Math score is valid. They give students a test,
designed to measure mathematical aptitude for 6
• They then compare and correlate this scores with
the test scores already held by the teachers
(midterm scores). r

50
Ex. of Concurrent and Predictive Validity
• They evaluate the accuracy of their test, and
decide whether it measures what it is supposed
to. The key element is that the two methods were
compared at about the same time (Concurrent) or
only a few days apart).

51
Ex. of Concurrent and Predictive Validity
• However, If the researchers had measured the
mathematical aptitude, implemented a new
educational program, and then retested the
students after six months, this would be
predictive validity.

52
2. Criterion Related Validity
• Concurrent validity is measured by comparing two
tests done at the same time, for example a
written test and a hands-on exercise that seek to
assess the same criterion. This can be used to
limit criterion errors. Ex. For diagnosis of
depression Clinical interview and BDI II

53
2. Criterion Related Validity
• Predictive validity, by contrast, compares
success in the test with actual success in the
future job. The test is then adjusted over time
to improve its validity.
• Ex. EPPP exam and psychologist performance

54
2. Criterion Related Validity
• Criterion-related validity
• Criterion-related validity is like construct
validity, but relates the test to some external
criterion, such as particular aspects of the job.
• There are dangers with the external criterion
being selected based on its convenience rather
than being a full representation of the job. Ex.
An air traffic control test may use a limited set
of scenarios.

55
2. Criterion Related Validity
• The general design of a criterion-related
validity has the following 5 steps p.224
• 1. Identify a suitable criterion behavior
• (depression) and a method for measuring
• 2. Identify an appropriate sample of
• examinees (depressed patients)
• representative of those for whom the test
• will ultimately be used.

56
2. Criterion Related Validity
• 3. Administer the test and keep a record of each
examinees score.
• 4. When the criterion data are available, obtain
a measure of performance on the criterion for
each examinee (1.mild, 2. mod, 3. severe).
• 5. Determine the strength of the relationship
between test scores and criterion performance
Ex. The relationship between the teachers math
scores and the researchers math scores
(researcher determine the criterion performance)
r?

57
3. Construct Validity
• 3. Construct Validity
• A test has construct validity if accurately
measures a theoretical, non-observable construct
or trait (i.e. intelligence, motivation,
depression, anxiety, stats, biology, etc.) Ex.
The relationship between The Clinical interviews
Symptoms/Characteristics of depression which is
the underlying construct), and the scores on BDI
II (mild, moderate, severe) ).

58
3. Construct Validity
• 3. Construct Validity
• Construct-Irrelevant Variation occurs when
irrelevant factors contaminate the test.

59
3. Construct Validity
• Construct validity
• Underlying many tests is a construct or theory
that is being assessed.
• Ex. There are a number of tests/constructs for
describing intelligence (spatial ability, verbal
reasoning, processing speed, etc.) which the test
will individually assess.

60
3. Construct Validity
the cause-effect relationship.
• If the construct is not valid then the test on
which it is based will not be valid.
• Ex. There have been historical constructs that
intelligence is based on the size and shape of
the skull (Phrenology).

61
(No Transcript)
62
(No Transcript)
63
3. Construct Validity Measurements
• Multitrait-Multimethod Matrix
• Campbell and Fiske (1959) described this
approach as concerned with the adequacy of tests
as measures of a construct/trait. With this
technique the researcher must think of two or
more ways (methods) to measure the
construct/trait of interest Next slide

64
3. Construct Validity Measurements
• (1.True-False, 2. Forced Choice, and 3.
Incomplete sentences are methods) and (A.
sex-guilt, B. hostility-guilt , and C.
morality-conscience are trait or construct) .
Using one sample of subjects, measurements are
obtained by same or different methods. Compare
the correlation between the two measurements and
identify one of the 3 types Next slide

65
3. Construct Validity
• 1. Reliability Coefficients ? Using same
measurement method for same trait, its like test
retest reliability. (you use the same trait and
method (twice) Ideally should be high r. See
Table 10.2 on next slide
• 2. Convergent Validity Coefficient Using
different measurement method but same trait (its
like parallel forms reliability i.e. form A and
form B. Ideally should be high r).The 2
measurement methods or the 2 variables converge
(come together) and it is called Convergent
Validity Coefficient. See Table 10.2 on next
slide

66
3. Construct Validity
67
3. Construct Validity ConstructTrait
• 3. Divergent or Discriminate Validity
Coefficient
• (2 different kinds) A. Correlations between
measures of different construct (trait) using the
same measurement method is (Heterotrait-Monomethod
Coefficient).
• Or, B. using different measurement methods for
different constructs (trait)
• (Heterotrait-Heteromethod Coefficient).
• Ideally there is low or no relationship
between the variables. They Diverge (come apart).
it is called Divergent Validity Coefficient ?
• See Table 10.2 on next slide

68
3. Construct Validity
• Factor Analysis
• Exploratory and Confirmatory
• Factor Analysis is another way to measure the
validity of a test. It is about Data reduction.
• Raymond Cattell in his 16 PF reduced 4500
personality related questions into 187 questions
and 16 related variables or factors. Next slide

69
Descriptors of Low Range Primary Factor Descriptors of High Range
Impersonal, distant, cool, reserved, detached, formal, aloof Warmth(A) Warm, outgoing, attentive to others, kindly, easy-going, participating, likes people
Concrete thinking, lower general mental capacity, less intelligent, unable to handle abstract problems Reasoning(B) Abstract-thinking, more intelligent, bright, higher general mental capacity, fast learner
Reactive emotionally, changeable, affected by feelings, emotionally less stable, easily upset Emotional Stability(C) Emotionally stable, adaptive, mature, faces reality calmly
Deferential, cooperative, avoids conflict, submissive, humble, obedient, easily led, docile, accommodating Dominance(E) Dominant, forceful, assertive, aggressive, competitive, stubborn, bossy
Serious, restrained, prudent, taciturn, introspective, silent Liveliness(F) Lively, animated, spontaneous, enthusiastic, happy-go-lucky, cheerful, expressive, impulsive
Expedient, nonconforming, disregards rules, self-indulgent Rule-Consciousness(G) Rule-conscious, dutiful, conscientious, conforming, moralistic, staid, rule bound
Shy, threat-sensitive, timid, hesitant, intimidated Social Boldness(H) Socially bold, venturesome, thick-skinned, uninhibited
Utilitarian, objective, unsentimental, tough minded, self-reliant, no-nonsense, rough Sensitivity(I) Sensitive, aesthetic, sentimental, tender-minded, intuitive, refined
Trusting, unsuspecting, accepting, unconditional, easy Vigilance(L) Vigilant, suspicious, skeptical, distrustful, oppositional
Grounded, practical, prosaic, solution oriented, steady, conventional Abstractedness(M) Abstract, imaginative, absent minded, impractical, absorbed in ideas
Forthright, genuine, artless, open, guileless, naive, unpretentious, involved Privateness(N) Private, discreet, nondisclosing, shrewd, polished, worldly, astute, diplomatic
Self-assured, unworried, complacent, secure, free of guilt, confident, self-satisfied Apprehension(O) Apprehensive, self-doubting, worried, guilt prone, insecure, worrying, self blaming
Traditional, attached to familiar, conservative, respecting traditional ideas Openness to Change(Q1) Open to change, experimental, liberal, analytical, critical, free-thinking, flexibility
Group-oriented, affiliative, a joiner and follower dependent Self-Reliance(Q2) Self-reliant, solitary, resourceful, individualistic, self-sufficient
Tolerates disorder, unexacting, flexible, undisciplined, lax, self-conflict, impulsive, careless of social rules, uncontrolled Perfectionism(Q3) Perfectionistic, organized, compulsive, self-disciplined, socially precise, exacting will power, control, self-sentimental
Relaxed, placid, tranquil, torpid, patient, composed low drive Tension(Q4) Tense, high energy, impatient, driven, frustrated, over wrought, time driven.
Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994 Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994 Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994
Raymond Cattell's 16 Personality Factors
70
• 3. Construct Validity
• Construct Validity has the following 4
• steps (Same as Research Hypotheses and
Testing)
• Formulate one or more hypotheses (state your
hypothesis) Stress and Dep.
• Select (or develop) a measurement instrument
• Gather empirical data to test your hypotheses
(collect your data and calculate the statistics.
• Determine if the data are consistent with
hypotheses (do your stats and make a decision)

71
(No Transcript)
72
Validity Coefficient
• The validity coefficient
• The validity coefficient is calculated as a
correlation between the two items (variables)
being compared, very typically success in the
test as compared with success in the job.
• A validity of 0.6 and above is considered high,
which suggests that very few tests give strong
indications of job performance.

73
Validity Coefficients for True Scores
• Validity coefficient is like reliability and
generalizability coefficients.
• rXYSP/vssX.ssY ? Pearson Correlation
Coefficient
• pXtYtpXY/vpXX.pYY This formula sometimes is
called the correlation for attenuation because it
has a validity coefficient that is corrected for
errors of measurement in the predictor (X) and
criterion (Y).
• pXtYtValidity Coefficient for True score
• pXYp value of X Y SP 0.5
• pXXp value for validity of X ssX 0.6
• pYYp value for validity of Y ssY 0.5

74
The Relationship between Reliability and Validity
• If a test is unreliable pR0 it can not be valid.
If a test is reliable pR.6 doesnt mean it is
valid. However, for Ex. in psychology If data are
valid, they must be reliable therefore, if a
psychological test is valid Pv.90, it is also
reliable.

75
The Relationship between Reliability and Validity
• Mathematically pVvpR
• This means the criterion related validity
coefficient can not exceed the square root of the
predictor reliability coefficient.
• Ex. If reliability coefficient pR.81
• Validity coefficient pVv.81 ? which is .90

76
The Relationship between Reliability and Validity
• If someone who is 200 pounds steps on a
scale 10 times and gets different readings of 15,
250, 95, 140, etc., the scale is not reliable. If
the scale consistently reads "150", then it is
reliable, but not valid. If it reads "200" each
time, then the measurement is both reliable and
valid. This is what is meant by the statement,
"Reliability is necessary but not sufficient for
validity." A test cannot be valid and not
reliable.

77
Relationship between reliability and validity
• If data are valid, they must be reliable. If
people receive very different scores on a test
every time they take it, the test is not likely
to predict anything. However, if a test is
reliable, that does not mean that it is valid.
For example, we can measure strength of grip very
reliably, but that does not make it a valid
measure of intelligence or even of mechanical
ability. Reliability is a necessary, but not
sufficient, condition for validity.