Reliability Coefficient for Criterion Referenced Tests - PowerPoint PPT Presentation

Loading...

PPT – Reliability Coefficient for Criterion Referenced Tests PowerPoint presentation | free to download - id: 6ef2b3-ZGJlN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Reliability Coefficient for Criterion Referenced Tests

Description:

CHAPTER 9 Reliability Coefficient for Criterion Referenced Tests * – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 78
Provided by: sheh2
Learn more at: http://www.ravanshenas.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Reliability Coefficient for Criterion Referenced Tests


1
CHAPTER 9
  • Reliability Coefficient for Criterion Referenced
    Tests

2
Reliability Coefficients for Criterion Referenced
Tests
  • Criterion What we intend to measure (DV)
  • Norm-Referenced As in Intelligence tests for Ex.
    We compare the examinee's score with their norm
    (Normative IQ) or Deviation IQ.
  • Criterion-Referenced As in achievement tests we
    want to know if the examinee achieved a
    particular domain (math, psych, or a particular
    behavior).

3
Reliability Coefficients for Criterion Referenced
Tests
  • Reliability Coefficients for Criterion Referenced
    Tests are used for 2 different purposes
  • 1-Domain Score Estimation or
  • 2- Mastery Allocations

4
1. Domain Score Estimation
  • We use the same type of calculation to determine
    the reliability coefficient as we did before.
    Reliability coefficient for Domain Score
    Estimation of data in table 9.1 is same as table
    7.1
  • Ex. First we do an ANOVA to find the MS(MS within
    or MS person, and MS residual) then use the
    Hoyts Method to calculate the reliability
    coefficient. Next slides

5
Reliability Coefficients for Criterion Referenced
Tests MS person MS withinMS items MS between
6
Hoyts (1941) MethodMS person MS withinMS
items MS betweenMS residual has its own
calculations, it is not MS total
7
1. Domain Score Estimation
  • 1-Domain Score Estimation
  • Domain Score for an examinee is same as
    Observed Score (X) in Classical theory. It is the
    proportion of the items in a specific domain that
    examinee can answer correctly.
  • Ex. Your score of 85 on Test Construction has a
    D.S. of 85.

8
Reliability Coefficients for Criterion Referenced
Tests
  • Decision Consistency
  • It is about the consistency of your decision.
    Decision Consistency concerns with the extent to
    which the same decisions are made from different
    sets of measurements. Consistency of decisions is
    based on two different forms of a test (parallel
    forms test). or, on two administrations of the
    same test (test-retest).
  • A high reliability coefficient (p) indicates
    that there is consistency in examinees scores.

9
Reliability Coefficients for Criterion Referenced
Tests
  • Factors Affecting Decision Consistency
  • 1. Test length
  • 2. Location of the cut-score in the score
    distributions
  • 3. Test score generalizability
  • 4. Similarity of the score distributions for the
    two forms

10
Mastery Allocation
  • 2. Mastery Allocation
  • Involves comparing the percent-correct score
    to an arbitrary established cut score. If the
    percent-correct score is equal or greater than
    the cut score, the examinee has mastered that
    domain.

11
Mastery Allocation
  • Mastery Allocation
  • Mastering a domain is called Mastery
    Allocation
  • Ex. EPPP exam cut score in Florida is 70,
    If you scored 70 or greater on this exam then
    you mastered the psychology domain. You get your
    psychologist license and you can call yourself a
    psychologist.

12
(No Transcript)
13
UNIT III VALIDITY
  • CHAP 10 INTRODUCTION TO VALIDITY
  • CHAP 11 STATISTICAL PROCEDURES FOR PREDICTION
    AND CLASSIFICATION
  • CHAP 12 BIAS IN SELECTION
  • CHAP 13 FACTOR ANALYSIS

14
(No Transcript)
15
CHAPTER 10INTRODUCTION TO Validity
  • Validity Validity refers to the degree that a
    test measures what is intended to measure. It is
    about the quality (accuracy/trueness ) of a test.
  • Characteristics of Validity
  • 1. Result
  • 2. Context
  • 3. Coefficient

16
Characteristics of Validity
  • 1. Result
  • Validity refers to the results of a test, not to
    the test itself.
  • Ex. If you are taking a statistic test you want
    to know that the resulting score is valid to
    measure your knowledge of statistics.

17
INTRODUCTION TO VALIDITY
  • 2. Context
  • Validity of The resulting score (statistics) must
    be interpreted within the context in which the
    test occurs (statistics).

18
INTRODUCTION TO VALIDITY
  • 3. Coefficient
  • Just like reliability coefficient validity
    coefficient also has degrees of variability from
    low to high.
  • P 0 to 1
  • Ex. The validity of the last year Test
    Construction Exam. p0.90

19
Validity
  • Validity has been described as 'the agreement
    between a test score and the quality it is
    believed to measure' (Kaplan and Saccuzzo, 2001).
    In other words, it measures the gap between what
    a test actually measures and what is intended to
    measure. Next Slide

20
Validity
  • This gap can be caused by two particular
    circumstances
  • (a) the design of the test is insufficient for
    the intended purpose, (ex. use essays for older
    examinees) and (b) the test is used in a context
    or fashion which was not intended in the design
    (change questions to multiple choice for math).

21
External Internal Validity
  • External Validity
  • External validity addresses the ability to
    generalize your study to other people and other
    situations. Ex. Correlational studies. The
    association between stress and depression

22
External Internal Validity
  • Internal Validity
  • Internal validity addresses the "true" causes of
    the outcomes that you observed in your study.
    Strong internal validity means that you not only
    have reliable measures of your independent and
    dependent variables But a strong justification
    that causally links your independent variables to
    your dependent variables (Ex. Experimental
    studies. The affect of stress on heart attack).

23
Major Types of Validity 3Cs
Items

Stats how well a
test estimates/predict a performance

teachers Math test and the researcher test
(fcat) EPPP GRE Test
non-observable
construct or trait your Dep Test or
Clinical interview (underlying construct i.e.
Sleeping, eating, hopeless) BDI-2 score

24
(No Transcript)
25
Face validity
  • Face validity is that the test appears to be
    valid. This is validated using common-sense
    rules, for example
  • a mathematical test should include some
    numerical elements.

26
Face validity
  • 1. 35
  • 2. 12-10
  • 3. 8-5
  • 4. 25-16
  • 5. 133-8
  • Multiple Choice Please select the best answer.
  • 6. Judy had 10 pennies. She lost 2. How many
    pennies does she have left?
  • A. 2
  • B. 8
  • C. 10
  • D. 12

27
Face validity
28
Face Validity
  • A test can appear to be invalid but actually be
    perfectly valid, for example where correlations
    between unrelated items and the desired items
    have been found.
  • Ex. Successful pilots in WW2 were found to very
    often have had an active childhood interest in
    flying model planes (The association between
    flying model planes and WW2 successful pilots).

29
Face Validity
  • A test that does not have face validity may be
    rejected by test-takers (if they have that
    option) and people who are choosing the test to
    use from amongst a set of options.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Types of Validity
  • 1. Content Validity
  • Measures the knowledge of the content domain of
    which it was designed to measure.
  • Ex. If the content domain is statistics the test
    should measure the statistical knowledge, not
    English, Math, or psychology etc.,

34
1. Content Validity
  • Instruction Multiple Choice Please select the
    best answer. (structured framework)
  • 6. Judy had 10 pennies. She lost 2. How many
    pennies does she have left?
  • A. 2
  • B. 8
  • C. 10
  • D. 12
  • The red part is called Performance Domain or
    Domain Characteristic, which deals with your
    knowledge of the domain..
  • The yellow is called Matching Item.

35
1.Content Validity
  • Content Validity
  • A test has content validity if it sufficiently
    covers the area that it is intended to cover.
    This is particularly important in ability or
    attainment/achievement tests that validate skills
    or knowledge in a particular domain.
  • Content Under-Representation occurs when
    important areas are missed. Construct-Irrelevant
    Variation occurs when irrelevant factors
    contaminate the test.

36
1. Content Validity
  • Content Validity has 4 Steps
  • 1. Defining the performance domain of interest
  • 2. Selecting a panel of qualified experts in the
    content domain.
  • 3. Providing a structured framework (instruction)
    for the process of matching item (Question) to
    the performance domain (answers.)
  • 4. Collecting and summarizing the data from the
    matching process.

37
1. Content Validity
  • Content Validity has 4 Steps
  • 1. Defining the performance domain of interest
  • Ex. Ask yourself what am I trying to measure?
    Psych, Stats, English??

38
1. Content Validity
  • 2. Selecting a panel of qualified experts in the
    content domain.
  • Ex. Select expert statisticians to review your
    stats questions. Another ex. Qualifying exam
    questions.

39
1. Content Validity
  • 3. Providing a structured framework (instruction)
    for the process of matching item (Question) to
    the performance domain (answers.)
  • Ex. Go back 4 slides and see Question 3

40
1. Content Validity
  • 4. Collecting and summarizing the data from the
    matching process.
  • Select and collect a sample of these relevant
    questions (items).

41
1. Content Validity
  • Practical Considerations in Content Validity
  • Content validity requires the following 4
    decisions (questions).
  • 1. Should objective be weighted to reflect their
    importance? Ex. Next slide

42
1. Content Validity
  • 2. How should the item-matching task be
    structured? Ex. Next slide
  • 3. What aspect of item should be examined? Ex.
    Next slide
  • 4. How should results be summarized?
  • Ex. Next slide

43
1. Content Validity
  • 1. Should objective be weighted to reflect their
    importance?
  • In Content Validity we should rate the
    importance of objectives. The designer of the
    test should provide a scale such as a rubric
    for measuring the objectives in a test. This also
    helps you to measure the inter-rater reliability
    of a test more accurately.

44
1. Content Validity
  • 2. How should the item-matching task be
    structured?
  • Katz (1958) suggested that the expert
    reviewers should read the item and identify the
    correct/best response.
  • Hambleton (1980) idea was that the experts
    should rate the degree of matching to a specific
    objective by using a 5 point scale
  • poor fit 1____2____3____4____5 excellent
    fit

45
1. Content Validity
  • 3. What aspect of item should be examined?
  • We should have a clear description of item and
    domain to consider the matching item(s) to a
    performance domain or domain characteristics.
  • Ex. Go back to Question 6

46
1. Content Validity
  • 4. How should results be summarized
  • There are 5 ways read p. 221
  • 1. Percentage of items matched to objectives
  • 2. Percentage of items matched to objectives
    with high importance rating
  • 3. Correlation between the importance
    weighting of objectives and the number of items
    measuring those objectives
  • 4. Index of item-objective congruence
  • 5. Percentage of objectives not assessed by
    any of the items on the test

47
2. Criterion Related Validity
  • Criterion Related Validity is a measure of the
    extent to which a test is related to same
    criterion or, how well a test estimates/predict a
    performance
  • Ex. SAT would be a predictor of college
    performance, GRE, Graduate performance, EPPP
    psychologist performance, and Driver License
    Test, basic traffic signs and signals and/or
    driving performance.

48
2. Criterion Related Validity
  • Criterion Related Validity is concerned with how
    well a test either estimates current performance
    (Concurrent Validity) or how well it predicts
    the future performance (Predictive Validity). Ex.
    EPPP Exam

49
Ex. of Concurrent and Predictive Validity
  • Researchers want to know if 6 grade students
    Math score is valid. They give students a test,
    designed to measure mathematical aptitude for 6
    graders.
  • They then compare and correlate this scores with
    the test scores already held by the teachers
    (midterm scores). r

50
Ex. of Concurrent and Predictive Validity
  • They evaluate the accuracy of their test, and
    decide whether it measures what it is supposed
    to. The key element is that the two methods were
    compared at about the same time (Concurrent) or
    only a few days apart).

51
Ex. of Concurrent and Predictive Validity
  • However, If the researchers had measured the
    mathematical aptitude, implemented a new
    educational program, and then retested the
    students after six months, this would be
    predictive validity.

52
2. Criterion Related Validity
  • Concurrent validity is measured by comparing two
    tests done at the same time, for example a
    written test and a hands-on exercise that seek to
    assess the same criterion. This can be used to
    limit criterion errors. Ex. For diagnosis of
    depression Clinical interview and BDI II

53
2. Criterion Related Validity
  • Predictive validity, by contrast, compares
    success in the test with actual success in the
    future job. The test is then adjusted over time
    to improve its validity.
  • Ex. EPPP exam and psychologist performance

54
2. Criterion Related Validity
  • Criterion-related validity
  • Criterion-related validity is like construct
    validity, but relates the test to some external
    criterion, such as particular aspects of the job.
  • There are dangers with the external criterion
    being selected based on its convenience rather
    than being a full representation of the job. Ex.
    An air traffic control test may use a limited set
    of scenarios.

55
2. Criterion Related Validity
  • The general design of a criterion-related
    validity has the following 5 steps p.224
  • 1. Identify a suitable criterion behavior
  • (depression) and a method for measuring
  • it (your depression test).
  • 2. Identify an appropriate sample of
  • examinees (depressed patients)
  • representative of those for whom the test
  • will ultimately be used.

56
2. Criterion Related Validity
  • 3. Administer the test and keep a record of each
    examinees score.
  • 4. When the criterion data are available, obtain
    a measure of performance on the criterion for
    each examinee (1.mild, 2. mod, 3. severe).
  • 5. Determine the strength of the relationship
    between test scores and criterion performance
    Ex. The relationship between the teachers math
    scores and the researchers math scores
    (researcher determine the criterion performance)
    r?

57
3. Construct Validity
  • 3. Construct Validity
  • A test has construct validity if accurately
    measures a theoretical, non-observable construct
    or trait (i.e. intelligence, motivation,
    depression, anxiety, stats, biology, etc.) Ex.
    The relationship between The Clinical interviews
    Symptoms/Characteristics of depression which is
    the underlying construct), and the scores on BDI
    II (mild, moderate, severe) ).

58
3. Construct Validity
  • 3. Construct Validity
  • Construct-Irrelevant Variation occurs when
    irrelevant factors contaminate the test.

59
3. Construct Validity
  • Construct validity
  • Underlying many tests is a construct or theory
    that is being assessed.
  • Ex. There are a number of tests/constructs for
    describing intelligence (spatial ability, verbal
    reasoning, processing speed, etc.) which the test
    will individually assess.

60
3. Construct Validity
  • Constructs can be about causes, about effects and
    the cause-effect relationship.
  • If the construct is not valid then the test on
    which it is based will not be valid.
  • Ex. There have been historical constructs that
    intelligence is based on the size and shape of
    the skull (Phrenology).

61
(No Transcript)
62
(No Transcript)
63
3. Construct Validity Measurements
  • Multitrait-Multimethod Matrix
  • Campbell and Fiske (1959) described this
    approach as concerned with the adequacy of tests
    as measures of a construct/trait. With this
    technique the researcher must think of two or
    more ways (methods) to measure the
    construct/trait of interest Next slide

64
3. Construct Validity Measurements
  • (1.True-False, 2. Forced Choice, and 3.
    Incomplete sentences are methods) and (A.
    sex-guilt, B. hostility-guilt , and C.
    morality-conscience are trait or construct) .
    Using one sample of subjects, measurements are
    obtained by same or different methods. Compare
    the correlation between the two measurements and
    identify one of the 3 types Next slide

65
3. Construct Validity
  • 1. Reliability Coefficients ? Using same
    measurement method for same trait, its like test
    retest reliability. (you use the same trait and
    method (twice) Ideally should be high r. See
    Table 10.2 on next slide
  • 2. Convergent Validity Coefficient Using
    different measurement method but same trait (its
    like parallel forms reliability i.e. form A and
    form B. Ideally should be high r).The 2
    measurement methods or the 2 variables converge
    (come together) and it is called Convergent
    Validity Coefficient. See Table 10.2 on next
    slide

66
3. Construct Validity
67
3. Construct Validity ConstructTrait
  • 3. Divergent or Discriminate Validity
    Coefficient
  • (2 different kinds) A. Correlations between
    measures of different construct (trait) using the
    same measurement method is (Heterotrait-Monomethod
    Coefficient).
  • Or, B. using different measurement methods for
    different constructs (trait)
  • (Heterotrait-Heteromethod Coefficient).
  • Ideally there is low or no relationship
    between the variables. They Diverge (come apart).
    it is called Divergent Validity Coefficient ?
  • See Table 10.2 on next slide

68
3. Construct Validity
  • Factor Analysis
  • Exploratory and Confirmatory
  • Factor Analysis is another way to measure the
    validity of a test. It is about Data reduction.
  • Raymond Cattell in his 16 PF reduced 4500
    personality related questions into 187 questions
    and 16 related variables or factors. Next slide

69
Descriptors of Low Range Primary Factor Descriptors of High Range
Impersonal, distant, cool, reserved, detached, formal, aloof Warmth(A) Warm, outgoing, attentive to others, kindly, easy-going, participating, likes people
Concrete thinking, lower general mental capacity, less intelligent, unable to handle abstract problems Reasoning(B) Abstract-thinking, more intelligent, bright, higher general mental capacity, fast learner
Reactive emotionally, changeable, affected by feelings, emotionally less stable, easily upset Emotional Stability(C) Emotionally stable, adaptive, mature, faces reality calmly
Deferential, cooperative, avoids conflict, submissive, humble, obedient, easily led, docile, accommodating Dominance(E) Dominant, forceful, assertive, aggressive, competitive, stubborn, bossy
Serious, restrained, prudent, taciturn, introspective, silent Liveliness(F) Lively, animated, spontaneous, enthusiastic, happy-go-lucky, cheerful, expressive, impulsive
Expedient, nonconforming, disregards rules, self-indulgent Rule-Consciousness(G) Rule-conscious, dutiful, conscientious, conforming, moralistic, staid, rule bound
Shy, threat-sensitive, timid, hesitant, intimidated Social Boldness(H) Socially bold, venturesome, thick-skinned, uninhibited
Utilitarian, objective, unsentimental, tough minded, self-reliant, no-nonsense, rough Sensitivity(I) Sensitive, aesthetic, sentimental, tender-minded, intuitive, refined
Trusting, unsuspecting, accepting, unconditional, easy Vigilance(L) Vigilant, suspicious, skeptical, distrustful, oppositional
Grounded, practical, prosaic, solution oriented, steady, conventional Abstractedness(M) Abstract, imaginative, absent minded, impractical, absorbed in ideas
Forthright, genuine, artless, open, guileless, naive, unpretentious, involved Privateness(N) Private, discreet, nondisclosing, shrewd, polished, worldly, astute, diplomatic
Self-assured, unworried, complacent, secure, free of guilt, confident, self-satisfied Apprehension(O) Apprehensive, self-doubting, worried, guilt prone, insecure, worrying, self blaming
Traditional, attached to familiar, conservative, respecting traditional ideas Openness to Change(Q1) Open to change, experimental, liberal, analytical, critical, free-thinking, flexibility
Group-oriented, affiliative, a joiner and follower dependent Self-Reliance(Q2) Self-reliant, solitary, resourceful, individualistic, self-sufficient
Tolerates disorder, unexacting, flexible, undisciplined, lax, self-conflict, impulsive, careless of social rules, uncontrolled Perfectionism(Q3) Perfectionistic, organized, compulsive, self-disciplined, socially precise, exacting will power, control, self-sentimental
Relaxed, placid, tranquil, torpid, patient, composed low drive Tension(Q4) Tense, high energy, impatient, driven, frustrated, over wrought, time driven.
Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994 Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994 Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994
Raymond Cattell's 16 Personality Factors
70
  • 3. Construct Validity
  • Construct Validity has the following 4
  • steps (Same as Research Hypotheses and
    Testing)
  • Formulate one or more hypotheses (state your
    hypothesis) Stress and Dep.
  • Select (or develop) a measurement instrument
  • Gather empirical data to test your hypotheses
    (collect your data and calculate the statistics.
  • Determine if the data are consistent with
    hypotheses (do your stats and make a decision)

71
(No Transcript)
72
Validity Coefficient
  • The validity coefficient
  • The validity coefficient is calculated as a
    correlation between the two items (variables)
    being compared, very typically success in the
    test as compared with success in the job.
  • A validity of 0.6 and above is considered high,
    which suggests that very few tests give strong
    indications of job performance.

73
Validity Coefficients for True Scores
  • Validity coefficient is like reliability and
    generalizability coefficients.
  • rXYSP/vssX.ssY ? Pearson Correlation
    Coefficient
  • pXtYtpXY/vpXX.pYY This formula sometimes is
    called the correlation for attenuation because it
    has a validity coefficient that is corrected for
    errors of measurement in the predictor (X) and
    criterion (Y).
  • pXtYtValidity Coefficient for True score
  • pXYp value of X Y SP 0.5
  • pXXp value for validity of X ssX 0.6
  • pYYp value for validity of Y ssY 0.5

74
The Relationship between Reliability and Validity
  • If a test is unreliable pR0 it can not be valid.
    If a test is reliable pR.6 doesnt mean it is
    valid. However, for Ex. in psychology If data are
    valid, they must be reliable therefore, if a
    psychological test is valid Pv.90, it is also
    reliable.

75
The Relationship between Reliability and Validity
  • Mathematically pVvpR
  • This means the criterion related validity
    coefficient can not exceed the square root of the
    predictor reliability coefficient.
  • Ex. If reliability coefficient pR.81
  • Validity coefficient pVv.81 ? which is .90

76
The Relationship between Reliability and Validity
  • If someone who is 200 pounds steps on a
    scale 10 times and gets different readings of 15,
    250, 95, 140, etc., the scale is not reliable. If
    the scale consistently reads "150", then it is
    reliable, but not valid. If it reads "200" each
    time, then the measurement is both reliable and
    valid. This is what is meant by the statement,
    "Reliability is necessary but not sufficient for
    validity." A test cannot be valid and not
    reliable.

77
Relationship between reliability and validity
  • If data are valid, they must be reliable. If
    people receive very different scores on a test
    every time they take it, the test is not likely
    to predict anything. However, if a test is
    reliable, that does not mean that it is valid.
    For example, we can measure strength of grip very
    reliably, but that does not make it a valid
    measure of intelligence or even of mechanical
    ability. Reliability is a necessary, but not
    sufficient, condition for validity.
About PowerShow.com