Loading...

PPT – Reliability Coefficient for Criterion Referenced Tests PowerPoint presentation | free to download - id: 6ef2b3-ZGJlN

The Adobe Flash plugin is needed to view this content

CHAPTER 9

- Reliability Coefficient for Criterion Referenced

Tests

Reliability Coefficients for Criterion Referenced

Tests

- Criterion What we intend to measure (DV)
- Norm-Referenced As in Intelligence tests for Ex.

We compare the examinee's score with their norm

(Normative IQ) or Deviation IQ. - Criterion-Referenced As in achievement tests we

want to know if the examinee achieved a

particular domain (math, psych, or a particular

behavior).

Reliability Coefficients for Criterion Referenced

Tests

- Reliability Coefficients for Criterion Referenced

Tests are used for 2 different purposes - 1-Domain Score Estimation or
- 2- Mastery Allocations

1. Domain Score Estimation

- We use the same type of calculation to determine

the reliability coefficient as we did before.

Reliability coefficient for Domain Score

Estimation of data in table 9.1 is same as table

7.1 - Ex. First we do an ANOVA to find the MS(MS within

or MS person, and MS residual) then use the

Hoyts Method to calculate the reliability

coefficient. Next slides

Reliability Coefficients for Criterion Referenced

Tests MS person MS withinMS items MS between

Hoyts (1941) MethodMS person MS withinMS

items MS betweenMS residual has its own

calculations, it is not MS total

1. Domain Score Estimation

- 1-Domain Score Estimation
- Domain Score for an examinee is same as

Observed Score (X) in Classical theory. It is the

proportion of the items in a specific domain that

examinee can answer correctly. - Ex. Your score of 85 on Test Construction has a

D.S. of 85.

Reliability Coefficients for Criterion Referenced

Tests

- Decision Consistency
- It is about the consistency of your decision.

Decision Consistency concerns with the extent to

which the same decisions are made from different

sets of measurements. Consistency of decisions is

based on two different forms of a test (parallel

forms test). or, on two administrations of the

same test (test-retest). - A high reliability coefficient (p) indicates

that there is consistency in examinees scores.

Reliability Coefficients for Criterion Referenced

Tests

- Factors Affecting Decision Consistency
- 1. Test length
- 2. Location of the cut-score in the score

distributions - 3. Test score generalizability
- 4. Similarity of the score distributions for the

two forms

Mastery Allocation

- 2. Mastery Allocation
- Involves comparing the percent-correct score

to an arbitrary established cut score. If the

percent-correct score is equal or greater than

the cut score, the examinee has mastered that

domain.

Mastery Allocation

- Mastery Allocation
- Mastering a domain is called Mastery

Allocation - Ex. EPPP exam cut score in Florida is 70,

If you scored 70 or greater on this exam then

you mastered the psychology domain. You get your

psychologist license and you can call yourself a

psychologist.

(No Transcript)

UNIT III VALIDITY

- CHAP 10 INTRODUCTION TO VALIDITY
- CHAP 11 STATISTICAL PROCEDURES FOR PREDICTION

AND CLASSIFICATION - CHAP 12 BIAS IN SELECTION
- CHAP 13 FACTOR ANALYSIS

(No Transcript)

CHAPTER 10INTRODUCTION TO Validity

- Validity Validity refers to the degree that a

test measures what is intended to measure. It is

about the quality (accuracy/trueness ) of a test.

- Characteristics of Validity
- 1. Result
- 2. Context
- 3. Coefficient

Characteristics of Validity

- 1. Result
- Validity refers to the results of a test, not to

the test itself. - Ex. If you are taking a statistic test you want

to know that the resulting score is valid to

measure your knowledge of statistics.

INTRODUCTION TO VALIDITY

- 2. Context
- Validity of The resulting score (statistics) must

be interpreted within the context in which the

test occurs (statistics).

INTRODUCTION TO VALIDITY

- 3. Coefficient
- Just like reliability coefficient validity

coefficient also has degrees of variability from

low to high. - P 0 to 1
- Ex. The validity of the last year Test

Construction Exam. p0.90

Validity

- Validity has been described as 'the agreement

between a test score and the quality it is

believed to measure' (Kaplan and Saccuzzo, 2001).

In other words, it measures the gap between what

a test actually measures and what is intended to

measure. Next Slide

Validity

- This gap can be caused by two particular

circumstances - (a) the design of the test is insufficient for

the intended purpose, (ex. use essays for older

examinees) and (b) the test is used in a context

or fashion which was not intended in the design

(change questions to multiple choice for math).

External Internal Validity

- External Validity
- External validity addresses the ability to

generalize your study to other people and other

situations. Ex. Correlational studies. The

association between stress and depression

External Internal Validity

- Internal Validity
- Internal validity addresses the "true" causes of

the outcomes that you observed in your study.

Strong internal validity means that you not only

have reliable measures of your independent and

dependent variables But a strong justification

that causally links your independent variables to

your dependent variables (Ex. Experimental

studies. The affect of stress on heart attack).

Major Types of Validity 3Cs

Items

Stats how well a

test estimates/predict a performance

teachers Math test and the researcher test

(fcat) EPPP GRE Test

non-observable

construct or trait your Dep Test or

Clinical interview (underlying construct i.e.

Sleeping, eating, hopeless) BDI-2 score

(No Transcript)

Face validity

- Face validity is that the test appears to be

valid. This is validated using common-sense

rules, for example - a mathematical test should include some

numerical elements.

Face validity

- 1. 35
- 2. 12-10
- 3. 8-5
- 4. 25-16
- 5. 133-8
- Multiple Choice Please select the best answer.
- 6. Judy had 10 pennies. She lost 2. How many

pennies does she have left? - A. 2
- B. 8
- C. 10
- D. 12

Face validity

Face Validity

- A test can appear to be invalid but actually be

perfectly valid, for example where correlations

between unrelated items and the desired items

have been found. - Ex. Successful pilots in WW2 were found to very

often have had an active childhood interest in

flying model planes (The association between

flying model planes and WW2 successful pilots).

Face Validity

- A test that does not have face validity may be

rejected by test-takers (if they have that

option) and people who are choosing the test to

use from amongst a set of options.

(No Transcript)

(No Transcript)

(No Transcript)

Types of Validity

- 1. Content Validity
- Measures the knowledge of the content domain of

which it was designed to measure. - Ex. If the content domain is statistics the test

should measure the statistical knowledge, not

English, Math, or psychology etc.,

1. Content Validity

- Instruction Multiple Choice Please select the

best answer. (structured framework) - 6. Judy had 10 pennies. She lost 2. How many

pennies does she have left? - A. 2
- B. 8
- C. 10
- D. 12
- The red part is called Performance Domain or

Domain Characteristic, which deals with your

knowledge of the domain.. - The yellow is called Matching Item.

1.Content Validity

- Content Validity
- A test has content validity if it sufficiently

covers the area that it is intended to cover.

This is particularly important in ability or

attainment/achievement tests that validate skills

or knowledge in a particular domain. - Content Under-Representation occurs when

important areas are missed. Construct-Irrelevant

Variation occurs when irrelevant factors

contaminate the test.

1. Content Validity

- Content Validity has 4 Steps
- 1. Defining the performance domain of interest
- 2. Selecting a panel of qualified experts in the

content domain. - 3. Providing a structured framework (instruction)

for the process of matching item (Question) to

the performance domain (answers.) - 4. Collecting and summarizing the data from the

matching process.

1. Content Validity

- Content Validity has 4 Steps
- 1. Defining the performance domain of interest
- Ex. Ask yourself what am I trying to measure?

Psych, Stats, English??

1. Content Validity

- 2. Selecting a panel of qualified experts in the

content domain. - Ex. Select expert statisticians to review your

stats questions. Another ex. Qualifying exam

questions.

1. Content Validity

- 3. Providing a structured framework (instruction)

for the process of matching item (Question) to

the performance domain (answers.) - Ex. Go back 4 slides and see Question 3

1. Content Validity

- 4. Collecting and summarizing the data from the

matching process. - Select and collect a sample of these relevant

questions (items).

1. Content Validity

- Practical Considerations in Content Validity
- Content validity requires the following 4

decisions (questions). - 1. Should objective be weighted to reflect their

importance? Ex. Next slide

1. Content Validity

- 2. How should the item-matching task be

structured? Ex. Next slide - 3. What aspect of item should be examined? Ex.

Next slide - 4. How should results be summarized?
- Ex. Next slide

1. Content Validity

- 1. Should objective be weighted to reflect their

importance? - In Content Validity we should rate the

importance of objectives. The designer of the

test should provide a scale such as a rubric

for measuring the objectives in a test. This also

helps you to measure the inter-rater reliability

of a test more accurately.

1. Content Validity

- 2. How should the item-matching task be

structured? - Katz (1958) suggested that the expert

reviewers should read the item and identify the

correct/best response. - Hambleton (1980) idea was that the experts

should rate the degree of matching to a specific

objective by using a 5 point scale - poor fit 1____2____3____4____5 excellent

fit

1. Content Validity

- 3. What aspect of item should be examined?
- We should have a clear description of item and

domain to consider the matching item(s) to a

performance domain or domain characteristics. - Ex. Go back to Question 6

1. Content Validity

- 4. How should results be summarized
- There are 5 ways read p. 221
- 1. Percentage of items matched to objectives
- 2. Percentage of items matched to objectives

with high importance rating - 3. Correlation between the importance

weighting of objectives and the number of items

measuring those objectives - 4. Index of item-objective congruence
- 5. Percentage of objectives not assessed by

any of the items on the test

2. Criterion Related Validity

- Criterion Related Validity is a measure of the

extent to which a test is related to same

criterion or, how well a test estimates/predict a

performance - Ex. SAT would be a predictor of college

performance, GRE, Graduate performance, EPPP

psychologist performance, and Driver License

Test, basic traffic signs and signals and/or

driving performance.

2. Criterion Related Validity

- Criterion Related Validity is concerned with how

well a test either estimates current performance

(Concurrent Validity) or how well it predicts

the future performance (Predictive Validity). Ex.

EPPP Exam

Ex. of Concurrent and Predictive Validity

- Researchers want to know if 6 grade students

Math score is valid. They give students a test,

designed to measure mathematical aptitude for 6

graders. - They then compare and correlate this scores with

the test scores already held by the teachers

(midterm scores). r

Ex. of Concurrent and Predictive Validity

- They evaluate the accuracy of their test, and

decide whether it measures what it is supposed

to. The key element is that the two methods were

compared at about the same time (Concurrent) or

only a few days apart).

Ex. of Concurrent and Predictive Validity

- However, If the researchers had measured the

mathematical aptitude, implemented a new

educational program, and then retested the

students after six months, this would be

predictive validity.

2. Criterion Related Validity

- Concurrent validity is measured by comparing two

tests done at the same time, for example a

written test and a hands-on exercise that seek to

assess the same criterion. This can be used to

limit criterion errors. Ex. For diagnosis of

depression Clinical interview and BDI II

2. Criterion Related Validity

- Predictive validity, by contrast, compares

success in the test with actual success in the

future job. The test is then adjusted over time

to improve its validity. - Ex. EPPP exam and psychologist performance

2. Criterion Related Validity

- Criterion-related validity
- Criterion-related validity is like construct

validity, but relates the test to some external

criterion, such as particular aspects of the job. - There are dangers with the external criterion

being selected based on its convenience rather

than being a full representation of the job. Ex.

An air traffic control test may use a limited set

of scenarios.

2. Criterion Related Validity

- The general design of a criterion-related

validity has the following 5 steps p.224 - 1. Identify a suitable criterion behavior
- (depression) and a method for measuring
- it (your depression test).
- 2. Identify an appropriate sample of
- examinees (depressed patients)
- representative of those for whom the test
- will ultimately be used.

2. Criterion Related Validity

- 3. Administer the test and keep a record of each

examinees score. - 4. When the criterion data are available, obtain

a measure of performance on the criterion for

each examinee (1.mild, 2. mod, 3. severe). - 5. Determine the strength of the relationship

between test scores and criterion performance

Ex. The relationship between the teachers math

scores and the researchers math scores

(researcher determine the criterion performance)

r?

3. Construct Validity

- 3. Construct Validity
- A test has construct validity if accurately

measures a theoretical, non-observable construct

or trait (i.e. intelligence, motivation,

depression, anxiety, stats, biology, etc.) Ex.

The relationship between The Clinical interviews

Symptoms/Characteristics of depression which is

the underlying construct), and the scores on BDI

II (mild, moderate, severe) ).

3. Construct Validity

- 3. Construct Validity
- Construct-Irrelevant Variation occurs when

irrelevant factors contaminate the test.

3. Construct Validity

- Construct validity
- Underlying many tests is a construct or theory

that is being assessed. - Ex. There are a number of tests/constructs for

describing intelligence (spatial ability, verbal

reasoning, processing speed, etc.) which the test

will individually assess.

3. Construct Validity

- Constructs can be about causes, about effects and

the cause-effect relationship. - If the construct is not valid then the test on

which it is based will not be valid. - Ex. There have been historical constructs that

intelligence is based on the size and shape of

the skull (Phrenology).

(No Transcript)

(No Transcript)

3. Construct Validity Measurements

- Multitrait-Multimethod Matrix
- Campbell and Fiske (1959) described this

approach as concerned with the adequacy of tests

as measures of a construct/trait. With this

technique the researcher must think of two or

more ways (methods) to measure the

construct/trait of interest Next slide

3. Construct Validity Measurements

- (1.True-False, 2. Forced Choice, and 3.

Incomplete sentences are methods) and (A.

sex-guilt, B. hostility-guilt , and C.

morality-conscience are trait or construct) .

Using one sample of subjects, measurements are

obtained by same or different methods. Compare

the correlation between the two measurements and

identify one of the 3 types Next slide

3. Construct Validity

- 1. Reliability Coefficients ? Using same

measurement method for same trait, its like test

retest reliability. (you use the same trait and

method (twice) Ideally should be high r. See

Table 10.2 on next slide - 2. Convergent Validity Coefficient Using

different measurement method but same trait (its

like parallel forms reliability i.e. form A and

form B. Ideally should be high r).The 2

measurement methods or the 2 variables converge

(come together) and it is called Convergent

Validity Coefficient. See Table 10.2 on next

slide

3. Construct Validity

3. Construct Validity ConstructTrait

- 3. Divergent or Discriminate Validity

Coefficient - (2 different kinds) A. Correlations between

measures of different construct (trait) using the

same measurement method is (Heterotrait-Monomethod

Coefficient). - Or, B. using different measurement methods for

different constructs (trait) - (Heterotrait-Heteromethod Coefficient).
- Ideally there is low or no relationship

between the variables. They Diverge (come apart).

it is called Divergent Validity Coefficient ? - See Table 10.2 on next slide

3. Construct Validity

- Factor Analysis
- Exploratory and Confirmatory
- Factor Analysis is another way to measure the

validity of a test. It is about Data reduction. - Raymond Cattell in his 16 PF reduced 4500

personality related questions into 187 questions

and 16 related variables or factors. Next slide

Descriptors of Low Range Primary Factor Descriptors of High Range

Impersonal, distant, cool, reserved, detached, formal, aloof Warmth(A) Warm, outgoing, attentive to others, kindly, easy-going, participating, likes people

Concrete thinking, lower general mental capacity, less intelligent, unable to handle abstract problems Reasoning(B) Abstract-thinking, more intelligent, bright, higher general mental capacity, fast learner

Reactive emotionally, changeable, affected by feelings, emotionally less stable, easily upset Emotional Stability(C) Emotionally stable, adaptive, mature, faces reality calmly

Deferential, cooperative, avoids conflict, submissive, humble, obedient, easily led, docile, accommodating Dominance(E) Dominant, forceful, assertive, aggressive, competitive, stubborn, bossy

Serious, restrained, prudent, taciturn, introspective, silent Liveliness(F) Lively, animated, spontaneous, enthusiastic, happy-go-lucky, cheerful, expressive, impulsive

Expedient, nonconforming, disregards rules, self-indulgent Rule-Consciousness(G) Rule-conscious, dutiful, conscientious, conforming, moralistic, staid, rule bound

Shy, threat-sensitive, timid, hesitant, intimidated Social Boldness(H) Socially bold, venturesome, thick-skinned, uninhibited

Utilitarian, objective, unsentimental, tough minded, self-reliant, no-nonsense, rough Sensitivity(I) Sensitive, aesthetic, sentimental, tender-minded, intuitive, refined

Trusting, unsuspecting, accepting, unconditional, easy Vigilance(L) Vigilant, suspicious, skeptical, distrustful, oppositional

Grounded, practical, prosaic, solution oriented, steady, conventional Abstractedness(M) Abstract, imaginative, absent minded, impractical, absorbed in ideas

Forthright, genuine, artless, open, guileless, naive, unpretentious, involved Privateness(N) Private, discreet, nondisclosing, shrewd, polished, worldly, astute, diplomatic

Self-assured, unworried, complacent, secure, free of guilt, confident, self-satisfied Apprehension(O) Apprehensive, self-doubting, worried, guilt prone, insecure, worrying, self blaming

Traditional, attached to familiar, conservative, respecting traditional ideas Openness to Change(Q1) Open to change, experimental, liberal, analytical, critical, free-thinking, flexibility

Group-oriented, affiliative, a joiner and follower dependent Self-Reliance(Q2) Self-reliant, solitary, resourceful, individualistic, self-sufficient

Tolerates disorder, unexacting, flexible, undisciplined, lax, self-conflict, impulsive, careless of social rules, uncontrolled Perfectionism(Q3) Perfectionistic, organized, compulsive, self-disciplined, socially precise, exacting will power, control, self-sentimental

Relaxed, placid, tranquil, torpid, patient, composed low drive Tension(Q4) Tense, high energy, impatient, driven, frustrated, over wrought, time driven.

Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994 Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994 Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn Rieke, 1994

Raymond Cattell's 16 Personality Factors

- 3. Construct Validity
- Construct Validity has the following 4
- steps (Same as Research Hypotheses and

Testing) - Formulate one or more hypotheses (state your

hypothesis) Stress and Dep. - Select (or develop) a measurement instrument
- Gather empirical data to test your hypotheses

(collect your data and calculate the statistics. - Determine if the data are consistent with

hypotheses (do your stats and make a decision)

(No Transcript)

Validity Coefficient

- The validity coefficient
- The validity coefficient is calculated as a

correlation between the two items (variables)

being compared, very typically success in the

test as compared with success in the job. - A validity of 0.6 and above is considered high,

which suggests that very few tests give strong

indications of job performance.

Validity Coefficients for True Scores

- Validity coefficient is like reliability and

generalizability coefficients. - rXYSP/vssX.ssY ? Pearson Correlation

Coefficient - pXtYtpXY/vpXX.pYY This formula sometimes is

called the correlation for attenuation because it

has a validity coefficient that is corrected for

errors of measurement in the predictor (X) and

criterion (Y). - pXtYtValidity Coefficient for True score
- pXYp value of X Y SP 0.5
- pXXp value for validity of X ssX 0.6
- pYYp value for validity of Y ssY 0.5

The Relationship between Reliability and Validity

- If a test is unreliable pR0 it can not be valid.

If a test is reliable pR.6 doesnt mean it is

valid. However, for Ex. in psychology If data are

valid, they must be reliable therefore, if a

psychological test is valid Pv.90, it is also

reliable.

The Relationship between Reliability and Validity

- Mathematically pVvpR
- This means the criterion related validity

coefficient can not exceed the square root of the

predictor reliability coefficient. - Ex. If reliability coefficient pR.81
- Validity coefficient pVv.81 ? which is .90

The Relationship between Reliability and Validity

- If someone who is 200 pounds steps on a

scale 10 times and gets different readings of 15,

250, 95, 140, etc., the scale is not reliable. If

the scale consistently reads "150", then it is

reliable, but not valid. If it reads "200" each

time, then the measurement is both reliable and

valid. This is what is meant by the statement,

"Reliability is necessary but not sufficient for

validity." A test cannot be valid and not

reliable.

Relationship between reliability and validity

- If data are valid, they must be reliable. If

people receive very different scores on a test

every time they take it, the test is not likely

to predict anything. However, if a test is

reliable, that does not mean that it is valid.

For example, we can measure strength of grip very

reliably, but that does not make it a valid

measure of intelligence or even of mechanical

ability. Reliability is a necessary, but not

sufficient, condition for validity.