Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won

About This Presentation

Title:

Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won

Description:

Nikki Shepherd Eatchel, M.A. Robin Rome, Esq. ... Nikki Shepherd Eatchel, M.A. Vice President, Test Development. Thomson Prometric ... – PowerPoint PPT presentation

Number of Views:321

Avg rating:3.0/5.0

Slides: 53

Provided by: Steph6

Category:

more less

Transcript and Presenter's Notes

Title: Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won

1
Physical Ability Testing and Practical
Examinations They Fought the Law and the Law
Won
Expect the Unexpected Are We Clearly Prepared?

Nikki Shepherd Eatchel, M.A. Robin Rome, Esq.
Vice President, Test Development Vice President,
Legal and Contracts
Thomson Prometric Thomson Prometric

Council on Licensure, Enforcement and Regulation
2006 Annual Conference
Alexandria, Virginia
2
Physical Ability Testing and Practical Exams

Goals for todays presentation
Outline the major risk factors for physical
ability and practical examinations
Recommend specific developmental activities and
other measures that will help withstand a legal
challenge
Provide recommendations for evaluating exams
developed by you or for you

3
Physical Ability and Practical Exams Challenges
to Validity

Although all employment, certification, and
licensure testing is certainly open to challenge,
exams designed to physically assess a candidates
performance on specific job skills and tasks are
often
more vulnerable to challenge than objective
written
exams.

4
Physical Ability and Practical Exams Challenges
to Validity

Examples of physical ability and practical exams
Firefighter certification
Police officer pre-employment
Nursing practical for licensure
Corporate product certification
Food safety practical for licensure

5
Physical Ability and Practical Exams Challenges
to Validity

Why are physical ability and practical exams more
vulnerable to
challenge?
Reliance on exam rater judgments regarding how a
task was performed introduces the possibility of
error in the assessment of the skill or task
(human error)
Often, when only one rater is used to assess a
candidate, there is increased likelihood of
disagreement between the rater and the candidate
Physical ability exams typically have greater
adverse impact upon protected groups than the
written exams involved in a employment,
certification, or licensure process (though
practical exams do not tend to show the same
pattern)

6
Standards Used For Exam Evaluation

There are two set of standards that are often
used to guide the development and evaluation of
exams, they are as follows
Standards for Educational and Psychological
Testing, 1999
Developed jointly by the American Educational
Research Association (AERA), the American
Psychological Association (APA), and the
National Council on Measurement in Education
(NCME)
Uniform Guidelines on Employee Selection
Procedures, 1978
Developed by the Equal Employment Opportunity
Commission (EEOC)

7
Standards Used For Exam Evaluation

Although both sets of standards contain valuable
information regarding the development process
(and both should be considered when developing a
testing program), courts more frequently refer to
the Uniform Guidelines as the resource for
evaluating exams.
The Uniform Guidelines are entitled to great
deference by courts deciding whether selection
devices such as physical ability or practical
tests comply with Title VII.
Griggs v. Duke Power Co., 401 U.S. 424 , 434

8
Physical Ability and Practical Exams Challenges
to Validity

What are the aspects of an examination that are
most
likely to be scrutinized if the validity of a
physical
ability or practical exam is challenged ?

9
Physical Ability and Practical Exams Challenges
to Validity

Job Analysis
Criterion-Related Validity
Cutscore
Rater Training
Candidate Appeal Process

10
Physical Ability and Practical Exams Job Analysis

A job analysis is crucial in establishing that
the content of the physical ability or practical
exam is valid. Key components of the job
analysis include
Content Validity
Validity Generalization
Adequate and Diverse Sample Sizes

11
Job Analysis - Content Validity

Although there are multiple validity methods
that
can be used during the test development process,
the
foundation for acceptable development practice
continues to reside with traditional content
validity
methods.
Supplemental validity methods are typically seen
as
beneficial, yet not sufficient, when courts
evaluate
testing processes.

12
Job Analysis - Content Validity

When evidence of validity based on test content
is presented, the rationale
for defining and describing a specific job
content domain in a particular
way (e.g., in terms of task to be performed or
knowledge, skills, abilities,
or other personal characteristics) should be
stated clearly.
Standard 14.9
A job analysis is necessary to identify the
knowledge, skills and abilities necessary for
successful job performance.
A selection procedure can be supported by a
content validity strategy to the extent that it
is a representative sample of the content of the
job.
Guidelines, 29 CFR 1607.14(C)(1)

13
Job Analysis - Content ValidityCase
StudyWilliams v. Ford

Facts
Class action claiming that pre-employment test
for unskilled hourly production workers, Hourly
Selection System Test Battery (HSSTB),
discriminated against African Americans.
Physical/practical parts of HSSTB measured
parts assembly, visual speed and accuracy and
precision/manual dexterity.

14
Job Analysis - Content ValidityCase
StudyWilliams v. Ford (contd)

Plaintiffs Position
Disparate impact discrimination, i.e., African
Americans failed or scored lower on the test in
disproportionately high numbers when compared to
whites.
HSSBT was not content valid because the job
analysis failed to demonstrate a clear linkage of
specific requirements.

Fords Position
HSSTB was content valid as supported by a job
analysis.
Job analysis consisted of
Supervisor identification of job inventories
Supervisor rating of importance of job
requirements and job abilities identified in the
inventories
Analysis of reliability ratings and data to
identify key job requirements
Development of test to measure skills needed to
perform the job requirements rated as important

15
Job Analysis - Content ValidityCase
StudyWilliams v. Ford (contd)

Holding
Ford demonstrated that the HSSTB was content
valid.
Reasoning
Ford had the burden of showing that the HSSBT
was job related
Must show by professionally acceptable
methods, that the test is predictive or
significantly correlated with important elements
of work behavior that comprise or are relevant to
the job or jobs for which the candidates are
being evaluated. Williams v. Ford, 187 F.3d 533,
539 (6th Cir. 1999).
Ford met this burden by showing that the HSSTB
was content valid It used a professional test
developer to conduct a job analysis that complied
with the EEOC Guidelines.

16
Job Analysis - Validity Generalization

An issue often referred to in test development is
validity
generalization. Validity generalization is
defined as
Applying validity evidence obtained in one or
more situations
to other similar situations on the basis of
simultaneous
estimation, meta-analysis, or synthetic
validation arguments.
Standards, 1999, p. 184

17
Job Analysis - Validity Generalization

Transfer of validity work from one demographic
and/or geographic
area to another, while certainly possible when
based on good initial
validity work and a clear delineation of the
original and secondary
populations, has not been well received by courts
as a defensible practice.
This has typically been due to lack of
appropriate documentation
regarding the similarity of both the populations
involved with the
generalization and the interpretations resulting
from the instrument.

18
Job Analysis - Validity GeneralizationCase
StudyLegault v. aRusso

Facts
Challenge to physical abilities tests used to
select fire department recruits.
Selection process included a four-part pass/fail
physical abilities test involving climbing a
ladder, moving a ladder from a fire engine,
running 1.5 miles in 12 minutes, and carrying and
pulling a fire hose. It also included a separate
physical abilities test focusing on a balance
beam, second hose pull and obstacle course.

19
Job Analysis - Validity GeneralizationCase
Study Legault v. aRusso (contd)

Holding
Fire department failed to show the physical
abilities tests were job related.
Reasoning
The job analysis relied on by the fire department
was not temporal or specific
- Validity was not supported by a
several-year-old job specification that
described the firefighters general duties.
Legault v. aRusso, 842 F. Supp. 1479, 1488
(D.N.H. 1994)
- Validity was not supported by a specification
identifying only general tasks (e.g.,
strenuous physical exertion, operating
equipment and appurtenances of heavy apparatus,
etc.). The specification also failed to break
these tasks into component skills, assess their
relative importance or indicate the level of
proficiency required.
The physical abilities tests were not valid
simply because they were similar to those used by
other cities - There was no evidence these
similar tests were validated and follow the
leader is not an acceptable means of test
validation. Legault, 842 F. Supp. at 1488.

20
Job Analysis Adequate and Diverse Sample Sizes

Adequate and diverse sample sizes are a necessity
for ensuring validity and
increasing the defensibility of an exam.
A description of how the research sample
compares with the relevant labor market
or work force, . . ., and a discussion of the
likely effects on validity of differences
between the sample and the relevant labor market
or work force, are also desirable.
Descriptions of educational levels, length of
service, and age are also desirable.
Whether the study is predictive or concurrent,
the sample subjects should insofar
as feasible be representative of the candidates
normally available in the relevant
labor market for the job or group of jobs in
question . . .
Uniform Guidelines

21
Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles

Facts
Female applicants challenged the police
departments height requirement and physical
abilities test.
Applicants were required to be 56 and to pass
a physical abilities test including scaling a
wall, hanging, weight dragging and endurance
within specific parameters.

22
Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
(contd)

Plaintiffs Position
Challenged the methodology and findings of
validation studies presented by the City.
The validation studies relating to the height
requirement did not include the individuals whom
the police department was seeking to reject,
i.e., those under 56.
The validation studies relating to the physical
abilities test did not include those who failed
the test and tested only success during academy
training, not success on the job.

The Citys Position
The height requirement was job related Offered
validation studies correlating height to
performance
Questionnaire showing that taller officers tend
to use more force and experience less suspect
resistance
Simulations demonstrating that taller officers
performed bar-arm control better than shorter
officers
The physical abilities test was job related
Offered validation studies correlating skills
tested to measures of success during academy
training and on the job requirements (e.g., foot
pursuit, field shooting and emergency rescue).

23
Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
(contd)

Holding
The validation studies did not demonstrate that
the height requirement and physical abilities
tests were job related.
Reasoning
The validation studies did not reflect an
adequate and diverse sampling.
The City failed to demonstrate the height
requirement was job related because persons
shorter than 56 were not included in the
validation study (the study included individuals
from 58 to 62).
The City failed to demonstrate the physical
abilities test was job related because the
validation study relied on measures of training
success without showing that those measures were
significantly related to job performance.

24
Criterion-Related Validity

When possible, the collection of
criterion-related validity is extremely helpful
in the defense of a physical ability test or
practical exam.
A criterion-related study should consist of
empirical data demonstrating that the selection
procedure is predictive of or significantly
correlated with correlated with important
elements of performance.
Guidelines, 29 CFR 1607.5(B)

25
Criterion-Related Validity

The goal of criterion-related validity is to show
a significant relationship between how candidates
perform on an exam and how they subsequently
perform on the job (with higher scores resulting
in better performance).
This can be accomplished through the use of
concurrent or predictive criterion-related
validity.
Job Ratings
Promotional Exams
Etc.

26
Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland

Facts
Female plaintiffs challenged the rank-order and
physical abilities selection examination for
firefighters.
The physical abilities test required three
skills overhead lift using barbells, fire scene
set up and tower climb and dummy drag.

27
Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland (contd)

Plaintiffs Position
The physical abilities test did not test for
attributes identified in the Citys job analysis
as important to an effective firefighter.
The test measured attributes in which men
traditionally excel, such as speed and strength
(anaerobic traits), and ignored attributes in
which women traditionally excel, such as stamina
and endurance (aerobic traits).

The Citys Position
The test was created by a psychologist with
significant experience developing tests for
municipalities.
The physical abilities test measured attributes
related to specific job skills.

28
Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland (contd)

Holding
The physical abilities test was valid since it
was based on a criterion-related study.
Reasoning
Referred to an earlier case, Berkman v. City of
New York, 812 F.2d 52 (2d Cir. 1987), in which
the court held that although aerobic attributes
are an important component of firefighting, the
Citys failure to include physical ability events
that tested for such attributes did not
invalidate the examination.
Given the extensive job analysis performed,
although a simulated firefighting examination
that does not test for stamina in addition to
anaerobic capacity may be a less effective
baromoter of firefighting abilities than one that
does include an aerobic component, the
deficiencies of this examination are not of the
magnitude to render it defective, and vulnerable
to a Title VII challenge. Zamlen, 906 F.2d 209,
219 (6th Cir. 1990).

29
Physical Ability and Practical Exams Cut Score

The setting of an examination cut score is
perhaps the most controversial
step within the test development process, as it
is this step that has the
most obvious impact on the candidate population.
The Uniform Guidelines state the following in
regard to the determination
of the cut score
Where cutoff scores are used, they should
normally be set so as to be
reasonable and consistent with normal
expectations of acceptable
proficiency within the work force.

30
Cutscore Case StudyLanning v. SEPTA

Facts
Title VII class action challenging SEPTAs
requirement that applicants for the job of
transit police officer be able to run 1.5 miles
in 12 minutes.
In prior related cases, it was established that
the running requirement was job related. The
sole issue before the court was whether the cut
off was valid.

31
Cutscore Case StudyLanning v. SEPTA (contd)

Holding
The cut off established by SEPTA was valid.
Reasoning
The court looked at whether the cut off
measured the minimum qualifications necessary
for the successful performance of a transit
police officer.
Studies introduced by SEPTA showed a
statistical link between the success on the run
test and the performance of identified job
standards - Individuals who passed the run test
had a success rate of 70 to 90 and individuals
who failed the run test had a success rate of 5
to 20.
The court emphasized that the cut off does not
need to reflect a 100 rate of success, but
there should be a showing of why the cut off is
an objective measure of the minimum
qualifications for successful performance.

32
A Good Defense

Many organizations spend a considerable amount of
time and
money on the valid and defensible development of
a practical
exam or a physical ability test.
Surprisingly, after the lofty investment in the
development of
these exams, some organizations fail to establish
appropriate
training for raters involved in the
administration of the exam.

33
A Good Defense

When using practical exams or physical ability
tests,
there are two aspects of the testing program
that,
when well established, can reduce the likelihood
of a
challenge
1. Rater Training
2. Candidate Appeal Process

34
Rater Training

Proper rater training is key in minimizing
challenges to a practical exam/physical ability
test.
Standardized training materials and sessions
Inter-Rater and Intra-Rater Reliability Studies
Follow up training

35
Rater Training Standardized Materials
Practical exams and physical ability tests rely
on examination raters to identify whether or not
a candidate performed the activity or event
appropriately. One way to reduce challenges to
this type of exam is to have a robust training
program that is required of all raters on a
regular basis.
36
Rater Training Standardized Materials

Standardized materials can include the following
components
Train the Trainer Manual/Materials
Examination Rater Manual
Examination Rater Video
4. Rater Checklist

37
Rater Training Rater Reliability

When subjective judgment enters into test
scoring, evidence
should be provided on both inter-rater
consistency in scoring
and within-examinee consistency over repeated
measurements.
Standard 2.13
Does an individual rater apply the testing
standards consistently across multiple
candidates?
Do groups of raters rate the same candidate
consistently?

38
Rater Training Rater Reliability

Rater Reliability during the training process
Part of the rater training process should involve
groups of raters rating the same performance, to
evaluate whether or not a consistent testing
standard is being applied.
This process should include an opportunity for
all raters to discuss outliers and reach
consensus about the appropriate standards.

39
Rater Training Rater Reliability

Rater Reliability after the training process
Trends of individual raters should be evaluated
to monitor the
consistency of individual raters over time.
Although it should be
expected that raters will evaluate candidates
differently, it is
possible to review whether raters are
consistently shifting over time.

40
Rater Training Follow Up Training

There are instances when individuals have
developed a valid exam,
appropriately trained their raters, and then
experience problems due
to a lack of consistent, follow up training
sessions for examination
raters.
Like any other aspect of a testing program,
raters should be
evaluated on a regular basis. In addition, raters
should be required to
undergo re-training on a periodic basis.

41
Rater Training Appeal Process
One aspect of a testing program that should
always be considered during inception is the
avenue for candidate feedback and (if necessary)
appeals. Often, allowing an avenue for
candidates to request feedback or investigation
into a exam administration will reduce the
likelihood that the challenge will progress to a
legal one.
42
Rater Training Appeal Process

Important aspects of a candidate feedback and
appeal process
Public documentation of the feedback and appeal
process
Clear candidate instructions on the information
that should be included in feedback and/or appeal
Specific timeframes for responses to feedback or
appeals
Designated group of resources to address feedback
and appeal issues

43
Rater Training Appeal Process
Developing an avenue for client feedback at the
inception of a program is viewed much more
positively by courts than one that is set up
after a challenge to the exam. Processes
developed post-challenge tend to be viewed with
an air of suspicion.
44
RecommendationsCase StudyFirefighters United
for Fairness v. City of Memphis

Facts
Class action challenging the practical portion
of fire department promotional test.
Practical portion consisted of a videotaped
response to a factual situation presenting
problems commonly encountered by fire department
lieutenants and battalion chiefs.
Plaintiffs claimed the practical test violated
their due process and equal protection rights
under the Fourteenth Amendment.
Holding
The practical test did not violate Plaintiffs
rights under the Fourteenth Amendment.

45
RecommendationsCase StudyFirefighters United
for Fairness v. City of Memphis (contd)
Reasoning

Fairness in review
City established a multi-level review process
Candidates were permitted to review practical
video, transcript of practical video, answer key
of raters and submit redlines citing specific
concerns with their tests
Subject matter experts reviewed the redlines and
changed scores to reflect problems inherent in
the form, content or grading of the test, where
appropriate

Fairness in grading
Court upheld the use of two raters to grade
transcripts of practical video components of test
using answer key developed by subject matter
experts.
According to the court, this system ensured that
the capricious whim of individual assessors would
not contribute to any alleged incorrect scores.
Firefighter United for Fairness v. City of
Memphis, 362 F. Supp. 2d 963, 972 (W.D. Tenn.
2005).

46
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists

Job Analysis
Does the job analysis define the knowledge,
skills, and abilities that compose the important
and/or critical aspects of the job in question?
Was the job analysis conducted specifically for
the job in question?
Is the job analysis current and based on a
relevant candidate population?

47
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists

Criterion-Related Validity
If possible, were criterion-related validity
studies conducted?
Concurrent Study?
Predictive Study?

48
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists

Cut Score
Was a cut score study conducted with a
representative sample of subject-matter experts
(e.g., Modified Angoff Technique)?
Has the cut score process been documented?

49
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists

Rater Training
Has a standardized rater training program been
established?
Does the rater training include opportunities to
ensure rater reliability?
Is follow up training provided on a regular
basis?
Is rater data reviewed on a regular basis to
identify changes in rating trends?

50
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists

Candidate Appeal Process
Is there an avenue for candidates to provide
feedback or submit an appeal regarding an
examination administration?
Is that avenue well documented and publicly
available?
Are there designated resources available for
addressing feedback and appeals?

51
Physical Ability and Practical Exams

Questions?

52
Speaker Contact Information

Nikki Shepherd Eatchel, M.A.
Vice President, Test Development
Thomson Prometric
1260 Energy Lane
St. Paul, MN 55108
651-603-3396
nikki.eatchel_at_thomson.com

Robin Rome, Esq. Vice President, Legal and
Contracts Thomson Prometric 2000 Lenox
Drive Lawrenceville, NJ 08648 609-895-5160 robin.r
ome_at_thomson.com

Write a Comment

User Comments (0)