Title: Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won
1Physical Ability Testing and Practical
Examinations They Fought the Law and the Law
Won
Expect the Unexpected Are We Clearly Prepared?
- Nikki Shepherd Eatchel, M.A. Robin Rome, Esq.
- Vice President, Test Development Vice President,
Legal and Contracts - Thomson Prometric Thomson Prometric
Council on Licensure, Enforcement and Regulation
2006 Annual Conference
Alexandria, Virginia
2Physical Ability Testing and Practical Exams
- Goals for todays presentation
- Outline the major risk factors for physical
ability and practical examinations - Recommend specific developmental activities and
other measures that will help withstand a legal
challenge - Provide recommendations for evaluating exams
developed by you or for you
3Physical Ability and Practical Exams Challenges
to Validity
- Although all employment, certification, and
- licensure testing is certainly open to challenge,
- exams designed to physically assess a candidates
- performance on specific job skills and tasks are
often - more vulnerable to challenge than objective
written - exams.
4Physical Ability and Practical Exams Challenges
to Validity
- Examples of physical ability and practical exams
- Firefighter certification
- Police officer pre-employment
- Nursing practical for licensure
- Corporate product certification
- Food safety practical for licensure
5Physical Ability and Practical Exams Challenges
to Validity
- Why are physical ability and practical exams more
vulnerable to - challenge?
- Reliance on exam rater judgments regarding how a
task was performed introduces the possibility of
error in the assessment of the skill or task
(human error) - Often, when only one rater is used to assess a
candidate, there is increased likelihood of
disagreement between the rater and the candidate - Physical ability exams typically have greater
adverse impact upon protected groups than the
written exams involved in a employment,
certification, or licensure process (though
practical exams do not tend to show the same
pattern)
6Standards Used For Exam Evaluation
- There are two set of standards that are often
used to guide the development and evaluation of
exams, they are as follows - Standards for Educational and Psychological
Testing, 1999 - Developed jointly by the American Educational
Research Association (AERA), the American
Psychological Association (APA), and the
National Council on Measurement in Education
(NCME) - Uniform Guidelines on Employee Selection
Procedures, 1978 - Developed by the Equal Employment Opportunity
Commission (EEOC)
7Standards Used For Exam Evaluation
- Although both sets of standards contain valuable
information regarding the development process
(and both should be considered when developing a
testing program), courts more frequently refer to
the Uniform Guidelines as the resource for
evaluating exams. - The Uniform Guidelines are entitled to great
deference by courts deciding whether selection
devices such as physical ability or practical
tests comply with Title VII. - Griggs v. Duke Power Co., 401 U.S. 424 , 434
8Physical Ability and Practical Exams Challenges
to Validity
- What are the aspects of an examination that are
most - likely to be scrutinized if the validity of a
physical - ability or practical exam is challenged ?
9Physical Ability and Practical Exams Challenges
to Validity
- Job Analysis
- Criterion-Related Validity
- Cutscore
- Rater Training
- Candidate Appeal Process
10Physical Ability and Practical Exams Job Analysis
- A job analysis is crucial in establishing that
the content of the physical ability or practical
exam is valid. Key components of the job
analysis include - Content Validity
- Validity Generalization
- Adequate and Diverse Sample Sizes
-
11Job Analysis - Content Validity
- Although there are multiple validity methods
that - can be used during the test development process,
the - foundation for acceptable development practice
- continues to reside with traditional content
validity - methods.
- Supplemental validity methods are typically seen
as - beneficial, yet not sufficient, when courts
evaluate - testing processes.
12Job Analysis - Content Validity
- When evidence of validity based on test content
is presented, the rationale - for defining and describing a specific job
content domain in a particular - way (e.g., in terms of task to be performed or
knowledge, skills, abilities, - or other personal characteristics) should be
stated clearly. - Standard 14.9
- A job analysis is necessary to identify the
knowledge, skills and abilities necessary for
successful job performance. - A selection procedure can be supported by a
content validity strategy to the extent that it
is a representative sample of the content of the
job. - Guidelines, 29 CFR 1607.14(C)(1)
13Job Analysis - Content ValidityCase
StudyWilliams v. Ford
- Facts
- Class action claiming that pre-employment test
for unskilled hourly production workers, Hourly
Selection System Test Battery (HSSTB),
discriminated against African Americans. - Physical/practical parts of HSSTB measured
parts assembly, visual speed and accuracy and
precision/manual dexterity. -
-
-
14Job Analysis - Content ValidityCase
StudyWilliams v. Ford (contd)
- Plaintiffs Position
- Disparate impact discrimination, i.e., African
Americans failed or scored lower on the test in
disproportionately high numbers when compared to
whites. - HSSBT was not content valid because the job
analysis failed to demonstrate a clear linkage of
specific requirements.
- Fords Position
- HSSTB was content valid as supported by a job
analysis. - Job analysis consisted of
- Supervisor identification of job inventories
- Supervisor rating of importance of job
requirements and job abilities identified in the
inventories - Analysis of reliability ratings and data to
identify key job requirements - Development of test to measure skills needed to
perform the job requirements rated as important
15Job Analysis - Content ValidityCase
StudyWilliams v. Ford (contd)
- Holding
- Ford demonstrated that the HSSTB was content
valid. - Reasoning
- Ford had the burden of showing that the HSSBT
was job related - Must show by professionally acceptable
methods, that the test is predictive or
significantly correlated with important elements
of work behavior that comprise or are relevant to
the job or jobs for which the candidates are
being evaluated. Williams v. Ford, 187 F.3d 533,
539 (6th Cir. 1999). - Ford met this burden by showing that the HSSTB
was content valid It used a professional test
developer to conduct a job analysis that complied
with the EEOC Guidelines.
16Job Analysis - Validity Generalization
- An issue often referred to in test development is
validity - generalization. Validity generalization is
defined as - Applying validity evidence obtained in one or
more situations - to other similar situations on the basis of
simultaneous - estimation, meta-analysis, or synthetic
validation arguments. -
- Standards, 1999, p. 184
17Job Analysis - Validity Generalization
- Transfer of validity work from one demographic
and/or geographic - area to another, while certainly possible when
based on good initial - validity work and a clear delineation of the
original and secondary - populations, has not been well received by courts
as a defensible practice. - This has typically been due to lack of
appropriate documentation - regarding the similarity of both the populations
involved with the - generalization and the interpretations resulting
from the instrument.
18Job Analysis - Validity GeneralizationCase
StudyLegault v. aRusso
- Facts
- Challenge to physical abilities tests used to
select fire department recruits. - Selection process included a four-part pass/fail
physical abilities test involving climbing a
ladder, moving a ladder from a fire engine,
running 1.5 miles in 12 minutes, and carrying and
pulling a fire hose. It also included a separate
physical abilities test focusing on a balance
beam, second hose pull and obstacle course.
19Job Analysis - Validity GeneralizationCase
Study Legault v. aRusso (contd)
- Holding
- Fire department failed to show the physical
abilities tests were job related. - Reasoning
- The job analysis relied on by the fire department
was not temporal or specific - - Validity was not supported by a
several-year-old job specification that
described the firefighters general duties.
Legault v. aRusso, 842 F. Supp. 1479, 1488
(D.N.H. 1994) - - Validity was not supported by a specification
identifying only general tasks (e.g.,
strenuous physical exertion, operating
equipment and appurtenances of heavy apparatus,
etc.). The specification also failed to break
these tasks into component skills, assess their
relative importance or indicate the level of
proficiency required. - The physical abilities tests were not valid
simply because they were similar to those used by
other cities - There was no evidence these
similar tests were validated and follow the
leader is not an acceptable means of test
validation. Legault, 842 F. Supp. at 1488.
20Job Analysis Adequate and Diverse Sample Sizes
- Adequate and diverse sample sizes are a necessity
for ensuring validity and - increasing the defensibility of an exam.
- A description of how the research sample
compares with the relevant labor market - or work force, . . ., and a discussion of the
likely effects on validity of differences - between the sample and the relevant labor market
or work force, are also desirable. - Descriptions of educational levels, length of
service, and age are also desirable. - Whether the study is predictive or concurrent,
the sample subjects should insofar - as feasible be representative of the candidates
normally available in the relevant - labor market for the job or group of jobs in
question . . . - Uniform Guidelines
21Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
- Facts
- Female applicants challenged the police
departments height requirement and physical
abilities test. - Applicants were required to be 56 and to pass
a physical abilities test including scaling a
wall, hanging, weight dragging and endurance
within specific parameters.
22Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
(contd)
- Plaintiffs Position
- Challenged the methodology and findings of
validation studies presented by the City. - The validation studies relating to the height
requirement did not include the individuals whom
the police department was seeking to reject,
i.e., those under 56. - The validation studies relating to the physical
abilities test did not include those who failed
the test and tested only success during academy
training, not success on the job.
- The Citys Position
- The height requirement was job related Offered
validation studies correlating height to
performance - Questionnaire showing that taller officers tend
to use more force and experience less suspect
resistance - Simulations demonstrating that taller officers
performed bar-arm control better than shorter
officers - The physical abilities test was job related
Offered validation studies correlating skills
tested to measures of success during academy
training and on the job requirements (e.g., foot
pursuit, field shooting and emergency rescue).
23Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
(contd)
- Holding
- The validation studies did not demonstrate that
the height requirement and physical abilities
tests were job related. - Reasoning
- The validation studies did not reflect an
adequate and diverse sampling. - The City failed to demonstrate the height
requirement was job related because persons
shorter than 56 were not included in the
validation study (the study included individuals
from 58 to 62). - The City failed to demonstrate the physical
abilities test was job related because the
validation study relied on measures of training
success without showing that those measures were
significantly related to job performance.
24Criterion-Related Validity
- When possible, the collection of
criterion-related validity is extremely helpful
in the defense of a physical ability test or
practical exam. - A criterion-related study should consist of
empirical data demonstrating that the selection
procedure is predictive of or significantly
correlated with correlated with important
elements of performance. - Guidelines, 29 CFR 1607.5(B)
25Criterion-Related Validity
- The goal of criterion-related validity is to show
a significant relationship between how candidates
perform on an exam and how they subsequently
perform on the job (with higher scores resulting
in better performance). - This can be accomplished through the use of
concurrent or predictive criterion-related
validity. - Job Ratings
- Promotional Exams
- Etc.
26Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland
- Facts
- Female plaintiffs challenged the rank-order and
physical abilities selection examination for
firefighters. - The physical abilities test required three
skills overhead lift using barbells, fire scene
set up and tower climb and dummy drag.
27Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland (contd)
- Plaintiffs Position
- The physical abilities test did not test for
attributes identified in the Citys job analysis
as important to an effective firefighter. - The test measured attributes in which men
traditionally excel, such as speed and strength
(anaerobic traits), and ignored attributes in
which women traditionally excel, such as stamina
and endurance (aerobic traits).
- The Citys Position
- The test was created by a psychologist with
significant experience developing tests for
municipalities. - The physical abilities test measured attributes
related to specific job skills.
28Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland (contd)
- Holding
- The physical abilities test was valid since it
was based on a criterion-related study. - Reasoning
- Referred to an earlier case, Berkman v. City of
New York, 812 F.2d 52 (2d Cir. 1987), in which
the court held that although aerobic attributes
are an important component of firefighting, the
Citys failure to include physical ability events
that tested for such attributes did not
invalidate the examination. - Given the extensive job analysis performed,
although a simulated firefighting examination
that does not test for stamina in addition to
anaerobic capacity may be a less effective
baromoter of firefighting abilities than one that
does include an aerobic component, the
deficiencies of this examination are not of the
magnitude to render it defective, and vulnerable
to a Title VII challenge. Zamlen, 906 F.2d 209,
219 (6th Cir. 1990). -
29Physical Ability and Practical Exams Cut Score
- The setting of an examination cut score is
perhaps the most controversial - step within the test development process, as it
is this step that has the - most obvious impact on the candidate population.
- The Uniform Guidelines state the following in
regard to the determination - of the cut score
- Where cutoff scores are used, they should
normally be set so as to be - reasonable and consistent with normal
expectations of acceptable - proficiency within the work force.
30Cutscore Case StudyLanning v. SEPTA
- Facts
- Title VII class action challenging SEPTAs
requirement that applicants for the job of
transit police officer be able to run 1.5 miles
in 12 minutes. - In prior related cases, it was established that
the running requirement was job related. The
sole issue before the court was whether the cut
off was valid.
31Cutscore Case StudyLanning v. SEPTA (contd)
- Holding
- The cut off established by SEPTA was valid.
- Reasoning
- The court looked at whether the cut off
measured the minimum qualifications necessary
for the successful performance of a transit
police officer. - Studies introduced by SEPTA showed a
statistical link between the success on the run
test and the performance of identified job
standards - Individuals who passed the run test
had a success rate of 70 to 90 and individuals
who failed the run test had a success rate of 5
to 20. - The court emphasized that the cut off does not
need to reflect a 100 rate of success, but
there should be a showing of why the cut off is
an objective measure of the minimum
qualifications for successful performance.
32A Good Defense
- Many organizations spend a considerable amount of
time and - money on the valid and defensible development of
a practical - exam or a physical ability test.
- Surprisingly, after the lofty investment in the
development of - these exams, some organizations fail to establish
appropriate - training for raters involved in the
administration of the exam.
33A Good Defense
- When using practical exams or physical ability
tests, - there are two aspects of the testing program
that, - when well established, can reduce the likelihood
of a - challenge
- 1. Rater Training
- 2. Candidate Appeal Process
34Rater Training
- Proper rater training is key in minimizing
- challenges to a practical exam/physical ability
- test.
- Standardized training materials and sessions
- Inter-Rater and Intra-Rater Reliability Studies
- Follow up training
35Rater Training Standardized Materials
Practical exams and physical ability tests rely
on examination raters to identify whether or not
a candidate performed the activity or event
appropriately. One way to reduce challenges to
this type of exam is to have a robust training
program that is required of all raters on a
regular basis.
36Rater Training Standardized Materials
- Standardized materials can include the following
components - Train the Trainer Manual/Materials
- Examination Rater Manual
- Examination Rater Video
- 4. Rater Checklist
37Rater Training Rater Reliability
- When subjective judgment enters into test
scoring, evidence - should be provided on both inter-rater
consistency in scoring - and within-examinee consistency over repeated
- measurements.
- Standard 2.13
- Does an individual rater apply the testing
standards consistently across multiple
candidates? - Do groups of raters rate the same candidate
consistently?
38Rater Training Rater Reliability
- Rater Reliability during the training process
- Part of the rater training process should involve
groups of raters rating the same performance, to
evaluate whether or not a consistent testing
standard is being applied. - This process should include an opportunity for
all raters to discuss outliers and reach
consensus about the appropriate standards.
39Rater Training Rater Reliability
- Rater Reliability after the training process
- Trends of individual raters should be evaluated
to monitor the - consistency of individual raters over time.
Although it should be - expected that raters will evaluate candidates
differently, it is - possible to review whether raters are
consistently shifting over time.
40Rater Training Follow Up Training
- There are instances when individuals have
developed a valid exam, - appropriately trained their raters, and then
experience problems due - to a lack of consistent, follow up training
sessions for examination - raters.
- Like any other aspect of a testing program,
raters should be - evaluated on a regular basis. In addition, raters
should be required to - undergo re-training on a periodic basis.
41Rater Training Appeal Process
One aspect of a testing program that should
always be considered during inception is the
avenue for candidate feedback and (if necessary)
appeals. Often, allowing an avenue for
candidates to request feedback or investigation
into a exam administration will reduce the
likelihood that the challenge will progress to a
legal one.
42Rater Training Appeal Process
- Important aspects of a candidate feedback and
appeal process - Public documentation of the feedback and appeal
process - Clear candidate instructions on the information
that should be included in feedback and/or appeal - Specific timeframes for responses to feedback or
appeals - Designated group of resources to address feedback
and appeal issues
43Rater Training Appeal Process
Developing an avenue for client feedback at the
inception of a program is viewed much more
positively by courts than one that is set up
after a challenge to the exam. Processes
developed post-challenge tend to be viewed with
an air of suspicion.
44RecommendationsCase StudyFirefighters United
for Fairness v. City of Memphis
- Facts
- Class action challenging the practical portion
of fire department promotional test. - Practical portion consisted of a videotaped
response to a factual situation presenting
problems commonly encountered by fire department
lieutenants and battalion chiefs. - Plaintiffs claimed the practical test violated
their due process and equal protection rights
under the Fourteenth Amendment. - Holding
- The practical test did not violate Plaintiffs
rights under the Fourteenth Amendment.
45RecommendationsCase StudyFirefighters United
for Fairness v. City of Memphis (contd)
Reasoning
- Fairness in review
- City established a multi-level review process
- Candidates were permitted to review practical
video, transcript of practical video, answer key
of raters and submit redlines citing specific
concerns with their tests - Subject matter experts reviewed the redlines and
changed scores to reflect problems inherent in
the form, content or grading of the test, where
appropriate
- Fairness in grading
- Court upheld the use of two raters to grade
transcripts of practical video components of test
using answer key developed by subject matter
experts. - According to the court, this system ensured that
the capricious whim of individual assessors would
not contribute to any alleged incorrect scores.
Firefighter United for Fairness v. City of
Memphis, 362 F. Supp. 2d 963, 972 (W.D. Tenn.
2005).
46Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
- Job Analysis
- Does the job analysis define the knowledge,
skills, and abilities that compose the important
and/or critical aspects of the job in question? - Was the job analysis conducted specifically for
the job in question? - Is the job analysis current and based on a
relevant candidate population?
47Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
- Criterion-Related Validity
- If possible, were criterion-related validity
studies conducted? - Concurrent Study?
- Predictive Study?
48Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
- Cut Score
- Was a cut score study conducted with a
representative sample of subject-matter experts
(e.g., Modified Angoff Technique)? - Has the cut score process been documented?
49Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
- Rater Training
- Has a standardized rater training program been
established? - Does the rater training include opportunities to
ensure rater reliability? - Is follow up training provided on a regular
basis? - Is rater data reviewed on a regular basis to
identify changes in rating trends?
50Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
- Candidate Appeal Process
- Is there an avenue for candidates to provide
feedback or submit an appeal regarding an
examination administration? - Is that avenue well documented and publicly
available? - Are there designated resources available for
addressing feedback and appeals?
51Physical Ability and Practical Exams
52Speaker Contact Information
- Nikki Shepherd Eatchel, M.A.
- Vice President, Test Development
- Thomson Prometric
- 1260 Energy Lane
- St. Paul, MN 55108
- 651-603-3396
- nikki.eatchel_at_thomson.com
Robin Rome, Esq. Vice President, Legal and
Contracts Thomson Prometric 2000 Lenox
Drive Lawrenceville, NJ 08648 609-895-5160 robin.r
ome_at_thomson.com