Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won

Description:

Nikki Shepherd Eatchel, M.A. Robin Rome, Esq. ... Nikki Shepherd Eatchel, M.A. Vice President, Test Development. Thomson Prometric ... – PowerPoint PPT presentation

Number of Views:321
Avg rating:3.0/5.0
Slides: 53
Provided by: Steph6
Category:

less

Transcript and Presenter's Notes

Title: Physical Ability Testing and Practical Examinations: They Fought the Law and the Law Won


1
Physical Ability Testing and Practical
Examinations They Fought the Law and the Law
Won
Expect the Unexpected Are We Clearly Prepared?
  • Nikki Shepherd Eatchel, M.A. Robin Rome, Esq.
  • Vice President, Test Development Vice President,
    Legal and Contracts
  • Thomson Prometric Thomson Prometric

Council on Licensure, Enforcement and Regulation
2006 Annual Conference
Alexandria, Virginia
2
Physical Ability Testing and Practical Exams
  • Goals for todays presentation
  • Outline the major risk factors for physical
    ability and practical examinations
  • Recommend specific developmental activities and
    other measures that will help withstand a legal
    challenge
  • Provide recommendations for evaluating exams
    developed by you or for you

3
Physical Ability and Practical Exams Challenges
to Validity
  • Although all employment, certification, and
  • licensure testing is certainly open to challenge,
  • exams designed to physically assess a candidates
  • performance on specific job skills and tasks are
    often
  • more vulnerable to challenge than objective
    written
  • exams.

4
Physical Ability and Practical Exams Challenges
to Validity
  • Examples of physical ability and practical exams
  • Firefighter certification
  • Police officer pre-employment
  • Nursing practical for licensure
  • Corporate product certification
  • Food safety practical for licensure

5
Physical Ability and Practical Exams Challenges
to Validity
  • Why are physical ability and practical exams more
    vulnerable to
  • challenge?
  • Reliance on exam rater judgments regarding how a
    task was performed introduces the possibility of
    error in the assessment of the skill or task
    (human error)
  • Often, when only one rater is used to assess a
    candidate, there is increased likelihood of
    disagreement between the rater and the candidate
  • Physical ability exams typically have greater
    adverse impact upon protected groups than the
    written exams involved in a employment,
    certification, or licensure process (though
    practical exams do not tend to show the same
    pattern)

6
Standards Used For Exam Evaluation
  • There are two set of standards that are often
    used to guide the development and evaluation of
    exams, they are as follows
  • Standards for Educational and Psychological
    Testing, 1999
  • Developed jointly by the American Educational
    Research Association (AERA), the American
    Psychological Association (APA), and the
    National Council on Measurement in Education
    (NCME)
  • Uniform Guidelines on Employee Selection
    Procedures, 1978
  • Developed by the Equal Employment Opportunity
    Commission (EEOC)

7
Standards Used For Exam Evaluation
  • Although both sets of standards contain valuable
    information regarding the development process
    (and both should be considered when developing a
    testing program), courts more frequently refer to
    the Uniform Guidelines as the resource for
    evaluating exams.
  • The Uniform Guidelines are entitled to great
    deference by courts deciding whether selection
    devices such as physical ability or practical
    tests comply with Title VII.
  • Griggs v. Duke Power Co., 401 U.S. 424 , 434

8
Physical Ability and Practical Exams Challenges
to Validity
  • What are the aspects of an examination that are
    most
  • likely to be scrutinized if the validity of a
    physical
  • ability or practical exam is challenged ?

9
Physical Ability and Practical Exams Challenges
to Validity
  • Job Analysis
  • Criterion-Related Validity
  • Cutscore
  • Rater Training
  • Candidate Appeal Process

10
Physical Ability and Practical Exams Job Analysis
  • A job analysis is crucial in establishing that
    the content of the physical ability or practical
    exam is valid. Key components of the job
    analysis include
  • Content Validity
  • Validity Generalization
  • Adequate and Diverse Sample Sizes

11
Job Analysis - Content Validity
  • Although there are multiple validity methods
    that
  • can be used during the test development process,
    the
  • foundation for acceptable development practice
  • continues to reside with traditional content
    validity
  • methods.
  • Supplemental validity methods are typically seen
    as
  • beneficial, yet not sufficient, when courts
    evaluate
  • testing processes.

12
Job Analysis - Content Validity
  • When evidence of validity based on test content
    is presented, the rationale
  • for defining and describing a specific job
    content domain in a particular
  • way (e.g., in terms of task to be performed or
    knowledge, skills, abilities,
  • or other personal characteristics) should be
    stated clearly.
  • Standard 14.9
  • A job analysis is necessary to identify the
    knowledge, skills and abilities necessary for
    successful job performance.
  • A selection procedure can be supported by a
    content validity strategy to the extent that it
    is a representative sample of the content of the
    job.
  • Guidelines, 29 CFR 1607.14(C)(1)

13
Job Analysis - Content ValidityCase
StudyWilliams v. Ford
  • Facts
  • Class action claiming that pre-employment test
    for unskilled hourly production workers, Hourly
    Selection System Test Battery (HSSTB),
    discriminated against African Americans.
  • Physical/practical parts of HSSTB measured
    parts assembly, visual speed and accuracy and
    precision/manual dexterity.

14
Job Analysis - Content ValidityCase
StudyWilliams v. Ford (contd)
  • Plaintiffs Position
  • Disparate impact discrimination, i.e., African
    Americans failed or scored lower on the test in
    disproportionately high numbers when compared to
    whites.
  • HSSBT was not content valid because the job
    analysis failed to demonstrate a clear linkage of
    specific requirements.
  • Fords Position
  • HSSTB was content valid as supported by a job
    analysis.
  • Job analysis consisted of
  • Supervisor identification of job inventories
  • Supervisor rating of importance of job
    requirements and job abilities identified in the
    inventories
  • Analysis of reliability ratings and data to
    identify key job requirements
  • Development of test to measure skills needed to
    perform the job requirements rated as important

15
Job Analysis - Content ValidityCase
StudyWilliams v. Ford (contd)
  • Holding
  • Ford demonstrated that the HSSTB was content
    valid.
  • Reasoning
  • Ford had the burden of showing that the HSSBT
    was job related
  • Must show by professionally acceptable
    methods, that the test is predictive or
    significantly correlated with important elements
    of work behavior that comprise or are relevant to
    the job or jobs for which the candidates are
    being evaluated. Williams v. Ford, 187 F.3d 533,
    539 (6th Cir. 1999).
  • Ford met this burden by showing that the HSSTB
    was content valid It used a professional test
    developer to conduct a job analysis that complied
    with the EEOC Guidelines.

16
Job Analysis - Validity Generalization
  • An issue often referred to in test development is
    validity
  • generalization. Validity generalization is
    defined as
  • Applying validity evidence obtained in one or
    more situations
  • to other similar situations on the basis of
    simultaneous
  • estimation, meta-analysis, or synthetic
    validation arguments.
  • Standards, 1999, p. 184

17
Job Analysis - Validity Generalization
  • Transfer of validity work from one demographic
    and/or geographic
  • area to another, while certainly possible when
    based on good initial
  • validity work and a clear delineation of the
    original and secondary
  • populations, has not been well received by courts
    as a defensible practice.
  • This has typically been due to lack of
    appropriate documentation
  • regarding the similarity of both the populations
    involved with the
  • generalization and the interpretations resulting
    from the instrument.

18
Job Analysis - Validity GeneralizationCase
StudyLegault v. aRusso
  • Facts
  • Challenge to physical abilities tests used to
    select fire department recruits.
  • Selection process included a four-part pass/fail
    physical abilities test involving climbing a
    ladder, moving a ladder from a fire engine,
    running 1.5 miles in 12 minutes, and carrying and
    pulling a fire hose. It also included a separate
    physical abilities test focusing on a balance
    beam, second hose pull and obstacle course.

19
Job Analysis - Validity GeneralizationCase
Study Legault v. aRusso (contd)
  • Holding
  • Fire department failed to show the physical
    abilities tests were job related.
  • Reasoning
  • The job analysis relied on by the fire department
    was not temporal or specific
  • - Validity was not supported by a
    several-year-old job specification that
    described the firefighters general duties.
    Legault v. aRusso, 842 F. Supp. 1479, 1488
    (D.N.H. 1994)
  • - Validity was not supported by a specification
    identifying only general tasks (e.g.,
    strenuous physical exertion, operating
    equipment and appurtenances of heavy apparatus,
    etc.). The specification also failed to break
    these tasks into component skills, assess their
    relative importance or indicate the level of
    proficiency required.
  • The physical abilities tests were not valid
    simply because they were similar to those used by
    other cities - There was no evidence these
    similar tests were validated and follow the
    leader is not an acceptable means of test
    validation. Legault, 842 F. Supp. at 1488.

20
Job Analysis Adequate and Diverse Sample Sizes
  • Adequate and diverse sample sizes are a necessity
    for ensuring validity and
  • increasing the defensibility of an exam.
  • A description of how the research sample
    compares with the relevant labor market
  • or work force, . . ., and a discussion of the
    likely effects on validity of differences
  • between the sample and the relevant labor market
    or work force, are also desirable.
  • Descriptions of educational levels, length of
    service, and age are also desirable.
  • Whether the study is predictive or concurrent,
    the sample subjects should insofar
  • as feasible be representative of the candidates
    normally available in the relevant
  • labor market for the job or group of jobs in
    question . . .
  • Uniform Guidelines

21
Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
  • Facts
  • Female applicants challenged the police
    departments height requirement and physical
    abilities test.
  • Applicants were required to be 56 and to pass
    a physical abilities test including scaling a
    wall, hanging, weight dragging and endurance
    within specific parameters.

22
Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
(contd)
  • Plaintiffs Position
  • Challenged the methodology and findings of
    validation studies presented by the City.
  • The validation studies relating to the height
    requirement did not include the individuals whom
    the police department was seeking to reject,
    i.e., those under 56.
  • The validation studies relating to the physical
    abilities test did not include those who failed
    the test and tested only success during academy
    training, not success on the job.
  • The Citys Position
  • The height requirement was job related Offered
    validation studies correlating height to
    performance
  • Questionnaire showing that taller officers tend
    to use more force and experience less suspect
    resistance
  • Simulations demonstrating that taller officers
    performed bar-arm control better than shorter
    officers
  • The physical abilities test was job related
    Offered validation studies correlating skills
    tested to measures of success during academy
    training and on the job requirements (e.g., foot
    pursuit, field shooting and emergency rescue).

23
Job Analysis Adequate and Diverse Sample
SizesCase StudyBlake v. City of Los Angeles
(contd)
  • Holding
  • The validation studies did not demonstrate that
    the height requirement and physical abilities
    tests were job related.
  • Reasoning
  • The validation studies did not reflect an
    adequate and diverse sampling.
  • The City failed to demonstrate the height
    requirement was job related because persons
    shorter than 56 were not included in the
    validation study (the study included individuals
    from 58 to 62).
  • The City failed to demonstrate the physical
    abilities test was job related because the
    validation study relied on measures of training
    success without showing that those measures were
    significantly related to job performance.

24
Criterion-Related Validity
  • When possible, the collection of
    criterion-related validity is extremely helpful
    in the defense of a physical ability test or
    practical exam.
  • A criterion-related study should consist of
    empirical data demonstrating that the selection
    procedure is predictive of or significantly
    correlated with correlated with important
    elements of performance.
  • Guidelines, 29 CFR 1607.5(B)

25
Criterion-Related Validity
  • The goal of criterion-related validity is to show
    a significant relationship between how candidates
    perform on an exam and how they subsequently
    perform on the job (with higher scores resulting
    in better performance).
  • This can be accomplished through the use of
    concurrent or predictive criterion-related
    validity.
  • Job Ratings
  • Promotional Exams
  • Etc.

26
Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland
  • Facts
  • Female plaintiffs challenged the rank-order and
    physical abilities selection examination for
    firefighters.
  • The physical abilities test required three
    skills overhead lift using barbells, fire scene
    set up and tower climb and dummy drag.

27
Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland (contd)
  • Plaintiffs Position
  • The physical abilities test did not test for
    attributes identified in the Citys job analysis
    as important to an effective firefighter.
  • The test measured attributes in which men
    traditionally excel, such as speed and strength
    (anaerobic traits), and ignored attributes in
    which women traditionally excel, such as stamina
    and endurance (aerobic traits).
  • The Citys Position
  • The test was created by a psychologist with
    significant experience developing tests for
    municipalities.
  • The physical abilities test measured attributes
    related to specific job skills.

28
Job Analysis Criterion-Related ValidityCase
StudyZamlen v. City of Cleveland (contd)
  • Holding
  • The physical abilities test was valid since it
    was based on a criterion-related study.
  • Reasoning
  • Referred to an earlier case, Berkman v. City of
    New York, 812 F.2d 52 (2d Cir. 1987), in which
    the court held that although aerobic attributes
    are an important component of firefighting, the
    Citys failure to include physical ability events
    that tested for such attributes did not
    invalidate the examination.
  • Given the extensive job analysis performed,
    although a simulated firefighting examination
    that does not test for stamina in addition to
    anaerobic capacity may be a less effective
    baromoter of firefighting abilities than one that
    does include an aerobic component, the
    deficiencies of this examination are not of the
    magnitude to render it defective, and vulnerable
    to a Title VII challenge. Zamlen, 906 F.2d 209,
    219 (6th Cir. 1990).

29
Physical Ability and Practical Exams Cut Score
  • The setting of an examination cut score is
    perhaps the most controversial
  • step within the test development process, as it
    is this step that has the
  • most obvious impact on the candidate population.
  • The Uniform Guidelines state the following in
    regard to the determination
  • of the cut score
  • Where cutoff scores are used, they should
    normally be set so as to be
  • reasonable and consistent with normal
    expectations of acceptable
  • proficiency within the work force.

30
Cutscore Case StudyLanning v. SEPTA
  • Facts
  • Title VII class action challenging SEPTAs
    requirement that applicants for the job of
    transit police officer be able to run 1.5 miles
    in 12 minutes.
  • In prior related cases, it was established that
    the running requirement was job related. The
    sole issue before the court was whether the cut
    off was valid.

31
Cutscore Case StudyLanning v. SEPTA (contd)
  • Holding
  • The cut off established by SEPTA was valid.
  • Reasoning
  • The court looked at whether the cut off
    measured the minimum qualifications necessary
    for the successful performance of a transit
    police officer.
  • Studies introduced by SEPTA showed a
    statistical link between the success on the run
    test and the performance of identified job
    standards - Individuals who passed the run test
    had a success rate of 70 to 90 and individuals
    who failed the run test had a success rate of 5
    to 20.
  • The court emphasized that the cut off does not
    need to reflect a 100 rate of success, but
    there should be a showing of why the cut off is
    an objective measure of the minimum
    qualifications for successful performance.

32
A Good Defense
  • Many organizations spend a considerable amount of
    time and
  • money on the valid and defensible development of
    a practical
  • exam or a physical ability test.
  • Surprisingly, after the lofty investment in the
    development of
  • these exams, some organizations fail to establish
    appropriate
  • training for raters involved in the
    administration of the exam.

33
A Good Defense
  • When using practical exams or physical ability
    tests,
  • there are two aspects of the testing program
    that,
  • when well established, can reduce the likelihood
    of a
  • challenge
  • 1. Rater Training
  • 2. Candidate Appeal Process

34
Rater Training
  • Proper rater training is key in minimizing
  • challenges to a practical exam/physical ability
  • test.
  • Standardized training materials and sessions
  • Inter-Rater and Intra-Rater Reliability Studies
  • Follow up training

35
Rater Training Standardized Materials
Practical exams and physical ability tests rely
on examination raters to identify whether or not
a candidate performed the activity or event
appropriately. One way to reduce challenges to
this type of exam is to have a robust training
program that is required of all raters on a
regular basis.
36
Rater Training Standardized Materials
  • Standardized materials can include the following
    components
  • Train the Trainer Manual/Materials
  • Examination Rater Manual
  • Examination Rater Video
  • 4. Rater Checklist

37
Rater Training Rater Reliability
  • When subjective judgment enters into test
    scoring, evidence
  • should be provided on both inter-rater
    consistency in scoring
  • and within-examinee consistency over repeated
  • measurements.
  • Standard 2.13
  • Does an individual rater apply the testing
    standards consistently across multiple
    candidates?
  • Do groups of raters rate the same candidate
    consistently?

38
Rater Training Rater Reliability
  • Rater Reliability during the training process
  • Part of the rater training process should involve
    groups of raters rating the same performance, to
    evaluate whether or not a consistent testing
    standard is being applied.
  • This process should include an opportunity for
    all raters to discuss outliers and reach
    consensus about the appropriate standards.

39
Rater Training Rater Reliability
  • Rater Reliability after the training process
  • Trends of individual raters should be evaluated
    to monitor the
  • consistency of individual raters over time.
    Although it should be
  • expected that raters will evaluate candidates
    differently, it is
  • possible to review whether raters are
    consistently shifting over time.

40
Rater Training Follow Up Training
  • There are instances when individuals have
    developed a valid exam,
  • appropriately trained their raters, and then
    experience problems due
  • to a lack of consistent, follow up training
    sessions for examination
  • raters.
  • Like any other aspect of a testing program,
    raters should be
  • evaluated on a regular basis. In addition, raters
    should be required to
  • undergo re-training on a periodic basis.

41
Rater Training Appeal Process
One aspect of a testing program that should
always be considered during inception is the
avenue for candidate feedback and (if necessary)
appeals. Often, allowing an avenue for
candidates to request feedback or investigation
into a exam administration will reduce the
likelihood that the challenge will progress to a
legal one.
42
Rater Training Appeal Process
  • Important aspects of a candidate feedback and
    appeal process
  • Public documentation of the feedback and appeal
    process
  • Clear candidate instructions on the information
    that should be included in feedback and/or appeal
  • Specific timeframes for responses to feedback or
    appeals
  • Designated group of resources to address feedback
    and appeal issues

43
Rater Training Appeal Process
Developing an avenue for client feedback at the
inception of a program is viewed much more
positively by courts than one that is set up
after a challenge to the exam. Processes
developed post-challenge tend to be viewed with
an air of suspicion.
44
RecommendationsCase StudyFirefighters United
for Fairness v. City of Memphis
  • Facts
  • Class action challenging the practical portion
    of fire department promotional test.
  • Practical portion consisted of a videotaped
    response to a factual situation presenting
    problems commonly encountered by fire department
    lieutenants and battalion chiefs.
  • Plaintiffs claimed the practical test violated
    their due process and equal protection rights
    under the Fourteenth Amendment.
  • Holding
  • The practical test did not violate Plaintiffs
    rights under the Fourteenth Amendment.

45
RecommendationsCase StudyFirefighters United
for Fairness v. City of Memphis (contd)
Reasoning
  • Fairness in review
  • City established a multi-level review process
  • Candidates were permitted to review practical
    video, transcript of practical video, answer key
    of raters and submit redlines citing specific
    concerns with their tests
  • Subject matter experts reviewed the redlines and
    changed scores to reflect problems inherent in
    the form, content or grading of the test, where
    appropriate
  • Fairness in grading
  • Court upheld the use of two raters to grade
    transcripts of practical video components of test
    using answer key developed by subject matter
    experts.
  • According to the court, this system ensured that
    the capricious whim of individual assessors would
    not contribute to any alleged incorrect scores.
    Firefighter United for Fairness v. City of
    Memphis, 362 F. Supp. 2d 963, 972 (W.D. Tenn.
    2005).

46
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
  • Job Analysis
  • Does the job analysis define the knowledge,
    skills, and abilities that compose the important
    and/or critical aspects of the job in question?
  • Was the job analysis conducted specifically for
    the job in question?
  • Is the job analysis current and based on a
    relevant candidate population?

47
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
  • Criterion-Related Validity
  • If possible, were criterion-related validity
    studies conducted?
  • Concurrent Study?
  • Predictive Study?

48
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
  • Cut Score
  • Was a cut score study conducted with a
    representative sample of subject-matter experts
    (e.g., Modified Angoff Technique)?
  • Has the cut score process been documented?

49
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
  • Rater Training
  • Has a standardized rater training program been
    established?
  • Does the rater training include opportunities to
    ensure rater reliability?
  • Is follow up training provided on a regular
    basis?
  • Is rater data reviewed on a regular basis to
    identify changes in rating trends?

50
Physical Ability and Practical Exams
Recommendations and Evaluation Checklists
  • Candidate Appeal Process
  • Is there an avenue for candidates to provide
    feedback or submit an appeal regarding an
    examination administration?
  • Is that avenue well documented and publicly
    available?
  • Are there designated resources available for
    addressing feedback and appeals?

51
Physical Ability and Practical Exams
  • Questions?

52
Speaker Contact Information
  • Nikki Shepherd Eatchel, M.A.
  • Vice President, Test Development
  • Thomson Prometric
  • 1260 Energy Lane
  • St. Paul, MN 55108
  • 651-603-3396
  • nikki.eatchel_at_thomson.com

Robin Rome, Esq. Vice President, Legal and
Contracts Thomson Prometric 2000 Lenox
Drive Lawrenceville, NJ 08648 609-895-5160 robin.r
ome_at_thomson.com
Write a Comment
User Comments (0)
About PowerShow.com