Title: Performance-Based Testing to Measure Ophthalmic Skills Using Computer Simulation
1Performance-Based Testing to Measure Ophthalmic
Skills Using Computer Simulation
- Authors
- John T. LiVecchi, MD William Ehlers, MD Lynn
Anderson, PhD - Assistant Clinical Professor Associate
Professor Chief Executive Officer - Drexel University College of Medicine University
of CT Health Center Joint Commission on Allied
Health - and University of Central Florida University of
Connecticut Personnel in Ophthalmology (JCAHPO) - College of Medicine
- Director of Oculoplastic Surgery
- St. Lukes Cataract Laser Institute
- Overview
- JCAHPO is a non-profit, non-governmental
organization that provides certification of
ophthalmic medical assistants and performs other
educational and credentialing services. JCAHPO is
governed by a Board of Directors composed of
representatives from participating ophthalmic
organizations and a public member.
(April, 2011) - The authors have no financial interest in
the subject matter of this poster.
2Abstract
- Purpose
- To investigate the validity and reliability of
an interactive computer-based simulation and test
a computer automated scoring algorithm to replace
clinical hands-on skill testing with live
observers by assessing the knowledge and
performance of ophthalmic technicians on clinical
skills. - Design
- Validity and reliability study of video-taped
ophthalmic technicians performance of computer
simulations on 12 clinical skills. - Participants
- 50 JCAHPO candidates Certified Ophthalmic
Technician (COT) or Certified Ophthalmic Medical
Technologist (COMT). - Methods
- Tests were conducted to evaluate ophthalmic
technicians knowledge and ability to perform 12
ophthalmic skills using high fidelity computer
simulations in July 2003 and again in August
2010. Performance checklists on technique and
task results were developed based on best
practices. A scoring rationale was established to
evaluate performance using weighted scores and
computer adapted algorithms. Candidate
performance was evaluated by a computer-automated
scoring system and expert evaluations of
video-computer recording of skills tests.
Inter-rater reliability of the instruments was
investigated by comparing the agreement of the
computer scoring and the rating of two ophthalmic
professional raters on the scoring agreement of a
process step and results between the computer and
the raters . Computer and rater agreement for a
particular step must be statistically significant
by Chi-square analysis or a percentage of
agreement of 90 or higher. - Results
- Of 80 process steps evaluated in seven COT
skills, 71 of the process steps were found to be
in agreement (statistically significant by
Chi-square or 90 agreement criteria) and 29 of
the process steps were found to be suspect.
Similarly, of five COMT skills with 86 process
steps evaluated, 75 were in agreement and 25 of
the process steps were suspect. Given the high
degree of agreement between the raters and
computer scoring, the inter-rater reliability was
judged to be high. - Conclusions
- Our results suggest that computer performance
scoring is a valid and reliable scoring system.
This research found - a high level of correspondence between human
scoring and computer-automated scoring systems.
3Tasks Performed
- Keratometry
- Lensometry
- Tonometry
- Ocular Motility
- Visual Fields
- Retinoscopy
- Refinement
- Versions and Ductions
- Pupil Assessment
- Manual Lensometry with Prism
- Ocular Motility with Prism
- Photography with Fluorescein Angiography
4Simulation Design
- Standardized skill checklists were created based
on best practices - Multiple scenarios were created for each skill,
which were randomly administered. - Interactive arrows allow candidates to manipulate
simulated equipment. - Fidelity (realistic reliable) analysis assessed
the degree to which test simulation required the
same behavior as those required by the task.
Necessary fidelity allows a person to - - Manipulate the simulation
- Clearly understand where they are in performance
- Demonstrate capability on evaluative criteria
5Simulation Test Design Challenges
- Important considerations in the development of
the simulation scoring included - Accurate presentation of the skill through
simulation - Presentation of correct alternative procedures
- Presentation of incorrect alternative procedures
- Not performing the step correctly
- Performing the steps out of order
- Arriving at the wrong answer even if the correct
process is used - Scoring Differentiate exploration and
intentional performance - Validation of all aspects of the simulation to
ensure successful candidate navigation,
usability, and fidelity - Candidate tutorial training to ensure confident
interaction with simulated equipment and tasks on
the performance test
6Test Design, Simulation Scoring, and Rating
- Candidate performance was evaluated on technique
and results on each of the 12 ophthalmic tasks. - Procedural checklists were developed for all
tasks based on best practices. Subject matter
experts including ophthalmologists and certified
ophthalmic technician job incumbents determined
criteria for judging correct completion of each
procedural step, and if steps were completed in
an acceptable process order. (In some cases, the
procedural step could be completed in any order
and still yield a satisfactory process.) - Each step on performance checklists was analyzed
to determine the importance of the step and a
weighted point-value was assigned for scoring.
These weighted checklists were then used by
raters and the computer for scoring. - The values ranged from 6 points for a step
considered to be important but have little impact
on satisfactory performance to 21 points for a
step that was considered critical to
satisfactorily completing the skill. A cut score
was established for passing the skill
performance. - Using the computer, candidates were tested on all
skills. Candidate performance was scored by the
computer and a video-computer recording was
created for evaluation by live rater observation.
- Computer automated scoring has a high correlation
to live rater observation scoring. 1, 2 - The results were compared to determine the
agreement between computer scoring and the
scoring of professional raters using the same
checklists. - The accuracy of the skills test results was also
evaluated. Each tasks results were compared - to professional standards for performing the
skill for each scenario presented within - the simulation.
7Validity Analysis
- Computer simulation validity measures included
content, user, and scoring validity. - Measurement of the candidates ability to
accurately complete the task was based on
performance checklists. - To ensure that computer scoring and rater scoring
was being done on the same candidate performance,
each candidates performance of a computer
simulation skill was recorded on video for
viewing by the observers. - The scoring of the simulations was validated by
comparing the candidates scores on each skill
with job incumbent professionals assessments of
the candidates performance. - The raters were asked to evaluate whether the
candidate performed each step correctly, and if
the order of performing the steps was acceptable
given the criteria presented in the checklist. - The computer scoring based on the criteria
specified in scoring checklists was compared to
ophthalmic professionals judgments using the
same checklists.
8Data Analysis
- Test validity was high with candidate pass rates
over 80 on the various individual tasks. - Candidates were surveyed on their perceptions of
the simulations accurate portrayal of clinical
skills they perform for daily job performance. - The inter-rater reliability of the instruments
was analyzed by comparing the computer scoring of
the candidates to the ratings of the two
ophthalmic professionals using the same checklist
with a confidence interval of /- 95. - Scores generated by the computer and scores
generated by each rater were entered into a
database as exhibited in Table 1 (Slide 9). A
representative sample task (keratometry) is
displayed. - The scores for a tests overall process steps and
the accuracy of results were compared. - The decision rule used to determine the raters
score which was compared with the computer score
was as follows - Scores of both raters had to agree with each
other for a process step for a given candidate to
be included in the analysis. - If the two raters did not agree, a third rater
evaluated the process for final analysis. - Table 2 (Slide 10) indicates representative
results for inter-rater reliability for three
tasks with agreement between the computer scoring
and the rater scoring. - Chi-square and percentage of agreement analyses
were used to determine - statistical significance.
9Data Comparison of Computer Scoring and Rater
Scoring
Table 1
Test Process Computer Rater 1 Rater 2 Computer Rater 1 Rater 2
EXAMPLE Candidate 1 Candidate 1 Candidate 1 Candidate 2 Candidate 2 Candidate 2
Keratometry Focus the eyepiece 13 13 13 13 13 13
Instruct patient 13 13 13 13 13 13
Total Process Score 74 80 80 80 80 80
Total Results Pass Pass Pass Pass Pass Pass
Vertical Power Results Fail Fail Fail Pass Pass Pass
Vertical Axis Results Pass Pass Pass Pass Pass Pass
Horizontal Power Results Pass Pass Pass Pass Pass Pass
Horizontal Axis Results Pass Pass Pass Pass Pass Pass
10Agreement Between the Computer Scoring and the
Rater Scoring
Table 2
Test Process Decision Reason Rater Agree All Events PoPercent of Agreement
Keratometry Focus eyepiece Acceptable chi2sig 10 11 1.000
Position keratometer Acceptable po1 11 11 1.000
Position patient Not rated 0 11 .000
Record the horizontal drum reading Suspect polt.9 7 11 0.857
Lensometry Focus eyepiece Suspect polt.9 10 12 0.800
Ocular Motility Instruct patient Acceptable chi2sig 24 24 0.958
Cover-Uncover test Acceptable chi2sig 17 24 0.941
11Results
- Validity
- 90 of the candidates reported that the COT
simulation accurately portrayed the clinical
skills they perform for daily job performance. - 89 of the candidates reported that the COMT
simulation accurately portrayed the clinical
skills they perform for daily job performance. - The same scoring checklist was used by both the
computer and raters to judge the candidate
performance, assuring consistent and objective
measurement rather than subjective judgment
regarding candidate skills. - Reliability
- Of 80 process steps evaluated in seven COT
skills, 71 of the process steps were found to be
in agreement (statistically significant by
Chi-square or 90 agreement criteria) and 29 of
the process steps were found to be suspect. - Of five COMT skills with 86 process steps
evaluated, 75 were in agreement and 25 of the
process steps were suspect. - Given the high degree of agreement between the
raters and computer scoring, the inter-rater
reliability was judged to be high. -
12Discussion and Conclusions
- Discussion
- Computer simulations are now commonly used for
education and entertainment. The key to
incorporating new technologies to improve skills
assessment is to formally incorporate automated
scoring of individual performance steps
identified in a checklist developed by subject
matter experts and weighted with regard to
importance of each step and performance of steps
in the correct order when necessary. High
fidelity computer simulations, with objective
analysis of the correct completion of checklist
steps and the determination of accurate test
results, can provide accurate assessment of
ophthalmic technicians clinical skills. - Conclusion
- This comparative analysis demonstrates a high
level of correspondence between human scoring and
computer-automated scoring systems. Our results
suggest that computer performance scoring is a
valid and reliable system for assessing the
clinical skills of Ophthalmic Technicians. This
research further supports that computer
simulation testing improves performance-based
assessment by standardizing the examination and
reducing observer bias. These findings are useful
in evaluating and improving the training and
certification of ophthalmic technicians. -
- References
- 1. Williamson, David M., Mislevy, Robert J. and
Bejar, Isaac I. Automated Scoring of Complex
Tasks in Computer Based Testing An Introduction.
Mahwah, NJ Lawrence Erlbaum Associates, Inc.,
2006. - 2. Yang, Y., Buckendahl, C.W., Juszkiewica, P.J.,
Bhola, D.S. (2002). A review of strategies for
validating computer automated - scoring. Applied Measurement in Education, 15
(4), 391.