Title: Computational%20Methods%20for%20Finding%20Patterns%20of%20Human%20and%20System%20
1Computational Methods for Finding Patterns
ofHuman and System Failure in Mishap Reports
Chris Johnson University of Glasgow,
Scotland. http//www.dcs.gla.ac.uk/johnson UCD
12th December 2003
2Johnson, Le Galo and Blaize European Incident
Reporting Requirements in Air Traffic
Management, EUROCONTROL, 2000.
3bad
good
4Centers and contractors used Problem Reporting
and Corrective Action database differently,
preventing comparisons across the database.
NASA safety managers complain that the Web
Program Compliance Assurance and Status System is
too cumbersome. Personnel use Lessons Learned
Information System only on an ad hoc basis.
Hazard reports rarely communicated effectively,
nor are databases used by engineers and managers
capable of translating operational experiences
into effective risk management practices.
(CAIB, p.189)
5(No Transcript)
6(No Transcript)
7- Probabilistic information retrieval
- Avoids problem of codification
- But issues of precision and recall.
- Conversational case based reasoning
- Extended form of US Navys NACODAE system
- Flexible precision recall.
- Word sense disambiguation etc.
8FAA GAIN lacks computational support.Someone
must address this opportunity
Meta-Level Concerns for Aerospace
9Linda, JavaSpaces and Middleware for Incident
Reporting
- Concurrency and distribution
ltB777, 1/12/2003, On final approachgt
ltB737, Maintenance failure on gt
ltA320, 12/12/2003, ATC came throughgt
ltA320, No clearancegt
Australia
UK
lt Weather poor but gt
US
10Linda, JavaSpaces and Middleware for Incident
Reporting
- Overloading of matching operators
ltA320, ?, ?gt
lt?, ?, match(CRM)gt
ltB777, 1/12/2003, On final approachgt
ltB737, Maintenance failure on gt
ltA320, 12/12/2003, ATC came throughgt
ltA320, No clearancegt
Australia
UK
lt Weather poor but gt
US
11Linda, JavaSpaces and Middleware for Incident
Reporting
ltA320, ?, ?gt
lt?, ?, match(CRM)gt
ltB777, 1/12/2003, On final approachgt
ltB737, Maintenance failure on gt
ltA320, 12/12/2003, ATC came throughgt
ltA320, No clearancegt
Australia
UK
lt Weather poor but gt
US
12So does the software say something new and useful?
13Case Study 1 FDA Telemedicine
- Medical errors lead to
- 45,000-100,000 deaths (US).
- RTA43,000, Aids16,000.
- Additional care 15 billion
- 45 have some mishap.
- 17 prolonged hospital stay.
Look, Im not blaming you, Im just suing you
14- SE Virginia medical centres
- 1 nurse monitors system
- 49 remote patients
- 5 ICUs at 3 centres.
- Staff 50-80 of ICU budget.
Courtesy NASA Telemedicine Instrumentation Pack
project
Courtesy Univ. of Virginia, Office of
Telemedicine
15(No Transcript)
16(No Transcript)
17Findings from MAUDE Safety Culture and
Telemedical Mishaps
- Introduction of telemedicine implies
- less clinical staff more technical staff
- technical staff dont understand
devices/procedures? - Increasing reliance on vendors guidance
- vendors in turn rely on manufacturers
- communication often breaks down or is too slow.
- No common safety culture
- many incidents stem from poor communication
- Strong parallels with NASA (CAIB Chapter 7).
18Cluster 1 Configuration
- EASITM software provides 12-lead ECG data on
5-leads to patient. - TECH NOTED EASI 12-LEAD DISPLAY ON CENTRAL
STATION FROM TRANSMITTER THAT WASNT EASI CAPABLE.
- CUSTOMER REPLACED TRANSMITTER, RELOADED CENTRAL
STATION SOFTWARE, CONFIRMED ALL SIGNALS WERE
CORRECTLY TRANSMITTED AND LABELED. - CUSTOMER DID NOT UNDERSTAND DIFFERENCE BETWEEN
STANDARD ECG AND EASI. - CUSTOMER WAS RETRAINED TO FURTHER THEIR
UNDERSTANDING OF DIFFERENCE. - (MDR TEXT KEY 1379795)
- Less electrodes reduce work for nurses, improves
patient comfort.
19Cluster 1 Configuration
- Social implications clinicians and support rely
on suppliers explanations. - Symptomatic of system safety problems
- manufacturers gain insights that should be
caught earlier in development. - Retraining is proposed, no idea of systemic
causes of human error? - DURING INVESTIGATION, ENGINEERS CONFIGURED A
SYSTEM IN SAME SETUP AS CUSTOMER. - FOUND MAINFRAME RECEIVERS CAN RECEIVE INCORRECT
BIT TO MISIDENTIFY TRANSMITTER AS EASI CAPABLE - Report doesnt state how to prevent
mis-configuration.
20Cluster 2 Sub-contractors
- End-user frustrated by device unreliability and
manufacturers response - SEVERAL UNITS RETURNED FOR REPAIR HAD FAN
UPGRADES TO ALLEVIATE TEMP PROBLEMS. HOWEVER,
THEY FAILED IN USE AGAIN AND WERE RETURNED FOR
REPAIR - AGAIN SALESMAN STATED ITS NOT A THERMAL PROBLEM
ITS A PROBLEM WITH Xs Circuit Board. - X ENGINEER STATED Device HAS ALWAYS BEEN HOT
INSIDE, RUNNING AT 68C AND THEIR product ONLY
RATED AT 70C. - ANOTHER TRANSPONDER STARTED TO BURNSENT FOR
REPAIR. SHORTLY AFTER MONITOR BEGAN RESETTING FOR
NO REASON (MDR TEXT KEY 1370547) - Manufacturers felt reports not safety-related
- reports relate to end-user frustration regarding
product reliability (not safety).
21Cluster 2 Subcontractors
- Telemedicine applications developed by groups of
suppliers - flexibility and cost savings during development,
manufacture, marketing - problems if incidents stem from sub-components
not manufactured by suppliers - incident reports must be propagated back along
the supply chain. - Manufacturer states problems stem from
subcontractors circuit board - more problems after faulty board replaced,
customer returns unit again - connectors to PCB not properly seated but still
passes acceptance test? - connector not seated completely during initial
repair and gradually loosens over time?
22Cluster 2 Subcontractors
- Fly-fix-fly approach undermines attempts to
improve patient safety. - Confused dialogue between clinician, vendor,
manufacturer - End-user may see technical issues as form of
excuse (eg PCB connectors) - Device repairs not only rectify problems, they
introduce new ones - compounds end-user uncertainty and distrust of
device reliability - communication fails and shared safety culture
erodes over time.
23Cluster 3 Modification Induced Bugs
- IN SOFTWARE RELEASE VF2, IF PATIENT IN
"AUTOADMIT" MODE, PARAMETER DATA AUTOMATICALLY
COLLECTED AND STORED IN THE SYSTEMS DATABASE, - IF THE PATIENT LATER REMOVED (BUT NOT DISCHARGED)
FROM ORIGINAL BED/NETWORK LOCATION, DATA
COLLECTION TEMPORARILY DEACTIVATED (EG DURING
MOVE FOR TREATMENT). - PROBLEM OCCURS WHEN NEW PATIENT ADMITTED TO SAME
BED/NETWORK LOCATION BUT ORIGINAL PATIENT NOT
DISCHARGED WHILE CONNECTED TO THAT LOCATION. - NEW PATIENT ADMISSION STORES DATA IN DATABASE
CORRECTLY. HOWEVER, IN PARALLEL, INCORRECTLY
APPENDS NEW PATIENT DATA ON TOP OF OLD PATIENT'S
RECORD - (MDR TEXT KEY 1340560)
24Safety Culture and Telemedical Mishaps
- Software identifies 40-50 more US telemedical
mishaps in 6 months. - Analysis of reports suggests no quick fixes
but - Regulators need to focus on dialogue between
manufacturers and users - Consider detailed training requirements for
telemedicine before approval - Especially look at end-user maintenance and
configuration issues - Introduce training in safety and risk management
for support staff? - Joint US/UK AHRQ presentation in Washington.
- Things are only going to get worse
25Da Vinci, 1st robotic aid approved by the FDA
New York Presbyterian Hospital uses it on atrial
septal defects.
26Case Study 2 Inter-Industry Comparisons
27Cluster 1 Programming Errors
- Pilot didnt check 1st Officer programming FMC.
- ATC informed us we were off course ... it took
minutes to figure out what happened. ATC vectored
us back onto departure and gave us a climb
clearance. ATC also pointed out traffic, but we
never saw it. We arent sure if our error caused a
conflict. - First Officer programmed FMC. I checked the Route
Page to see if it matched our clearance. It
showed correct departure and transition. I did
not check Legs Pages to see if all fixes were
there. I will next time! - We made an error programming the FMC, then became
complacent I should have done a more complete
check of the First Officer's programming
28Cluster 1 Programming Errors
- Computer flight plan was route ABC.
- ATC clearance was via route D-E-F.
- Original flight plan should have been destroyed,
so as not to accidentally revert to old route. - First Officer very experienced and I had complete
trust that he was capable of loading correct
waypoints, but both he and I failed to use a
visible method of marking the computer flight
plan. - 99 of time, cleared route is same as computer
flight plan, but not always, as I found out the
hard way. ATC caught my error.
29Cluster 1 Programming Errors
- Container ship grounds, same route every week.
- 4 deck officers, good visibility, 2 radars and
GPS. - Charts had courses in black ink, couldnt be
erased. - At 0243 altered course to 237, position
plotted. - 45 minutes later, ship grounds at full speed.
- Watch officer set auto steering to wrong course.
- 237 next to reciprocal 157 for return voyage.
30Cluster 2 Warnings as Safety Nets
- During the descent, we were doing some HF radio
checks, and forgot to arm the altitude select
mode on the flight director. As a result, we
descended through our altitude.... - We promptly returned to FL280. As a crew, we are
very diligent and disciplined about altitude
assignments. - But in this case, because our attention was
diverted from the task at hand, we flew through
our assigned altitude. It was that classic trap
both crew members distracted by something and
nobody flying the airplane.
31Cluster 2 Warnings as Safety Nets
- 3 on fishing vessel, 2 cook, pump bilges,
maintain watch. -
- Skipper asleep on the deck of the wheelhouse.
- Vessels planned track 0.35 miles from a rig.
- Automated radar alarm system set to 0.3 miles.
- VHF off skipper said too much distracting
traffic. - Rig ask stand-by safety vessel for help,
alongside boat. - Nobody on bridge or deck even after sounding
horns. - Abandon platform stations as precautionary
measure. - Skipper protests on being wakened, under
control. - Radar warning system is a safety net or final
safeguard.
32Conclusions
- Must make better use of lessons learned systems.
- Use Tuple Space and IR to search for key issues
- distributed and persistent architectures for
retrieval - avoids need for standardised formats
- can be used within and between industries.
- Caveats
- does it tell us anything new?
- how valid are inter-industry comparisons?
- how do we get from clusters to recommendations?
33Questions?