Modeling, Analyzing and Engineering NASAs Safety Culture Betty Barrett, John Carroll, Joel CutcherGe - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Modeling, Analyzing and Engineering NASAs Safety Culture Betty Barrett, John Carroll, Joel CutcherGe

Description:

Ultimately, full shift in responsibility of SR&QA to the Space Shuttle Program in 2000 ... Facilitate the identification and tracking of metrics to detect ... – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 35
Provided by: joelcutche7
Category:

less

Transcript and Presenter's Notes

Title: Modeling, Analyzing and Engineering NASAs Safety Culture Betty Barrett, John Carroll, Joel CutcherGe


1
Modeling, Analyzing and Engineering NASAs Safety
Culture Betty Barrett, John Carroll, Joel
Cutcher-Gershenfeld, Nicolas Dulac, Nancy
Leveson, Karen Marais, David ZipkinMassachusetts
Institute of Technology
  • Presentation to Universities Space Research
    Association
  • January 2005

2
Motivation
  • The foam debris hit was not the single cause of
    the Columbia accident, just as the failure of the
    joint seal that permitted O-ring erosion was not
    the single cause of Challenger. Both Columbia
    and Challenger were lost also because of the
    failure of NASAs organizational system.
  • -- Columbia Accident Investigation Board report
    (CAIB), August, 2003, p. 195

3
Core Hypothesis
  • Safety decision making and dynamics can be
    modeled, analyzed and engineered just like
    physical systems. The models will be useful in
    designing and validating improvements to the risk
    management and safety culture, in evaluating the
    potential impact of changes and policy decisions,
    in assessing risks, in detecting when risk is
    increasing to unacceptable levels, and in
    performing root cause analysis.
  • Defining Organizational Culture (three
    levelsSchein, 1985)
  • Level 1 Visible Artifacts
  • Level 2 Stated Policies and Principles
  • Level 3 Underlying Values and Assumptions

4
Assumptions NASA Safety Culture
  • Gap between vision and reality
  • No one single culture
  • Mitigation of risk, not elimination of risk

Visual Image for the Project
  • An electronic equivalent of the canary in a coal
    mine

5
Case Example
  • Incremental loss of independence
  • Shuttle SSRP (originally called the Senior Safety
    Review Board and now known as the System Safety
    Review Panel) established in 1981
  • Over two decades with twists and turns
  • Safety, Reliability, and Quality Assurance
    (SRQA) established with membership and chair
    from the safety organizations
  • First, advisory input from Space Shuttle Program
  • Then, representation from the Program
  • Then, leadership from the Program
  • Ultimately, full shift in responsibility of SRQA
    to the Space Shuttle Program in 2000
  • Project manager now decides how much safety
    services to purchase!

6
Introduction to System Safety
  • Safety as an emergent, system property
  • The Problem
  • Component level focus on reliability and
    redundancy was incomplete
  • Fly-fix-fly became unacceptable
  • System Safety
  • Emerged after WWII Jerome Lederers Flight
    Safety Foundation
  • Focus on interfaces of particular components or
    operations and system-level hazards
  • Still a challenge relative to the
    component-focused mindset

7
Chain-of-Events Accident Causality Models
  • Explain accidents in terms of multiple events,
    sequenced as a forward chain over time.
  • Events linked together by direct relationships
    (ignore indirect, non-linear relationships).
  • Events almost always involve component failure,
    human error, or energy-related events.

8
Limitations of Event-Chain Causality Models
  • Social and organizational factors
  • System accidents
  • Software Error
  • Human Error
  • Cannot effectively model human behavior by
    decomposing it into individual decisions and
    actions and studying it in isolation from
  • physical and social context
  • value system in which it takes place
  • dynamic work process
  • Adaptation
  • Major accidents involve systematic migration of
    organizational behavior to higher levels of risk.

9
A Systems Theory Model of Accidents
  • Return to a core principle Safety as an Emergent
    Property
  • Accidents arise from interactions among
  • People
  • Societal and organizational structures
  • Engineering activities
  • Physical system components
  • that violate the constraints on safe
    components behavior and interactions
  • Need to include entire socio-technical system

10
A Systems Theory Model of Accidents
  • Systems should not be treated as a static design
  • A socio-technical system is a dynamic process
    continually adapting to achieve its ends and to
    react to changes in itself and its environment
  • Preventing accident requires designing a control
    structure to enforce constraints on system
    behavior and adaptation

11
(No Transcript)
12
A Systems Theory Model of Accidents
  • Views accidents as a control problem
  • O-ring did not control propellant gas release by
    sealing gap in field joint
  • Software did not adequately control descent speed
    of Mars Polar Lander
  • Events are the result of the inadequate control
  • Result from lack of enforcement of safety
    constraints
  • To understand accidents, we need to examine
    control structure itself to determine why
    inadequate to maintain safety constraints

Not a blame model trying to understand why
13
Modeling Accidents Using STAMP
  • Three types of models are needed
  • Static safety control structure
  • Safety requirement and constraint
  • Flawed control action
  • Context (social, political, etc.)
  • Mental model flaws
  • Coordination flaws
  • Structural dynamics
  • How the static safety control structure changed
    over time
  • Behavioral dynamics
  • Dynamic processes behind the changes (i.e., why
    the system changes)

Possible to model analyze, and engineer the
safety culture
14
Introduction to System Safety Modeling
  • Orientation to Systems Dynamics modeling
  • Overall model structure
  • Unpacking one element of the model
  • Three sample scenarios

15
Orientation to Systems Dynamics Modeling
16
Overall Model Structure
Launch Rate
System Safety
Resource
Allocation
System
Safety
Status
Perceived
Success by
Administration
Shuttle Aging
and
System Safety
Maintenance
Efforts
Efficacy
Incident Learning
Corrective
Action
Risk
System Safety
Knowledge,
Skills Staffing
17
Overall Model Structure
System Safety
Resource
Launch Rate
Allocation
System
Safety
Status
Perceived
Success by
Administration
Shuttle Aging
and
System Safety
Maintenance
Efforts
Efficacy
Incident Learning
Corrective
Action
Risk
System Safety
Knowledge,
Skills Staffing
18
Complete Learning Model
19
Unpacking One Element Learning Corrective
Actions
20
How can this model help us learn about NASA?
  • It allows us to
  • Understand how and why accidents have occurred
  • Test and validate changes and new policies
  • Learn which levers have a significant and
    sustainable effect
  • Facilitate the identification and tracking of
    metrics to detect increasing risk
  • In order to do the above, we need to be
    comfortable with the model

21
Model Results
Attempts to address systemic factors
1
dmnl
400
months
0.5
dmnl
200
months
0
dmnl
0
months
0
150
300
450
600
750
900
Time (Months)
Attempts to address systemic factors
Changes made after an accident were ineffective
over the long run in solving the systemic problems
22
Model Results
Risk
0.2
Risk Units
400
months
0.1
Risk Units
200
months
0
Risk Units
0
months
0
150
300
450
600
750
900
Time (Months)
Risk
Response to accidents had very little impact on
actual risk
23
Model Results
Safety and performance compete for resources
4
3
2
1
0
0
100
200
300
400
500
600
700
800
900
1000
Time (Months)
Perceived priority of safety
Perceived priority of performance
Accidents lead to a reevaluation of NASA safety
and performance priorities
24
Scenario A Impact of fixing systemic factors
vs. symptoms
Risk
1
0.75
0.5
0.25
0
0
100
200
300
400
500
600
700
800
900
1000
Time (Months)
The system risk quickly escalates if only
symptoms are fixed and systemic factors are not
addressed
25
Scenario B Independence of Safety Decision
Makers
Risk
0.2
0.15
0.1
0.05
0
0
100
200
300
400
500
600
700
800
900
1000
Time (Months)
  • Assumes an Independent Safety Organization that
    ensures
  • the assignment of high ranked and highly
    regarded personnel to the safety organization
  • more power and authority to the safety
    organization
  • staff can make reports without fear of blame
  • an increase in the percentage of incidents are
    reported
  • higher employee participation in the
    investigation
  • an unbiased evaluation of proposed corrective
    actions emphasizing solutions that address
    systemic factors

26
Scenario C Increased Contracting
Risk
1
0.75
0.5
0.25
0
0
100
200
300
400
500
600
700
800
900
1000
Time (Months)
There is a tipping point at which NASA is not
able to perform the integration and safety
oversight that is their responsibility. After
this point, the risk escalates substantially
27
Lessons Learned
  • Without addressing systemic factors, accidents
    persist and risk increases
  • Increasing a safety organizations independence
    has a positive effect on system risk
  • There are certain tipping points beyond which
    the behavior of the system is significantly
    different

Many other lessons will be extracted from the
model after further analysis!!
28
Phase I Accomplishments
  • A working model!
  • Interrelationships among models for
  • Launch Rate
  • Perceived Program Success
  • Shuttle Aging and Maintenance
  • Incident Learning Corrective Action
  • System Safety Efforts and Efficacy
  • System Safety Resource Allocation
  • System Safety Knowledge, Skills and Staffing
  • System Safety Status
  • Risk

29
Next Step Implications
  • Further validation of the model
  • Further analysis using the model
  • Development of robust system safety what if
    flight simulator tool for field use
  • Development of user interface and support process
    for flight simulator
  • Pilot deployment in the field and ongoing PDCA
    improvement

30
Conclusions
  • Organizational and Institutional aspects of
    safety systems can be modeled with rigor and
    utility comparable to technical systems models
  • The Promise
  • Detecting in advance indications of migration
    toward heighten risk
  • Assessing risk/benefits associated with potential
    changes in organizations structure and systems
  • Building a systems safety approach into
    organizational strategy, structure and process

31
Appendix
32
Project Work Plan
  • Phase I (six months)
  • Models of NASA Shuttle Program Safety Culture and
    Safety Control Structure
  • Focus on hazards at the interfaces of components
    and operations, as well as dynamics over time
  • Incorporate insights from Challenger and Columbia
    accident reports a rare window into
    relationships and interactions
  • Build on efforts of weekly study group faculty,
    research staff and doctoral students
  • Phase II (two years)
  • Validation of model, incorporation of toolkit for
    what if analysis, and integration of metrics
  • Partnership with NASA around operationalization
    and implementation

33
Elements of Relevant Social Systems
  • Formal organizational safety structure
  • Headquarters Office of Safety and Mission
    Assurance SMA offices at NASA centers and
    facitities NASA Engineering and Safety Center
    (NESC) and safety roles of managers, engineers,
    civil servants, contractors and others etc.
  • Organizational sub-systems
  • Communications systems, safety information
    support systems, analysis and decision making
    systems, reward and reinforcement systems,
    selection and retention systems, skills and
    training systems, organizational learning
    systems, incident investigation systems
    (including in-flight anomalies (IFAs)), and
    conflict resolution systems, etc.
  • Safety rules and procedures
  • Specific rules and procedures underlying
    assumptions and principles dynamics over time
  • Individual behavior Motivation and capability
  • Commitment to safety values Knowledge, skills,
    and ability with respect to safety tools and
    methods group dynamics fear of surfacing safety
    issues learning from mistakes etc.

34
Full Social Systems Framework
  • Structure Sub-Systems
  • Structure
  • Groups ongoing and ad hoc (formal and informal)
  • Organizations hierarchies, networks, layers
    (formal and informal)
  • Institutions
  • Industries
  • Markets
  • Sub-Systems
  • Communications systems
  • Information systems
  • Reward and reinforcement systems
  • Selection and retention systems
  • Learning and feedback systems
  • Complaint and conflict resolution systems
  • Social Interaction Processes
  • Leadership
  • Negotiations
  • Problem-solving
  • Decision-making
  • Teamwork
  • Partnership
  • Capability Motivation
  • Bias and human judgment
  • Individual knowledge, skills ability
  • Group stages of development
  • Fear, satisfaction and commitment
  • Culture, Vision Strategy
  • Culture
  • Artifacts, attributes, assumptions
  • Gender and diversity
  • Cross-cultural dynamics
  • Dominant cultures and sub-cultures
  • Vision and Strategy

Importance of decompositional attention to
details and integration across elements
Write a Comment
User Comments (0)
About PowerShow.com