Title: Distributed Dynamic Event Tree Generation for Reliability and Risk Assessment
1Distributed Dynamic Event Tree Generation for
Reliability and Risk Assessment
- Benjamin Rutt, Umit Catalyurek
- Dept. of Biomedical Informatics, The Ohio State
University - Aram Hakobyan, Kyle Metzroth, Tunc Aldemir,
Richard Denning - Nuclear Engineering Program, The Ohio State
University - Sean Dunagan, David Kunsman
- Sandia National Laboratories
- CLADE06 Paris, France
- June 19, 2006
2Outline
- Introduction
- Objectives
- System Overview
- Distributed Execution
- Distributed Database Support
- Experimental Results
- Conclusion and Future Work
3Introduction
- Probabilistic Risk Assessment (PRA)
- Quantification of the risk and reliability
associated with system operation - Integral part of US Nuclear Regularity Commission
for licensing, also used by NASA - Level 2 PRA
- Analysis of radionuclide release from containment
- Little progress in improving the methods used in
the Level 2 element of Probabilistic Risk
Assessment (PRA) since the release of NUREG-1150
in 1991 - Current PRA methodology uses static
event-tree/fault-tree - necessitates to either make assumptions in
advance about the relative timing of events or to
consider the occurrence of events (such as
hydrogen combustion) at multiple stages of the
scenario
4Objectives
- Real plant Level 2 PRA consists of hundreds of
manual simulation runs - Currently organized and analyzed manually
- Approximate processing time one man year
- Exact timing and magnitude of system variables
are critical in determining the risk - augment the static methods with Dynamic PRA
- Create mechanized driver to generate dynamic
accident progression event trees - Use of distributed computing to perform dynamic
analyses that characterize containment failure or
bypass and source terms in a mechanistically
consistent manner - Treatment of uncertainties (epistemic and
aleatory) - Create software that easily organizes and stores
event tree data - Develop user-friendly interface for display and
control of the above
5Overview of Reliability and Risk Assessment
Framework
6RRAF Architecture Overview
7Distributed Execution System
- Execution of stand-alone or parallel simulator on
a distributed environment - Staging of input files and output files
- Branching and task migration
- Dynamic Workflow
- Application specific vs Generalized tools for
computation steering and check-pointing - Simulator agnostic Driver
- Requirements
- SIM reads its input from command-line and/or text
file - SIM has check-pointing feature
- SIM allows user-defined control-functions (e.g.
stopping if certain condition is true) - SIM output can be utilized to detect stopping
condition
8Driver
- Simulator Agnostic
- determines when branching is to occur
- initiates multiple restarts of system code
analyses - determines the probabilities of scenarios when
to terminate
9Distributed Database Support
- Metadata Management
- Store, access and visualize the Event-Tree
- Store, access metadata regarding each individual
run (branch) - Access to simulation data
- Single scenarios data is distributed on multiple
machines due to branching - Data is distributed on flat files (binary and
text) - Current Prototype uses
- MySQL for Metadata
- STORM for accessing to distributed simulation
data
10Data Virtualization with STORM
- Applications developers generally prefer storing
data in files - Support high level queries on multi-dimensional
distributed datasets - Many possible data abstractions, query interfaces
- Grid virtualized object relational database or
XML database - Grid virtualized objects with user defined
methods invoked to access and process data
Virtual Tables
Data Virtualization
Data Service
Scientific Datasets
11STORM
- Support efficient selection of the data of
interest from distributed scientific datasets and
transfer of data from storage clusters to compute
clusters - Front-end
- Support a basic SQL Select query with a virtual
relational table view or a virtual XML database
view - A lightweight layer on top of datasets
- STORM runtime middleware STORM carries out query
execution, query planning
SELECT ltDataElementsgt FROM Dataset-1,
Dataset-2,, Dataset-n WHERE ltExpressiongt AND
ltFilter(ltDataElementgt)gt GROUP-BY-PROCESSOR
ComputeAttribute(ltDataElementgt)
12- STORM Services
- Query
- Meta-data
- Indexing
- Data Source
- Filtering
- Partition Generation
- Data Mover
13Case Study
- Zion station blackout accident with failure of
Auxiliary Feedwater system - Includes models of creep rupture of major RCS
components (surge line, hot leg, and SG tubes) - No pump seal leakage allowed
- MELCOR severe accident simulation code
14Creep Rupture of RCS Components
- Currently
- Larson-Miller correlation is used in MELCOR creep
rupture modeling with - Proposed
- Cumulative distribution function developed for R
in the form of a lognormal distribution with a
mean value of µ 1 and standard deviation of s
0.4
15Branching Points for Creep Rupture Model
- CDF of Creep Rupture Parameter R represented as a
failure probability or so called fragility curve
for surge line, hot leg, and SG tubes (see next
slide) - Discretization of cumulative failure probability
at 5, 25, 50, 75, 95 (as an example) - Corresponding R values of 0.518, 0.764, 1.00,
1.31, and 1.931 chosen as branching points (See
next slide) - Two outcomes at each branching point fails, and
does not fail
16Fragility Curve For Creep Rupture Model
Branching Points
17Creep Rupture Modes
No uncertainty, R 1
Range where SG tubes may fail if uncertainty
introduced
18Experimental Results
- Experiments performed on a Linux compute cluster
- 40 dual 2.4 GHz Opteron 250 processors, 8GB mem,
2x250GB SATA RAID0 - Nodes connected with a gigabit switched network
- 3 configurations
- 20 processor, 10 node
- 40 processor, 20 node
- 80 processor, 40 node
- 1 initiating event 4 dynamic workflows
- Executed concurrently, 1316 total branches, i.e.
300 branches per experiment
19Queue Wait and Execution Time
20Processor Utilization
21Processor Utilization (contd)
22Processor Utilization (contd)
23Average execution time per branch
24Scheduling approaches
25Conclusion and Future Work
- First generic dynamic event-tree generation
infrastructure - Effectively one run of driver with much shorter
combined simulation time compared with existing
PRA results - Scenarios can be identified that are not
accounted for in the conventional PRA Level-2
analysis - Much more descriptive graphical illustration of
event tree results - Was this a new workflow problem? Yes/No )
- Future Work
- Integrate with other plant simulators RELAP work
starts this summer - More generic metadata management system to
accommodate different simulators - ? Mobius
- Support for other distributed execution
frameworks and or queuing systems such as Condor,
PBS, etc. - Clustering and classification of scenarios
- Off-line (post-processing) for visualization
- Online for both for visualization and branch
elimination - UI for entering branching and truncation rules
26Thanks
- Questions/Comments?
- Contact
- umit_at_bmi.osu.edu
- For more information
- http//bmi.osu.edu and http//msc.osu.edu