Grid Computing in Distributed HighEnd Computing Applications: - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Grid Computing in Distributed HighEnd Computing Applications:

Description:

Grid Computing in Distributed High-End Computing Applications: ... One potential killer application ... T_surf, s_surf, u_surf, v_surf, sea_level, frazil ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 39
Provided by: shuji6
Category:

less

Transcript and Presenter's Notes

Title: Grid Computing in Distributed HighEnd Computing Applications:


1
Grid Computing in Distributed High-End Computing
Applications Coupled Climate Models and
Geodynamics Ensemble Simulations Shujia
Zhou Northrop Grumman IT/TASC W. Kuang2, W.
Jiang3, P. Gary2, J. Palencia4, G. Gardner5
2NASA Goddard Space Flight Center, 3JCET,
UMBC, 4Raytheon ITSS, 5INDUSCORP
2
Outline
  • Background
  • One potential killer application (coupling
    distributed climate models)
  • One near-reality application (managing
    distributed ensemble simulation)
  • One framework supporting Grid computing
    applications Common Component Architecture
    (CCA/XCAT3, CCA/XCAT-C)
  • High-speed network at NASA GSFC
  • An ensemble-dispatch prototype based on XCAT3
  • ESMF vs. Grid
  • ESMF-CCA Prototype 2.0 Grid computing
  • Summary

3
Earth-Sun System Models
  • Typical Earth-Sun system models (e.g., climate,
    weather, data assimilation) consist of several
    complex components coupled together through
    exchange of a sizable amount of data.
  • There is a growing need for coupling model
    components from different institutions
  • Discover new science
  • Validate predictions

A M PEs to N PEs data-transfer problem !!!
4
Coupled Atmosphere-Ocean Models
Atmosphere
Ocean
Different grid type, resolution
5
Flow Diagram of Coupling Atmosphere and Ocean(a
typical ESMF application)
Create Atm, Ocn, CplAtmXOcn, CplOcnXAtm componets
Registration
time
Component registration
data
tt0
Atmosphere
ESMF_StateexportAtm
CplAtmXOcn
Regridding interpolate()
ESMF_StateimportOcn
Ocean
Run n tOcn time steps, run()
tglobal
ESMF_StateexportOcn
CplOcnXAtm
Regridding extract()
ESMF_StateimportAtm
exportAtm
Run m tAtm time steps, run()
Atmosphere
tt0 ncycle tglobal
Finalize
6
(No Transcript)
7
(No Transcript)
8
Coupling Earth System Model Components from
Different Institutions
  • Current approach physically port source codes
    and their support environment such as libraries
    and data files to one computer
  • Problems
  • Considerable efforts and times are needed for
    porting, validating, and optimizing the codes
  • Some code owners may not want to release their
    source codes.
  • Owners continue to update the mode codes.

A killer application Couple models at their
institutions via Grid computing !
9
How Much Data Exchange in A Climate Model? (e.g,
NOAA/GFDL MOM4 Version Beta2)
  • Import 12 2D arrays
  • U_flux, v_flux, q_flux, salt_flux, sw_flux,
    fprec, runoff, calving, p, t_flux, lprec, lw_flux
  • For the 0.25 degree resolution without mask, 99
    MB data
  • Export 6 2D arrays
  • T_surf, s_surf, u_surf, v_surf, sea_level, frazil
  • For the 0.25 degree resolution without mask, 49
    MB data

For a coupling interval of 6 hours between
atmosphere and ocean models with 0.25 degree
resolution, data exchange is typically not more
frequent than 1 minute of a wall clock lt1MB per
second.
Observation 100KB/s for using scp to move
data from NCCS to UCLA!
A Gbps-network is much sufficient for this kind
of data exchange!
10
Distributed Ensemble Simulations
  • Typical Earth-Sun system models (e.g., climate,
    weather, data assimilation, solid Earth) are also
    highly computationally demanding
  • One geodynamo model, MoSST, requires 700 GB RAM,
    and 1016 flops for the (200, 200, 200) truncation
    level
  • The ensemble simulation is needed to obtain the
    best estimation used for optimal forecasting
  • For a successful assimilation with MoSST, a
    minimum of 30 ensemble runs and 50PB storage are
    expected.

Using a single supercomputer is not practical!
11
Characteristics of Ensemble Simulation
  • Little or no interaction among ensemble members
  • The initial state for next ensemble run may
    depend on the previous ensemble run---loosely
    coupled.
  • High failure tolerance
  • Small network usage reduces the failure
    possibility
  • The forecasting depends on the collection of all
    the ensemble members, not on a particular
    ensemble member

12
Technology Grid Computing Middleware (CCA/XCAT)
  • Merging OGSI and DOEs high-performance component
    framework, Common Component Architecture (CCA)
  • Component model
  • Compliant with CCA specification
  • Grid services
  • Each XCAT component is also a collection of Grid
    services
  • XCAT Provides Ports are implemented as OGSI web
    service
  • XCAT Uses Ports can also accept any OGSI
    compliant web service
  • XCAT provide a component-based Grid services
    model for Grid computing
  • Component Assembly Composition in space
  • Provide-Use pattern facilitates composition
  • Standard ports are defined to streamline the
    connection process
  • More applicable for the cases where users and
    providers know each others

13
Technology Grid Computing Middleware (Proteus
Multi-Protocol Library)
  • Proteus provides single-protocol abstraction to
    components
  • Allows users to dynamically switch between
    traditional and high-speed protocols
  • Facilitates use of specialized implementations
    of serialization and deserialization

Proteus allows a user to have a choice of networks
14
Technology Dedicated High-Speed Network (Lambda
Networks)
  • Dedicated high speed links (1Gbps, 10 Gbps, etc)
  • Being demonstrated in large-scale distributed
    visualization and data mining
  • National LambdaRail is currently under
    development
  • NASA GSFC is prototyping it and is in the process
    of connecting to it.

15
Distributed Ensemble Simulation via Grid
Computing(System Architecture)
driver
geo1
geo2
geo3
dispatch
MoSST
MoSST
MoSST
host
remote1
remote2
remote3
PE0
PE1
PE0
PE1
PE0
PE1
PE0
Note is grid computing codes
is an application code
Separated for flexibility!
16
Prototype Ensemble Simulation via Grid
Computing(Components and Ports)
dispatch
dispatchProvide
driver
dispatchUse
go
geo1
geo1Use
geo1Provide
geo2Use
geo2
geo2Provide
Simpler than workflow
17
Prototype Ensemble Simulation via Grid
Computing(Flow Chart of Invoking A Simulation)
Dispatcher invokes remote applications
3
dispatch
geo1
provideCMD
provideCMD
useCMD
1
3
useCMD
2
driver
geo2
provideCMD
4
2
useCMD
Run on three computer nodes connected by 10
Gigabit Ethernet
18
Prototype Ensemble Simulation via Grid
Computing(Flow Chart Of Feedback During A
Simulation)
Simulations report failure or completion

dispatch
provideCMD
1
geo1
provideCMD
useCMD
3
1
useCMD
4
driver
geo2
provideCMD
2
4
useCMD
A monitoring functionality is developed for geo
components
19
Adaptive User Interface
  • Network programming is complex and its concept is
    unfamiliar to scientists
  • A user-friendly interface is even more important
    in applying grid computing to scientific
    applications
  • A Model Running Environment (MRE) tool is
    developed to reduce the complexity of running
    scripts by adaptively hiding details.

20
Original script
Marked script
Filled script
MRE 2.0 is used in GMI Production!
21
Where is ESMF in Grid Computing?
  • Make components known to Grid
  • Need global Component ID
  • Make component services available to Grid
  • ESMF_Component (F90 user type C function
    pointer)
  • C interfaces for three fundamental data types
    are not complete
  • ESMF_Field, ESMF_Bundle, ESMF_State
  • The function pointers need to be replaced with
    the remote one
  • Make data-exchange type transferable via Grid
  • ESMF_State (F90 data pointer C array)
  • Serialization/deserialization is available
  • The data represented by a pointer needs to be
    replaced with data copy

22
Grid-Enabled ESMFLink Functions in Remote Model
Components
init
grid
layout
run
Driver
final
init
grid
grid
grid
Assembled component
layout
layout
layout
run
final
Network
init
init
init
run
run
run
final
final
final
Atmosphere
Coupler
Ocean
setEntryPoint
setService
23
Grid-Enabled ESMFTransfer Data Across Network
ESMF_StateimportOcn
Network
RMI
Ocean proxy
Ocean
Component, import/export state, clock
ESMF_StateexportOcn
24
ESMF-CCA Prototype 2.0
Global component ID
CCA component registration
Provide Port
Use Port
Provide Port
Use Port
Atmosphere
Proxy
Init() Run() Final()
Init() Run() Final()
ESMF_State
Network
RMI for remote pointer XSOAP for data transfer
Ocean
CCA tool
Init() Run() Final()
ESMF concept
Grid computing
25
A sequential coupling between an atmosphere and a
remote ocean model component implemented in the
ESMF-CCA Prototype 2.0
Create Atm, Ocn, CplAtmXOcn, CplOcnXAtm componets
Registration
Component registration
time
tt0
Atmosphere
Evolution
ESMF_StateexportAtm
ESMF_StateexportAtm
CplAtmXOcn
Regridding
ESMF_StateimportOcn
RMI
OceanProxy
Ocean
tglobal
ESMF_StateexportOcn
CplOcnXAtm
Regridding
ESMF_StateimportAtm
Atmosphere
Evolution
tt0 ncycle tglobal
Finalize
26
Composing Components with XCAT3
Jython Script
1. Launch components
Atmosphere Component
Atm
2. Connect Uses and Provides Ports
CplAtmXOcn Component
A2O
Climate Component
Atm
Ocean1 Component
Ocn
A2O
CplOcnXAtm Component
O2A
Ocn
O2A
Go Component
Go
Go
Run on two remote computer nodes connected by 10
Gigabit Ethernet
27
Summary
  • Grid computing technology and high-speed network
    such as Lambda network make distributed high-end
    computing applications promising.
  • Our prototype based on XCAT3 framework shows
    distributed ensemble simulation can be performed
    on a up to 10 Gbps network in a user-friendly
    way.
  • ESMF component could be grid-enabled with the
    help of CCA/XCAT.

28
Backup slides
29
Prototype Ensemble Simulation via Grid
Computing(Flow chart of intelligently
dispatching ensemble members)
dispatch
driver
1
3
5
geo1
geo2
geo3
2
4
6
The type, geoCMD, is used to exchange data
among components
30
observation
Scientific Objective Develop a geomagnetic data
assimilation framework with MoSST core dynamics
model and surface geomagnetic observations to
predict changes in Earths magnetic environment.
Algorithm
Xa Assimilation solution Xf Forecast
solution Z Observation data K Kalman Gain
matrix H Observation operator
31
New Transport Layer Protocols
  • Why needed
  • TCPs original design for slow backbone networks
  • Standard out-of-the-box kernel TCP protocol
    tunings inadequate for large bandwidthlong delay
    application performance
  • TCP requires a knowledgeable wizard to optimize
    the host for high performance networks

32
Current throughput findings from GSFCs 10-Gbps
networking efforts
  • From UDP-based tests between GSFC hosts with
    10-GE NICs, enabled by nuttcp -u -w1m
  • From To Throughput TmCPU RcCPU packet-loss
  • San Diego Chicago 5.213 Gbps 99 63 0
  • Chicago San Diego 5.174 Gbps 99 65 0.0005
  • Chicago McLean 5.187 Gbps 100 58 0
  • McLean Chicago 5.557 Gbps 98 71 0
  • San Diego McLean 5.128 Gbps 99 57 0
  • McLean San Diego 5.544 Gbps 100 64 0.0006

33
Current throughput findings from GSFCs 10-Gbps
networking efforts
  • From TCP-based tests between GSFC hosts with
    10-GE NICs, enabled by nuttcp -w10m
  • From To Throughput TmCPU RcCPU
  • San Diego Chicago 0.006 Gbps 0 0
  • Chicago San Diego 0.006 Gbps 0 0
  • Chicago McLean 0.030 Gbps 0 0
  • McLean Chicago 4.794 Gbps 95 44
  • San Diego McLean 0.005 Gbps 0 0
  • McLean San Diego 0.445 Gbps 8 3

34
Current throughput findings from GSFCs 10-Gbps
networking efforts
  • From UDT-based tests between GSFC hosts with
    10-GE NICs, enabled by iperf
  • From To Throughput
  • San Diego Chicago 2.789 Gbps
  • Chicago San Diego 3.284 Gbps
  • Chicago McLean 3.435 Gbps
  • McLean Chicago 2.895 Gbps
  • San Diego McLean 3.832 Gbps
  • McLean San Diego 1.352 Gbps
  • Developed by Robert Grossman (UIC)
    http//udt.sourceforge.net/

35
The non-experts are falling behind
Year Experts Non-experts Ratio 1988
1 Mb/s 300 kb/s 31 1991 10
Mb/s 1995 100 Mb/s 1999 1
Gb/s 2003 10 Gb/s 3 Mb/s
30001
36
New Transport Layer Protocols
  • Major Types
  • UDP and TCP Reno standard (default w/OS)
  • Other versions of TCP (Vegas, BIC) are included
    in the Linux 2.6 train
  • Other OSs may not have the stack code included
  • Alternative transport protocols are non-standard
    and require kernels to be patched or operate in
    user space

37
(No Transcript)
38
Next Step Transform A Model into A Set of Grid
Services
usePort
providePort (grid service)
XCAT Component
Wrapper to XCAT Component
Import/export state Init(),Run(),Final()
ESMF Component
  • Standalone (local)
  • Coupled systems (distributed)

model
supercomputer
data storage
Write a Comment
User Comments (0)
About PowerShow.com