Jay Boisseau boisseau@tacc.utexas.edu - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Jay Boisseau boisseau@tacc.utexas.edu

Description:

Jay Boisseau. boisseau_at_tacc.utexas.edu. TeraGrid: A Terascale Distributed Discovery Environment. Jay Boisseau. TeraGrid Executive Steering Committee (ESC) Member. and ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 50
Provided by: johnbo2
Category:
Tags: boisseau | edu | jay | tacc | utexas

less

Transcript and Presenter's Notes

Title: Jay Boisseau boisseau@tacc.utexas.edu


1
TeraGrid A Terascale Distributed Discovery
Environment
  • Jay Boisseau
  • TeraGrid Executive Steering Committee (ESC)
    Member
  • and
  • Director, Texas Advanced Computing Center atThe
    University of Texas at Austin

2
Outline
  • What is TeraGrid?
  • Users Requirements
  • TeraGrid Software
  • TeraGrid Resources Support
  • Science Gateways
  • Summary

3
What is TeraGrid?
4
The TeraGrid Vision
  • Integrating the Nations Most Powerful Resources
  • Provide a unified, general purpose, reliable set
    of services and resources.
  • Strategy An extensible virtual organization of
    people and resources across TeraGrid partner
    sites.
  • Enabling the Nations Terascale Science
  • Make Science More Productive through a unified
    set of very-high capability resources.
  • Strategy leverage TeraGrids unique resources to
    create new capabilities driven prioritized by
    science partners
  • Empowering communities to leverage TeraGrid
    capabilities
  • Bring TG capabilities to the broad science
    community (no longer just big science).
  • Strategy Science Gateways connecting
    communities, Integrated roadmap with peer Grids
    and software efforts

5
The TeraGrid Strategy
  • Building a distributed system of unprecedented
    scale
  • 40 teraflops compute
  • 1 petabyte storage
  • 10-40Gb/s networking
  • Creating a unified user environment across
    heterogeneous resources
  • User software environment, User support
    resources.
  • Created an initial community of over 500 users,
    80 PIs.
  • Integrating new partners to introduce new
    capabilities
  • Additional computing, visualization capabilities
  • New types of resources- data collections,
    instruments

Make it extensible!
6
The TeraGrid Team
  • TeraGrid Team has two major components
  • 9 Resource Providers (RPs) who provide resources
    and expertise
  • Seven universities
  • Two government laboratories
  • Expected to grow
  • The Grid Integration Group (GIG) who provides
    leadership in grid integration among the RPs
  • Led by Director, who is assisted by Executive
    Steering Committee, Area Directors, Project
    Manager
  • Includes participation by staff at each RP
  • Funding now provided for people, not just
    networks and hardware!

7
Integration Converging NSF Initiatives
  • High-End Capabilities U.S. Core Centers,
    TeraGrid
  • Integrating high-end, production-quality
    supercomputer centers
  • Building tightly coupled, unique large-scale
    resources
  • STRENGTH Time-critical and/or unique high-end
    capabilities
  • Communities GriPhyN, iVDGL, LEAD, GEON,
    NEESGrid
  • ITR and MRI projects integrate science
    communities
  • Building community-specific capabilities and
    tools
  • STRENGTH Community integration and tailored
    capabilities, high-capacity loosely coupled
    capabilities
  • Common Software Base NSF/NMI, DOE, NASA programs
  • Projects integrating, packaging, distributing
    software and tools from the Grid community
  • Building common middleware components and
    integrated distributions
  • STRENGTH Large-scale deployment, common software
    base, assured-quality software components and
    component sets

8
User Requirements
9
Coherence Unified User Environment
  • Do I have to learn how to use 9 systems?
  • Coordinated TeraGrid Software and Services (CTSS)
  • Transition toward services and service oriented
    architecture
  • From software stack to software and services
  • Do I have to submit proposals for 9 allocations?
  • Unified NRAC for Core and TeraGrid Resources
    Roaming allocations
  • Can I use TeraGrid the way I use other Grids?
  • Partnership with Globus Alliance, NMI GRIDS
    Center, Other Grids
  • History of collaboration and successful
    interoperation with other Grids

10
Teragrid User Survey
  • TeraGrid capabilities must be user-driven
  • Undertook needs analysis Summer 2004
  • 16 Science Partner Teams
  • Realize these may not be widely representative,
    so will repeat this analysis every year with
    increasing number of groups
  • 62 items considered, top 10 needs reflected in
    the TeraGrid roadmap

11
TeraGrid User Input
Data
Grid Computing
Science Gateways
Overall Score
Partners in Need
Remote File Read/Write
High-Performance File Transfer
Coupled Applications, Co-scheduling
Grid Portal Toolkits
Grid Workflow Tools
Batch Metascheduling
Global File System
Client-Side Computing Tools
Batch Scheduled Parameter Sweep Tools
Advanced Reservations
12
Some Common Grid Computing Use Cases
  • Submitting large number of individual jobs
  • Requires grid scheduling to multiple systems
  • Requires automated data movement or common file
    system
  • Running on-demand jobs for time-critical
    applications (e.g. weather forecasts, medical
    treatments)
  • Requires preemptive scheduling
  • Requires fault tolerance (checkpoint/recovery)

13
Highest Priority Items
  • Common to many projects that are quite different
    in their specific usage scenarios
  • Efficient cross-site data management
  • Efficient cross-site computing
  • Capabilities to customize Science Gateways to the
    needs of specific user communities
  • Simplified management of accounts, allocations,
    and security credentials across sites

14
Bringing TeraGrid Capabilities to Communities
15
Bringing TeraGrid Capabilities to Communities
A new generation of users that access TeraGrid
via Science Gateways, scaling well beyond the
traditional user with a shell login
account. Projected user community size by each
science gateway project. Impact on society from
gateways enabling decision support is much larger!
16
Exploiting TeraGrids Unique Capabilities
Aquaporin mechanism
Water moves through aquaporin channels in single
file. Oxygen leads the way in. At the most
constricted point of channel, water molecule
flips. Protons cant do this. Animation pointed
to by 2003 Nobel chemistry prize announcement.
(Klaus Schulten, UIUC)
ENZO (Astrophysics)
Enzo is an adaptive mesh refinement grid-based
hybrid code designed to do simulations of
cosmological structure formation (Mike Norman,
UCSD).
Reservoir Modeling
  • Given An (unproduced) oil field permeability
    and other material properties (based on
    geostatistical models) locations of a few
    producer/injector wells
  • Question Where is the best place for a third
    injector?
  • Goal To have fully automatic methods of injector
    well placement optimization. (J. Saltz, OSU)

GAFEM (Ground-water modeling)
GAFEM is a parallel code, developed at North
Carolina State Univ., for solution of large
scale groundwater inverse problems.
17
Exploiting TeraGrids Unique Capabilities Flood
Modeling
Merry Maisel (TACC), Gordon Wells (UT)
18
Exploiting TeraGrids Unique Capabilities Flood
Modeling
  • Flood Modeling needs more than traditional
    batch-scheduled HPC systems!
  • Precipitation data, groundwater data, terrain
    data
  • Rapid large-scale data visualization
  • On-demand scheduling
  • Ensemble scheduling
  • Real-time visualization of simulations
  • Computational steering of possible remedies
  • Simplified access to results via web portals for
    field agents, decisions makers, etc.
  • TeraGrid adds the data and visualization systems,
    portals, and grid services necessary

19
Harnessing TeraGrid for Education
Example Nanohub is used to complete coursework
by undergraduate and graduate students in dozens
of courses at 10 universities.
20
User Inputs Determine TeraGrid Roadmap
  • Top priorities reflected in Grid Capabilities and
    Software Integration roadmap First targets
  • User-defined reservations
  • Resource matching and wait time estimation
  • Grid interfaces for on-demand and reserved access
  • Parallel/striped data movers
  • Co-scheduling service defined for
    high-performance data transfers
  • Dedicated GridFTP transfer nodes available to
    production users.

21
TeraGrid Roadmap Defined 5 Years Out
22
(No Transcript)
23
Working Groups, Requirements Analysis Teams
  • Working Groups
  • Applications
  • Data
  • External Relations
  • Grid
  • Interoperability
  • Networks
  • Operations
  • Performance Evaluation
  • Portals
  • Security
  • Software
  • Test Harness and Information Services (THIS)
  • User Services
  • Visualization
  • RATs
  • Science Gateways
  • Security
  • Advanced Application Support
  • User Portal
  • CTSS Evolution
  • Data Transport Tools
  • Job Scheduling Tools
  • TeraGrid Network

24
TeraGrid Software
25
Software Strategy
  • Identify existing solutions develop solutions
    only as needed
  • Some solutions are frameworks
  • We need to tailor software to our goals
  • Information services/site interfaces
  • Some solutions do not exist
  • Software function verification
  • INCA project scripted implementation of the docs
  • Global account / accounting management
  • AMIE
  • Data Movers
  • Etc.
  • Deploy, Integrate, Harden, and Support!

26
TeraGrid Software Stack Offerings
  • Core Software
  • Grid service servers and clients
  • Data management and access tools
  • Authentication services
  • Environment commonality and management
  • Applications springboard for workflow and
    service oriented work
  • Platform-specific software
  • Compilers
  • Binary compatibility opportunities
  • Performance tools
  • Visualization software
  • Services
  • Databases
  • Data archives
  • Instruments

27
TeraGrid Software Development
  • Consortium of leading project members
  • Define primary goals and targets
  • Mine helpdesk data
  • Review pending software request candidates
  • Transition test environments to production
  • Eliminate software workarounds
  • Implement solutions derived from user surveys
  • Deployment testbeds
  • Separate environments as well as alternate access
    points
  • Independent testbeds in place
  • Internal staff testing from applications teams
  • Initial Beta users

28
(No Transcript)
29
Software Roadmap
  • Near term Work (work in progress)
  • Co-scheduled file transfers
  • Production-level GridFTP resources
  • Metascheduling (grid scheduling)
  • Simple workflow tools
  • Future directions
  • On-demand integration with Open Science Grid
  • Grid checkpoint/restart

30
Grid Roadmap
  • Near term
  • User-defined reservations
  • Web services testbeds
  • Resource wait time estimation
  • To be used by workflow tools
  • Striped data movers
  • WAN file system prototypes
  • Longer term
  • Integrated tools for workflow scheduling
  • Commercial grid middleware opportunities

31
TeraGrid Resources Support
32
TeraGrid Resource Partners
33
TeraGrid Resources
ANL/UC Caltech IU NCSA ORNL PSC Purdue SDSC TACC
Compute Resources Itanium2 (0.5 TF) IA-32 (0.5 TF) Itanium2 (0.8 TF) Itanium2 (0.2 TF) IA-32 (2.0 TF) Itanium2 (10 TF) SGI SMP (6.5 TF) IA-32 (0.3 TF) XT3 (10 TF) TCS (6 TF) Marvel (0.3 TF) Hetero (1.7 TF) Itanium2 (4.4 TF) Power4 (1.1 TF) IA-32 (6.3 TF) Sun (Vis)
Online Storage 20 TB 155 TB 32 TB 600 TB 1 TB 150 TB 540 TB 50 TB
Mass Storage 1.2 PB 3 PB 2.4 PB 6 PB 2 PB
Data Collections Yes Yes Yes Yes Yes
Visualization Yes Yes Yes Yes Yes
Instruments Yes Yes Yes
Network (Gb/s,Hub) 30 CHI 30 LA 10 CHI 30 CHI 10 ATL 30 CHI 10 CHI 30 LA 10 CHI
Partners will add resources and TeraGrid will add
partners!
34
TeraGrid Usage by NSF Division
CDA
IBN
CCR
ECS
DMS
BCS
ASC
DMR
MCB
AST
CHE
PHY
CTS
Includes DTF/ETF clusters only
35
TeraGrid User Support Strategy
  • Proactive and Rapid Response for General User
    Needs
  • Sustained Assistance for Groundbreaking
    Applications
  • GIG Coordination with staffing from all RP sites
  • Area Director (AD) Sergiu Sanielevici (PSC)
  • Peering with Core Centers User Support teams

36
User Support Team (UST)Trouble Tickets
  • Filter TeraGrid Operations Center (TOC) trouble
    tickets system issue or possible user issue
  • For each Ticket, designate a Point of Contact
    (POC) to contact User within 24 hours
  • Communicate status if known
  • Begin dialog to consult on solution or workaround
  • Designate a Problem Response Squad (PRS) to
    assist POC
  • Experts who respond to POCs postings to UST list,
    and/or requested by AD
  • All UST members monitor progress reports and
    contribute their expertise
  • PRS membership may evolve with our understanding
    of the problem, including support from hardware
    and software teams
  • Ensure all GIG/RP/Core Helps and Learns
  • Weekly review of user issues selected by AD
    decide on escalation
  • Inform TG development plans

37
User Support Team (UST)Advanced Support
  • For applications/groups judged by TG management
    to be groundbreaking in exploiting DEEP/WIDE TG
    infrastructure
  • Embedded Point Of Contact (labor intensive)
  • Becomes de-facto member of the application group
  • Prior working relationship with the application
    group a plus
  • Can write and test code, redesign algorithms,
    optimize etc
  • But no throwing over the fence
  • Represents needs of the application group to
    systems people, if required
  • Alerts AD to success stories

38
Science Gateways
39
The Gateway Concept
  • The Goal and Approach
  • To engage advanced scientific communities that
    are not traditional users of the supercomputing
    centers.
  • We will build science gateways providing
    community-tailored access to TeraGrid services
    and capabilities
  • Science Gateways take two forms
  • Web-based Portals that front-end Grid Services
    that provide TeraGrid-deployed applications used
    by a community.
  • Coordinated access points enabling users to move
    seamlessly between TeraGrid and other grids.

40
Grid Portal Gateways
  • The Portal accessed through a browser or desktop
    tools
  • Provides Grid authentication and access to
    services
  • Provide direct access to TeraGrid hosted
    applications as services
  • The Required Support Services
  • Searchable Metadata catalogs
  • Information Space Management.
  • Workflow managers
  • Resource brokers
  • Application deployment services
  • Authorization services.
  • Builds on NSF DOE software
  • Use NMI Portal Framework, GridPort
  • NMI Grid Tools Condor, Globus, etc.
  • OSG, HEP tools Clarens, MonaLisa

41
Gateways that Bridge to Community Grids
  • Many Community Grids already exist or are being
    built
  • NEESGrid, LIGO, Earth Systems Grid, NVO, Open
    Science Grid, etc.
  • TeraGrid will provide a service framework to
    enable access in ways that are transparent to
    their users.
  • The community maintains and controls the Gateway
  • Different Communities have different
    requirements.
  • NEES and LEAD will use TeraGrid to provision
    compute services
  • LIGO and NVO have substantial data distribution
    problems.
  • All of them require remote execution of complex
    workflows.

Storms Forming
Forecast Model
Streaming Observations
Data Mining
On-Demand Grid Computing
42
The Architecture of Gateway Services
  • The Users Desktop

Grid Portal Server
TeraGrid Gateway Services
Proxy Certificate Server / vault
User Metadata Catalog
Application Workflow
Application Deployment
Application Events
Resource Broker
Replica Mgmt
App. Resource catalogs
Core Grid Services
Security
Notification Service
Data Management Service
Grid Orchestration
Resource Allocation
Accounting Service
Policy
Administration Monitoring
Reservations And Scheduling
Web Services Resource Framework Web Services
Notification
Physical Resource Layer
43
Flood Modeling Gateway
  • University of Texas at Austin
  • TACC
  • Center for Research in Water Resources
  • Center for Space Research
  • Oak Ridge National Lab
  • Purdue University

Large-scale flooding along Brays Bayou in central
Houston triggered by heavy rainfall during
Tropical Storm Allison (June 9, 2001) caused more
than 2 billion of damage.
Gordon Wells, UT David Maidment, UT Budhu
Bhaduri, ORNL, Gilbert Rochon, Purdue
44
Biomedical and Biology
  • Building Biomedical Communities Dan Reed (UNC)
  • National Evolutionary Synthesis Center
  • Carolina Center for Exploratory Genetic Analysis
  • Portals and federated databases for the Biomed
    research community

45
Neutron Science Gateway
  • Matching Instrument science with TeraGrid
  • Focusing on application use cases that can be
    uniquely served by TeraGrid. For example, a
    proposed scenario from March 2003 SETENS proposal

Neutron Science TeraGrid Gateway (NSTG) John
Cobb, ORNL
46
Summary
47
SURA Opportunities with TeraGrid
  • Identify applications in SURA universities
  • Leverage TeraGrid technologies in SURA grid
    activities
  • Provide tech transfer back to TeraGrid
  • Deploy grids in SURA region that interoperate
    with TeraGrid, allow users to scale up to
    TeraGrid

48
Summary
  • TeraGrid is a national cyberinfrastructure
    partnership for world-class computational
    research, with many types of resources for
    knowledge discovery
  • TeraGrid aims to integrate with other grids, and
    other researchers around the world
  • All Hands Meeting in April will yield new details
    on roadmaps, software, capabilities, and
    opportunities.

49
For More Information
  • TeraGrid http//www.teragrid.org
  • TACC http//www.tacc.utexas.edu
  • Feel free to contact me directly
  • Jay Boisseau boisseau_at_tacc.utexas.edu
  • Note TACC is about to announce the
    newInternational Partnerships for Advanced
    Computing (IPAC) program, with initial members
    from Latin America and Spain, which can serve as
    gateway into TeraGrid.
Write a Comment
User Comments (0)
About PowerShow.com