SC03 TeraGrid Tutorial: Applications in the TeraGrid Environment - PowerPoint PPT Presentation

1 / 246
About This Presentation
Title:

SC03 TeraGrid Tutorial: Applications in the TeraGrid Environment

Description:

– PowerPoint PPT presentation

Number of Views:364
Avg rating:3.0/5.0
Slides: 247
Provided by: tera3
Category:

less

Transcript and Presenter's Notes

Title: SC03 TeraGrid Tutorial: Applications in the TeraGrid Environment


1
SC03 TeraGrid Tutorial Applications in the
TeraGrid Environment
  • John Towns, NCSA ltjtowns_at_ncsa.edugt
  • Nancy Wilkins-Diehr, SDSC ltwilkinsn_at_sdsc.edugt
  • Sharon Brunett, CACR ltsharon_at_cacr.caltech.edugt
  • Sandra Bittner, ANL ltbittner_at_mcs.anl.govgt
  • Derek Simmel, PSC ltdsimmel_at_psc.edugt
  • and many others participating in the TeraGrid
    Project

2
Tutorial Outline - Morning
  • TeraGrid Overview
  • John Towns Slide 4 20 mins
  • Introduction to TeraGrid Resources and Services
  • John Towns Slide 15 60 mins
  • TeraGrid Computing Paradigms
  • Sharon Brunett Slide 55 20 mins
  • BREAK
  • TeraGrid User Environment Job Execution
  • Sandra Bittner Slide 69 60 mins
  • TeraGrid Support Services and Resources
  • Nancy Wilkins-Diehr Slide 129 20 mins

3
Tutorial Outline - Afternoon
  • LUNCH
  • Getting Started with User Certificates on the
    TeraGrid
  • Derek Simmel Slide 138
  • Factals with MPI and MPICH-G2 Exercise
  • Sandra Bittner Slide 160
  • Pipelined Application Exercise with Mcell
  • Nancy Wilkins-Diehr Slide 166
  • We will take a break when it is time

4
Brief Overview of the TeraGrid
  • John Towns
  • NCSA / Univ of Illinois
  • Co-Chair, TG Users Services WG
  • jtowns_at_ncsa.edu

5
The TeraGrid VisionDistributing the resources is
better than putting them at one site
  • Build new, extensible, grid-based infrastructure
    to support grid-enabled scientific applications
  • New hardware, new networks, new software, new
    practices, new policies
  • Expand centers to support cyberinfrastructure
  • Distributed, coordinated operations center
  • Exploit unique partner expertise and resources to
    make whole greater than the sum of its parts
  • Leverage homogeneity to make the distributed
    computing easier and simplify initial development
    and standardization
  • Run single job across entire TeraGrid
  • Move executables between sites

6
TeraGrid Objectives
  • Create unprecedented capability
  • integrated with extant PACI capabilities
  • supporting a new class of scientific research
  • Deploy a balanced, distributed system
  • not a distributed computer but rather
  • a distributed system using Grid technologies
  • computing and data management
  • visualization and scientific application analysis
  • Define an open and extensible infrastructure
  • an enabling cyberinfrastructure for scientific
    research
  • extensible beyond the original sites
  • NCSA, SDSC, ANL, Caltech, PSC (under ETF)
  • ETF2 awards to TACC, Indian/Purdue, ORNL

7
Measuring Success
  • Breakthrough science via new capabilities
  • integrated capabilities more powerful than
    existing PACI resources
  • current PACI users and new communities requiring
    Grids
  • An extensible Grid
  • design principles assume heterogeneity and more
    than four sites
  • Grid hierarchy, scalable, replicable, and
    interoperable
  • formally documented design, standard protocols
    and specifications
  • encourage, support, and leverage open source
    software
  • A pathway for current users
  • evolutionary paths from current practice
  • provide examples, tools, and training to exploit
    Grid capabilities
  • user support, user support, and user support

8
TeraGrid Application Targets
  • Multiple classes of user support
  • each with differing implementation complexity
  • minimal change from current practice
  • new models, software, and applications
  • Usage exemplars
  • traditional supercomputing made simpler
  • remote access to data archives and computers
  • distributed data archive access and correlation
  • remote rendering and visualization
  • remote sensor and instrument coupling

9
TeraGrid Components
  • Compute hardware
  • Intel/Linux Clusters, Alpha SMP clusters, POWER4
    cluster,
  • Large-scale storage systems
  • hundreds of terabytes for secondary storage
  • Very high-speed network backbone
  • bandwidth for rich interaction and tight
    coupling
  • Grid middleware
  • Globus, data management,
  • Next-generation applications

10
Wide Variety of Usage Scenarios
  • Tightly coupled jobs storing vast amounts of
    data, performing visualization remotely as well
    as making data available through online
    collections (ENZO)
  • Thousands of independent jobs using data from a
    distributed data collection (NVO)
  • Applications employing novel latency-hiding
    algorithms adapting to a changing number of
    processors (PPM)
  • High-throughput applications of loosely coupled
    jobs (MCell)

11
Prioritization to ensure success
  • Diagnostic apps to test functionality (ENZO, PPM)
  • Flagship apps provide early requirements for
    software and hardware functionality
  • Cactus, ENZO, EOL, Gadu, LSMS, MCell, MM5,
    Montage, NAMD, NekTar, PPM, Quake, Real time
    brain mapping
  • Plans to approach existing grid communities
  • GriPhyN, NEES, BIRN, etc.

12
TeraGrid Roaming
Attend TeraGrid training class or access
web-based TG training materials
Receive Account info, pointers to training, POC
for user services Ops, pointers to login
resources, atlas of TG resources
Apply for TeraGrid Account
Develop and optimize code at Caltech
Run large job at NCSA, move data from SRB to
local scratch and store results in SRB
Run large job at SDSC, store data using SRB.
Run larger job using both SDSC and PSC systems
together, move data from SRB to local scratch
storing results in SRB
Move small output set from SRB to ANL cluster, do
visualization experiments, render small sample,
store results in SRB
Move large output data set from SRB to
remote-access storage cache at SDSC, render using
ANL hardware, store results in SRB
(Recompile may be necessary in some cases)
13
Strategy Define Build Standard Services
  • Finite Number of TeraGrid Services
  • defined as specifications, protocols, APIs
  • separate from implementation
  • Extending TeraGrid
  • adoption of TeraGrid specifications, protocols,
    APIs
  • protocols, data formats, behavior specifications,
    SLAs
  • Engineering and Verification
  • shared software repository
  • build sources, scripts
  • service must be accompanied by test module

14
TeraGrid Extensibility
  • You must be this high to ride the TeraGrid
  • fast network
  • non-trivial resources
  • meet SLA (testing and QA requirements)
  • become a member of the virtual organization
  • capable of TG hosting (peering arrangements)
  • TG Software Environment
  • user (download, configure, install, and run TG
    1.0)
  • developer (join distributed engineering team)
  • TG Virtual Organization
  • Operations, User-services
  • Add new capability
  • make the whole greater than the sum of its parts

repo.teragrid.org
15
Introduction to TeraGrid Resources and Services
  • John Towns
  • NCSA / Univ of Illinois
  • Co-Chair, TG Users Services WG
  • jtowns_at_ncsa.edu

16
TeraGrid Components
  • Compute hardware
  • Phase I
  • Intel Linux clusters
  • open source software and community
  • Madison processors for commodity leverage
  • Alpha SMP clusters
  • Phase II
  • more Linux cluster hardware
  • POWER4 cluster
  • ETF2
  • Resources from additional sites TACC,
    Indiana/Purdue, ORNL
  • Large-scale storage systems
  • hundreds of terabytes for secondary storage

17
TeraGrid Components
  • Very high-speed network backbone
  • bandwidth for rich interaction and tight
    coupling
  • Grid middleware
  • Globus, data management,
  • Next-generation applications
  • breakthrough versions of todays applications
  • but also, reaching beyond traditional
    supercomputing

18
Introduction to TeraGrid Resources and Services
  • Compute Resources
  • Data Resources and Data Management Services
  • Visualization Resources
  • Network Resources
  • Grid Services
  • Grid Scheduling
  • Allocations and Proposals

19
Compute Resources Overview
4 Lambdas
CHI
LA
96 GeForce4 Graphics Pipes
100 TB DataWulf
96 Pentium4 64 2p Madison Myrinet
32 Pentium4 52 2p Madison 20 2p Madison Myrinet
20 TB
Caltech
ANL
256 2p Madison 667 2p Madison Myrinet
128 2p Madison 256 2p Madison Myrinet
1.1 TF Power4 Federation
500 TB FCS SAN
230 TB FCS SAN
NCSA
SDSC
PSC
Charlie Catlett ltcatlett_at_mcs.anl.govgt Pete
Beckman ltbeckman_at_mcs.anl.govgt
20
Compute Resources NCSA2.6 TF ? 10.6 TF w/ 230
TB
30 Gbps to TeraGrid Network
GbE Fabric
8 TF Madison 667 nodes
2.6 TF Madison 256 nodes
Storage I/O over Myrinet and/or GbE
2p Madison 4 GB memory 2x73 GB
2p Madison 4 GB memory 2x73 GB
2p 1.3 GHz 4 or 12 GB memory 73 GB scratch
2p Madison 4 GB memory 2x73 GB
250MB/s/node 670 nodes
250MB/s/node 256 nodes
256 2x FC
Myrinet Fabric
Brocade 12000 Switches
92 2x FC
InteractiveSpare Nodes
230 TB
8 4p Madison Nodes
Login, FTP
21
Compute Resources SDSC 1.3 TF ? 4.3 1.1 TF
w/ 500 TB
30 Gbps to TeraGrid Network
GbE Fabric
3 TF Madison 256 nodes
1.3 TF Madison 128 nodes
2p Madison 4 GB memory 2x73 GB
2p 1.3 GHz 4 GB memory 73 GB scratch
2p Madison 4 GB memory 2x73 GB
128 250MB/s
128 250MB/s
128 250MB/s
128 2x FC
128 2x FC
128 2x FC
Myrinet Fabric
Brocade 12000 Switches
256 2x FC
500 TB
InteractiveSpare Nodes
6 4p Madison Nodes
Login, FTP
22
Compute Resources ANL1.4 TF w/ 20 TB, Viz
30 Gbps to TeraGrid Network
Visualization .9 TF Pentium IV 96 nodes
Compute .5 TF Madison 64 nodes
GbE Fabric
2p Madison 4 GB memory 2x73 GB
2p 2.4 GHz 4 GB RAM 73 GB disk Radeon 9000
2p 2.4 GHz 4 GB RAM 73 GB disk Radeon 9000
2p Madison 4 GB memory 2x73 GB
250MB/s/node 64 nodes
250MB/s/node 96 nodes
96 visualization streams
Myrinet Fabric
Storage I/O over Myrinet and/or GbE
Viz Devices
Network Viz
Viz I/O over Myrinet and/or GbE To TG network.
2p 2.4 GHz 4 GB RAM
4 2p PIV Nodes
4 4p Madison Nodes
Login, FTP
8 2x FC
20 TB
Storage Nodes
Interactive Nodes
23
Compute Resources Caltech 100 GF w/ 100 TB
30 Gbps to TeraGrid Network
GbE Fabric
6 Opteron nodes
33 IA32 storage nodes 100 TB /pvfs
72 GF Madison 36 IBM/Intel nodes
34 GF Madison 17 HP/Intel nodes
2p Madison 6 GB memory 2x73 GB
2p Madison 6 GB memory 73 GB scratch
2p ia32 6 GB memory 100 TB /pvfs
4p Opteron 8 GB memory 66 TB RAID5 HPSS
Datawulf
2p Madison 6 GB memory 73 GB scratch
36 250MB/s
33 250MB/s
17 250MB/s
Myrinet Fabric
13 2xFC
Interactive Node
2p IBM Madison Node
Login, FTP
13 Tape drives 1.2 PB silo raw capacity
24
Compute Resources PSC6.4 TF w/ 150 TB
30 Gbps to TeraGrid Network
GbE Fabric
Linux Cache Nodes (LCNs) 150TB RAID disk
Application Gateways
Hierarchical Storage (DMF)
Quadrics
25
PSC Integration Strategy
  • TCS (lemieux.psc.edu),Marvel (rachel.psc.edu)and
    Visualization nodes
  • OpenPBS, SIMON Scheduler
  • openSSH/SSL
  • Compaq C/C Fortran, gcc
  • Quadrics MPI (lemieux)
  • Marvel Native MPI (rachel)
  • Python with XML libraries
  • GridFTP via Linux Cache Nodes / HSM
  • Adding
  • Globus 2.x.y GRAM, GRIS
  • softenv
  • gsi-openssh, gsi-ncftp
  • Condor-G
  • INCA test harness
  • more as Common TeraGrid Software Stack develops

26
SDSC POWER4 Integration Strategy
  • Software Stack Test Suite
  • Porting Core and Basic Services to AIXL
  • POWER4 cluster with common TeraGrid software
    stack as close as practical to IA-64 and TCS
    Alpha versions to support TeraGrid Roaming
  • Network Attachment Architecture
  • Federation Switch on every node
  • Fibre Channel on every node
  • GbE to TeraGrid via Force10 switch

27
Data Resources and Data Management Services
  • Approach
  • Deploy core services
  • Drive the system with data intensive flagship
    applications
  • TG Data Services Plan
  • Integrate mass storage systems at sites into TG
  • GridFTP-based access to mass storage systems
  • SRB-based access to data
  • HDF5 libraries

28
Common Data Services
  • Database systems five systems (5x32 IBM Regatta)
    acquired at SDSC for DB2 and other related DB
    apps Oracle and DB2 clients planned at NCSA

29
The TeraGrid Visualization Strategy
  • Combine existing resources and current
    technology
  • Commodity clustering and commodity graphics
  • Grid technology
  • Access Grid collaborative tools
  • Efforts, expertise, and tools from each of the
    ETF sites
  • Volume Rendering (SDSC)
  • Coupled Visualization (PSC)
  • Volume Rendering (Caltech)
  • VisBench (NCSA)
  • Grid and Visualization Services (ANL)
  • to enable new and novel ways of visually
    interacting with simulations and data

30
Two Types of Loosely Coupled Visualization
Interactive Visualization
TeraGrid Simulation
Computationally steeringthrough pre-computed data
TeraGridnetwork
User
Batch Visualization
short term storage
Long term storage
Processing batch jobssuch as movie generation
31
On-Demand and Collaborative Visualization
TeraGrid Simulation
On-Demand Visualization
Coupling simulation with interaction
AG
Voyager Recording
Collaborative Visualization
Preprocessing,filtering, featuredetection.
Multi-party viewingand collaboration
32
Visualization Sample Use Cases
33
The TeraGrid Networking Strategy
  • TeraGrid Backplane
  • Provides sufficient connectivity (bandwidth,
    latency) to support virtual machine room
  • Core backbone 40 Gbps
  • Connectivity to each site 30 Gbps
  • Local networking
  • Provide support for all nodes at each site to
    have adequate access to backplane
  • Support for distributed simulations

34
TeraGrid Wide Area Network
35
TeraGrid Optical Network
Ciena Metro DWDM (operated by site)
818 W. 7th St. (CENIC Hub)
455 N. Cityfront Plaza (Qwest Fiber Collocation
Facility)
2200mi
Ciena CoreStream Long-Haul DWDM (Operated by
Qwest)
Los Angeles
DTF Backbone Core Router
Chicago
Cisco Long-Haul DWDM (Operated by CENIC)
Additional Sites And Networks
Routers / Switch-Routers
Starlight
DTF Local Site Resources and External Network
Connections
115mi
25mi
140mi
25mi
??mi
Caltech
SDSC
ANL
NCSA
PSC
Site Border Router
Cluster Aggregation Switch
Caltech Systems
SDSC Systems
NCSA Systems
ANL Systems
PSC Systems
36
NCSA TeraGrid Network
Juniper T640
Site Border Router
3x10GbE
30 Gbps
Cluster Aggregation Switch
Force10
160 Gbps
4x10GbE
4x10GbE
4x10GbE
4x10GbE
Force10
Force10
Force10
Force10
37
SDSC TeraGrid Network
30 Gbps
Juniper T640
Site Border Router
40 Gbps
2x10GbE
2x10GbE
Force10
38
Caltech TeraGrid Network
30 Gbps
Juniper T640
Site Border Router
30 Gbps
3x10GbE
Force10
39
Argonne TeraGrid Network
30 Gbps
Juniper T640
Site Border Router
30 Gbps
3x10GbE
Force10
40
PSC TeraGrid Network
Cisco
Site Border Router
3x10GbE
30 Gbps
Linux Cache Nodes (LCNs) 150TB RAID disk
30x1GbE
30 Gbps
Application Gateways
SGI DMF Hierarchical Storage
GbE
F/C
Quadrics
4x32p EV7 SMPs
20 Viz nodes
41
Grid Services A Layered Grid Architecture
Talking to things communication (Internet
protocols) security
Connectivity
Controlling things locally Access to,
control of, resources
Fabric
42
TeraGrid Runtime Environment
CREDENTIAL
Single sign-on via grid-id
Assignment of credentials to user proxies
Globus Credential
Mutual user-resource authentication
Site 2
Authenticated interprocess communication
Mappingtolocal ids
Certificate
43
Homogeneity Strategies
  • Common Grid Middleware Layer
  • TeraGrid Software Stack
  • Collectively Designed
  • prerequisite for adding services (components) to
    common stack is associated INCA test and build
    module
  • Multiple Layers Coordinated
  • environment variables
  • pathnames
  • versions for system software, libraries, tools
  • Minimum requirements plus Site Value-Added
  • multiple environments possible
  • special services and tools on top of common
    TeraGrid software stack

44
Common Authentication Service
  • Standardized GSI authentication across all
    TeraGrid systems allows use of the same
    certificate
  • Developed coordinated cert acceptance policy
  • today accept
  • NCSA/Alliance
  • SDSC
  • PSC
  • DOE Science Grid
  • Developing procedures and tools to simplify the
    management of certificates
  • Grid mapfile distribution
  • simplified certificate request/retrieval
  • Sandra and Derek will cover these in more detail
    later

45
Grid Information Services
  • Currently Leveraging Globus Grid Information
    Service
  • each service/resource is an information source
  • index servers at each of the TG sites
  • full mesh between index servers for fault
    tolerance
  • access control as needed
  • Resource Information Management Strategies
  • TG GIS for systems level information
  • generic non-sensitive information
  • access control on sensitive info such as job
    level information
  • Applications specific GIS services
  • access controls applied as needed
  • user control

46
TeraGrid Software Stack V1.0
  • A social contract with the user
  • LORA Learn Once, Run Anywhere
  • Precise definitions
  • services (done In CVS)
  • software (done In CVS)
  • user environment (done In CVS)
  • Reproducibility
  • standard configure, build, and install
  • single CVS repository for software
  • initial releases for IA-64, IA-32, Power4, Alpha

47
Current TG Software Stack
  • SuSE SLES
  • X-cat
  • OpenPBS
  • Maui scheduler
  • MPICH, MPICH-G2, MPICH-VMI
  • gm drivers
  • VMI/CRM
  • Globus
  • Condor-G
  • gsi-ssh
  • GPT Wizard and GPT
  • GPT
  • SoftEnv
  • MyProxy
  • Intel compilers
  • GNU compilers
  • HDF4/5
  • SRB client

48
Grid Scheduling Job Management Condor-G, the
User Interface
  • Condor-G is the preferred job management
    interface
  • job scheduling, submission, tracking, etc.
  • allows for complex job relationships and data
    staging issues
  • interfaces to Globus layers transparently
  • allows you to use your workstation as your
    interface to the grid
  • The ability to determine current system loads and
    queue status will come in the form of a web
    interface
  • allows for user-drive load balancing across
    resources
  • might look a lot like the PACI HotPage https//ho
    tpage.paci.org/

49
Pipelined Jobs Execution
  • Scheduling of such jobs can be done now
  • Condor-G helps significantly with job
    dependencies
  • Can be coordinated with non-TG resources
  • Nancy will go through and exercise on this in the
    afternoon

50
Multi-Site, Single Execution
  • Support for execution via MPICH-G2 and MPICH-VMI2
  • MPI libraries optimized for WAN execution
  • Scheduling is still very much a CS research area
  • investigating product options
  • Maui, Catalina, PBSPro
  • tracking Globus developments
  • tracking GGF standards

51
Advanced Reservations
  • Allow scheduled execution time for jobs
  • provides support for co-scheduling of resources
    for multi-site execution
  • Still need to manually schedule across sites
  • provides support for co-scheduling with non-TG
    resources (instruments, detectors, etc.)
  • Send a note to help_at_teragrid.org if you want to
    do co-scheduling

52
Allocations Policies
  • TG resources allocated via the PACI allocations
    and review process
  • modeled after NSF process
  • TG considered as single resource for grid
    allocations
  • Different levels of review for different size
    allocation requests
  • DAC up to 10,000
  • PRAC/AAB lt200,000 SUs/year
  • NRAC 200,000 SUs/year
  • Policies/procedures posted at
  • http//www.paci.org/Allocations.html
  • Proposal submission through the PACI On-Line
    Proposal System (POPS)
  • https//pops-submit.paci.org/

53
Accounts and Account Management
  • TG accounts created on ALL TG systems for every
    user
  • information regarding accounts on all resources
    delivered
  • working toward single US mail packet arriving for
    user
  • accounts synched through centralized database
  • certificates provide uniform access for users
  • jobs can be submitted to/run on any TG resource
  • NMI Account Management Information Exchange
    (AMIE) used to manage account and transport usage
    records in GGF format

54
And now
  • on to the interesting details

55
TeraGrid Computing Paradigm
  • Sharon Brunett
  • CACR / Caltech
  • Co-Chair, TG Performance Eval WG
  • sharon_at_cacr.caltech.edu

56
TeraGrid Computing Paradigm
  • Traditional parallel processing
  • Distributed parallel processing
  • Pipelined/dataflow processing

57
Traditional Parallel Processing
  • Tightly coupled multicomputers are meeting
    traditional needs of large scale scientific
    applications
  • compute bound codes
  • faster and more CPUs
  • memory hungry codes
  • deeper cache, more local memory
  • tightly coupled, communications intensive codes
  • high bandwidth, low latency interconnect message
    passing between tasks
  • I/O bound codes
  • large capacity, high performance disk subsystems

58
Traditional Parallel Processing - When Have
We Hit the Wall?
  • Applications can outgrow or be limited by a
    single parallel computer
  • heterogeneity desirable due to application
    components
  • storage, memory and/or computing demands exceed
    resources of a single system
  • more robustness desired
  • integrate remote instruments

59
Traditional Parallel Processing
  • Single executables to be on a single remote
    machine
  • big assumptions
  • runtime necessities (e.g. executables, input
    files, shared objects) available on remote
    system!
  • login to a head node, choose a submission
    mechanism
  • Direct, interactive execution
  • mpirun np 16 ./a.out
  • Through a batch job manager
  • qsub my_script
  • where my_script describes executable location,
    runtime duration, redirection of stdout/err,
    mpirun specification

60
Traditional Parallel Processing II
  • Through globus
  • globusrun -r some-teragrid-head-node.teragrid.or
    g/jobmanager -f my_rsl_script
  • where my_rsl_script describes the same details as
    in the qsub my_script!
  • Through Condor-G
  • condor_submit my_condor_script
  • where my_condor_script describes the same details
    as the globus my_rsl_script!

61
Distributed Parallel Processing
  • Decompose application over geographically
    distributed resources
  • functional or domain decomposition fits well
  • take advantage of load balancing opportunities
  • think about latency impact
  • Improved utilization of a many resources
  • Flexible job management

62
Overview of Distributed TeraGrid Resources
Site Resources
Site Resources
HPSS
HPSS
External Networks
External Networks
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 10.3 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
63
Distributed Parallel Processing II
  • Multiple executables to run on multiple remote
    systems
  • tools for pushing runtime necessities to remote
    sites
  • Storage Resource Broker, gsiscp,ftp,
    globus-url-copy - copies files between sites
  • globus-job-submit my_script
  • returns https address for monitoring and post
    processing control

64
Distributed Parallel Processing III
  • Multi-site runs need co-allocated resources
  • VMI-mpich jobs can run multi-site
  • vmirun np local_cpus grid_vmi gnp total_cpus
    -crm crm_name key key_value ./a.out
  • server/client socket based data exchanges between
    sites
  • Globus and Condor-G based multi-site job
    submission
  • create appropriate RSL script

65
Pipelined/dataflow processing
  • Suited for problems which can be divided into a
    series of sequential tasks where
  • multiple instances of problem need executing
  • series of data needs processing with multiple
    operations on each series
  • information from one processing phase can be
    passed to next phase before current phase is
    complete

66
Pipelined/dataflow processing
  • Key requirement for efficiency
  • fast communication between adjacent processes in
    a pipeline
  • interconnect on TeraGrid resources meets this
    need
  • Common examples
  • frequency filters
  • Monte Carlo
  • MCELL example this afternoon!

67
Pipeline/Dataflow Example CMS (Compact Muon
Solenoid) Application
  • Schedule and run 100s of Monte Carlo detector
    response simulations on TG compute cluster(s)
  • Transfer each jobs 1 GB of output to mass
    storage system at a selected TG
  • Schedule and run 100s of jobs on a TG cluster to
    reconstruct physics from the simulated data
  • Transfer results to mass storage system

68
Pipelined CMS Job Flow
2) Launch secondary job on remote pool of nodes
get input files via Globus tools (GASS)
Master Condor job running at Caltech
Secondary Condor job on remote pool
5) Secondary reports complete to master
Caltech workstation
6) Master starts reconstruction jobs via Globus
jobmanager on cluster
9) Reconstruction job reports complete to master
Vladimir Litvin, Caltech Scott Koranda,
NCSA/Univ of Wisc-Milwaulke
3a) 75 Monte Carlo jobs on remote Condor pool
3b) 25 Monte Carlo jobs on remote nodes via
Condor
7) gsiftp fetches data from mass storage
4) 100 data files transferred via gsiftp, 1 GB
each
TG or other Linux cluster
8) Processed database stored to mass storage
TeraGrid Globus-enabled FTP server
69
The TeraGrid User Environment Job Execution
  • Sandra Bittner
  • Argonne National Laboratory
  • bittner_at_mcs.anl.gov

70
The TG User Environment Job Execution
  • Development Environment
  • Grid Mechanisms
  • Data Handling
  • Job Submission and Monitoring

71
Development Environment

72
SoftEnv System
  • Software package management system instituting
    symbolic keys for user environments
  • Replaces traditional UNIX dot files
  • Supports community keys
  • Programmable similar to other dot files
  • Integrated user environment transfer
  • Well suited to software lifecycles
  • Offers unified view of heterogeneous platforms

73
Manipulating the Environment
  • /home/ltusernamegt/.soft
  • _at_teragrid
  • softenv
  • displays symbolic software key names
  • soft add ltpackage-namegt
  • temporary addition of package to environment
  • soft delete ltpackage-namegt
  • temporary package removal from environment
  • resoft
  • modify dotfile and apply to present environment

74
softenv output (part 1 of 2)
  • softenv
  • SoftEnv version 1.4.2
  • The SoftEnv system is used to set up environment
    variables. For details, see 'man softenv-intro'.
  • This is a list of keys and macros that the
    SoftEnv system understands.
  • In this list, the following symbols indicate
  • This keyword is part of the default
    environment, which you get by putting "_at_default"
    in your .soft
  • U This keyword is considered generally
    "useful".
  • P This keyword is for "power users", people
    who want to build their
  • own path from scratch. Not recommended
    unless you know what you
  • are doing.

75
softenv output (part 2 of 2)
  • These are the keywords explicitly available
  • P atlas ATLAS
  • P globus Globus -- The
    Meta Scheduler
  • P gm Myricom GM
    networking software
  • P goto goto BLAS
    libraries
  • P gsi-openssh GSI OpenSSH
  • P hdf4 HDF4
  • P hdf5 HDF5
  • P intel-compilers Intel C
    Fortran Compilers
  • java Java
    Environment flags power
  • P maui Maui Scheduler
  • P mpich-g2 MPICH for G2
  • P mpich-vmi MPICH for VMI
  • P myricom GM Binaries
  • P openpbs-2.3.16 Open Portable
    Batch System 2.3.16
  • P pbs Portable Batch
    System
  • P petsc PETSc 2.1.5
  • P srb-client SRB Client

76
SoftEnv Documentation
  • Overview
  • man softenv
  • User Guide
  • man softenv-intro
  • Administrators Guide
  • man softenv-admin
  • The Msys Toolkit
  • http//www.mcs.anl.gov/systems/software

77
Communities
  • Creating organizing communities
  • Registering keys
  • Adding software
  • Software versions and life cycle

78
Software Layers
  • Breaking down a directory name
  • /soft/globus-2.4.3_intel-c-7.1.025-f-7.1.028_
    ssh-3.5p1_gm-2.0.6_mpich-m_1.2.5..10_
    mpicc64dbg_vendorcc64dbg
  • Or one softkey of globus-2.4.3-intel

79
Compilers Scripting Languages
  • Intel C, Intel Fortran
  • may differ across platforms/architecture
  • pre-production v7.1 v8.0
  • GNU Compiler Collection, GCC
  • may differ across platforms/architecture
  • pre-production v3.2-30 v3.2.2-5
  • Scripting languages
  • PERL
  • Python

80
Grid Mechanisms
  • Certificates
  • Software

81
Certificates Your TeraGrid Passport
  • Reciprocal agreements
  • NCSA, SDSC, PSC, DOEgrids
  • what happen to DOEScienceGrid certs
  • what about Globus certificates
  • Apply from command line
  • ncsa-cert-req
  • sdsc-cert-req
  • Register distribute certificate on TG
  • gx-map
  • GLOBUS_LOCATION/grid-proxy-init

82
Globus Foundations
83
GSI in Action Create Processes at A and B
that Communicate Access Files at C
Single sign-on via grid-id generation of
proxy cred.
User Proxy
User
Proxy credential
Or retrieval of proxy cred. from online
repository
Remote process creation requests
GSI-enabled GRAM server
GSI-enabled GRAM server
Authorize Map to local id Create process Generate
credentials
Ditto
Site A (Kerberos)
Site B (Unix)
Computer
Computer
Process
Process
Communication
Local id
Local id
Remote file access request
Kerberos ticket
Restricted proxy
Restricted proxy
GSI-enabled FTP server
Site C (Kerberos)
Authorize Map to local id Access file
With mutual authentication
Storage system
84
globus-job-run
  • For running of interactive jobs
  • Additional functionality beyond rsh
  • Ex Run 2 process job w/ executable staging
  • globus-job-run - host np 2 s myprog arg1 arg2
  • Ex Run 5 processes across 2 hosts
  • globus-job-run \
  • - host1 np 2 s myprog.linux arg1 \
  • - host2 np 3 s myprog.aix arg2
  • For list of arguments run
  • globus-job-run -help

85
globus-job-submit
  • For running of batch/offline jobs
  • globus-job-submit Submit job
  • same interface as globus-job-run
  • returns immediately
  • globus-job-status Check job status
  • globus-job-cancel Cancel job
  • globus-job-get-output Get job stdout/err
  • globus-job-clean Cleanup after job

86
globusrun
  • Flexible job submission for scripting
  • uses an RSL string to specify job request
  • contains an embedded globus-gass-server
  • defines GASS URL prefix in RSL substitution
    variable
  • (stdout(GLOBUSRUN_GASS_URL)/stdout)
  • supports both interactive and offline jobs
  • Complex to use
  • must write RSL by hand
  • must understand its esoteric features
  • generally you should use globus-job- commands
    instead

87
GridFtp
  • Moving a Test File
  • globus-url-copy "grid-cert-info -subject" \
    gsiftp//localhost5678/tmp/file1 \
    file///tmp/file2
  • Examples during hands-on session

88
Condor-G
  • Combines the strengths of Condor
  • and the Globus Toolkit
  • Advantages when managing grid jobs
  • full featured queuing service
  • credential management
  • fault-tolerance

89
Standard Condor-G
  • Examples during hands-on demonstration

90
How It Works
Condor-G
Grid Resource
Schedd
PBS
91
How It Works
Condor-G
Grid Resource
Schedd
PBS
92
How It Works
Condor-G
Grid Resource
Schedd
PBS
GridManager
93
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
94
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
User Job
95
Condor-G with Glide In
  • Examples during hands-on session

96
How It Works
Condor-G
Grid Resource
Schedd
PBS
97
How It Works
Condor-G
Grid Resource
Schedd
PBS
98
How It Works
Condor-G
Grid Resource
Schedd
PBS
GridManager
99
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
100
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
User Job
101
How It Works
Condor-G
Grid Resource
Schedd
PBS
Collector
102
How It Works
Condor-G
Grid Resource
Schedd
PBS
glide-ins
Collector
103
How It Works
Condor-G
Grid Resource
Schedd
glide-ins
PBS
GridManager
Collector
104
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
glide-ins
Collector
105
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
Startd
glide-ins
Collector
106
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
Startd
glide-ins
Collector
107
How It Works
Condor-G
Grid Resource
JobManager
Schedd
PBS
GridManager
Startd
glide-ins
Collector
User Job
108
MPI Message Passing Interface
  • MPICH-G2
  • Grid-enabled implementation of the MPI v1
    standard
  • harnesses services from the Globus Toolkit to run
    MPI jobs across heterogeneous platforms
  • MPICH-GM
  • used to exploit the lower latency and higher data
    rates of Myrinet networks
  • may be used alone or layered with other MPI
    implementations
  • MPI-VMI2
  • exploits network layer and integrates profiling
    behaviors for optimization

109
MPI
  • TG default is MPI-v1, for MPI-v2 use softkey
  • ROMIO
  • high performance, portable MPI-IO
  • optimized for noncontiguous data access patterns
    common in parallel applications
  • optimized I/O collectives
  • C, Fortran, Profiling interfaces provided
  • not included file interoperability or
    user-defined error handlers for files

110
MPICH-G2
  • Excels at cross-site or inter-cluster jobs
  • Offers multiple MPI receive behaviors to enhance
    job performance under known conditions
  • Not recommended for intra-cluster jobs
  • Examples during hands-on session

111
MPICH-G2
  • Three different receive behaviors of MPICH-G2.
  • offers topology aware communicators using
    GLOBUS_LAN_ID
  • enhanced non-vendor MPI through point-to-point
    messaging
  • data exchange through UDP enabled GridFTP
  • SC 2003 Demo ANL Booth
  • Offers MPI_Comm_connect,accept from the MPI-2
    standard
  • Uses standard MPI directives such as mpirun and
    mpicc, mpif77, etc

112
MPICH-VMI2
  • Operates across varied network bandwidth
    protocols, such as TCP, Infiniband, Myrinet
  • Utilizes standard MPI directives such as mpirun,
    mpicc, and mpi77
  • Harnesses profiling routines to provide execution
    optimization when application characteristics are
    not previously known.
  • Examples during hands-on session

113
Data Handling

114
Wheres the disk
  • Local node disk
  • Shared writeable global areas
  • Scratch Space TG_SCRATCH
  • Parallel Filesystems, GPFS, PVFS
  • Home directories /home/ltusernamegt
  • Project/Database space
  • LORA learn once run anywhere

115
Data responsibilities Expectations
  • Storage lifetimes
  • check local policy command TG documentation
  • Data transfer
  • srb, grid-ftp, scp
  • Data restoration services/Back-ups
  • varies by site
  • Job Check-pointing
  • responsibility rests with the user
  • Email Relay only, no local delivery
  • forwarded to address of registration

116
File Systems
  • GPFS
  • available for IA32 based clusters
  • underdevelopment for IA64 based clusters
  • fast file system - initial tests promising
  • PVFS
  • parallel file system
  • used for high performance scratch

117
PVFS
  • Parallel file system providing shared access to a
    high-performance scratch space
  • software is quite stable but does not try to
    handle single-point hardware (node) failures
  • Excellent place to store a replica of input data,
    or output data prior to archiving
  • Quirks
  • is/can be very slow
  • executing off PVFS has traditionally been buggy
    (not suggested)
  • no client caching means poor small read/write
    performance
  • For more information visit the PVFS BOF on Wed.
    at 5pm

118
Integrating complex resources
  • SRB
  • Visualization Resources
  • ANL booth demos
  • fractal demo during hands-on session
  • Real-time equipment
  • shake tables
  • microscopy
  • haptic devices
  • Integration work in progress
  • A research topic

119
Job Submission Monitoring

120
Scheduling
  • Metascheduling
  • user setable reservations
  • pre-allocated advanced reservations
  • Local scheduling
  • PBS/Maui
  • Condor-G
  • Peer scheduling
  • may be considered in the future

121
Job Submission Methods
  • TG Wide submissions
  • Condor-G
  • MPICH-G2
  • MPICH-VMI 2
  • TG Local cluster submissions
  • Globus
  • Condor-G
  • PBS Batch Interactive
  • Examples during hands-on session

122
The Teragrid Pulse
  • Inca System
  • test harness
  • unit reporters
  • version reporters
  • Operation monitor
  • system resources
  • job submissions

123
What is the Inca Test Harness?
  • Software built to support the Grid Hosting
    Environment
  • Is SRB working at all sites?
  • Should we upgrade Globus to version 2.4?
  • Is TG_SCRATCH available on the compute nodes?
  • Framework for automated testing, verification,
    and monitoring
  • Find problems before users do!

124
Architecture Overview
  • Reporter - a script or executable
  • version, unit test, and integrated test
  • assembled into suites
  • Harness - perl daemons
  • Planning and execution of reporter suites
  • Archiving
  • Publishing
  • Client - user-friendly web interface,
    application, etc.

125
How will this help you?
  • Example pre-production screenshots

126
Network Characteristics
  • Each site is connected at 30 Gb/s
  • Cluster nodes are connected at 1 Gb/s
  • Real world performance node to node is 990 Mb/s
  • TCP tuning is essential within your application
    to attain good throughput
  • TCP has issues along high speed, high latency
    paths which are current research topics

127
Ongoing Iperf tests (single day example)
  • Iperf tests run once an hour between dedicated
    test platforms at each site (IA32 based)
  • Will report to INCA soon
  • Deployed on a variety of representative machines
    soon
  • Code to be made available under an open source
    license soon
  • http//network.teragrid.org/tgperf/

128
Detailed Iperf Graph (single day close up)
129
For More Information
  • TeraGrid http//www.teragrid.org/userinfo
  • Condor http//www.cs.wisc.edu/condor
  • Globus http//www.globus.org
  • PBS http//www.openpbs.org
  • MPI http//www.mcs.anl.gov/mpi
  • MPICH-G2 http//www.niu.edu/mpi
  • MPICH-VMI http//vmi.ncsa.uiuc.edu
  • SoftEnv http//www.mcs.anl.gov/systems/software

130
TeraGrid Support Services and Resources
  • Nancy Wilkins-Diehr
  • San Diego Supercomputer Center
  • Co-Chair, TG Users Services WG
  • wilkinsn_at_sdsc.edu

131
Production in January!
  • First phase of the TeraGrid will be available for
    allocated users in January, 2004
  • Variety of disciplines represented by first users
  • groundwater and oil resevoir modeling
  • Large Hadron Collider support
  • Southern California Earthquake Center (SCEC)
  • Apply by Jan 6 for April access

132
Complete User Support
  • Documentation
  • Applications
  • Consulting
  • Training

133
Documentation
  • TeraGrid-wide documentation
  • simple
  • high-level
  • what works on all resources
  • Site-specific documentation
  • full details on unique capabilities of each
    resource
  • http//www.teragrid.org/docs

134
Common Installation of Applications
  • ls TG_APPS_PREFIX
  • ATLAS globus-2.4.2-2003-07-30-test2
    netcdf-3.5.0
  • HPSS goto
    papi
  • LAPACK gx-map
    pbs
  • PBSPro_5_2_2_2d gx-map-0.3
    perfmon
  • bin hdf4
    petsc
  • crm hdf5

135
24/7 Consulting Support
  • help_at_teragrid.org
  • advanced ticketing system for cross-site support
  • staffed 24/7
  • 866-336-2357, 9-5 Pacific Time
  • http//news.teragrid.org/
  • Extensive experience solving problems for early
    access users
  • Networking, compute resources, extensible
    TeraGrid resources

136
Training that Meets User Needs
  • Asynchronous training
  • this tutorial and materials will be available
    online
  • Synchronous training
  • TeraGrid training incorporated into ongoing
    training activities at all sites
  • Training at your site
  • with sufficient participants

137
Questions?
  • help_at_teragrid.org
  • Come visit us in any TeraGrid site booth
  • bittner_at_mcs.anl.gov
  • dsimmel_at_psc.edu
  • jtowns_at_ncsa.edu
  • sharon_at_cacr.caltech.edu
  • wilkinsn_at_sdsc.edu

138
Lunch! Then hands-on lab
  • Certificate creation and management, SRB
    initialization
  • Fractals
  • Single-site MPI
  • Cross-site MPICH-G2
  • Cross-site VMI2
  • Visualization
  • MCell
  • PBS
  • Globus
  • Condor-G
  • Condor DAGman
  • SRB

139
Getting Started with User Certificates on the
TeraGrid
  • Derek Simmel
  • Pittsburgh Supercomputing Center
  • dsimmel_at_psc.edu

140
Requesting a TeraGrid Allocation
  • http//www.paci.org

141
TeraGrid Accounts
  • Principal Investigators (PIs) identify who should
    have TeraGrid accounts that can charge against
    their project's allocation
  • PIs initiate the account creation process for
    authorized users via an administrative web page
  • Units of a project's allocation are charged at
    rates corresponding to the resources used

142
Approaches to TeraGrid Use
  • Log in interactively to a login node at a
    TeraGrid site and work from there
  • no client software to install/maintain yourself
  • execute tasks from your interactive session
  • Work from your local workstation and authenticate
    remotely to TeraGrid resources
  • comfort and convenience of working "at home"
  • may have to install/maintain add'l TG software

143
User Certificates for TeraGrid
  • Why use certificates for authentication?
  • Facilitates Single Sign-On
  • enter your pass-phrase only once per session,
    regardless of how many systems and services that
    you access on the Grid during that session
  • one pass-phrase to remember (to protect your
    private key), instead of one for each system
  • Widespread Use and Acceptance
  • certificate-based authentication is standard for
    modern Web commerce and secure services

144
Certificate-Based Authentication
Registration Authority
Certificate Authority
A
CA
RA
Client Z
145
TeraGrid Authentication-gtTasks
GIIS
RA/CA
HPC
HPC
HPC
Data
Viz
146
TeraGrid-Accepted CAs
  • NCSA CA
  • SDSC/NPACI CA
  • PSC CA
  • DOEGrids CANCSA CA and SDSC/NPACI CA will
    generate new TeraGrid User Certificates

147
New TeraGrid Account TODO List
  • Use Secure Shell (SSH) to log into a TeraGrid
    site
  • Change your Password WE'RE SKIPPING THIS STEP
    TODAY
  • Obtain a TeraGrid-acceptable User Certificate,
    and install it in your home directory assuming
    you do not already have one
  • Register your User Certificate in Globus
    grid-mapfile on TeraGrid systems
  • Securely copy your TeraGrid User Certificate and
    Private Key to your home workstation
  • Test your User Certificate for Remote
    Authentication
  • Initialize your TeraGrid user SRB collection (if
    applicable)WE'RE SKIPPING THIS STEP TODAY

148
0. Logging into your Classroom Laptop Computer
  • You have each been assigned a temporary TeraGrid
    user account trainNNNN is the number assigned
    to you for the duration of this course
  • Login to the laptop by entering your user account
    name (sc03) and the password provided (sc2003)
  • Once logged in, open a Terminal (xterm)
  • STOP and await further instructions...

149
1. SSH to a TeraGrid Site
  • ssh trainNN_at_tg-login1.ncsa.teragrid.org(Enter
    the password provided when prompted to do
    so)STOP and await further instructions...

150
2a. Change your Account Password
WE'RE SKIPPING THIS STEP TODAY
  • Good Password Selection Rules Apply
  • Do not use words that could be in any dictionary,
    including common or trendy misspellings of words
  • Pick something easy for you to remember, but
    impossible for others to guess
  • Pick something that you can learn to type
    quickly, using may different fingers
  • Combine letters, digits, punctuation symbols and
    capitalization
  • Never use the same password for two different
    systems, nor for two different accounts
  • If you must write your password down, do so away
    from prying eyes and lock it securely away!

151
2b. Change your Account Password
WE'RE SKIPPING THIS STEP TODAY
  • Means for changing local passwords vary among
    systems
  • local password on Linux and similar operating
    systems
  • passwd
  • Kerberos environments
  • kpasswd
  • Systems managed using NIS
  • yppasswd
  • See site documentation for correct method
  • http//www.teragrid.org/docs/

152
2c. Change your Account Password
WE'RE SKIPPING THIS STEP TODAY
  • kpasswd(Follow the prompts to enter your
    current user account password and then to enter
    (twice) your newly selected password)
  • exitto log out from tg-master2.ncsa.teragrid.org
    STOP and await further instructions...

153
3a. User Certificate Request
  • For this exercise, we will execute a command-line
    program to request a new TeraGrid User
    Certificate from the NCSA CA
  • NCSA CA User Cert instructions are available at
  • http//www.ncsa.uiuc.edu/UserInfo/Grid/Security/Ge
    tUserCert.html
  • For SDSC/NPACI CA User Certificates, a similar
    program may be used, or the web interface at
  • https//hotpage.npaci.edu/accounts/cgi-bin/create_
    certificate.cgi

154
3b. User Certificate Request
WE'RE SKIPPING THIS STEP TODAY
  • Log into a TeraGrid Login node at NCSA
  • gt ssh trainNN_at_tg-login1.ncsa.teragrid.org(use
    your new password to log in)STOP and await
    further instructions...

155
A1 New step for today...
  • Execute ls -a in your home directory on
    tg-login1.ncsa.teragrid.org
  • If you see a directory named .globus, AND no
    directory named .globus-sdsc, then STOP and await
    instructions to correct this
  • We want to make sure you have the right .globus
    in place later for the exercises...

156
3c. User Certificate Request
  • Execute the NCSA CA User Certificate request
    script
  • gt ncsa-cert-request(use your new password again
    to authenticate)STOP and await further
    instructions...

NCSA Kerberos
157
3d. User Certificate Request
  • When prompted, enter a Pass-phrase for your new
    certificate (and a second time to verify)
  • A Pass-phrase may be a sentence with spaces
  • Make it as long as you care to type "in the dark"
  • Good password selection rules apply
  • Write your pass-phrase down but store it
    securely!
  • Never allow your passphrase to be discovered by
    others - especially since this gets you in to
    multiple systems...
  • If you lose your pass-phrase, it cannot be
    recovered - you must get a new certificate

158
3e. User Certificate Request
  • The Certificate request script will place your
    new user certificate and private key into a
    .globus directory in your home directory
  • gt ls -la .globustotal 24drwxr-xr-x 3 train00
    train00 4096 Nov 17 1345 .drwx------ 33
    train00 train00 4096 Oct 17 2017 ..-r--r--r--
    1 train00 train00 2703 Nov 17 1355
    usercert.pem-r--r--r-- 1 train00 train00 1420
    Nov 17 1350 usercert_request.pem-r-------- 1
    train00 train00 963 Nov 17 1350 userkey.pem
  • Your Pass-phrase protects your private key

159
3f. User Certificate Request
  • Examine your new certificate
  • gt grid-cert-info -issuer -subject -startdate
    -enddate/CUS/ONational Center for
    Supercomputing Applications/CNCertification
    Authority/CUS/ONational Center for
    Supercomputing Applications/CNTraining
    User00Jul 11 211605 2003 GMTJul 10 211605
    2004 GMT
  • Your Certificate's Subject is your Certificate DN
  • DN Distinguished Name

160
3g. User Certificate Request
  • Test Globus certificate proxy generation
  • gt grid-proxy-init -verify -debugUser Cert File
    /home/train00/.globus/usercert.pemUser Key File
    /home/train00/.globus/userkey.pemTrusted CA Cert
    Dir /etc/grid-security/certificatesOutput File
    /tmp/x509up_u500Your identity /CU
Write a Comment
User Comments (0)
About PowerShow.com