HPGC 2006 Workshop on High-Performance Grid Computing - PowerPoint PPT Presentation

Loading...

PPT – HPGC 2006 Workshop on High-Performance Grid Computing PowerPoint presentation | free to download - id: 3bc6a0-MzVjO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

HPGC 2006 Workshop on High-Performance Grid Computing

Description:

HPGC 2006 Workshop on High-Performance Grid Computing at IPDPC 2006 Rhodes Island, Greece, April 25 29, 2006 Major HPC Grid Projects From Grid Testbeds to ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 53
Provided by: csUnbCa7
Learn more at: http://www.cs.unb.ca
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: HPGC 2006 Workshop on High-Performance Grid Computing


1
HPGC 2006 Workshop on High-Performance Grid
Computing at IPDPC 2006 Rhodes Island, Greece,
April 25 29, 2006 Major HPC Grid
Projects From Grid Testbeds to Sustainable
High-Performance Grid Infrastructures
Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG,
e-IRG wgentzsch_at_d-grid.de Thanks to Eric
Aubanel, Virendra Bhavsar, Michael Frumkin, Rob
F. Van der Wijngaart
2
HPGC 2006 Workshop on High-Performance Grid
Computing at IPDPC 2006 Rhodes Island, Greece,
April 25 29, 2006 Major HPC Grid
Projects From Grid Testbeds to Sustainable
High-Performance Grid Infrastructures
Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG,
e-IRG wgentzsch_at_d-grid.de Thanks to Eric
Aubanel, Virendra Bhavsar, Michael Frumkin, Rob
F. Van der Wijngaart and INTEL
3
Focus
  • on HPC capabilities of grids
  • on sustainable grid infrastructures
  • selected six major HPC grid projects
  • UK e-Science, US TeraGrid, NAREGI Japan,
  • EGEE and DEISA Europe, D-Grid Germany
  • and I apologize for not mentioning
  • Your favorite grid project, but

4
Too Many Major Grids to mention them all
5
UK e-Science Grid started in early 2001 400 Mio
Application independent
6
NGS Overview User view
  • Resources
  • 4 Core clusters
  • UKs National HPC services
  • A range of partner contributions
  • Access
  • Support UK academic researchers
  • Light weight peer review for limited free
    resources
  • Central help desk
  • www.grid-support.ac.uk

7
NGS Overview Oganisational view
  • Management
  • GOSC Board
  • Strategic direction
  • Technical Board
  • Technical coordination and policy
  • Grid Operations Support Centre
  • Manages the NGS
  • Operates the UK CA over 30 RAs
  • Operates central helpdesk
  • Policies and procedures
  • Manage and monitor partners

8
NGS Use
Files stored
Over 320 users
CPU time by user
Users by institution
9
NGS Development
  • Core Node refresh
  • Expand partnership
  • HPC
  • Campus Grids
  • Data Centres
  • Digital Repositories
  • Experimental Facilities
  • Baseline services
  • Aim to map user requirements onto standard
    solutions
  • Support convergence/interoperability
  • Move further towards project (VO) support
  • Support collaborative projects
  • Mixed economy
  • Core resources
  • Shared resources
  • Project/project/contract specific resources

10
The Architecture of Gateway
Services
  • The Users Desktop

Grid Portal Server
TeraGrid Gateway Services
Proxy Certificate Server / vault
User Metadata Catalog
Application Workflow
Application Deployment
Application Events
Resource Broker
Replica Mgmt
App. Resource catalogs
Core Grid Services
Security
Notification Service
Data Management Service
Grid Orchestration
Resource Allocation
Accounting Service
Policy
Administration Monitoring
Reservations And Scheduling
Courtesy Jay Boisseau
Web Services Resource Framework Web Services
Notification
Physical Resource Layer
11
TeraGrid Use
1600 users
600 users
12
Delivering User Priorities in 2005
Overall Score (depth of need)
Partners in Need (breadth of need)
Remote File Read/Write
High-Performance File Transfer
Coupled Applications, Co-scheduling
Grid Portal Toolkits
Results of in-depth discussions with 16 TeraGrid
user teams during first annual user survey
(August 2004).
Grid Workflow Tools
Batch Metascheduling
Global File System
Client-Side Computing Tools
Batch Scheduled Parameter Sweep Tools
Advanced Reservations
Data
Capability Type
Grid Computing
Science Gateways
13
National Research Grid Infrastructure (NAREGI)
2003-2007
  • Petascale Grid Infrastructure RD for Future
    Deployment
  • 45 mil (US) 16 mil x 5 (2003-2007) 125 mil
    total
  • PL Ken Miura (Fujitsu?NII)
  • Sekiguchi(AIST), Matsuoka(Titech),
    Shimojo(Osaka-U), Aoyagi (Kyushu-U)
  • Participation by multiple (gt 3) vendors,
    Fujitsu, NEC, Hitachi, NTT, etc.
  • NOT AN ACADEMIC PROJECT, 100FTEs
  • Follow and contribute to GGF Standardization,
    esp. OGSA

NEC
Focused Grand Challenge Grid Apps Areas
Osaka-U
Titech
AIST
Fujitsu
IMS
Hitachi
U-Kyushu
14
NAREGI Software Stack (Beta Ver. 2006)
Grid-Enabled Nano-Applications (WP6)
Grid PSE
Grid Visualization
Grid Programming (WP2) -Grid RPC -Grid MPI
WP3
Grid Workflow (WFML (Unicore WF))
Distributed Information Service (CIM)
Super Scheduler
Data (WP4)
WP1
Packaging
(WSRF (GT4Fujitsu WP1) GT4 and other services)
Grid VM (WP1)
Grid Security and High-Performance Grid
Networking (WP5)
SuperSINET
NII
IMS
Research Organizations
Major University Computing Centers
Computing Resources and Virtual Organizations
15
GridMPI
  • MPI applications run on the Grid environment
  • Metropolitan area, high-bandwidth environment ?
    10 Gpbs, ? 500 miles (smaller than 10ms one-way
    latency)
  • Parallel Computation
  • Larger than metropolitan area
  • MPI-IO

computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
16
EGEE Infrastructure
Country participating in EGEE
  • Scale
  • gt 180 sites in 39 countries
  • 20 000 CPUs
  • gt 5 PB storage
  • gt 10 000 concurrent jobs per day
  • gt 60 Virtual Organisations

17
The EGEE project
  • Objectives
  • Large-scale, production-quality infrastructure
    for e-Science
  • leveraging national and regional grid activities
    worldwide
  • consistent, robust and secure
  • improving and maintaining the middleware
  • attracting new resources and users from industry
    as well as science
  • EGEE
  • 1st April 2004 31 March 2006
  • 71 leading institutions in 27 countries,
    federated in regional Grids
  • EGEE-II
  • Proposed start 1 April 2006 (for 2 years)
  • Expanded consortium
  • gt 90 partners in 32 countries (also non-European
    partners)
  • Related projects, incl.
  • BalticGrid
  • SEE-GRID
  • EUMedGrid

18
Applications on EGEE
  • More than 20 applications from 7 domains
  • High Energy Physics
  • 4 LHC experiments (ALICE, ATLAS, CMS, LHCb)
  • BaBar, CDF, DØ, ZEUS
  • Biomedicine
  • Bioinformatics (Drug Discovery, GPS_at_,
    Xmipp_MLrefine, etc.)
  • Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D,
    etc.)
  • Earth Sciences
  • Earth Observation, Solid Earth Physics,
    Hydrology, Climate
  • Computational Chemistry
  • Astronomy
  • MAGIC
  • Planck
  • Geo-Physics
  • EGEODE
  • Financial Simulation
  • E-GRID

Another 8 applications from 4 domains are in
evaluation stage
19
Steps for Grid-enabling applications II
  • Tools to easily access Grid resources through
    high level Grid middleware (gLite)
  • VO management (VOMS etc.)
  • Workload management
  • Data management
  • Information and monitoring
  • Application can
  • interface directly to gLite
  • or
  • use higher level services such as portals,
    application specific workflow systems etc.

20
EGEE Performance Measurements
  • Information about resources (static dynamic)
  • Computing machine properties (CPUs, memory
    architecture, ..), platform properties (OS,
    compiler, other software, ), load
  • Data storage location, access properties, load
  • Network bandwidth, load
  • Information about applications
  • Static computing and data requirements to reduce
    search space
  • Dynamic changes in computing and data
    requirements (might need re-scheduling)
  • Plus
  • Information about Grid services (static
    dynamic)
  • Which services available
  • Status
  • Capabilities

21
Sustainability Beyond EGEE-II
  • Need to prepare for permanent Grid infrastructure
  • Maintain Europes leading position in global
    science Grids
  • Ensure a reliable and adaptive support for all
    sciences
  • Independent of project funding cycles
  • Modelled on success of GÉANT
  • Infrastructure managed centrally in collaboration
    with national bodies

22
  • e-Infrastructures Reflection Group
  • e-IRG Mission
  • to support on political, advisory and
    monitoring level,
  • the creation of a policy and administrative
    framework
  • for the easy and cost-effective shared use of
    electronic resources in Europe
  • (focusing on Grid-computing, data storage,
    and networking resources)
  • across technological, administrative and
    national domains.

23
DEISA Perspectives Towards cooperative extreme
computing in Europe
  • Victor Alessandrini
  • IDRIS - CNRS
  • va_at_idris.fr

24
The DEISA Supercomputing Environment (21.900
processors and 145 Tf in 2006, more than 190 Tf
in 2007)
  • IBM AIX Super-cluster
  • FZJ-Julich, 1312 processors, 8,9 teraflops peak
  • RZG Garching, 748 processors, 3,8 teraflops
    peak
  • IDRIS, 1024 processors, 6.7 teraflops peak
  • CINECA, 512 processors, 2,6 teraflops peak
  • CSC, 512 processors, 2,6 teraflops peak
  • ECMWF, 2 systems of 2276 processors each, 33
    teraflops peak
  • HPCx, 1600 processors, 12 teraflops peak
  • BSC, IBM PowerPC Linux system (MareNostrum) 4864
    processeurs, 40 teraflops peak
  • SARA, SGI ALTIX Linux system, 1024 processors, 7
    teraflops peak
  • LRZ, Linux cluster (2.7 teraflops) moving to SGI
    ALTIX system (5120 processors and 33 teraflops
    peak in 2006, 70 teraflops peak in 2007)
  • HLRS, NEC SX8 vector system, 646 processors, 12,7
    teraflops peak.

25
DEISA objectives
  • To enable Europes terascale science by the
    integration of Europes most powerful
    supercomputing systems.
  • Enabling scientific discovery across a broad
    spectrum of science and technology is the only
    criterion for success
  • DEISA is a European Supercomputing Service built
    on top of existing national services.
  • Integration of national facilities and services,
    together with innovative operational models
  • Main focus is HPC and Extreme Computing
    applications that cannot by supported by the
    isolated national services
  • Service providing model is the transnational
    extension of national HPC centers
  • Operations,
  • User Support and Applications Enabling,
  • Network Deployment and Operation,
  • Middleware services.

26
About HPC
  • Dealing with large complex systems requires
    exceptional computational resources. For
    algorithmic reasons, resources grow much faster
    than the systems size and complexity.
  • Dealing with huge datasets, involving large
    files. Typical datasets are several PBytes.
  • Little usage of commercial or public domain
    packages. Most applications are corporate codes
    incorporating specialized know how. Specialized
    user support is important.
  • Codes are fine tuned and targeted for a
    relatively small number of well identified.
  • computing platforms. They are extremely
    sensitive to the production environment.
  • Main requirement for high performance is
    bandwidth (processor to memory, processor to
    processor, node to node, system to system).

27
HPC and Grid Computing
  • Problem the speed of light is not big enough
  • Finite signal propagation speed boosts message
    passing latencies in a WAN from a few
    microseconds to tens of milliseconds (if A is
    in Paris and B in Helsinki)
  • If A and B are two halves of a tightly coupled
    complex system, communications are frequent and
    the enhanced latencies will kill performance.
  • Grid computing works best for embarrassingly
    parallel applications, or coupled software
    modules with limited communications.
  • Example A is an ocean code, and B an atmospheric
    code. There is no bulk interaction.
  • Large, tightly coupled parallel applications
    should be run in a single platform. This is why
    we still need high end supercomputers.
  • DEISA implements this requirement by rerouting
    jobs and balancing the computational workload at
    a European scale.

28
Applications for Grids
  • Single-CPU Jobs jobmix, many users, many serial
    applications, suitable for grid (e.g in
    universities and research centers)
  • Array Jobs 100s/1000s of jobs, one user, one
    serial application, varying input parameters,
    suitable for grid (e.g. parameter studies in
    Optimization, CAE, Genomics, Finance)
  • Massively Parallel Jobs, loosely coupled one
    job, one user, one parallel application, no/low
    communication, scalable, fine-tune for grid
    (time-explicit algorithms, film rendering,
    pattern recognition)
  • Parallel Jobs, tightly coupled one job, one
    user, one parallel application, high interprocs
    communication, not suitable for distribution over
    the grid, but for parallel system in the grid
    (time-implicit algorithms, direct solvers, large
    linear algebra equation systems)

29
Objectives of e-Science Initiative
German D-Grid Project Part of 100 Mio Euro
e-Science in Germany
  • Building one Grid Infrastructure in Germany
  • Combine existing German grid activities
  • Development of e-science services for the
    research community
  • Science Service Grid Services for Scientists
  • Important Sustainability
  • Production grid infrastructure after the
    funding period
  • Integration of new grid communities (2.
    generation)
  • Evaluation of new business models for grid
    services

30
e-Science Projects
D-Grid
Knowledge Management
Astro-Grid
C3-Grid
HEP-Grid
IN-Grid
MediGrid
ONTOVERSE
WIKINGER
WIN-EM
Textgrid
Im Wissensnetz
. . .
Generic Grid Middleware and Grid Services
eSciDoc
VIOLA
  • Integration Project

31
DGI D-Grid Middleware Infrastructure
User
Application Development and User Access
GAT API
Plug-In
GridSphere
UNICORE
Nutzer
High-level Grid Services
Scheduling Workflow Management
Monitoring
LCG/gLite
Data management
Basic Grid Services
Accounting Billing User/VO-Mngt
Globus 4.0.1
Security
Resources in D-Grid
Distributed Compute Resources
Network Infrastructur
Distributed Data Archive
Data/ Software
32
  • Key Characteristics of D-Grid
  • Generic Grid infrastructure for German research
    communities
  • Focus on Sciences and Scientists, not industry
  • Strong influence of international projects
    EGEE, Deisa,
  • CrossGrid, CoreGrid, GridLab, GridCoord,
    UniGrids, NextGrid,
  • Application-driven (80 of funding), not
    infrastructure-driven
  • Focus on implementation, not research
  • Phase 1 2 50 MEuro, 100 research
    organizations

33
Conclusion moving towards Sustainable Grid
Infrastructures OR Why Grids are here to stay !
34
Reason 1 Benefits
  • Resource Utilization increase from 20 to 80
  • Productivity more work done in shorter time
  • Agility flexible actions and re-actions
  • On Demand get resources, when you need them
  • Easy Access transparent, remote, secure
  • Sharing enable collaboration over the network
  • Failover migrate/restart applications
    automatically
  • Resource Virtualization access compute services,
    not servers
  • Heterogeneity platforms, OSs, devices,
    software
  • Virtual Organizations build dismantle on the
    fly

35
Reason 2 Standards The Global Grid Forum
  • Community-driven set of working groups that are
    developing standards and best practices for
    distributed computing efforts
  • Three primary functions community, standards,
    and operations
  • Standards Areas Infrastructure, Data, Compute,
    Architecture, Applications, Management, Security,
    and Liaison
  • Community Areas Research Applications, Industry
    Applications, Grid Operations, Technology
    Innovations, and Major Grid Projects
  • Community Advisory Board represents the different
    communities and provides input and feedback to
    GGF

36
Reason 3 Industry EGA, Enterprise Grid
Alliance
  • Industry-driven consortium to implement standards
    in industry products and make them interoperable
  • Founding members EMC, Fujitsu Siemens Computers,
    HP, NEC, Network Appliance, Oracle and Sun, plus
    20 Associate Members
  • May 11, 2005 Enterprise Grid Reference Model
    v1.0

37
Reason 3 Industry EGA, Enterprise Grid
Alliance
  • Industry-driven consortium to implement standards
    in industry products and make them interoperable
  • Founding members EMC, Fujitsu Siemens Computers,
    HP, NEC, Network Appliance, Oracle and Sun, plus
    20 Associate Members
  • May 11, 2005 Enterprise Grid Reference Model
    v1.0

Feb06 GGF EGF signed a letter of intent to
merge. A joint team is planning the transition,
expected to be complete this summer
38
Reason 4 OGSA ONE Open Grid Services
Architecture
OGSA
Web Services
Grid Technologies
OGSA Open Grid Service
Architecture Integrates grid technologies with
Web Services (OGSA gt WS-RF)
Defines the key components of the grid
OGSA enables the integration of services and
resources across distributed, heterogeneous,
dynamic, virtual organizations whether within a
single enterprise or extending to external
resource-sharing and service-provider
relationships.
39
Reason 5 Quasi-Standard Tools Example The
Globus Toolkit
  • Globus Toolkit provides four major functions for
    building grids

Courtesy Gridwise Technologies
40
. . . . and
  • Seamless, secure, intuitive access to distributed
    resources data
  • Available as Open Source
  • Features intuitive GUI with single sign-on,
    X.509 certificates for AA, workflow engine for
    multi-site, multi-step workflows, job monitoring,
    application support, secure data transfer,
    resource management, and more
  • In production

Courtesy Achim Streit, FZJ
41
Globus 2.4 ? UNICORE
WS-Resource based Resource Management Framework
for dynamic resource information and resource
negotiation
Client
Portal
Command Line
WS-RF
WS-RF
WS-RF
WS-RF
Gateway Service Registry
Gateway
WS-RF
WS-RF
WS-RF
Workflow Engine
File Transfer
User Management (AAA)
Network Job Supervisor
Monitoring
Resource Management
Application Support
WS-RF
WS-RF
WS-RF
Courtesy Achim Streit, FZJ
42
Reason 6 Global Grid Community
43
7 Projects/Initiatives Testbeds Companies
  • Altair
  • Avaki
  • Axceleon
  • Cassatt
  • Datasynapse
  • Egenera
  • Entropia
  • eXludus
  • GridFrastructure
  • GridIron
  • GridSystems
  • Gridwise
  • GridXpert
  • HP Utility Data Center
  • IBM Grid Toolbox
  • Kontiki
  • Metalogic
  • Noemix
  • Oracle 10g
  • CO Grid
  • Compute-against-Cancer
  • D-Grid
  • DeskGrid
  • DOE Science Grid
  • EEGE
  • EuroGrid
  • European DataGrid
  • FightAIDS_at_home
  • Folding_at_home
  • GRIP
  • NASA IPG
  • NC BioGrid
  • NC Startup Grid
  • NC Statewide Grid
  • NEESgrid
  • NextGrid
  • Nimrod
  • Ninf
  • ActiveGrid
  • BIRN
  • Condor-G
  • Deisa
  • Dame
  • EGA
  • EnterTheGrid
  • GGF
  • Globus
  • Globus Alliance
  • GridBus
  • GridLab
  • GridPortal
  • GRIDtoday
  • GriPhyN
  • I-WAY
  • Knowledge Grid
  • Legion
  • MyGrid

44
8 FP6 Grid Technologies Projects
Call 5 start Summer 2006
EU Funding 124 M
supporting the NESSI ETP Grid community
Grid services, business models
trust, security
platforms, user environments
data, knowledge, semantics, mining
Specific support action
Integrated project
Network of excellence
Specific targeted research project
45
Reason 9 Enterprise Grids
SunRay Access
Browser Access via GEP
Workstation Access
Optional Control Network (Gbit-E)
Myrinet
Myrinet
Servers, Blades, VIZ
Myrinet
Linux Racks
Grid Manager
Workstations
Sun Fire Link
Data Network (Gbit-E)
NAS/NFS
Simple NFS
HA NFS
Scalable QFS/NFS
46
Enterprise Grid Reference Architecture
SunRay Access
Browser Access via GEP
Access
Workstation Access
Optional Control Network (Gbit-E)
Myrinet
Myrinet
Servers, Blades, VIZ
Myrinet
Linux Racks
Compute
Grid Manager
Workstations
Sun Fire Link
Data Network (Gbit-E)
Data
NAS/NFS
Simple NFS
HA NFS
Scalable QFS/NFS
47
1000s of Enterprise Grids in Industry
  • Life Sciences
  • Startup and cost efficient
  • Custom research or limited use applications
  • Multi-day application runs (BLAST)
  • Exponential Combinations
  • Limited administrative staff
  • Complementary techniques
  • Electronic Design
  • Time to Market
  • Fastest platforms, largest Grids
  • License Management
  • Well established application suite
  • Large legacy investment
  • Platform Ownership issues
  • Financial Services
  • Market simulations
  • Time IS Money
  • Proprietary applications
  • Multiple Platforms
  • Multiple scenario execution
  • Need instant results analysis tools
  • High Performance Computing
  • Parallel Reservoir Simulations
  • Geophysical Ray Tracing
  • Custom in-house codes
  • Large scale, multi-platform execution

48
Reason 10 Grid Service Providers Example BT
Pre-GRID IT asset usage 10-15
  • Inside data center, within Firewall
  • Virtual use of own IT assets
  • The GRID virtualiser engine inside Firewall
  • Opens up under-used ICT assets
  • improves TCO, ROI and Apps performance
  • BUT
  • Intra-enterprise GRID is self limiting
  • Pool of virtualised assets is restricted by
    firewall
  • Does not support Inter-Enterprise usage
  • BT is focussing on managed Grid solution

ENTERPRISE
WANS
LANS
Virtualised assets
GRID Engine
Post-GRID IT asset usage 70-75
Courtesy Piet Bel, BT
49
BTs Virtual Private Grid ( VPG )
ENTERPRISE
WANS
LANS
Virtualised IT assets
GRID Engine
BT NETWORK
GRID ENGINE
Courtesy Piet Bel, BT
50
Reason 11 There will be a Market for Grids
51
General Observations on Grid Performance
  • Today, there are 100s of important grid projects
    around the world
  • GGF identifies about 15 research projects which
    have major impact
  • Most research grids focus on HPC and
    collaboration, most industry grids focus on
    utilization and automation
  • Many grids are driven by user / application
    needs, few grid projects are driven by
    infrastructure research
  • Few projects focus on performance / benchmarks
    where performance is mostly seen at the job /
    computation / application level
  • Need for metrics and measurements that help us
    understand grids
  • In a grid, application performance has 3 major
    areas of concern system capabilities, network,
    and software infrastructure
  • Evaluating performance in a grid is different
    from classic benchmarking, because grids are
    dynamically changing systems incorporating new
    components.

52
The Grid Engine
Thank You !

wgentzsch_at_d-grid.de
About PowerShow.com