HPGC 2006 Workshop on High-Performance Grid Computing

About This Presentation

Title:

HPGC 2006 Workshop on High-Performance Grid Computing

Description:

HPGC 2006 Workshop on High-Performance Grid Computing at IPDPC 2006 Rhodes Island, Greece, April 25 29, 2006 Major HPC Grid Projects From Grid Testbeds to ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 53

Provided by: csUnbCa7

Category:

more less

Transcript and Presenter's Notes

Title: HPGC 2006 Workshop on High-Performance Grid Computing

1
HPGC 2006 Workshop on High-Performance Grid
Computing at IPDPC 2006 Rhodes Island, Greece,
April 25 29, 2006 Major HPC Grid
Projects From Grid Testbeds to Sustainable
High-Performance Grid Infrastructures
Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG,
e-IRG wgentzsch_at_d-grid.de Thanks to Eric
Aubanel, Virendra Bhavsar, Michael Frumkin, Rob
F. Van der Wijngaart
2
HPGC 2006 Workshop on High-Performance Grid
Computing at IPDPC 2006 Rhodes Island, Greece,
April 25 29, 2006 Major HPC Grid
Projects From Grid Testbeds to Sustainable
High-Performance Grid Infrastructures
Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG,
e-IRG wgentzsch_at_d-grid.de Thanks to Eric
Aubanel, Virendra Bhavsar, Michael Frumkin, Rob
F. Van der Wijngaart and INTEL
3
Focus

on HPC capabilities of grids
on sustainable grid infrastructures
selected six major HPC grid projects
UK e-Science, US TeraGrid, NAREGI Japan,
EGEE and DEISA Europe, D-Grid Germany
and I apologize for not mentioning
Your favorite grid project, but

4
Too Many Major Grids to mention them all
5
UK e-Science Gridstarted in early 2001400 Mio
Application independent
6
NGS OverviewUser view

Resources
4 Core clusters
UKs National HPC services
A range of partner contributions
Access
Support UK academic researchers
Light weight peer review for limited free
resources
Central help desk
www.grid-support.ac.uk

7
NGS OverviewOganisational view

Management
GOSC Board
Strategic direction
Technical Board
Technical coordination and policy
Grid Operations Support Centre
Manages the NGS
Operates the UK CA over 30 RAs
Operates central helpdesk
Policies and procedures
Manage and monitor partners

8
NGS Use
Files stored
Over 320 users
CPU time by user
Users by institution
9
NGS Development

Core Node refresh
Expand partnership
HPC
Campus Grids
Data Centres
Digital Repositories
Experimental Facilities
Baseline services
Aim to map user requirements onto standard
solutions
Support convergence/interoperability
Move further towards project (VO) support
Support collaborative projects
Mixed economy
Core resources
Shared resources
Project/project/contract specific resources

10
The Architecture of Gateway
Services

The Users Desktop

Grid Portal Server
TeraGrid Gateway Services
Proxy Certificate Server / vault
User Metadata Catalog
Application Workflow
Application Deployment
Application Events
Resource Broker
Replica Mgmt
App. Resource catalogs
Core Grid Services
Security
Notification Service
Data Management Service
Grid Orchestration
Resource Allocation
Accounting Service
Policy
Administration Monitoring
Reservations And Scheduling
Courtesy Jay Boisseau
Web Services Resource Framework Web Services
Notification
Physical Resource Layer
11
TeraGrid Use
1600 users
600 users
12
Delivering User Priorities in 2005
Overall Score (depth of need)
Partners in Need (breadth of need)
Remote File Read/Write
High-Performance File Transfer
Coupled Applications, Co-scheduling
Grid Portal Toolkits
Results of in-depth discussions with 16 TeraGrid
user teams during first annual user survey
(August 2004).
Grid Workflow Tools
Batch Metascheduling
Global File System
Client-Side Computing Tools
Batch Scheduled Parameter Sweep Tools
Advanced Reservations
Data
Capability Type
Grid Computing
Science Gateways
13
National Research Grid Infrastructure (NAREGI)
2003-2007

Petascale Grid Infrastructure RD for Future
Deployment
45 mil (US) 16 mil x 5 (2003-2007) 125 mil
total
PL Ken Miura (Fujitsu?NII)
Sekiguchi(AIST), Matsuoka(Titech),
Shimojo(Osaka-U), Aoyagi (Kyushu-U)
Participation by multiple (gt 3) vendors,
Fujitsu, NEC, Hitachi, NTT, etc.
NOT AN ACADEMIC PROJECT, 100FTEs
Follow and contribute to GGF Standardization,
esp. OGSA

NEC
Focused Grand Challenge Grid Apps Areas
Osaka-U
Titech
AIST
Fujitsu
IMS
Hitachi
U-Kyushu
14
NAREGI Software Stack (Beta Ver. 2006)
Grid-Enabled Nano-Applications (WP6)
Grid PSE
Grid Visualization
Grid Programming (WP2) -Grid RPC -Grid MPI
WP3
Grid Workflow (WFML (Unicore WF))
Distributed Information Service(CIM)
Super Scheduler
Data (WP4)
WP1
Packaging
(WSRF (GT4Fujitsu WP1) GT4 and other services)
Grid VM (WP1)
Grid Security and High-Performance Grid
Networking (WP5)
SuperSINET
NII
IMS
Research Organizations
Major University Computing Centers
Computing Resources and Virtual Organizations
15
GridMPI

MPI applications run on the Grid environment
Metropolitan area, high-bandwidth environment ?
10 Gpbs, ? 500 miles (smaller than 10ms one-way
latency)
Parallel Computation
Larger than metropolitan area
MPI-IO

computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
16
EGEE Infrastructure
Country participating in EGEE

Scale
gt 180 sites in 39 countries
20 000 CPUs
gt 5 PB storage
gt 10 000 concurrent jobs per day
gt 60 Virtual Organisations

17
The EGEE project

Objectives
Large-scale, production-quality infrastructure
for e-Science
leveraging national and regional grid activities
worldwide
consistent, robust and secure
improving and maintaining the middleware
attracting new resources and users from industry
as well as science
EGEE
1st April 2004 31 March 2006
71 leading institutions in 27 countries,
federated in regional Grids
EGEE-II
Proposed start 1 April 2006 (for 2 years)
Expanded consortium
gt 90 partners in 32 countries (also non-European
partners)
Related projects, incl.
BalticGrid
SEE-GRID
EUMedGrid

18
Applications on EGEE

More than 20 applications from 7 domains
High Energy Physics
4 LHC experiments (ALICE, ATLAS, CMS, LHCb)
BaBar, CDF, DØ, ZEUS
Biomedicine
Bioinformatics (Drug Discovery, GPS_at_,
Xmipp_MLrefine, etc.)
Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D,
etc.)
Earth Sciences
Earth Observation, Solid Earth Physics,
Hydrology, Climate
Computational Chemistry
Astronomy
MAGIC
Planck
Geo-Physics
EGEODE
Financial Simulation
E-GRID

Another 8 applications from 4 domains are in
evaluation stage
19
Steps for Grid-enabling applications II

Tools to easily access Grid resources through
high level Grid middleware (gLite)
VO management (VOMS etc.)
Workload management
Data management
Information and monitoring
Application can
interface directly to gLite
or
use higher level services such as portals,
application specific workflow systems etc.

20
EGEE Performance Measurements

Information about resources (static dynamic)
Computing machine properties (CPUs, memory
architecture, ..), platform properties (OS,
compiler, other software, ), load
Data storage location, access properties, load
Network bandwidth, load
Information about applications
Static computing and data requirements to reduce
search space
Dynamic changes in computing and data
requirements (might need re-scheduling)
Plus
Information about Grid services (static
dynamic)
Which services available
Status
Capabilities

21
Sustainability Beyond EGEE-II

Need to prepare for permanent Grid infrastructure
Maintain Europes leading position in global
science Grids
Ensure a reliable and adaptive support for all
sciences
Independent of project funding cycles
Modelled on success of GÉANT
Infrastructure managed centrally in collaboration
with national bodies

e-Infrastructures Reflection Group
e-IRG Mission
to support on political, advisory and
monitoring level,
the creation of a policy and administrative
framework
for the easy and cost-effective shared use of
electronic resources in Europe
(focusing on Grid-computing, data storage,
and networking resources)
across technological, administrative and
national domains.

23
DEISA PerspectivesTowards cooperative extreme
computing in Europe

Victor Alessandrini
IDRIS - CNRS
va_at_idris.fr

24
The DEISA Supercomputing Environment(21.900
processors and 145 Tf in 2006, more than 190 Tf
in 2007)

IBM AIX Super-cluster
FZJ-Julich, 1312 processors, 8,9 teraflops peak
RZG Garching, 748 processors, 3,8 teraflops
peak
IDRIS, 1024 processors, 6.7 teraflops peak
CINECA, 512 processors, 2,6 teraflops peak
CSC, 512 processors, 2,6 teraflops peak
ECMWF, 2 systems of 2276 processors each, 33
teraflops peak
HPCx, 1600 processors, 12 teraflops peak
BSC, IBM PowerPC Linux system (MareNostrum) 4864
processeurs, 40 teraflops peak
SARA, SGI ALTIX Linux system, 1024 processors, 7
teraflops peak
LRZ, Linux cluster (2.7 teraflops) moving to SGI
ALTIX system (5120 processors and 33 teraflops
peak in 2006, 70 teraflops peak in 2007)
HLRS, NEC SX8 vector system, 646 processors, 12,7
teraflops peak.

25
DEISA objectives

To enable Europes terascale science by the
integration of Europes most powerful
supercomputing systems.
Enabling scientific discovery across a broad
spectrum of science and technology is the only
criterion for success
DEISA is a European Supercomputing Service built
on top of existing national services.
Integration of national facilities and services,
together with innovative operational models
Main focus is HPC and Extreme Computing
applications that cannot by supported by the
isolated national services
Service providing model is the transnational
extension of national HPC centers
Operations,
User Support and Applications Enabling,
Network Deployment and Operation,
Middleware services.

26
About HPC

Dealing with large complex systems requires
exceptional computational resources. For
algorithmic reasons, resources grow much faster
than the systems size and complexity.
Dealing with huge datasets, involving large
files. Typical datasets are several PBytes.
Little usage of commercial or public domain
packages. Most applications are corporate codes
incorporating specialized know how. Specialized
user support is important.
Codes are fine tuned and targeted for a
relatively small number of well identified.
computing platforms. They are extremely
sensitive to the production environment.
Main requirement for high performance is
bandwidth (processor to memory, processor to
processor, node to node, system to system).

27
HPC and Grid Computing

Problem the speed of light is not big enough
Finite signal propagation speed boosts message
passing latencies in a WAN from a few
microseconds to tens of milliseconds (if A is
in Paris and B in Helsinki)
If A and B are two halves of a tightly coupled
complex system, communications are frequent and
the enhanced latencies will kill performance.
Grid computing works best for embarrassingly
parallel applications, or coupled software
modules with limited communications.
Example A is an ocean code, and B an atmospheric
code. There is no bulk interaction.
Large, tightly coupled parallel applications
should be run in a single platform. This is why
we still need high end supercomputers.
DEISA implements this requirement by rerouting
jobs and balancing the computational workload at
a European scale.

28
Applications for Grids

Single-CPU Jobs jobmix, many users, many serial
applications, suitable for grid (e.g in
universities and research centers)
Array Jobs 100s/1000s of jobs, one user, one
serial application, varying input parameters,
suitable for grid (e.g. parameter studies in
Optimization, CAE, Genomics, Finance)
Massively Parallel Jobs, loosely coupled one
job, one user, one parallel application, no/low
communication, scalable, fine-tune for grid
(time-explicit algorithms, film rendering,
pattern recognition)
Parallel Jobs, tightly coupled one job, one
user, one parallel application, high interprocs
communication, not suitable for distribution over
the grid, but for parallel system in the grid
(time-implicit algorithms, direct solvers, large
linear algebra equation systems)

29
Objectives of e-Science Initiative
German D-Grid Project Part of 100 Mio Euro
e-Science in Germany

Building one Grid Infrastructure in Germany
Combine existing German grid activities
Development of e-science services for the
research community
Science Service Grid Services for Scientists
Important Sustainability
Production grid infrastructure after the
funding period
Integration of new grid communities (2.
generation)
Evaluation of new business models for grid
services

30
e-Science Projects
D-Grid
Knowledge Management
Astro-Grid
C3-Grid
HEP-Grid
IN-Grid
MediGrid
ONTOVERSE
WIKINGER
WIN-EM
Textgrid
Im Wissensnetz
. . .
Generic Grid Middleware and Grid Services
eSciDoc
VIOLA

Integration Project

31
DGI D-Grid Middleware Infrastructure
User
Application Development and User Access
GAT API
Plug-In
GridSphere
UNICORE
Nutzer
High-levelGrid Services
SchedulingWorkflow Management
Monitoring
LCG/gLite
Data management
Basic Grid Services
AccountingBilling User/VO-Mngt
Globus 4.0.1
Security
Resourcesin D-Grid
DistributedCompute Resources
NetworkInfrastructur
DistributedData Archive
Data/Software
32

Key Characteristics of D-Grid
Generic Grid infrastructure for German research
communities
Focus on Sciences and Scientists, not industry
Strong influence of international projects
EGEE, Deisa,
CrossGrid, CoreGrid, GridLab, GridCoord,
UniGrids, NextGrid,
Application-driven (80 of funding), not
infrastructure-driven
Focus on implementation, not research
Phase 1 2 50 MEuro, 100 research
organizations

33
Conclusion moving towards Sustainable Grid
Infrastructures OR Why Grids are here to stay !
34
Reason 1 Benefits

Resource Utilization increase from 20 to 80
Productivity more work done in shorter time
Agility flexible actions and re-actions
On Demand get resources, when you need them
Easy Access transparent, remote, secure
Sharing enable collaboration over the network
Failover migrate/restart applications
automatically
Resource Virtualization access compute services,
not servers
Heterogeneity platforms, OSs, devices,
software
Virtual Organizations build dismantle on the
fly

35
Reason 2 StandardsThe Global Grid Forum

Community-driven set of working groups that are
developing standards and best practices for
distributed computing efforts
Three primary functions community, standards,
and operations
Standards Areas Infrastructure, Data, Compute,
Architecture, Applications, Management, Security,
and Liaison
Community Areas Research Applications, Industry
Applications, Grid Operations, Technology
Innovations, and Major Grid Projects
Community Advisory Board represents the different
communities and provides input and feedback to
GGF

36
Reason 3 Industry EGA, Enterprise Grid
Alliance

Industry-driven consortium to implement standards
in industry products and make them interoperable
Founding members EMC, Fujitsu Siemens Computers,
HP, NEC, Network Appliance, Oracle and Sun, plus
20 Associate Members
May 11, 2005 Enterprise Grid Reference Model
v1.0

37
Reason 3 Industry EGA, Enterprise Grid
Alliance

Industry-driven consortium to implement standards
in industry products and make them interoperable
Founding members EMC, Fujitsu Siemens Computers,
HP, NEC, Network Appliance, Oracle and Sun, plus
20 Associate Members
May 11, 2005 Enterprise Grid Reference Model
v1.0

Feb06 GGF EGF signed a letter of intent to
merge. A joint team is planning the transition,
expected to be complete this summer
38
Reason 4 OGSAONE Open Grid Services
Architecture
OGSA
Web Services
Grid Technologies
OGSA Open Grid Service
Architecture Integrates grid technologies with
Web Services (OGSA gt WS-RF)
Defines the key components of the grid
OGSA enables the integration of services and
resources across distributed, heterogeneous,
dynamic, virtual organizations whether within a
single enterprise or extending to external
resource-sharing and service-provider
relationships.
39
Reason 5 Quasi-Standard Tools Example The
Globus Toolkit

Globus Toolkit provides four major functions for
building grids

Courtesy Gridwise Technologies
40
. . . . and

Seamless, secure, intuitive access to distributed
resources data
Available as Open Source
Features intuitive GUI with single sign-on,
X.509 certificates for AA, workflow engine for
multi-site, multi-step workflows, job monitoring,
application support, secure data transfer,
resource management, and more
In production

Courtesy Achim Streit, FZJ
41
Globus 2.4 ? UNICORE
WS-Resource based Resource Management Framework
for dynamic resource information and resource
negotiation
Client
Portal
Command Line
WS-RF
WS-RF
WS-RF
WS-RF
Gateway Service Registry
Gateway
WS-RF
WS-RF
WS-RF
Workflow Engine
FileTransfer
UserManagement(AAA)
Network Job Supervisor
Monitoring
ResourceManagement
ApplicationSupport
WS-RF
WS-RF
WS-RF
Courtesy Achim Streit, FZJ
42
Reason 6 Global Grid Community
43
7 Projects/Initiatives Testbeds Companies

Altair
Avaki
Axceleon
Cassatt
Datasynapse
Egenera
Entropia
eXludus
GridFrastructure
GridIron
GridSystems
Gridwise
GridXpert
HP Utility Data Center
IBM Grid Toolbox
Kontiki
Metalogic
Noemix
Oracle 10g

CO Grid
Compute-against-Cancer
D-Grid
DeskGrid
DOE Science Grid
EEGE
EuroGrid
European DataGrid
FightAIDS_at_home
Folding_at_home
GRIP
NASA IPG
NC BioGrid
NC Startup Grid
NC Statewide Grid
NEESgrid
NextGrid
Nimrod
Ninf

ActiveGrid
BIRN
Condor-G
Deisa
Dame
EGA
EnterTheGrid
GGF
Globus
Globus Alliance
GridBus
GridLab
GridPortal
GRIDtoday
GriPhyN
I-WAY
Knowledge Grid
Legion
MyGrid

44
8 FP6 Grid Technologies Projects
Call 5 start Summer 2006
EU Funding 124 M
supporting the NESSI ETP Grid community
Grid services, business models
trust, security
platforms, user environments
data, knowledge, semantics, mining
Specific support action
Integrated project
Network of excellence
Specific targeted research project
45
Reason 9 Enterprise Grids
SunRay Access
Browser Access via GEP
Workstation Access
Optional Control Network (Gbit-E)
Myrinet
Myrinet
Servers, Blades, VIZ
Myrinet
Linux Racks
Grid Manager
Workstations
Sun Fire Link
Data Network (Gbit-E)
NAS/NFS
Simple NFS
HA NFS
Scalable QFS/NFS
46
Enterprise Grid Reference Architecture
SunRay Access
Browser Access via GEP
Access
Workstation Access
Optional Control Network (Gbit-E)
Myrinet
Myrinet
Servers, Blades, VIZ
Myrinet
Linux Racks
Compute
Grid Manager
Workstations
Sun Fire Link
Data Network (Gbit-E)
Data
NAS/NFS
Simple NFS
HA NFS
Scalable QFS/NFS
47
1000s of Enterprise Grids in Industry

Life Sciences
Startup and cost efficient
Custom research or limited use applications
Multi-day application runs (BLAST)
Exponential Combinations
Limited administrative staff
Complementary techniques

Electronic Design
Time to Market
Fastest platforms, largest Grids
License Management
Well established application suite
Large legacy investment
Platform Ownership issues

Financial Services
Market simulations
Time IS Money
Proprietary applications
Multiple Platforms
Multiple scenario execution
Need instant results analysis tools

High Performance Computing
Parallel Reservoir Simulations
Geophysical Ray Tracing
Custom in-house codes
Large scale, multi-platform execution

48
Reason 10 Grid Service Providers Example BT
Pre-GRID IT asset usage 10-15

Inside data center, within Firewall
Virtual use of own IT assets
The GRID virtualiser engine inside Firewall
Opens up under-used ICT assets
improves TCO, ROI and Apps performance
BUT
Intra-enterprise GRID is self limiting
Pool of virtualised assets is restricted by
firewall
Does not support Inter-Enterprise usage
BT is focussing on managed Grid solution

ENTERPRISE
WANS
LANS
Virtualised assets
GRID Engine
Post-GRID IT asset usage 70-75
Courtesy Piet Bel, BT
49
BTs Virtual Private Grid ( VPG )
ENTERPRISE
WANS
LANS
Virtualised IT assets
GRID Engine
BT NETWORK
GRID ENGINE
Courtesy Piet Bel, BT
50
Reason 11 There will be a Market for Grids
51
General Observations on Grid Performance

Today, there are 100s of important grid projects
around the world
GGF identifies about 15 research projects which
have major impact
Most research grids focus on HPC and
collaboration, most industry grids focus on
utilization and automation
Many grids are driven by user / application
needs, few grid projects are driven by
infrastructure research
Few projects focus on performance / benchmarks
where performance is mostly seen at the job /
computation / application level
Need for metrics and measurements that help us
understand grids
In a grid, application performance has 3 major
areas of concern system capabilities, network,
and software infrastructure
Evaluating performance in a grid is different
from classic benchmarking, because grids are
dynamically changing systems incorporating new
components.

52
The Grid Engine
Thank You !

wgentzsch_at_d-grid.de

Write a Comment

User Comments (0)

About PowerShow.com

HPGC 2006 Workshop on High-Performance Grid Computing - PowerPoint PPT Presentation

HPGC 2006 Workshop on High-Performance Grid Computing

HPGC 2006 Workshop on High-Performance Grid Computing at IPDPC 2006 Rhodes Island, Greece, April 25 29, 2006 Major HPC Grid Projects From Grid Testbeds to ... – PowerPoint PPT presentation