Title: Platform Computing: Strategy for Production Grid Computing
1Platform Computing Strategy for Production
Grid Computing
Dr. Fubo Zhang zfb_at_platform.com
2Agenda
- Grid evolution and technologies
- Platforms Grid Strategy Solutions
- Production user requirements experience
- Summary
3The Fifth Wave
Late 90s
90s
80s
70s
Mainframe
PC
Client/Server
Internet
Operational Productivity
Personal Productivity
Departmental Productivity
Channel Productivity
Distributed computing aggregates resources to
provide always-on, unlimited compute
power Virtual, adaptable, open, on-demand
4Distributed Computing
Computing Ubiquity
Internet Grid Computing
Grid Research
JobScheduler Parallel Analyzer MultiCluster
Distributed Batch Queuing NQS, DQS, Condor, LSF
Batch
DC Research
System Arch Trend
UNIX workstations supercomputers
SMPs UNIX workstations
Linux Windows farms with commd. chips
2 Vaxen Ethernet
1992
1996
2000
2005
1985
5Grid Its Three Stages
Grid Transparent, secure and coordinated
computing resource sharing across sites
cluster of clusters
Scope of Sharing
Inter-Grid supported by xSPs
extra-Grid across multiple organizations
DoD HPC, NASA IPG, Data Grid,
intra-Grid inter-departmental sharing within
organizations
TI, Toshiba, GM, Monsanto,
2004
1998
2001
2007
6LSF MultiCluster intra- extra-Grid
- Cluster-to-cluster sharing management
- Integratable with external Grid services, e.g.,
Kerberos authentication - Reliable file transfer staging
- User account mapping, SSL, Firewalls
Workstation
...
Compute Server
File Server
I.S.C.
I.S.C.
...
I.S.C.
...
I.S.C. Inter-cluster Sharing Conditions (e.g.,
time windows, types of jobs, job volume)
7Extra- and Inter-Grid
Clients
GridPortal
Admin
GridManager
Resource Directory
. . . . . . . .
Cluster
Cluster
Internet Enterprise Data Centers
Cluster
SubCluster
Desktops
8Internet Computing Grid
Internet Data Centers operated by xSPs
Interconnected IDCs
93 Levels of Grid Perspectives
- User Perspective
- Totally transparent single system environment
- Service capacity on demand
- Application Perspective
- Transparent to existing apps no change needed
some new apps can be built using Grid APIs - System Perspective
- Dynamic grid of autonomous clusters with sharing
agreements and common protocols - Combination of cluster-to-cluster and global
management - Open with levels of coherency and cooperation
10Grid Software Environment
Applications
Application Services
Core Services
DRM
Servers, Networks Node OSes
- Core Services Grid Environmental services
- Distributed Resource Management (DRM) Management
of work and computing resources - Application Services Support those applications
programmed for Grid
11Grid Software Functions
12Evolution of Grid
- From Technology to Solution
- Integrate pieces into packages
- From Special-purpose to General-purpose
- Projects to gain experience over time,
general-purpose tech always replace
special-purpose - From Toolkit to End-user Products
- Install config, but not program each grid
- From HPC/Research to Industrial Applications
- Transparent collaboration and resource sharing
- Individual vs. organization trust
- From ResearchProducts to Industry Standards
- Positive experience broad adoption
13Platforms Grid Strategy
- Focus on complete DRM solution be the aggregator
solution provider for Grid - Focus on production users and evolve from
extra-Grid to Inter-Grid - Support all app types interactive, batch,
parallel, sessions, transactions, multi-site
distributed, - Transparently expand Cluster functions to Grid
- Hybrid Global Cluster-to-Cluster architecture
- Provide strong and dynamic system for thin
clients - Partner to go to market for total Grid systems
offerings - Stay open Interoperate with other Grid software
like PBS and Globus-based systems - Drive Grid standards through NPI to open up
market
14Management of Distributed Computing
Visibility
Performance Management
Workload Management
Resource Management
User Demand
Resource Supply
15Resource Management
- Config, admin monitoring
- Ensure supply of critical apps and services
- Event automation self healing
- Automate routine operations
- Security management
16Performance Management
17Grid Resource Directory
Resource Directory
Who have Linux boxes and Synopsys Licenses ?
I join this grid. I export 128 Linux boxes
and 48 Synopsys licenses.
Site B.
Site B
Site A
Support Grid Managers resource selection and
matching based on resource requirements
18Ensuring Site Autonomy
- Resources exported by the local cluster
management - Grid Manager enforces cross-cluster sharing
policies and flow restrictions upon sites
(voluntary) participation - Submission cluster forwards job to execution
cluster, directed by Grid Manager - Execution cluster accepts remote jobs just like
local jobs, and it has full control of remote jobs
19Advance Reservation
- Advance Reservations allow resources such as job
slots and special devices to be booked in
advance, guaranteeing access to those resources
at the specified time - Other jobs are backfilled around the reservation
- A reservation may be on behalf of a user, a user
group or a project - Advance Reservations may be possible on all
reusable resources such as software licenses,
allowing guaranteed access to booked software
licenses
20Grid Resource Leasing
Licenses
Licenses
Site A
Site B
21Grid Fairshare Scheduling
- Unique and Grid-wide resources can be dynamically
fairshared across sites (e.g., software licenses,
devices) - Resources scattered across sites can be
aggregated and fairshared - Owner-guest bias can be supported
- Flexible fairshare policies can be specified and
enforced across Grid
22Grid Remote batch
Licenses
Receive Queue
Send Queue
Licenses
Division A
Division B
Clients
Clients
23Production User Requirements
- Full products, not middleware or toolkit
- Configure, but not program Grid no developers
- Grid among organizations, not individuals
- Resource management sharing policies key
- Not user accounts everywhere
- Existing apps more important than new ones
- Share resources to get more done, not just
capability - Transparent across Grid single system
environment - At most specify resource requirements
- Thin client support
- All services by the collective Grid resources
- Separation of apps from Grid infrastructure
- Open, standards, choices
24Grid at DoD HPCMO
- Initiative to share resources on HPCMPs
resources easily transparently SMDC, TACOM, NRL,
NAVO and WSMR, - Build a meta-queuing system to integrate the
centers - Primary Benefit the capability to submit a job
to a single, common queue, which will be sent to
the first available computer in the queuing pool.
25DoD Requirements for the Grid
Requirements
Solutions
- Transparent sharing of jobs
- Resource reservation protocol
- Transparent job control
- Accounting
- All client-server, server-server interactions
Kerberized - Ticket forwarding/renewal
- Multi-realm support
- Account mapping
- Platform FTA
- Kerberized
- Fault Tolerant
- Fire and forget
- Full Kerberos 5 Support
- Reliable, Secure File Transfer
Fully Operational Grid Computing
26DoD HPCMP
Building a Production Grid
27Grid at General Motors
NAENG Warren, MI
TPC Pontiac, MI
MLCGF Flint,MI
NA HPC Warren, MI
3 Independent Product Divisions 8
clusters Submit work to central Data Center
using MultiCluster Provide transparent access to
HPC center for jobs that cannot / should not run
locally (e.g., structure, rash, some CFD) Share
resources between local cluster and HPC
Center Control sharing of HPC Center by 8 clusters
28Grid at Pharmacia
Situation Requirement to share resources,
across newly acquired centers Solution Two
Clusters totaling 600 servers and
workstations Workload balancing by moving work to
an under utilized cluster Extra capacity is
transparently available.
29Compaq and Platform
- Partners since 1993 co-engineering, joint
marketing, OEM, reselling, NPI - LSF BatchParallel OEMed with SC systems
- Planning partnering on Linux systems products -
opportunity for end-to-end solutions from PCs to
supercomputers - Joint customers start to share across sites
partnership opportunity to build on existing
successes to deliver Grid solutions - Both companies committed to whole solution
easier for adoption and strong market position - Partnership between the best experts for best
solution
30Grid Value Chain
- Technology/toolkit developers
- How to do it develop key pieces
- Product vendors
- Integrate pieces into packages support
- Solution developers
- Install config, maybe program apps but not Grid
- Service providers
- Support Grid operations
- End users
- Just want the solution
Partnership is key to the success of Grid
Computing
31About Platform
- Founded 1992 by Berkeley PHD Engineers
- 400 Employees and more than 100 developers
- 1500 Customers Globally
- Key Partnerships - IBM, Compaq, Sun, SGI, etc
- World wide company, local support and consultant
- 30 developers and support engineers in Beijing
32Summary
- DC is mainstream Grid is emerging
- RD so far helps define the standards and
architecture, shows feasibility - Need to address both research and enterprise
requirements - Need both open source and commercial products
- Partnerships and standards are keys
- Platform takes a strong interest in Grid
33Platform Computingwww.Platform.Comzfb_at_platform
.com