Platform Computing: Strategy for Production Grid Computing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Platform Computing: Strategy for Production Grid Computing

1
Platform Computing Strategy for Production
Grid Computing
Dr. Fubo Zhang zfb_at_platform.com
2
Agenda

Grid evolution and technologies
Platforms Grid Strategy Solutions
Production user requirements experience
Summary

3
The Fifth Wave
Late 90s
90s
80s
70s
Mainframe
PC
Client/Server
Internet
Operational Productivity
Personal Productivity
Departmental Productivity
Channel Productivity
Distributed computing aggregates resources to
provide always-on, unlimited compute
power Virtual, adaptable, open, on-demand
4
Distributed Computing
Computing Ubiquity
Internet Grid Computing
Grid Research
JobScheduler Parallel Analyzer MultiCluster
Distributed Batch Queuing NQS, DQS, Condor, LSF
Batch
DC Research
System Arch Trend
UNIX workstations supercomputers
SMPs UNIX workstations
Linux Windows farms with commd. chips
2 Vaxen Ethernet
1992
1996
2000
2005
1985
5
Grid Its Three Stages
Grid Transparent, secure and coordinated
computing resource sharing across sites
cluster of clusters
Scope of Sharing
Inter-Grid supported by xSPs
extra-Grid across multiple organizations
DoD HPC, NASA IPG, Data Grid,
intra-Grid inter-departmental sharing within
organizations
TI, Toshiba, GM, Monsanto,
2004
1998
2001
2007
6
LSF MultiCluster intra- extra-Grid

Cluster-to-cluster sharing management
Integratable with external Grid services, e.g.,
Kerberos authentication
Reliable file transfer staging
User account mapping, SSL, Firewalls

Workstation
...
Compute Server
File Server
I.S.C.
I.S.C.
...
I.S.C.
...
I.S.C. Inter-cluster Sharing Conditions (e.g.,
time windows, types of jobs, job volume)
7
Extra- and Inter-Grid
Clients
GridPortal
Admin
GridManager
Resource Directory
. . . . . . . .
Cluster
Cluster
Internet Enterprise Data Centers
Cluster
SubCluster
Desktops
8
Internet Computing Grid
Internet Data Centers operated by xSPs
Interconnected IDCs
9
3 Levels of Grid Perspectives

User Perspective
Totally transparent single system environment
Service capacity on demand
Application Perspective
Transparent to existing apps no change needed
some new apps can be built using Grid APIs
System Perspective
Dynamic grid of autonomous clusters with sharing
agreements and common protocols
Combination of cluster-to-cluster and global
management
Open with levels of coherency and cooperation

10
Grid Software Environment
Applications
Application Services
Core Services
DRM
Servers, Networks Node OSes

Core Services Grid Environmental services
Distributed Resource Management (DRM) Management
of work and computing resources
Application Services Support those applications
programmed for Grid

11
Grid Software Functions
12
Evolution of Grid

From Technology to Solution
Integrate pieces into packages
From Special-purpose to General-purpose
Projects to gain experience over time,
general-purpose tech always replace
special-purpose
From Toolkit to End-user Products
Install config, but not program each grid
From HPC/Research to Industrial Applications
Transparent collaboration and resource sharing
Individual vs. organization trust
From ResearchProducts to Industry Standards
Positive experience broad adoption

13
Platforms Grid Strategy

Focus on complete DRM solution be the aggregator
solution provider for Grid
Focus on production users and evolve from
extra-Grid to Inter-Grid
Support all app types interactive, batch,
parallel, sessions, transactions, multi-site
distributed,
Transparently expand Cluster functions to Grid
Hybrid Global Cluster-to-Cluster architecture
Provide strong and dynamic system for thin
clients
Partner to go to market for total Grid systems
offerings
Stay open Interoperate with other Grid software
like PBS and Globus-based systems
Drive Grid standards through NPI to open up
market

14
Management of Distributed Computing
Visibility
Performance Management
Workload Management
Resource Management
User Demand
Resource Supply
15
Resource Management

Config, admin monitoring
Ensure supply of critical apps and services
Event automation self healing
Automate routine operations
Security management

16
Performance Management
17
Grid Resource Directory
Resource Directory
Who have Linux boxes and Synopsys Licenses ?
I join this grid. I export 128 Linux boxes
and 48 Synopsys licenses.
Site B.
Site B
Site A
Support Grid Managers resource selection and
matching based on resource requirements
18
Ensuring Site Autonomy

Resources exported by the local cluster
management
Grid Manager enforces cross-cluster sharing
policies and flow restrictions upon sites
(voluntary) participation
Submission cluster forwards job to execution
cluster, directed by Grid Manager
Execution cluster accepts remote jobs just like
local jobs, and it has full control of remote jobs

19
Advance Reservation

Advance Reservations allow resources such as job
slots and special devices to be booked in
advance, guaranteeing access to those resources
at the specified time
Other jobs are backfilled around the reservation
A reservation may be on behalf of a user, a user
group or a project
Advance Reservations may be possible on all
reusable resources such as software licenses,
allowing guaranteed access to booked software
licenses

20
Grid Resource Leasing
Licenses
Licenses
Site A
Site B
21
Grid Fairshare Scheduling

Unique and Grid-wide resources can be dynamically
fairshared across sites (e.g., software licenses,
devices)
Resources scattered across sites can be
aggregated and fairshared
Owner-guest bias can be supported
Flexible fairshare policies can be specified and
enforced across Grid

22
Grid Remote batch
Licenses
Receive Queue
Send Queue
Licenses
Division A
Division B
Clients
Clients
23
Production User Requirements

Full products, not middleware or toolkit
Configure, but not program Grid no developers
Grid among organizations, not individuals
Resource management sharing policies key
Not user accounts everywhere
Existing apps more important than new ones
Share resources to get more done, not just
capability
Transparent across Grid single system
environment
At most specify resource requirements
Thin client support
All services by the collective Grid resources
Separation of apps from Grid infrastructure
Open, standards, choices

24
Grid at DoD HPCMO

Initiative to share resources on HPCMPs
resources easily transparently SMDC, TACOM, NRL,
NAVO and WSMR,
Build a meta-queuing system to integrate the
centers
Primary Benefit the capability to submit a job
to a single, common queue, which will be sent to
the first available computer in the queuing pool.

25
DoD Requirements for the Grid
Requirements
Solutions

Transparent sharing of jobs
Resource reservation protocol
Transparent job control
Accounting
All client-server, server-server interactions
Kerberized
Ticket forwarding/renewal
Multi-realm support
Account mapping
Platform FTA
Kerberized
Fault Tolerant

Fire and forget
Full Kerberos 5 Support
Reliable, Secure File Transfer

Fully Operational Grid Computing
26
DoD HPCMP
Building a Production Grid
27
Grid at General Motors
NAENG Warren, MI
TPC Pontiac, MI
MLCGF Flint,MI
NA HPC Warren, MI
3 Independent Product Divisions 8
clusters Submit work to central Data Center
using MultiCluster Provide transparent access to
HPC center for jobs that cannot / should not run
locally (e.g., structure, rash, some CFD) Share
resources between local cluster and HPC
Center Control sharing of HPC Center by 8 clusters
28
Grid at Pharmacia
Situation Requirement to share resources,
across newly acquired centers Solution Two
Clusters totaling 600 servers and
workstations Workload balancing by moving work to
an under utilized cluster Extra capacity is
transparently available.
29
Compaq and Platform

Partners since 1993 co-engineering, joint
marketing, OEM, reselling, NPI
LSF BatchParallel OEMed with SC systems
Planning partnering on Linux systems products -
opportunity for end-to-end solutions from PCs to
supercomputers
Joint customers start to share across sites
partnership opportunity to build on existing
successes to deliver Grid solutions
Both companies committed to whole solution
easier for adoption and strong market position
Partnership between the best experts for best
solution

30
Grid Value Chain

Technology/toolkit developers
How to do it develop key pieces
Product vendors
Integrate pieces into packages support
Solution developers
Install config, maybe program apps but not Grid
Service providers
Support Grid operations
End users
Just want the solution

Partnership is key to the success of Grid
Computing
31
About Platform

Founded 1992 by Berkeley PHD Engineers
400 Employees and more than 100 developers
1500 Customers Globally
Key Partnerships - IBM, Compaq, Sun, SGI, etc
World wide company, local support and consultant
30 developers and support engineers in Beijing

32
Summary

DC is mainstream Grid is emerging
RD so far helps define the standards and
architecture, shows feasibility
Need to address both research and enterprise
requirements
Need both open source and commercial products
Partnerships and standards are keys
Platform takes a strong interest in Grid

Platform Computing: Strategy for Production Grid Computing PowerPoint PPT Presentation