Grid%20Middleware - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Grid%20Middleware

Description:

1) Condor Workstation Pool mgr. ... and SGI workstations - a subset of the NAS Condor pool. The Condor system is an IPG middleware service. 2) Parameter Study ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 67
Provided by: chanhy
Learn more at: http://chep.knu.ac.kr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Grid%20Middleware


1
Grid Middleware Service
Nov. 9, 2002 Chan-Hyun Youn Information and
Communications University

2
Contents
  • Grid and Middleware Services
  • Architectural Model for Resource Management
  • Hierarchical Resource Management
  • Abstract Owner
  • Market Model
  • Scheduling Algorithms in Economy Grid
  • Example of Application level Scheduler
  • Concluding Remarks

3
Architecture of a Grid
Discipline Specific Portals andScientific
Workflow Management Systems
Toolkits Visualization, data publish/subscribe,
etc.Applications Simulations, Data Analysis,
etc.
Grid Common Services Standardized Services and
Resources Interfaces
Collaboration and Remote Instrument Services
Grid Information Service
UniformResourceAccess
Co-Scheduling
Network Cache
Authentication Authorization
Security Services
Communication Services
Global Queuing
Global EventServices
Data Cataloguing
Uniform Data Access
Fault Management
Monitoring
Brokering
Auditing
Globus services
clusters
Resources
national supercomputer facility
tertiary storage
national user facilities
Condor pools
networkcaches
Source IPG (Johnston)
high-speed networks and communications services
4
Heterogeneous ComputingIPG Milestone Completed
10/2000
- Two problem solving environments use IPG
services for uniform access to heterogeneous
resources.
  • 1) Condor Workstation Pool mgr.
  • Molecular design application for nanotechnology
    devices and materials
  • Uses 0.5 million otherwise idle CPU hours/year
    scavenged from a 60-100 Sun and SGI workstations
    - a subset of the NAS Condor pool
  • The Condor system is an IPG middleware service

IPG Grid Common Services Standardized services
and uniform resource access
2) Parameter Study Manager
  • ILab aerospace design parameter study manager
    uses IPG to access distributed computing and data
    resources

study object
study concept
IPG managed compute and data management resources
5
Online InstrumentationReal-time Experiment
Interaction
Unitary Plan Wind Tunnel
multi-source data analysis, desktop VR clients
with shared controls
real-time collection
real-time experiment control
computer simulations
archival storage
6
Grid from Services View
Applications

E.g.,

7
Middleware
  • Layered collection of middleware services that
    provide to applications uniform views of
    distributed resource components and the
    mechanisms for assembling them into systems
  • Grid Workload Management, Data Management,
    Monitoring services
  • Management of the Local Computing Fabric
  • Mass Storage
  • Services extend both up and down through the
    various layers of the computing and
    communications infrastructure

8
Functions in Middleware
  • Workload management
  • The workload is chaotic unpredictable job
    arrival rates, data access patterns
  • The goal is maximising the global system
    throughput (events
  • processed per second)
  • Data management
  • Management of petabyte-scale data volumes, in an
    environment with
  • limited network bandwidth and heavy use of
    mass storage (tape)
  • Caching, replication, synchronisation, object
    database model
  • Application monitoring
  • Tens of thousands of components, thousands of
    jobs and individual
  • users
  • End-user - tracking of the progress of jobs and
    aggregates of jobs
  • Understanding application and grid level
    performance
  • Administrator understanding which global-level
    applications were
  • affected by failures, and whether and how to
    recover

9
Middleware (in Local Fabric)
  • Effective local site management of giant
    computing fabrics
  • Automated installation, configuration management,
    system
  • maintenance
  • Automated monitoring and error recovery -
    resilience,
  • self-healing
  • Performance monitoring
  • Characterisation, mapping, management of local
    Grid resources
  • Mass storage management
  • multi-PetaByte data storage
  • real-time data recording requirement
  • active tape layer 1,000s of users
  • uniform mass storage interface
  • exchange of data and meta-data between mass
    storage systems

10
Technical Approach in Layered Network
Applications
Applications need uniform views of resources, and
middleware must deal with the fact that most
real resources are locally owned
Applications
QoS Broker
Network Cache
Access Control
Applications
Resource Scheduling
Monitoring Management
Wind Tunnel
Global Middleware Services
Super- Computer
Cluster
Local Services
Ames
LBNL
ANL
NCAR
Tertiary storage
Cache
Tertiary (mass) storage
Source Grid98 Workshop (Johnston)
11
Operation Model (1)
Applications
Middleware must actually reach well !
Some services are provided in the middleware
QoS Broker
Network Cache
Access Control
Most services drill down to institutional
resources
Resource Scheduling
Monitoring Management
Resource Characteristics
Data Catalogues
Wind Tunnel
Global Middleware Services
Super- Computer
Cluster
Local Services
Ames
LBNL
ANL
NCAR
Some services drill down to the various network
layers
Tertiary storage
Cache
Tertiary (mass) storage
Source Grid98 Workshop (Johnston)
12
Operation Model (2)
Applications
Middleware layer and infrastructure to provide
the transparent access for applications !
Some services are provided in the middleware
Cache
QoS Broker
Network Cache
Access Control
Re-configure
Resource Scheduling
Monitoring Management
Resource Characteristics
Data Catalogues
Wind Tunnel
Global Middleware Services
Super- Computer
Cluster
Configure
Proxy management for multi-site resources
Ames
Analyzer
ANL
LBNL
NCAR
Re-configure
Tertiary storage
Local Services
Cache
Re-configure
Tertiary (mass) storage
Monitor
Monitor
Source Grid98 Workshop (Johnston)
13
Middleware Approach
  • Toolkit and services addressing key technical
    problems
  • Modular bag of services model
  • Not a vertically integrated solution
  • can be applied to many application domains
  • Inter-domain issues, rather than clustering
  • Integration of intra-domain solutions

14
GRID Workload Management
  • Architecture and services for scheduling and
  • resource management
  • Challenging issues
  • Optimal co-allocation of data, CPU and network
  • for specific jobs
  • Distributed scheduling (data and/or code
    migration) of
  • unscheduled/scheduled jobs
  • Uniform interface to various local resource
    managers
  • Usage policies on resource (CPU, data, network)

15
GRID Data Management
  • Services and tools for data management
  • Challenging issues
  • Petabyte-scale information volumes
  • High speed data moving and replica
  • Replica synchronization
  • Data caching
  • Uniform interface to mass storage management
    systems

16
GRID Monitoring Services
  • Tools and infrastructure for status and error
    monitoring
  • Tasks and challenges
  • Develop instrumentation APIs
  • Integration with information services
  • Real time and long term monitoring
  • Analysis of multivariable data
  • job performance optimisation
  • problem tracing

17
Fabric Management
  • Tools for new automated system management
    techniques of
  • large computing fabrics
  • Tasks and challenges
  • Management of very large computing fabrics
  • Reduced costs of administration and operations
  • Dynamic management of new resources
  • Scalability to thousands processors
  • An innovative approach self-healing
  • algorithms for fault detection and localization
  • automatic reconfiguration of the fabric
  • automatic task re-running

18
Mass Storage Management
  • Integration of local mass storage management
    systems
  • within the DataGrid
  • Tasks and challenges
  • Develop interface APIs
  • Develop data import/export interfaces
  • Publication of Information and metadata

19
Globus
20
Globus Approach
  • A software toolkit addressing key technical
    problems
  • Offer a modular bag of technologies
  • Enable incremental development of grid-enabled
    tools and applications
  • Define and standardize grid protocols and APIs
  • Focus is on inter-domain issues, not clustering
  • Supports collaborative resource use spanning
    multiple organizations
  • Integrates cleanly with intra-domain services
  • Creates a collective service layer

21
Globus Approach
  • Focus on architecture issues
  • Provide implementations of grid protocols and
    APIs as basic infrastructure
  • Use to construct high-level, domain-specific
    solutions
  • Design principles
  • Keep participation cost low
  • Enable local control
  • Support for adaptation

22
Four Key Protocols
  • The Globus Toolkit centers around four key
    protocols
  • Connectivity layer
  • Security Grid Security Infrastructure (GSI)
  • Resource layer
  • Resource Management Grid Resource Allocation
    Management (GRAM)
  • Information Services Grid Resource Information
    Protocol (GRIP)
  • Data Transfer Grid File Transfer Protocol
    (GridFTP)

23
Grid Security Infrastructure in Action
Single sign-on via grid-id generation of
proxy cred.
User Proxy
User
Proxy credential
Or retrieval of proxy cred. from online
repository
Remote process creation requests
Site A (Kerberos)
GSI-enabled GRAM server
GSI-enabled GRAM server
Authorize Map to local id Create process Generate
credentials
Site B (Unix)
Computer
Computer
Process
Process
Local id
Local id
Kerberos ticket
Restricted proxy
Restricted proxy
Site C (Kerberos)
With mutual authentication
Storage system
24
Resource Management
  • The Grid Resource Allocation Management (GRAM)
    protocol and client API allows programs to be
    started on remote resources, despite local
    heterogeneity
  • Resource Specification Language (RSL) is used to
    communicate requirements
  • A layered architecture allows application-specific
    resource brokers and co-allocators to be defined
    in terms of GRAM services
  • Integrated with Condor, MPICH-G2,

25
Resource Management Issues for Grid Computing
  • Site autonomy
  • Resources owned by different organizations, in
    different administrative domains
  • Local policies for use, scheduling, security
  • Heterogeneous substrate
  • Different local resource management systems
  • Policy extensibility
  • Local sites need ability to customize their
    resource management policies
  • Co-allocation
  • May need resources at several sites
  • Mechanism for allocating multiple resources,
    initiating computation, monitoring and managing
  • On-line control
  • Adapt application requirements to resource
    availability

26
Resource Management Architecture
RSL specialization
RSL
Application
Information Service
Queries
Info
Ground RSL
Simple ground RSL
Local resource managers
GRAM
GRAM
GRAM
LSF
EASY-LL
NQE
27
Local Resource Managers
  • Implemented with Globus Resource Allocation
    Manager (GRAM)
  • Processing RSL specifications representing
    resource requests
  • Deny request
  • Create one or more processes (jobs) that satisfy
    request
  • Enable remote monitoring and management of jobs
  • Periodically update MDS information service with
    current availability and capabilities of
    resources
  • GRAM is responsible for
  • Parsing and processing RSL
  • Job monitoring
  • MDS update

28
Grid Information Services
  • System information is critical to operation of
    the grid and construction of applications
  • What resources are available?
  • Resource discovery
  • What is the state of the grid?
  • Resource selection
  • How to optimize resource use
  • Application configuration and adaptation?
  • We need a general information infrastructure to
    answer these questions

29
GIS Architecture
Customized Aggregate Directories
Users
A
A
Enquiry Protocol
Registration Protocol
R
R
R
R
Standard Resource Description Services
30
A Model Architecture for Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Selected Replica
Replica Selection
Performance Information Predictions
NWS
GridFTP Control Channel
Disk Cache
GridFTPDataChannel
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
31
GridFTP Basic Approach
  • FTP protocol is defined by several IETF RFCs
  • Start with most commonly used subset
  • Standard FTP get/put etc., 3rd-party transfer
  • Implement standard but often unused features
  • GSS binding, extended directory listing, simple
    restart
  • Extend in various ways, while preserving
    interoperability with existing servers
  • Striped/parallel data channels, partial file,
    automatic manual TCP buffer setting, progress
    monitoring, extended restart

32
Striped GridFTP Server
GridFTPclient
To Client or Another Striped GridFTP Server
GridFTP Control Channel
GridFTP Data Channels
mpirun
GridFTP Server Parallel Backend
GridFTPserver master
MPI (Comm_World)

Control socket
MPI (Sub-Comm)
MPI-IO
Parallel File System (e.g. PVFS, PFS, etc.)

33
Condor
34
What is Condor?
  • Condor converts collections of distributively
    owned workstations and dedicated clusters into a
    distributed high-throughput computing facility.
  • Resource finder
  • Batch queue manager
  • Scheduler
  • Checkpoint/Restart
  • Process migration
  • Remote system calls

35
Layered Design
Resource
Access Control
Resource Owner
Match-Making
System Administrator
Condor
Request Agent
Customer/User
Application RM
Application
36
Unique Mechanisms
  • Checkpointing
  • Enables Preemptive Resume Resource Allocation
    (essential in an opportunistic environment)
  • Remote I/O
  • Enables computation across administrative domains
    (essential for HTC)
  • ClassAds
  • Enables flexible resource matchmaking (essential
    in a distributively owned environment)

37
Condor System Structure
Central Manager
Collector
Negotiator
C
N
Submit Machine
Execution Machine
CA
...A
RA
...C
...B
Customer Agent
Resource Agent
38
(No Transcript)
39
TENT
40
TENT
  • A distributed workflow management and integration
    system for engineering applications developed by
  • German Aerospace Center (DLR), Simulation and
    Software Technology (SISTEC) http//www.sistec.dlr
    .de
  • German National Research Center for Information
    Technology (GMD), Institute for Algorithms and
    Scientific Computing (SCAI) http//www.gmd.de/scai

41
TENT - The Integration Framework
visualization
42
TENT Packages
43
TENT - Software architecture
44
Architectural Models for Resource Management in
the Grid
45
Typical Grid Computing Environment
Grid Information Service
Grid Resource Broker
Application
R2
R3
R4
R5
RN
Grid Resource Broker
R6
R1
Resource Broker
Grid Information Service
46
Sources of Complexity in Grid Resource Management
  • No single administrative control.
  • No single ownership policy
  • Each resource owner has their own policies or
    scheduling
  • mechanisms
  • Users must honour them (particularly external
    Grid users)
  • Heterogeneity
  • resources PCs, Workstations, clusters,
    supercomputers, instruments, databases, software
  • fabric management systems and
  • management policies
  • application requirements
  • Dynamic availability may appear and disappear

47
Sources of Complexity in Grid Resource Management
  • Unreliable resource disappear from view
  • No uniform cost model - varies from one users
    resource to another and from time of day.
  • No single access mechanism Web, custom
  • interfaces, command line

48
Grid Resource Management Issues
  • Authentication (once).
  • Specify (code, resources, etc.).
  • Discover resources.
  • Negotiate authorization, acceptable use, Cost,
    etc.
  • Acquire resources.
  • Schedule Jobs.
  • Initiate computation.
  • Steer computation.
  • Access remote data-sets.
  • Collaborate with results.
  • Account for usage.
  • Discover resources.
  • Negotiate authorization,
  • acceptable use, Cost, etc.
  • Acquire resources.
  • Schedule jobs.
  • Initiate computation.
  • Steer computation.

Domain 1
Domain 2
Rajkumar Buyya (Monash Univ.)
49
Data Access for Resource Management
50
Architectural Models for RM
MODEL REMARKS Systems
Hierarchical It captures model followed in most contemporary systems. Globus, Legion, CCS, Apples, NetSolve, Ninf.
Abstract Owner (AO) Order and delivery model and focuses on long term goals. Expected to emerge and most peer-2-peer computing systems likely to be based on this.
Market Model It follows economic model for resource discover, sharing, scheduling. GRACE, Nimrod/G, JavaMarket, Mariposa.
51
Hierarchical RM
52
Resource Management in Globus
  • The Grid Resource Allocation Management (GRAM)
    protocol and client API allows programs to be
    started on remote resources, despite local
    heterogeneity
  • Resource Specification Language (RSL) is used to
    communicate requirements
  • A layered architecture allows application-specific
    resource brokers and co-allocators to be defined
    in terms of GRAM services
  • Integrated with Condor, MPICH-G2,

53
Resource Management Architecture in Globus
RSL specialization
RSL
Application
Information Service
Queries
Info
Ground RSL
Simple ground RSL
Local resource managers
GRAM
GRAM
GRAM
LSF
EASY-LL
NQE
54
Local Resource Managers
  • Implemented with Globus Resource Allocation
    Manager (GRAM)
  • Processing RSL specifications representing
    resource requests
  • Deny request
  • Create one or more processes (jobs) that satisfy
    request
  • Enable remote monitoring and management of jobs
  • Periodically update MDS information service with
    current availability and capabilities of
    resources
  • GRAM is responsible for
  • Parsing and processing RSL
  • Job monitoring
  • MDS update

55
Globus/MPICH-G2 components
MDS client API calls to locate resources
MPI Apps
MDS Grid Index Info Server
Process MPI messages
Local site boundary
MDS client API calls to get resource info
MPICH-G2
MDS Grid Resource Info Server
Client API calls to request resource
allocation and process creation.
Query current status of resource
Provide state change callbacks to client
Globus Resource Manager
Globus Security Infrastructure
Allocate create processes
Request
Globus-job-manager
Launch
Process
Parse
Monitor control
Globus Gatekeeper
Process
RSL Library
56
High throughput workload management system
architecture (simplified design)
Resource Discovery
Master
GIS
Submit jobs (using Class-Ads)
condor_submit (Globus Universe)
Information on characteristics and status of
local resources
Condor-G
globusrun
GRAM
GRAM
GRAM
CONDOR
LSF
PBS
Site1
Site2
Site3
57
Condor Globus Universe
58
AO General Model
59
AO is owner or broker
User
  • User negotiates with AO
  • through order window
  • That AO may own some
  • resources, and/or it may
  • broker with other AOs
  • for those resources
  • After negotiation,
  • resources are delivered
  • through pickup window

Requests
Resources
Order Window
Pickup Window
AO
Order
Pickup
Order
Pickup
Manager
ResourceManager
Delivery
Sales
AO3
Physical Resource
AO2
AO1
60
AO Resources
  • Resources are objects
  • Classes are
  • Instrument
  • Data source, sink, transform
  • e.g. programs, people, files,
  • data collection devices
  • Channel
  • Moves data among instruments
  • Complexes of above
  • Attributes define sizes, times,
  • connections, etc.

Instrument (File)
Instrument (Program)
Channels
Instrument (File)
Instrument (Program)
Instrument (Telescope)
Instrument (Person)
61
Negotiating with an AO
Make dummy resource (with attributes set to
constants, variables, or dont care) bid
delivery plan variable constraints
Pick one, Try again, Or give up
Assign tasksto resource,use, relinquish
USER
Perhapslater...
Delivery Window
Order Window
Resource candidates (values for
variables/attributes asking price for each)
AO
Resource
62
Economic Models for Trading
  • Commodity Market Model
  • Posted Prices Models
  • Bargaining Model
  • Tendering (Contract Net) Model
  • Auction Model
  • Proportional Resource Sharing Model
  • Shareholder Model
  • Partnership Model

63
Economy Grid Globus GRACE
Applications
Grid Apps.

Science
Engineering
Commerce
Portals
ActiveSheet
High-level Services and Tools
GlobusView
Grid Status
Grid Tools
DUROC
globusrun
MPI-G
Nimrod/G
MPI-IO
CC
Core Services
Heartbeat Monitor
Nexus
GRACE-TS
GRAM
Grid Middleware
Globus Security Interface
GASS
DUROC
MDS
GARA
GBank
GMD
Grid Fabric
Local Services
GRD
QBank
JVM
Condor
TCP
UDP
eCash
LSF
PBS
Solaris
Irix
Linux
Source Rajkumar Buyya (Monash Univ.)
64
Grid Architecture for Computational Economy
Information Server(s)
Grid Market Services
Sign-on
Health Monitor
Info ?
Grid Node N

Grid Explorer

Application
Secure
Job Control Agent
Grid Node1
Schedule Advisor
QoS
Pricing Algorithms
Trade Server
Trading
Trade Manager
Accounting
Resource Reservation
Misc. services

Deployment Agent
JobExec
Resource Allocation
Storage
Grid User
Grid Resource Broker

R1
R2
Rm
Grid Middleware Services
Grid Service Providers
Source Rajkumar Buyya (Monash Univ.)
65
GRACE components
  • A resource broker (e.g., Nimrod/G)
  • Various resource trading protocols for different
    economic models
  • A mediator for negotiating between users and grid
    service providers (Grid Market Directory)
  • A deal template for specifying resource
    requirements and services offers
  • Grid Trading Server
  • Pricing policy specification
  • Accounting (e.g., QBank) and payment management
    (GridBank, not yet implemented)

66
Flow Diagram for Pricing, Accounting, Allocations
and Job Scheduling
Pricing Policy
GRID Bank (digital transactions)
0
0
2
DB_at_Each Site
QBank
Trade Server
1
3
5
8
0. Make Deposits, Transfers, Refunds,
Queries/Reports 1. Clients negotiates for
access cost. 2. Negotiation is performed
per owner defined policies. 3. If client is
happy, TS informs QB about access deal. 4.
Job is Submitted 5. Check with QB for go
ahead 6. Job Starts 7. Job Completes 8.
Inform QB about resource resource
utilization.
Resource Manager
4
IBM-LL/PBS/.
6
7
Compute Resources clusters/SGI/SP/...
Rajkumar Buyya (Monash Univ.)
67
Nimrod/G A Grid Resource Broker
  • A resource broker for managing, steering, and
    executing task farming (parametric sweep/SPMD
    model) applications on Grid based on deadline and
    computational economy.
  • Based on users QoS requirements, our Broker
    dynamically leases services at runtime depending
    on their quality, cost, and availability.
  • Key Features
  • A single window to manage control experiment
  • Persistent and Programmable Task Farming Engine
  • Resource Discovery
  • Resource Trading
  • Scheduling Predications
  • Generic Dispatcher Grid Agents
  • Transportation of data results
  • Steering data management
  • Accounting

Source Rajkumar Buyya (Monash Univ.)
68
A Glance at Nimrod-G Broker
Nimrod/G Client
Nimrod/G Client
Nimrod/G Client
Nimrod/G Engine
Schedule Advisor
Trading Manager
Grid Store
Grid Dispatcher
Grid Explorer
Grid Middleware
TM TS
Globus, Legion, Condor, etc.
GE GIS
Grid Information Server(s)
RM TS
RM TS
RM TS
G
C
L
G
Legion enabled node.
Globus enabled node.
L
G
C
L
RM Local Resource Manager, TS Trade Server
Condor enabled node.
Source Rajkumar Buyya (Monash Univ.)
69
Nimrod/G Interactions
Grid Node
Compute Node
User Node
Source Rajkumar Buyya (Monash Univ.)
70
Adaptive Scheduling Steps
Discover More Resources
Discover Resources
Establish Rates
Evaluate Reschedule
Compose Schedule
Meet requirements ? Remaining Jobs, Deadline,
Budget ?
Distribute Jobs
Source Rajkumar Buyya (Monash Univ.)
71
Concluding Remarks
  • Restriction in Grid Middleware
  • Difficulties in distributed computing and
    resource management policy
  • Difficulties of middleware implementation
    required for heterogeneous systems in
    meta-computing infrastructure
  • Globus, Condor, TENT, PARIS, Cactus, .
  • Difficulties of Resource Management in Grid
    Computing
  • Models for Grid resource management architecture
  • Hierarchical, AO, and Market-model .
About PowerShow.com