CSS434: Parallel - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

CSS434: Parallel

Description:

A collection of centralized manager (Condor's gate flocking) ... At the master (Condor) At each node but collected at the master (Catalina) ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 15
Provided by: munehir
Category:

less

Transcript and Presenter's Notes

Title: CSS434: Parallel


1
CSS434 Grid Computing Textbook No Corresponding
Chapters
Professor Munehiro Fukuda A portion of these
slides were compiled from The Grid Blueprint for
a New Computer Infrastructure.
2
Network Infrastructure
  • Users login their organizational systems first
    locally or remotely.
  • If they are affiliated with other organizations,
  • They can login from the system of their main use
    to some other systems. (They are given an
    opportunity to use those resources in parallel).
  • Problems
  • They must orchestrate job execution among the
    resources they use.
  • Should those resources be limited to such a
    handful number of researchers?

3
Purposes of Computational Grid
  • Use computing resource connected to high-speed
    information highway as if we use electric power
    grid
  • Only 30 utilization in academic/commercial
    environments.
  • Many applications have only episodic
    requirements. So, why dont we share computation
    resource?
  • Computational results and data should be also
    made available to all users.
  • Users
  • Computational scientists and engineers
  • Experimental scientists
  • Association and corporations
  • Training and education
  • Consumers (E-commerce)

4
Grid Applications
Category Examples Characteristics
Distributed supercomputing DIS and Stellar dynamics Very large problems needing lots of computing resource at a time
High throughput Chip design and parameter studies Harnessing many idle resources to increase aggregate throughput
On demand Medical instrumentation Allocating special resource dynamically
Data intensive Sky survey Using distributed data and needing high-volume data flows
Collaborative Collaborative design Education Support communication or collaborative work
5
Grid Services Architecturefrom www.globus.org
slide
High-energy physics data analysis
Collaborative engineering
On-line instrumentation
Applications
Regional climate studies
Parameter studies
Distributed computing
Collab. design
Remote control
Application Toolkit Layer
Data- intensive
Remote viz
Information
Resource mgmt
. . .
Grid Services Layer
Security
Data access
Fault detection
Transport
Multicast
. . .
Grid Fabric Layer
Instrumentation
Control interfaces
QoS mechanisms
6
Programming ModelUniform Access
  • Paradigm
  • Bag of task or master workers (Condor-MW)
  • Client server (NetSolve)
  • Object oriented (Legion)
  • Synchronous applications (Not suited for
    massively parallel computation.)
  • Language Support
  • MPI-G message passing (Globus)
  • Open MP shared memory
  • Math Library remote procedure (NetSolve)

7
Resource ManagementDiscovery, Allocation, and
Scheduling
  • Centralized resource manager
  • easy to manage
  • a bottleneck
  • Decentralized resource manager
  • A collection of centralized manager (Condors
    gate flocking)
  • A combination of meta and local schedulers.

Systems Resource descriptions Front-end process Resource manager Job launcher
Globus RSL resource spec. language Broker and MDS GRAM
Condor ClassAd and DAGMan Schedd Agent Matchmaker and startd Sandbox (Starter)
Legion IDL interface def. language Scheduler Collection Enactor
8
Fault Tolerance
  • Check-pointing
  • At the master (Condor)
  • At each node but collected at the master
    (Catalina)
  • Use a whiteboard (Optimal Grid)
  • Re-execution of fault worker jobs from the
    beginning (Bayanihan, Optimal Grid)
  • Error code (NetSolve)
  • User is responsible to handle errors.

9
Security
  • Resources covered with security layers
  • Legion (Message/MayI layers)
  • Entropia (Intercepting all system calls)
  • A use of commodity tools
  • SSL
  • Public key
  • Security Certificate
  • Java sandbox
  • Kerberos

10
NetSolvehttp//icl.cs.utk.edu/netsolve/
Network of servers
Client
  • RPC-based approach
  • Clients
  • Include a set of APIs called as (asynchronous)
    RPCs
  • Agents
  • Match clients requests for services with servers
  • Servers
  • Encapsulates remotely accessed numerical libraries

Agent
Agent
choice
Scalar server
Client
request
reply
MPP servers
11
Legionhttp//legion.virginia.edu/
  • Legion classes
  • Act as managers and make policy
  • Core objects
  • Provide mechanisms that classes use to implement
    policies hosts (processors), vaults(memory),
    context, binding agents, etc.
  • Per-Program Scheduling
  • Participating sites can assure their local
    policies.
  • User can choose a scheduling policy.

Prog
request
Enactor
Scheduler
Converted Legion object ID By context objects
reserve
search
Converted Logion object address By binding agents
Resource database
Class
Host
collection
tty
Host
Host
tty
Resources
Class
tty
12
Condorhttp//www.cs.wisc.edu/condor/
A Users local agent R Each computer
resource M Central manager
I/O forwarded to a users home
13
AgentTeamwork at UWBArchitecture
14
Paper Review by Students
  • Globus
  • Legion
  • Condor
  • Netsolve
  • Discussions
  • What programming or execution model is each
    system based on?
  • What resource allocation and scheduling algorithm
    does each system use?
  • Are they fault-tolerant?
  • Did they any special security features for their
    own?
Write a Comment
User Comments (0)
About PowerShow.com