P1252108901utFca - PowerPoint PPT Presentation

About This Presentation
Title:

P1252108901utFca

Description:

Hi-throughput computing and Condor. Resource Management in distributed ... Condor ... Work outside the Condor kernel- New challenges. Mulitlateral Matchmaking ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 26
Provided by: MironL
Category:

less

Transcript and Presenter's Notes

Title: P1252108901utFca


1
Scheduling Resource Management in Distributed
Systems Rajesh Rajamani, raj_at_cs.wisc.edu http//
www.cs.wisc.edu/condor May 2001
2
Outline
  • Hi-throughput computing and Condor
  • Resource Management in distributed systems
  • Matchmaking
  • Current research/Misc.

3
Power of Computing environments
  • Power Work / Time
  • High Performance Computing
  • Fixed amount of work how much time?
  • Traditional Performance metrics FLOPS, MIPS
  • Response time/latency oriented
  • High Throughput Computing
  • Fixed amount of time how much work?
  • Application specific performance metrics
  • Throughput oriented

4
In other words
  • HPC - Enormous amounts of computing power over
    relatively short periods of time
  • () Good for applications under sharp time
    constraint
  • HTC - Large amounts of computing power for
    lengthy periods
  • () What if u want to simulate 1000 applications
    on ur latest DSP chip design over the next 3
    months??

5
The Condor Project
  • Goal - To develop, implement, deploy, and
    evaluate mechanisms and policies that support
    High Throughput Computing (HTC) on large
    collections of distributively owned computing
    resources

6
More about Condor
  • Started in late 80s
  • Principal Investigator - Prof.Miron Livny
  • Latest version 6.3.0 released
  • Supports 14 different platforms (OS Arch)
    including Linux, Solaris and WinNT
  • Currently employs over 20 students and 5 staff
  • We write code, debug, port, publish papers and
    YES, we also provide support !!!

7
Distributed ownership of resources
  • Underutilized - 70 of CPU cycles in a cluster go
    waste
  • Fragmented - Resources owned by different people
  • Use these resources to provide HTC, BUT without
    impacting QOS available to owner
  • Achieved by allowing the user to set access
    policy using control expressions

8
Access policy
  • Current state of the resource (eg, keyboard idle
    for 15 minutes or load average less than 0.2)
  • Characteristics of the request (run only jobs of
    research associates)
  • Time of day/night that jobs can be run

9
What happens when u submit a job
Central Manager
2. Submitting machine sends Classad of the job
Resources announce their properties periodically
3. Matchmaker Notifies parties of a match
Submitting machine
Available resource
4. Parties negotiate
1. User submits a job
10
Important Mechanisms
Mechanism For
Matchmaking Resource Management
Checkpointing Saving the state of a job
Bypass Remote system calls
DAGMAN Automatic job submission based on dependency graph
Master-Worker Exploiting task level parallelism
11
Condor Architecture
  • Manager
  • Collector Database of resources
  • Negotiator Matchmaker
  • Accountant Priority maintenance
  • Startds ( Represent owners of resources)
  • Implement owner's access control policy
  • Schedds ( Represent customers of the system)
  • Maintain persistent queues of resource requests

12
Condor Architecture, cont.
13
Power of Condor
  • Solves NUG30 Quadratic assignment problem, posed
    in 1968 over a period of 6.9 days, delivering
    over 96,000 CPU hours by commandeering an average
    of 650 machines !!!
  • Compare this with the RSA-155 problem posed in
    1977 and solved using 300 computers (over a
    period of 7 months) in the last 90s. If you were
    to use the same amount of resources as that used
    to solve NUG30, this couldve been done in 2
    weeks !!!
  • It (Chorus production) was done in parallel on
    machines in the computer center running XXX, and
    on the office machines under Condor. The latter
    did about 90 of the work! -
  • - Helge MEINHARD
  • (EP division, CERN)

14
Resource management using Matchmaking
  • Opportunistic Resource Exploitation
  • Resource availability is unpredictable
  • Exploit resources as soon as they are available
  • Matchmaking performed continuously
  • As against a centralized scheduler which
    wouldve to deal with -
  • Heterogeneity of resources
  • Distributed Ownership - widely varying allocation
    policies
  • Dynamic nature of the cluster

15
Classified Advertisements
  • A simple language used by resource providers and
    customers to express their properties/requirements
    to the Collector
  • Uses a semi-structured data model gt no specific
    schema is required by the matchmaker, allowing it
    to work naturally in a heterogeneous env
  • Language folds query language into the data
    model. Constraints may be expressed as attributes
    of the classad
  • Should conform to advertising protocol

16
Matchmaking with Classads
  • 4 steps to managing resources -
  • Parties requiring matchmaking advertise their
    characteristics, preferences, constraints, etc.
  • Advertisements matched by a Matchmaker
  • Matched entities are notified
  • Matched entities establish an allocation through
    a claiming process - could include
    authentication, constraint verification,
    negotiation of terms etc
  • Method is symmetric

17
Classad example
  • Sample classad of a Job
  • Type Job
  • Owner run_sim
  • Constraint
  • other.Type Machine
  • Arch INTEL
  • Opsys Solaris251
  • Other.Memory gt Memory
  • Sample classad of a workstation
  • Type Machine
  • OpSys Linux
  • Arch INTEL
  • Memory 256 M
  • Constraint true

18
Example Classad (workstation)
  • Type Machine
  • Activity Idle
  • Name crow.cs.wisc.edu
  • Arch INTEL
  • OpSys Solaris251
  • Kflops 21893
  • Memory 64
  • Disk 323496 //KB
  • DayTime 36107

19
Example Classad (contd.)
  • ResearchGrp miron, thain, john
  • Untrusted bgates, lalooyadav,
    thief
  • Rank member(other.Owner, ResearchGrp)10
  • Constraint !member(other.Owner, Untrusted)
    Rank gt 10 ?true false //To prevent
    malicious users

20
Example Classad (Submitted job)
  • Type Job
  • QDate 886799469
  • Owner raman
  • Cmd run_sim
  • Iwd /usr/raman/sim2
  • Memory 31
  • Rank Kflops/1e3 other.Memory/32
  • Constraint other.Type Machine
    OpSys Solaris251 Disk gt 10000
    other.Memory gt self.Memory

21
Matchmaking
  • Evaluates expressions in an environment that
    allows each classad to access attributes of the
    other
  • Other.Memory gt self.Memory
  • References to non-existent attribute evaluates to
    undefined
  • Considers pairs of ads incompatible unless their
    Constraint expressions both evaluate to true
  • Rank is then then used to choose among compatible
    matches
  • Both parties are notified about the match - could
    generate and hand-off session key for
    authentication and security

22
Separation of Matching and Claiming
  • Weak consistency requirements - Claiming allows
    provider and customer to verify their constraints
    with respect to their current state
  • Claiming protocol could use cryptographic
    techniques (authentication)
  • Principals involved in a match are themselves
    responsible for establishing, maintaining and
    servicing a match

23
Work outside the Condor kernel- New challenges
  • Mulitlateral Matchmaking - Gangmatching
  • IO regulation and Disk allocation - Kangaroo
  • User interfaces - ClassadView
  • Grid applications - Globus
  • Security

24
Summary
  • Matchmaking provides a scalable and robust
    resource management solution for HTC environments
  • Classads are used by workstations and jobs
  • Matchmaker forms the match and informs the
    parties, who in turn invoke the claiming protocol
  • The parties are responsible for establishing,
    maintaining and servicing a match
  • Questions ?

25
Gangmatch request
  • Type Job
  • Owner raj
  • Cmd run_sim
  • Ports
  • Label cpu
  • ImageSize 28 M
  • //Rank and constraints ,
  • Label License
  • Host cpu.Name
  • //Rank and constraints
Write a Comment
User Comments (0)
About PowerShow.com