SProj 3 - PowerPoint PPT Presentation

About This Presentation
Title:

SProj 3

Description:

Setting up the PBS Cluster. Installation of Linux with Windows. Installation of SGE as well as PBS. Setting up a Network File ... Default FIFO Scheduler in PBS ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 52
Provided by: jahan5
Category:
Tags: pbs | sproj

less

Transcript and Presenter's Notes

Title: SProj 3


1
SProj 3
  • Libra An Economy-Driven Cluster Scheduler
  • Jahanzeb Sherwani
  • Nosheen Ali
  • Nausheen Lotia
  • Zahra Hayat
  • Project Advisor/Client Rajkumar Buyya
  • Faculty Advisor Dr. Arif Zaman

2
Problem Statement
  • Implementing a computational-economy based
    user-centric scheduler for clusters

3
What is a cluster?
  • A collection of workstations interconnected via a
    network technology, in order to take advantage of
    combined computational power and resources
  • An integrated collection of resources that can
    provide a single system image spanning all its
    nodes a virtual supercomputer
  • Used for computation-intensive applications such
    as AI expert systems, nuclear simulations, and
    scientific calculations

4
Why clusters?
  • Cost-effectiveness low cost-performance ratio
    compared to a specialized supercomputer
  • Increase in workstation performance
  • Increase in network bandwidth
  • Decrease in network latency
  • Scalability higher than that of a specialized
    supercomputer
  • Easier to integrate into an existing network than
    specialized supercomputers

5
Computational Economy
  • Traditional system-centric performance metrics
  • CPU Throughput
  • Mean Response Time
  • Shortest Job First
  • Computational economy is the inclusion of
    user-specified quality of service parameters with
    jobs so that resource management is user-centric
    rather than system-centric

6
Computational Economy (contd)
  • Project focus to implement a scheduler that aims
    to maximize user utility
  • Job parameters most relevant to user-centric
    scheduling
  • Budget allocated to job by user
  • Deadline specified by user

7
Computational Economy for Grids
  • What is a grid?
  • An infrastructure that couples resources such as
    computers (workstations or clusters ), software
    (for special purpose applications) and devices
    (printers, scanners) across the Internet and
    presents them as a unified integrated single
    resource that can be widely used
  • How a grid differs from a cluster
  • Wide geographical area
  • Non-dedicated resources
  • No centralized resource management

8
Computational Economy for Grids
  • Management of resources and scheduling
    computations in a grid environment is complex as
    the resources are
  • geographically distributed
  • heterogeneous in nature
  • owned by different individuals or organizations
  • have different access and cost models
  • resource discovery required
  • security issues
  • Computational economy has been implemented for
    grids the Nimrod/G resource broker is a global
    resource management and scheduling system that
    supports deadline and economy-based computations
    in grid-computing environments

9
Computational Economy for Clusters
  • Market-based Proportional Resource Sharing for
    Clusters Brent Chun and David E. Culler,
    University of California at Berkeley, Computer
    Science Division
  • a market-based approach based on the notion of a
    computational economy which optimizes for user
    value. It describes an architecture for
    market-based cluster resource management based on
    the idea of proportional resource sharing of
    basic computing resources. Cluster nodes act as
    independent sellers of computing resources while
    user applications act as buyers who purchase
    resources . Users are allocated
    credits/tickets-the more tickets they have, the
    greater their CPU share. Ticket allocation is on
    the basis of the amount the user is willing to
    pay his valuation of the job
  • Deadline not incorporated

10
Cluster Architecture
11
Cluster Management Software
  • Cluster Management Software is designed to
    administer and manage application jobs submitted
    to workstation clusters.
  • Creates a Single System Image
  • When a collection of interconnected computers
    appear to be a unified resource, we say it
    possesses a Single System Image
  • The benefit of a Single System Image is that the
    exact location of the execution of a process is
    entirely concealed from the user. The user is
    offered the illusion of a single powerful
    computer
  • Maintains centralized information about cluster
    status and resources

12
Cluster Management Software
  • Commercial and Open-source Cluster Management
    Software
  • Open-source Cluster Management Software
  • DQS (Distributed Queuing System )
  • CONDOR
  • GNQS (Generalized Network Queuing System)
  • MOSIX
  • REXEC (Remote Execution)
  • SGE (Sun Grid Engine)
  • PBS (Portable Batch System)

13
Cluster Management Software
  • Why SGE was rejected
  • lack of online support
  • lack of stability
  • Final choice of CMS PBS(Portable Batch System )

14
Pricing the Cluster Resources
  • Cost a (Job Execution Time) b (Job Execution
    Time / Deadline)
  • Cost of using the cluster depends on job length
    and job deadline the longer the user is prepared
    to wait for the results, the lower his cost
  • Cost formula forces user to reveal his true
    deadline

15
Scheduling Algorithm
  • How to meet budget and deadline constraints?
  • Ensuring low run-time for the algorithm
  • Greedy Algorithm
  • Complex solutions unfeasible
  • Test run of algorithm
  • 5 jobs, arriving at time t0, 5, 7, 9, 9, on a 3
    node cluster

16
LIBRA with PBS
  • Portable Batch System (PBS) as the Cluster
    Management Software (CMS)
  • Robust, portable, effective, extensible batch job
    queuing and resource management system
  • Supports different schedulers
  • Job accounting
  • Technical Support

17
Setting up the PBS Cluster
  • Installation of Linux with Windows
  • Installation of SGE as well as PBS
  • Setting up a Network File System
  • Configuring GridSim in Java
  • Configuring PBSWeb
  • Setting up the Apache WebServer
  • PHP scripting for Apache
  • Setting up PostgreSQL
  • Setting up SSH

18
PBS Overview
  • Main components of PBS
  • Job Server pbs_server
  • Job Scheduler pbs_sched
  • Job Executor Resource Monitor pbs_mom
  • The server accepts commands and communicates with
    the daemons
  • qsub - submit a job
  • qstat - view queue and job status
  • qalter - change jobs attributes
  • qdel - delete a job

19
Xpbs GUI for PBS
20
Xpbs --- GUI for PBS
21
Job Scheduling in PBS
22
The Libra Scheduler
  • Default FIFO Scheduler in PBS
  • FIFO - sort jobs by job queuing time running the
    earliest job first
  • Fair share sort schedule jobs based on past
    usage of the machine by the job owners
  • Round-robin - pick a job from each queue
  • By key - sort jobs by a set of keys
    shortest_job_first, smallest_memory_first

23
The Libra Scheduler
  • Job Input Controller
  • Adding parameters at job submission time
  • deadline
  • budget
  • executionTime
  • Defining new attributes of job
  • Job Acceptance and Assignment Controller
  • Budget checked through cost function
  • Admission control through deadline scheduling
  • Execution host with the minimum load and ability
    to finish job on time selected
  • Equal Share instead of Minimum Share

24
The Libra Scheduler
  • Job Execution Controller
  • Job run on the best node according to algorithm
  • Cluster and node status updated
  • runTime
  • cpuLoad
  • Job Querying Controller
  • Server, Scheduler, Exec Host, and Accounting Logs

25
PBS-Libra Web --- Front-end for the Libra Engine
26
PBS-Libra Web
27
PBS-Libra Web
28
PBS-Libra Web
29
PBS-Libra Web
30
PBS-Libra Web
31
PBS-Libra Web
32
PBS-Libra Web
33
PBS-Libra Web
34
(No Transcript)
35
Simulations
  • Goal
  • Measure the performance of Libra Scheduler
  • Performance ?
  • Maximize user satisfaction

36
Simulations
  • Simulation Software
  • Alter GridSim (grid resource management
    simulation)

37
GridSim Class Diagram
38
Simulations
  • Methodology
  • Workload
  • 120 jobs with deadlines and budgets
  • Job lengths 1000 to 10000
  • Resources
  • 10 node, single processor (MIPS rating 100)
    homogenous cluster

39
Simulations
  • Assumptions
  • Strict deadlines
  • Ignores processing overhead due to scheduler and
    clock interrupt
  • Scheduler simulated as a function
  • Input job size, deadline, budget
  • Output accept/reject, node , share allocated

40
Simulations
  • Compared
  • Proportional Share
  • FIFO
  • Experiments
  • 120 jobs, 10 nodes
  • Increasing workload to 150 and 200
  • Increasing cluster size to 20

41
Simulation Results
  • 120 jobs, 20 did not meet budget

42
100 Jobs, 10 NodesFIFO 23 rejected -
Proportional Share 14 rejected
43
Simulation Results
  • Increase workload to 200 jobs on the same 10 node
    cluster

44
200 Jobs, 10 NodesFIFO 105 rejected -
Proportional Share 93 rejected
45
Simulation Results
  • Scale the cluster up to 20 nodes

46
200 Jobs, 20 NodesFIFO 35 rejected -
Proportional Share 23 rejected
47
Simulation Results
48
Simulation Results
49
Simulation Results
50
Simulation Results
51
Conclusion Future Work
  • Succesfully implemented a Linux-based cluster
    that schedules jobs using PBS with our
    economy-driven Libra scheduler, and PBS-Libra Web
    as the front end.
  • Successfully tested our scheduling policy
  • Proportional Share delivers more value to users
  • Exploring other pricing mechanisms
  • Expanding the cluster with more nodes and with
    support for parallel jobs
Write a Comment
User Comments (0)
About PowerShow.com