Scheduling in Hadoop MapReduce Vinod K V - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Scheduling in Hadoop MapReduce Vinod K V

Description:

JobTracker manages TaskTrackers and gives tasks to them. The MapReduce scheduling sub-system decides how to ... HOD internally uses Torque Maui to obtain nodes. ... – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 32
Provided by: search8
Category:

less

Transcript and Presenter's Notes

Title: Scheduling in Hadoop MapReduce Vinod K V


1
Scheduling in Hadoop MapReduceVinod K V
2
Abstract
  • What is scheduling in the context of MapReduce?
  • Previous scheduling system
  • The present in-framework schedulers

3
Scheduling in MapReduce
  • MapReduce cluster A JobTracker, bunch of
    TaskTrackers
  • TaskTrackers run tasks
  • JobTracker manages TaskTrackers and gives tasks
    to them
  • The MapReduce scheduling sub-system decides how
    to run MapReduce jobs
  • Task of which job to run next?

4
Previous scheduling system
  • Private MapReduce clusters, single DFS
  • Job-scheduling using HadoopOnDemand
  • Users use HOD to reserve a number of nodes on a
    cluster. HOD internally uses TorqueMaui to
    obtain nodes.
  • Once the nodes are reserved, users use HOD to
    submit a job to Hadoop. HOD starts up Hadoop on
    reserved nodes and submits job.
  • Users may use their reserved nodes to run more
    jobs if needed.

5
Benefits
  • There are benefits to this approach
  • Torque/Maui have good features for managing
    cluster resources.
  • Queues
  • Accounting
  • Security
  • Cleanup
  • UI for jobs/queues
  • Isolation. Users jobs run on their own cluster.

6
Limitations
  • MapReduce jobs are elastic
  • Users pick mostly arbitrary numbers for nodes.
  • When reserving nodes, Torque does not look at
    data locality. Adversely affects Hadoops
    performance.
  • Nodes are marked against queues!
  • Reserving nodes separately before submitting a
    Hadoop job causes a 30 increase in a jobs run
    time (the HOD tax).
  • Nodes are reserved for the whole job. Most jobs
    have many more maps than reducers. When maps
    finish, nodes can sit idle waiting for the
    reducers to finish. This adversely affects
    utilization.
  • Even after jobs finish, users do not deallocate
    clusters!!

7
Why not modify Torque / Maui
  • Tweaking Torque / Maui was hard
  • Poor documentation
  • Difficult to modify code
  • Different requirements our niche requirements
    weren't supported well in the community
  • Hard for us to control development, quality,
    patches, roadmaps

8
Task-scheduling
  • Scheduling tasks instead of jobs.

9
Design Pluggable Schedulers in Hadoop
  • Schedulers made a pluggable component from Hadoop
    0.19
  • Three schedulers
  • Default scheduler
  • Single priority based queue of jobs
  • Scheduling tries to balance map and reduce load
    on all tasktrackers in the cluster
  • Fair Scheduler
  • Built by Facebook
  • Multiple queues (pools) of jobs sorted in FIFO
    or by fairness limits
  • Each pool is guaranteed a minimum capacity and
    excess is shared by all jobs using a fairness
    algorithm
  • Scheduler tries to ensure that over time, all
    jobs receive the same number of resources
  • Capacity Scheduler
  • Yahoo!s scheduler

10
Design Interaction between M/R framework and
schedulers
submitJob
Heartbeat
Queues
Jobs
  • Waiting queue
  • Running queue
  • Sorted by priority and submission time
  • Guaranteed Capacity
  • Current Capacity
  • ReclamationInfo
  • UserInfo

11
CapacityTaskScheduler Agenda
  • Motivation
  • Features
  • Design Details
  • Roadmap
  • Questions

12
Agenda
  • Motivation
  • Features
  • Design Details
  • Roadmap
  • Questions

13
Motivation
  • Stack outside Yahoo! (pre-Hadoop 0.19)
  • Single static M/R cluster on a physical cluster,
    and shared DFS
  • Default Scheduling single priority based queue
    of jobs, focussed on load balancing
  • Limitations of this model
  • Sharing a cluster is complex guarantees for
    jobs, fairness expectations, high utilization
    needs
  • Minimal administrative support for shared
    environments
  • Broadly
  • Need better features for usage in a shared
    environment
  • Without compromising on core M/R benefits (data
    locality, elasticity, etc)

14
Agenda
  • Motivation
  • Features
  • Design Details
  • Roadmap
  • Questions

15
Support for multiple Organizations and Queues
  • Organizations (orgs) are distinct entities for
    administration, billing, etc.
  • Users belong to orgs
  • A user can belong to more than one org
  • Orgs represented by a queue of jobs
  • Conceptually may be more - queue for regular
    jobs, queue for short-running jobs, etc.
  • Users submit jobs to queues
  • Single step process submit a Hadoop job to a
    static M/R cluster
  • No need to specify number of nodes

16
Capacity
  • Queues have a capacity
  • Count of the number of slots guaranteed for a
    queue
  • Not a pre-defined set of nodes
  • Expressed as a percentage of the cluster capacity
  • Jobs in a queue get this capacity when there is
    demand
  • Sum of capacities is equal to Grid capacity
  • For better utilization, capacities can be
    distributed
  • Unused capacity of a queue can be used by other
    queues
  • But, will be reclaimed as tasks complete and the
    queues need back the capacity

17
Priorities
  • Queues support priorities for jobs
  • Same priority levels as with default scheduler
  • Can turn off priority support per queue
  • Users can specify / change priority of their jobs
  • Priorities dictate order in which scheduler looks
    at jobs
  • Behavior similar to standard Hadoop
  • No task level preemption
  • But higher priority jobs jump over lower priority
    ones
  • Job priorities are not compared across queues

18
Fairness (user limits)
  • Queues have a minimum user limit
  • Number of slots that a single user can use per
    queue
  • Expressed as a percentage
  • Users restricted to this amount only when there
    is more demand
  • No preemption to enforce limits
  • Can also think of it as the max number of
    concurrent users whose jobs a queue can run

19
Resource based scheduling
  • Two components monitoring and scheduling
  • Monitoring - Memory limits
  • Administrators can specify a virtual memory limit
    for Hadoop tasks on a tasktracker
  • TaskTrackers monitor and kill tasks if the limit
    is exceeded.
  • Scheduling - Memory requirements for a job
  • Users can specify the virtual / physical memory
    requirements for a job.
  • Scheduler aware of memory requirements and uses
    that in scheduling decisions
  • Disk requirements are handled as in standard
    Hadoop
  • Estimated per job based on current usage, and
    estimate used for remaining tasks

20
User Interface
  • Can submit jobs to a queue
  • hadoop jar foo.jar -Dmapred.queue.nameresearch
  • Submitted to a queue named 'default', by default
  • See queue details via CLI
  • hadoop queue -list
  • hadoop queue -info queue-name -showJobs
  • Queued and running jobs are shown in the order
    maintained by the scheduler
  • Also available on the web UI

21
Supporting Features
  • Features that were out-of-the box when using
    Torque/Maui
  • Security
  • ACLs for queues
  • to control who can submit jobs to a queue
  • who (other than owner) can modify jobs in a
    queue, including killing and changing priorities
  • Run tasks as users securely on the TaskTracker
    (In progress)
  • Accounting
  • Accounting logs for actions taken by the capacity
    scheduler, can be clubbed with Chukwa
  • All information about job is also logged

22
Agenda
  • Motivation
  • Features
  • Design Details
  • Roadmap
  • Questions

23
Scheduling Algorithm
  • One of many possible answers
  • Pick a slot type
  • Map or Reduce depending on which type has more
    free slots on the TT
  • Map, if the of free slots is equal
  • Pick a queue
  • First, queues which need to reclaim capacity
  • ordered by time left to reclaim
  • Then, queues furthest from their capacity
  • determined by (running tasks / guaranteed
    capacity)
  • Note No limit based on capacity
  • Unused resources are used for any eligible job of
    any queue
  • Pick a job
  • Sort order for jobs is by priority and arrival
    time
  • Jobs constrained by user limits and memory
    requirements
  • Pick a task
  • Same as in default scheduler

24
Memory-based monitoring and scheduling
  • Prevent rogue jobs from bringing down nodes, but
    support high-mem requirements
  • Monitoring
  • kill tasks if they consume more VM that their
    task limit
  • kill tasks if sum of VM consumed is more than
    node limit.
  • Scheduling - Memory requirements for a job
  • Users can specify the virtual / physical memory
    requirements for a job
  • Scheduler aware of memory requirements and uses
    that in scheduling decisions

25
Memory based monitoring
  • Monitoring
  • Administrators configure amount of vmem reserved
    for system usage on a tasktracker
  • Also configure a default cluster wide amount of
    vmem reserved for a task
  • typically amount of vmem for Hadoop tasks /
    number of slots
  • can be overridden per job by users
  • Thread in tasktracker periodically monitors total
    vmem used by a task
  • includes the task and all its subprocesses
  • uses the proc file system
  • Kills those tasks whose vmem usage exceeds the
    limit reserved for the task
  • If all tasks are within limits individually, but
    the sum exceeds the total limit, enough tasks are
    killed to curtail the total usage

26
Memory based scheduling
  • Users can specify maximum amount of vmem, and
    optionally physical memory, required per job
  • But not beyond an admin configured limit
  • In each heartbeat, tasktracker reports the
    following
  • amount of virtual and physical memory available
    for Hadoop tasks
  • and currently running tasks
  • Scheduler computes available memory and checks if
    the next job's requirements are matched
  • If memory requirements are not met, no task is
    given to the tasktracker
  • Avoids starvation of jobs with high memory
    requirements

27
Multiple tasks per HeartBeat
  • Very short tasks
  • Utilization
  • Starvation of one type of task-types

28
Agenda
  • Motivation
  • Features
  • Design Details
  • Roadmap
  • Questions

29
Roadmap
  • Next steps
  • Scheduling enhancements
  • global view of scheduling
  • Enhanced Resource Management
  • disk, CPU, etc, accounting and charging for high
    usage

30
Questions?
31
Thank You !
Write a Comment
User Comments (0)
About PowerShow.com