Scheduling in Hadoop MapReduce Vinod K V - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Scheduling in Hadoop MapReduce Vinod K V

Description:

JobTracker manages TaskTrackers and gives tasks to them. The MapReduce scheduling sub-system decides how to ... HOD internally uses Torque Maui to obtain nodes. ... – PowerPoint PPT presentation

Number of Views:250

Avg rating:3.0/5.0

Slides: 32

Provided by: search8

Category:

more less

Transcript and Presenter's Notes

Title: Scheduling in Hadoop MapReduce Vinod K V

1
Scheduling in Hadoop MapReduceVinod K V
2
Abstract

What is scheduling in the context of MapReduce?
Previous scheduling system
The present in-framework schedulers

3
Scheduling in MapReduce

MapReduce cluster A JobTracker, bunch of
TaskTrackers
TaskTrackers run tasks
JobTracker manages TaskTrackers and gives tasks
to them
The MapReduce scheduling sub-system decides how
to run MapReduce jobs
Task of which job to run next?

4
Previous scheduling system

Private MapReduce clusters, single DFS
Job-scheduling using HadoopOnDemand
Users use HOD to reserve a number of nodes on a
cluster. HOD internally uses TorqueMaui to
obtain nodes.
Once the nodes are reserved, users use HOD to
submit a job to Hadoop. HOD starts up Hadoop on
reserved nodes and submits job.
Users may use their reserved nodes to run more
jobs if needed.

5
Benefits

There are benefits to this approach
Torque/Maui have good features for managing
cluster resources.
Queues
Accounting
Security
Cleanup
UI for jobs/queues
Isolation. Users jobs run on their own cluster.

6
Limitations

MapReduce jobs are elastic
Users pick mostly arbitrary numbers for nodes.
When reserving nodes, Torque does not look at
data locality. Adversely affects Hadoops
performance.
Nodes are marked against queues!
Reserving nodes separately before submitting a
Hadoop job causes a 30 increase in a jobs run
time (the HOD tax).
Nodes are reserved for the whole job. Most jobs
have many more maps than reducers. When maps
finish, nodes can sit idle waiting for the
reducers to finish. This adversely affects
utilization.
Even after jobs finish, users do not deallocate
clusters!!

7
Why not modify Torque / Maui

Tweaking Torque / Maui was hard
Poor documentation
Difficult to modify code
Different requirements our niche requirements
weren't supported well in the community
Hard for us to control development, quality,
patches, roadmaps

8
Task-scheduling

Scheduling tasks instead of jobs.

9
Design Pluggable Schedulers in Hadoop

Schedulers made a pluggable component from Hadoop
0.19
Three schedulers
Default scheduler
Single priority based queue of jobs
Scheduling tries to balance map and reduce load
on all tasktrackers in the cluster
Fair Scheduler
Built by Facebook
Multiple queues (pools) of jobs sorted in FIFO
or by fairness limits
Each pool is guaranteed a minimum capacity and
excess is shared by all jobs using a fairness
algorithm
Scheduler tries to ensure that over time, all
jobs receive the same number of resources
Capacity Scheduler
Yahoo!s scheduler

10
Design Interaction between M/R framework and
schedulers
submitJob
Heartbeat
Queues
Jobs

Waiting queue
Running queue
Sorted by priority and submission time

Guaranteed Capacity
Current Capacity
ReclamationInfo
UserInfo

11
CapacityTaskScheduler Agenda

Motivation
Features
Design Details
Roadmap
Questions

12
Agenda

Motivation
Features
Design Details
Roadmap
Questions

13
Motivation

Stack outside Yahoo! (pre-Hadoop 0.19)
Single static M/R cluster on a physical cluster,
and shared DFS
Default Scheduling single priority based queue
of jobs, focussed on load balancing
Limitations of this model
Sharing a cluster is complex guarantees for
jobs, fairness expectations, high utilization
needs
Minimal administrative support for shared
environments
Broadly
Need better features for usage in a shared
environment
Without compromising on core M/R benefits (data
locality, elasticity, etc)

14
Agenda

Motivation
Features
Design Details
Roadmap
Questions

15
Support for multiple Organizations and Queues

Organizations (orgs) are distinct entities for
administration, billing, etc.
Users belong to orgs
A user can belong to more than one org
Orgs represented by a queue of jobs
Conceptually may be more - queue for regular
jobs, queue for short-running jobs, etc.
Users submit jobs to queues
Single step process submit a Hadoop job to a
static M/R cluster
No need to specify number of nodes

16
Capacity

Queues have a capacity
Count of the number of slots guaranteed for a
queue
Not a pre-defined set of nodes
Expressed as a percentage of the cluster capacity
Jobs in a queue get this capacity when there is
demand
Sum of capacities is equal to Grid capacity
For better utilization, capacities can be
distributed
Unused capacity of a queue can be used by other
queues
But, will be reclaimed as tasks complete and the
queues need back the capacity

17
Priorities

Queues support priorities for jobs
Same priority levels as with default scheduler
Can turn off priority support per queue
Users can specify / change priority of their jobs
Priorities dictate order in which scheduler looks
at jobs
Behavior similar to standard Hadoop
No task level preemption
But higher priority jobs jump over lower priority
ones
Job priorities are not compared across queues

18
Fairness (user limits)

Queues have a minimum user limit
Number of slots that a single user can use per
queue
Expressed as a percentage
Users restricted to this amount only when there
is more demand
No preemption to enforce limits
Can also think of it as the max number of
concurrent users whose jobs a queue can run

19
Resource based scheduling

Two components monitoring and scheduling
Monitoring - Memory limits
Administrators can specify a virtual memory limit
for Hadoop tasks on a tasktracker
TaskTrackers monitor and kill tasks if the limit
is exceeded.
Scheduling - Memory requirements for a job
Users can specify the virtual / physical memory
requirements for a job.
Scheduler aware of memory requirements and uses
that in scheduling decisions
Disk requirements are handled as in standard
Hadoop
Estimated per job based on current usage, and
estimate used for remaining tasks

20
User Interface

Can submit jobs to a queue
hadoop jar foo.jar -Dmapred.queue.nameresearch
Submitted to a queue named 'default', by default
See queue details via CLI
hadoop queue -list
hadoop queue -info queue-name -showJobs
Queued and running jobs are shown in the order
maintained by the scheduler
Also available on the web UI

21
Supporting Features

Features that were out-of-the box when using
Torque/Maui
Security
ACLs for queues
to control who can submit jobs to a queue
who (other than owner) can modify jobs in a
queue, including killing and changing priorities
Run tasks as users securely on the TaskTracker
(In progress)
Accounting
Accounting logs for actions taken by the capacity
scheduler, can be clubbed with Chukwa
All information about job is also logged

22
Agenda

Motivation
Features
Design Details
Roadmap
Questions

23
Scheduling Algorithm

One of many possible answers
Pick a slot type
Map or Reduce depending on which type has more
free slots on the TT
Map, if the of free slots is equal
Pick a queue
First, queues which need to reclaim capacity
ordered by time left to reclaim
Then, queues furthest from their capacity
determined by (running tasks / guaranteed
capacity)
Note No limit based on capacity
Unused resources are used for any eligible job of
any queue
Pick a job
Sort order for jobs is by priority and arrival
time
Jobs constrained by user limits and memory
requirements
Pick a task
Same as in default scheduler

24
Memory-based monitoring and scheduling

Prevent rogue jobs from bringing down nodes, but
support high-mem requirements
Monitoring
kill tasks if they consume more VM that their
task limit
kill tasks if sum of VM consumed is more than
node limit.
Scheduling - Memory requirements for a job
Users can specify the virtual / physical memory
requirements for a job
Scheduler aware of memory requirements and uses
that in scheduling decisions

25
Memory based monitoring

Monitoring
Administrators configure amount of vmem reserved
for system usage on a tasktracker
Also configure a default cluster wide amount of
vmem reserved for a task
typically amount of vmem for Hadoop tasks /
number of slots
can be overridden per job by users
Thread in tasktracker periodically monitors total
vmem used by a task
includes the task and all its subprocesses
uses the proc file system
Kills those tasks whose vmem usage exceeds the
limit reserved for the task
If all tasks are within limits individually, but
the sum exceeds the total limit, enough tasks are
killed to curtail the total usage

26
Memory based scheduling

Users can specify maximum amount of vmem, and
optionally physical memory, required per job
But not beyond an admin configured limit
In each heartbeat, tasktracker reports the
following
amount of virtual and physical memory available
for Hadoop tasks
and currently running tasks
Scheduler computes available memory and checks if
the next job's requirements are matched
If memory requirements are not met, no task is
given to the tasktracker
Avoids starvation of jobs with high memory
requirements

27
Multiple tasks per HeartBeat