Multiprocessor Scheduling - PowerPoint PPT Presentation

About This Presentation

Title:

Multiprocessor Scheduling

Description:

Sharing the work between OS and application. ... The OS grants some or all of the request, based on the number of processors currently available. ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 26

Provided by: patt218

Category:

Tags: multiprocessor | operating | os | scheduling | systems

more less

Transcript and Presenter's Notes

Title: Multiprocessor Scheduling

1
Multiprocessor Scheduling

Module 3.1
For a good summary on multiprocessor and
real-time scheduling,
visit http//www.cs.uah.edu/weisskop/osnotes_htm
l/M8.html

2
Classifications of Multiprocessor Systems

Loosely coupled multiprocessor, or clusters
Each processor has its own memory and I/O
channels.
Functionally specialized processors
Such as I/O processor
Nvidia GPGPU
Controlled by a master processor
Tightly coupled multiprocessing
MCMP
Processors share main memory
Controlled by operating system
More economical than clusters

3
Types of parallelism

Bit-level parallelism
Instruction-level parallelism
Data parallelism
Task parallelism ? our focus

4
Synchronization Granuality

Refers to frequency of synchronization or
parallelism among processes in the system
Five classes exist
Independent (SI is not applicable)
Very coarse (2000 lt SI lt 1M)
Course (200 lt SI lt 2000)
Medium (20 lt SI lt 200)
Fine (SI lt 20)

SI is called the Synchronization Interval, and
measured in instructions.
5
Wikipedia on Fine-grained, coarse-grained, and
embarrassing parallelism

Applications are often classified according to
how often their subtasks need to synchronize or
communicate with each other.
An application exhibits fine-grained parallelism
if its subtasks must communicate many times per
second
it exhibits coarse-grained parallelism if they do
not communicate many times per second,
and it is embarrassingly parallel if they rarely
or never have to communicate. Embarrassingly
parallel applications are considered the easiest
to parallelize.

6
Independent Parallelism

Multiple unrelated processes
Separate application or job, e.g. spreadsheet,
word processor, etc.
No synchronization
More than one processor is available
Average response time to users is less

7
Coarse and Very Coarse-Grained Parallelism

Very coarse distributed processing across
network nodes to form a single computing
environment
Coarse Synchronization among processes at a very
gross level
Good for concurrent processes running on a
multiprogrammed uniprocessor
Can by supported on a multiprocessor with little
change

8
Medium-Grained Parallelism

Parallel processing or multitasking within a
single application
Single application is a collection of threads
Threads usually interact frequently, leading to a
medium-level synchronization.

9
Fine-Grained Parallelism

Highly parallel applications
Synchronization every few instructions (on very
short events).
Fill the gap between ILP (instruction level
parallelism) and Medium-grained parallelism.
Can be found in small inner loops
Use of MPI and OpenMP programming languages
OS should not intervene. Usually done by HW
In practice, this is very specialized and
fragmented area

10
Scheduling

Scheduling on a multiprocessor involves 3
interrelated design issues
Assignment of processes to processors
Use of multiprogramming on individual processors
Makes sense for processes (coarse-grained)
May not be good for threads (medium-grained)
Actual dispatching of a process
What scheduling policy should we use FCFS, RR,
etc. Sometimes a very sophisticated policy
becomes counter productive.

11
Assignment of Processes to Processors

Treat processors as a pooled resource and assign
process to processors on demand
Permanently assign process to a processor
Dedicate short-term queue for each processor
Less overhead. Each does its own scheduling on
its queue.
Disadvantage Processor could be idle (has an
empty queue) while another processor has a
backlog.

12
Assignment of Processes to Processors

Global queue
Schedule to any available processor
During the lifetime of the process, process may
run on different processors at different times.
In SMP architecture, context switching can be
done with small cost.
Master/slave architecture
Key kernel functions always run on a particular
processor
Master is responsible for scheduling
Slave sends service request to the master
Synchronization is simplified
Disadvantages
Failure of master brings down whole system
Master can become a performance bottleneck

13
Assignment of Processes to Processors

Peer architecture
Operating system can execute on any processor
Each processor does self-scheduling from a pool
of available processes
Complicates the operating system
Make sure two processors do not choose the same
process
Needs lots of synchronization

14
Process Scheduling in Todays SMP

M/M/M/K Queueing system
Single queue for all processes
Multiple queues are used for priorities
All queues feed to the common pool of processors
Specific scheduling disciplines is less important
with more than one processor
A simple FCFS discipline with a static priority
may suffice for a multi-processor system.
Illustrate using graph, p460
In conclusion, specific scheduling discipline is
much less important with SMP than UP

15
Threads

Executes separate from the rest of the process
An application can be a set of threads that
cooperate and execute concurrently in the same
address space
Threads running on separate processors yields a
dramatic gain in performance

16
Multiprocessor Thread Scheduling --Four General
Approaches (1/2)

Load sharing
Processes are not assigned to a particular
processor
Gang scheduling
A set of related threads is scheduled to run on a
set of processors at the same time

17
Multiprocessor Thread Scheduling --Four General
Approaches (2/2)

Dedicated processor assignment
Threads are assigned to a specific processor.
(each thread can be run on a processor.)
When program terminates, processors are returned
to the processor available pool
Dynamic scheduling
Number of threads can be altered during course of
execution

18
Load Sharing

Load is distributed evenly across the processors
No centralized scheduler required
OS runs on every processor to select the next
thread.
Use global queues for ready threads
Usually FCFS policy

19
Disadvantages of Load Sharing

Central queue needs mutual exclusion
May be a bottleneck when more than one processor
looks for work at the same time
A noticeable problem when there are many
processors.
Preemptive threads will unlikely resume execution
on the same processor
Cache use is less efficient
If all threads are in the global queue, all
threads of a program will not gain access to the
processors at the same time.
Performance is compromised if coordination among
threads is high.

20
Gang Scheduling

Simultaneous scheduling of threads that make up a
single process
Useful for applications where performance
severely degrades when any part of the
application is not running
Threads often need to synchronize with each other

21
Advantages

Closely-related threads execute in parallel
Synchronization blocking is reduced
Less context switching
Scheduling overhead is reduced, as a single sync
to signal() may affect many threads

22
Scheduling Groups two time slicing divisions
23
Dedicated Processor Assignment Affinitization
(1/2)

When application is scheduled, each of its
threads is assigned a processor that remains
dedicated/affinitized to that thread until the
application runs to completion
No multiprogramming of processors, i.e. one
processor per a specific thread, and no other
thread.
Some processors may be idle as threads may block
Eliminates context switches with certain types
of applications this can save enough time to
compensate for the possible idle time penalty.
However, in a highly parallel environment with
100 of processors, utilization of processors is
not a major issue, but performance is. The total
avoidance of switching results in substantial
speedup.

24
Dedicated Processor Assignment Affinitization
(2/2)

In Dedicated Assignment, when the number of
threads exceeds the number of available
processors, efficiency drops.
Due to thread preemption, context switching,
suspending of other threads, and cache pollution
Notice that both gang scheduling and dedicated
assignment are more concerned with allocation
issues than scheduling issues.
The important question becomes "How many
processors should a process be assigned?" rather
than "How shall I choose the next process?"

25
Dynamic Scheduling

Threads within a process are variable
Sharing the work between OS and application.
When a job originates, it requests a certain
number of processors. The OS grants some or all
of the request, based on the number of processors
currently available.
Then the application itself can decide which
threads run when on which processors. This
requires language support as would be provided
with thread libraries.
When processors become free due to the
termination of threads or processes, the OS can
allocate them as needed to satisfy pending
requests.
Simulations have shown that this approach is
superior to gang scheduling and or dedicated
scheduling.