Title: Strategies%20for%20Implementing%20Dynamic%20Load%20Sharing
1Strategies for Implementing Dynamic Load Sharing
2Problems with Static Load Sharing
- Algorithms must be distributed at compile time.
- For many algorithms, the processing time for the
program is unknown. - Load on each processor is unknown and can change
at any time. - If a processor finished, it will remain idle.
3Advantages to Dynamic Load Sharing
- Partitions of work are modified at run time.
- Hope is for load to shift during run time so that
no processor is idle until all processors run out
of work. - Sender and Receiver Based
- depending on the algorithm, either the
over-loaded or under-loaded processor can
initiate the transfer of work.
4Hybrid Load Sharing
- Load is initially distributed statically.
- During run time, the work distribution is
modified dynamically. - Saves the complexity of having to break up the
workload at the start.
5Conditions for Dynamic Load Sharing to be
Worthwhile
- Work at each processor must be able to be
partitioned into independent pieces as long it
has more than a minimal amount. - Cost of splitting work and sending to another
processor must be less than if it does not. - Method of splitting data must exist.
6Receiver Initiated Dynamic Load Sharing Algorithms
7Asynchronous Round Robin
- Each processor keeps a target variable.
- When a processor runs out of work, it sends a
request to the processor in target. - Not desirable because multiple processors could
send work to the same processor near the same
time. Depending on network topology, high
overhead is possible to reach all processors from
one node.
8Nearest Neighbor
- An idle processor will send requests to its
nearest neighbors in a round robin scheme. - If a network has an identical distance between
processors, same as Asynchronous Round Robin
method. - The major problem with this method is that if
there is a localized concentration of data it
will take a long time to be distributed
9Global Round Robin
- Solves the problems of distributing work evenly
and of a processor receiving work from multiple
other processors. - Global target variable so that workload gets
distributed relatively evenly. - Problem is that processors will be in contention
for the target variable.
10Global Round Robin with Message Combining
- Avoids contention problem when accessing the
target variable - All requests to read the target value are
combined at intermediate nodes. - Has only been used in research.
11Random Polling
- Simplest method of load balancing
- processor requests a processor to send work to at
random - equal probability of sending work to each
processor, so, distribution of work is about equal
12Scheduler Based
- One processor designated as the scheduler.
- sends work from FIFO queue of processor that can
donate work to nodes that request work. - Work request never sent to a node that does not
have work to send. - Disadvantage is the routing time because every
node must go through the scheduler node.
13Gradient Model
- Based on 2 threshold parameters, High-Water-Mark
(HWM) and Low-Water-Mark (LWM) - Determine if system load is light, moderate or
heavy. - Proximity defined as shortest distance to a
lightly loaded node
14Gradient Map (cont)
- Tasks are rounted through the system in toward
the nearest underloaded processor. - Gradient map may change while work passes through
network - Propagation to update gradient map can vary
greatly.
15Receiver Initiated Diffusion
- Each processor only looks at the load information
in its own domain. - An implementation of Near Neighbor, however,
there is a threshold value where a processor will
request work, so a processor will never become
idle. - Eventually will cause each processor to have an
even amount of the work
16Sender Initiated Dynamic Load Balancing Algorithms
17Single Level Load Balancing
- Breaks down tasks into smaller subtasks.
- Each processor responsible for more than one
subtask. - This assures that each processor does roughly the
same amount of work. - Manager processor controls creation and
distribution of subtasks. - Not scalable because manager must distribute all
work
18Multilevel Load Balancing
- Processors arranged in trees.
- Root processor of each tree is responsible for
distributing super-subtasks to its successor
processors. - If only one tree, will act the same as single
level load balancing.
19Sender Initiated Diffusion
- Each processor sends out a load update to its
neighbors. If a load update shows that a
processor does not have a lot of work, then work
is sent by one of its neighbors. - Similar to Receiver Initiated Diffusion
- If one area of the network has a lot of work, it
will take a long time to become distributed.
20Hierarchical Balancing Method
- Organizes the processors into balancing domains.
- Specific processors given responsibility of
balancing different levels of hierarchy. - Tree structure works well.
- If a branch overloaded, it will send work to
another branch of the tree, each node has a
corresponding node in the opposite tree.
21Dimensional Exchange Method
- Balances domains first, then larger domains.
- Domain defined as a dimension of a hypercube.
- Can be expanded to mesh by folding the mesh into
sections.