EECS 583 Lecture 24 Group 2 Dataflow analysis opti Group 3 Scheduling, Regalloc, Code gen - PowerPoint PPT Presentation

1 / 90
About This Presentation
Title:

EECS 583 Lecture 24 Group 2 Dataflow analysis opti Group 3 Scheduling, Regalloc, Code gen

Description:

... examine the condition, r1 3. In the Register to Integer family the intervals for r1 ... The Register to Integer family has the following intervals for r1: ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 91
Provided by: scottm80
Category:

less

Transcript and Presenter's Notes

Title: EECS 583 Lecture 24 Group 2 Dataflow analysis opti Group 3 Scheduling, Regalloc, Code gen


1
EECS 583 Lecture 24Group 2 Dataflow analysis
optiGroup 3 Scheduling, Regalloc, Code gen
  • University of Michigan
  • April 10, 2002

2
Today
  • Dataflow analysis and optimization
  • BDD-based predicate analysis
  • More intelligent predicate relation analysis
    using binary decision diagrahms
  • Beth, Laura, Bill
  • Scheduling, register allocation, code generation
  • Retargeting Elcor to TI C6x
  • Handling multiple clusters
  • Jeff, Dave
  • Power-sensitive scheduling
  • Dealing with power in a modulo scheduler
  • Dynamic voltage scaling
  • Hai, Amit

3
Next time (Monday, 4/15)
  • G2 Dataflow analysis and optimization
  • Partial inlining Chunhui, Dukhyun, Jeremy
  • G3 Scheduling, register allocation, code
    generation
  • While loop software pipelining Arnar, Tomas,
    Misha
  • G4 Memory optimization
  • Data layout Tony, Marius
  • Exams returned on Monday if it kills me !!!!!!!!
  • On Wednes (4/17, last class)
  • Last 2 memory optimization groups go any spill
    over

4
Course evaluations
  • Written portion of the evaluation is important
    because this is the first time the class in this
    form was offered
  • So, I am interested in what you guys think needs
    to be improved
  • Put some thought into your answers !
  • Note saying the test was too long is not that
    useful
  • Question A
  • What did you NOT like about the class or do you
    think needs the most improvement? What would you
    have done differently?
  • Question B
  • Thurs/Fri group meetings Did you like these?
    Were they useful? How could they be more useful?
    Are they worth the time?

5
Group 2 Predicate Analysis using Binary Decision
Diagrams
  • University of Michigan
  • April 10, 2002

6
Background
  • Our goal
  • Provide a system similar to Elcors PQS which
    uses BDDs rather than partition graphs to answer
    questions about relationships between predicates
  • From last time - Predicates and BDDs
  • Represent predicated control flow as Boolean
    equations (with BDDs)
  • Supports general predicated code
  • Efficient and accurate analysis of condition
    relations
  • Building BDDs
  • start with a single node 1
  • add variables to the BDD, each new variable is a
    single ITE node with a then-arc and an invert-arc
    to 1.
  • The BDD is built and queried using the ITE(f,g,h)
    function

7
Our Project
  • Initialize the BBD
  • parse through hyperblock looking for cmpps
  • the tree is built from these operations
  • examine the comparison of each cmpp
  • the comparison used in the cmpps will create
    intervals from register to literal compares and
    conditions from the register to register compares
  • these will be represented as boolean functions in
    the BDD
  • create boolean functions to represent each
    predicate based on the functions which represent
    the comparisons in each of its cmpps
  • Use the ITE function to manipulate the BDD
  • functions give answers to queries used by data
    flow analysis such as is_disjoint, is_subset, ...
  • Must be similar to current queries of PQS

8
Example cmpps
  • p1 cmpp.un(r1 lt 3)
  • p2 cmpp.un(r1 gt r2)
  • p1 cmpp.on(r3 lt 5)
  • p3 cmpp.un(r1 gt 4
  • p3 cmpp.an(r4 gt2)

9
Step 1
  • Look at Register to Constant comparisons
  • For each register create at number line
  • Split the number line into segments based on
    literals in comparison
  • create BDD with a node representing a finite
    domain on the number line

10
Step 1 - r1s number line
3
4
-8
8
  • Conditions r1 lt 3, r1 gt 4
  • Intervals
  • I01 (-8,3), I11 (3,4), I21 (4,4), I31
    (4,8)
  • Need 2 BDD nodes to represent 4 intervals (v0,v1)
  • I01 00
  • I11 01
  • I21 10
  • I31 11
  • Insert BDD nodes and intervals into BDD currently
    consisting of the single node 1

11
Step 1 - BDD
1
v0
12
Step 1 - BDD (cont)
I01
I11
I21
I31
I01
I11
I21
v1
v1
v1
v1
v1
v1
v1
v0
v0
v0
v0
v0
1
0
0
1
13
Step 1 - Reduced BDD
14
Step 1 - r3s number line
5
-8
8
  • Conditions r3 lt 5
  • Intervals
  • I03 (-8,5), I13 (5,8)
  • Need 1 BDD node to represent 2 intervals (v2)
  • I03 1
  • I13 0
  • Insert BDD nodes and intervals into BDD

15
Step 1 - BDD
I03
I13
v2
16
Step 1 - r4s number line
2
-8
8
  • Conditions r4 lt 2
  • Intervals
  • I04 (-8,2), I14 (2,8)
  • Need 1 BDD node to represent 2 intervals (v3)
  • I04 1
  • I14 0
  • Insert BDD nodes and intervals into BDD

17
Step 1 - BDD
I14
I03
I13
I04
v3
v2
18
Step 2
  • Look at Register to Register comparisons
  • 5 Basic Types
  • gt, gt, , lt,lt
  • Disjoint Outcomes
  • (1) R1 gt R2
  • (2) R1 R2
  • (3) R1 lt R2
  • Map disjoint outcome space to 2 Boolean variables
  • (R1 lt R2) (0,0), (R1 R2) (-,1), (R1 gt R2)
    (1,0)

19
Step 2 - r1 gt r2
  • 2 variables to represent 3 disjoint outcomes (v4,
    v5)

20
Step 2 - r1 gt r2 BDD
gt
lt
gt

lt
v4
v4
v5
1
21
Final comparison BDD
I14
I03
I31
I04
r1 gtr2
I01
I13
I21
I11
v4
v1
v1
v1
v1
v2
v3
v5
v0
1
22
Step 3 Predicate Node Creation
  • Traverse code, creating new Predicate Nodes using
    the ITE function
  • The structure of the BDD is determined by
  • Predicate Condition
  • Condition Type
  • Guard

23
Step 3 Predicate Node Creation
Px cmmp.XX(C) if Pg
24
Step 3 Predicate Layer
  • P1 cmpp.UN(r1 lt 3)
  • First, examine the condition, r1 lt 3
  • In the Register to Integer family the intervals
    for r1 are
  • (- ?, 3), (3, 4), (4, 4), (4, ?)
  • R1 lt 3 corresponds to interval I0, node I01 in
    the BDD
  • Type UN predicate
  • Guarded under true
  • Predicate node for p1 created by
  • P1 ITE(I01, 1, 0)

25
Step 3 p1 cmpp.UN(r1 lt 3)
  • P1 ITE(I01, 1, 0)

26
Step3 p2 cmpp.UN(r1 gt r2)
  • The condition is a Register to Register type.
  • There is a family in the BDD corresponding to the
    comparisons of r1 and r2
  • R1 gt R2 is specifically the node needed
  • Type UN predicate
  • Guarded under True
  • Predicate node for p2 created by
  • P2 ITE(r1gtr2, 1, 0)

27
Step 3 p2 cmpp.UN(r1 gt r2)
  • P2 ITE(r1ltr2, 1, 0)

28
Step 3 p1 cmpp.ON(r3 lt 5)
  • Condition is a Register to Integer type.
  • The Register to Integer family has the following
    intervals for r3
  • I03 (- ?, 5), I13 (5, ?)
  • R3 lt 5 corresponds to I03, this is the condition
  • Type ON predicate
  • Guarded under True
  • Predicate node for p1 created by
  • P1 ITE(I03, ITE(1, 1, p1), p1)
  • The p1 in the ITE function is the previous
    predicate node for p1.

29
Step 3 p1 cmpp.ON(r3 lt 5)
  • Condition corresponds to I03
  • P1 ITE(I03,ITE(1, 1, p1), p1) ITE(I03, 1, p1)
    I03 p1

30
Step 3 p3 cmpp.UN(r1 gt 4)
  • Condition is a Register to Integer type
  • The Register to Integer family has the following
    intervals for r1
  • I01 (- ?, 3), I11 (3, 4), I21 (4, 4), I31
    (4, ?)
  • R1 gt 4 corresponds to both I21 or I31
  • Type UN predicate
  • Guarded under True
  • P3 ITE(ITE(I21, 1, I31), 1, 0)

31
Step 3 p3 cmpp.UN(r1 gt 4)
P3 ITE(ITE(I21, 1, I31), 1, 0)
p1
p3
p2
I03
I31
I04
I14
r1 gtr2
I01
I13
I21
I11
v2
v4
v1
v1
v1
v1
v2
v3
v1
v5
v0
1
32
Step 3 p3 cmpp.AN(r4 gt 2)
  • Condition corresponds to I14
  • P3 ITE(1, ITE(I14, p3, 0), p3)

p1
p3
p2
I03
I31
I04
I14
r1 gtr2
I01
I13
I21
I11
p3
v2
v1
v4
v1
v1
v1
v1
v2
v3
v3
v5
v0
v1
1
33
Step 3 Final Predicate BDD
p3
p1
p2
I04
I14
I03
I31
r1 gtr2
I01
I13
I21
I11
v2
v4
v1
v1
v1
v1
v2
v3
v3
v5
v0
v1
1
34
Queries to PQS-BDD
  • Are p2 and p3 disjoint?
  • Tmp ITE(ITE(p2, ITE(p3, 0, 1), ITE(p3, 1, 0)),
    0, 1)
  • If tmp null, then p2 and p3 are disjoint.
  • Is p1 a subset of p3?
  • Tmp ITE(p1, ITE(p3, 0, 1), 0)
  • If tmp null, then p1 is a subset of p3.
  • All queries currently answered using PQS can be
    answered with the PQS-BDD system with ITE
    functions.

35
Cluster Scheduling
  • Jeff Ringenberg
  • David Oehmke

36
Motivations
  • Register File
  • Size increases linearly with the number of
    registers
  • Size increases quadratically with the number of
    ports
  • Access time increases logarithmically with the
    number of read ports and number of registers
  • Wide machines require large numbers of registers
    and ports
  • 8 wide ideal, fully-orthogonal VLIW machine
    requires approximately 16 read ports and 8 write
    ports

37
Clustering
  • Functional units and registers files are broken
    into sets (generally uniform)
  • Each functional unit in a cluster is fully
    connected to the local register file for that
    cluster
  • Limited connectivity between clusters
  • Register files for an 8-wide, 2 cluster machine
    are approximately one quarter the area of
    register file for a single cluster machine
  • Connectivity
  • Most papers assume an explicit move operation to
    move data between clusters
  • Some actual architectures allow operands to be
    directly read from other clusters via a limited
    bandwidth cross path

38
Clustering in the TI c6000
39
Compiling for Clusters
  • Compilation for a clustered machine is more
    complex than for a single cluster
  • Assign operations to a clusters functional units
  • Assign data to a clusters register file
  • Move data between clusters when necessary
  • Complications
  • Spread operations and data over clusters to
    achieve parallelism (partitioning)
  • Hide/limit inter-cluster communication penalty
  • NP complete problem

40
Cluster Scheduling Algorithms
  • BUG, Bottom-up greedy
  • Original algorithm from Bulldog compiler
  • Pre-scheduling cluster assignment
  • Limited Connectivity VLIW
  • Schedule assuming fully connected, then partition
    and insert necessary copy operations
  • Partial Component Clustering
  • Pre-scheduling DAG decomposition and cluster
    assignment with iterative improvement phase
  • Effective Cluster Assignment for Modulo
    Scheduling
  • Pre-modulo scheduling cluster assignment

41
Cluster Scheduling Algorithms
  • Instruction Scheduling for Clustered VLIW DSPs
    (targets TI C6201 architecture)
  • Partitioning using simulated annealing with list
    scheduler as cost function
  • Unified Assign and Schedule
  • Assign operations to clusters while scheduling
  • Simple modification to list scheduler
  • CARS
  • Single phase cluster assignment, register
    allocation, and instruction scheduling

42
Bottom-up Greedy (BUG)
  • Assign (node, destinations)
  • if (!node.parent node.fu ! unassigned)
  • return
  • for each operand of node
  • fus,cycles LikelyFUs(node,destinations)
  • Assign (operand, fus)
  • fus,cycles LikelyFUs(node,destinations)
  • node.fu fus.front
  • node.cycle cycle.front
  • availablenode.funode.cyclefalse
  • for each operand of node
  • if (operand.type DEF
  • operand.location unassigned
  • MustHaveSingleLocation(operand))
  • AssignLocation(operand,node)

43
  • LikelyFUs (node, destinations)
  • minMAX_INT
  • for each fu in FeasibleLocations(node)
  • t CompletionCycle(node,fu,destinations)
  • if (t lt min)
  • min t
  • fus fu
  • cycles StartCycle(node,fu)
  • if (t min)
  • fus fu
  • cycles StartCycle(node,fu)
  • return fus, cycles

44
  • FeasibleLocations (node)
  • Returns the list of functional units that can
    perform that operation
  • StartCycle (node, fu)
  • Returns an estimate of the earliest cycle that a
    functional unit can be used to compute the node
    operation
  • Takes into account availability of function units
    and operand locations (delay and distance) if
    available
  • CompletionCycle (node, fu, destinations)
  • Returns StartCycle(node,fu) Delay(node,fu)
    Distance(fu,destinations)
  • Delay (node,fu)
  • Returns number of cycles to compute the operation
    on the functional unit
  • Distance (fu,destinations)
  • Returns minimum number of cycles to move the
    result of the functional unit to one of the
    destinations (0 if destinations is empty)

45
Notes on BUG
  • Top level routine calls Assign(root,NULL) for
    each root node
  • Assign called in decreasing depth order of the
    roots
  • The loop through the operands in Assign is also
    done in decreasing depth order
  • Assign for data node
  • DEF nodes do nothing
  • USE nodes pass any final locations to their
    parent nodes as the destinations list
  • Separate phase assigns locations to DEF and USE
    nodes that are still unassigned
  • Successors and predecessors are taken into
    account for this assignment

46
Shortcomings of BUG
  • Interconnect resource constraints cannot be
    checked
  • Assignment can oversaturate available buses
  • Assignment of values to registers occurs
    on-the-fly after FUs are assigned to operations
  • Subsequent copies of non-local data are scheduled
    later
  • Prior knowledge of these copies would benefit the
    FU assignment and scheduling
  • BUG is greedy
  • Future knowledge is not used in decisions
  • Decisions cannot be changed

47
BUG Example
Assign(6,) Assign(4,M1)
(A11102,A21113) Assign(2,A1)
2A1,0 (A10101,A20112)
4A1,1 (A11102,A21113)
Assign(1,M1) 1A2,0 (A12103,A2011
2) 6M1,2 Assign(5,) (A12103,A2210
3) Assign(2,A1,A2) Assign(3,A1,A2)
3L2,0 5A1,2 (A12103,A22103) Pr
oblem move A2,M1 time1 move L2,A1 time1
1
2 -
Cluster 1 Multiply(M1), ALU(A1) Cluster 2
Load(L2), ALU(A2) All Delays 1 Distance 0
within cluster Distance 1 between clusters
48
Unified Assign and Schedule
49
Cluster Priority Heuristics
  • None
  • Cluster list is not ordered
  • Random
  • Priority is a random number
  • Magnitude-weighted Predecessor (MWP)
  • Number of flow-dependent predecessors assigned to
    the cluster
  • Completion-weighted Predecessor (CWP)
  • Latest ready time for any flow-dependent
    predecessor assigned to the cluster
  • Critical-Path in Single Cluster using CWP (CPSC)
  • Priority calculation like CWP, but all nodes on
    critical path assigned to a single cluster

50
Advantages of UAS
  • Simple modification to list scheduler
  • Most common instruction scheduling technique
  • Cluster assignment is done with full knowledge of
    resource and interconnect availability
  • Better cluster utilization than BUG
  • Generates more compact schedules than BUG

51
Speedup compared to optimal for most frequently
executed basic block
52
Code size increase due to copy operations for
most frequently executed basic block
53
Speedup compared to 1-cluster 8-issue machine
(same number of resources) for full benchmark.
54
Instruction Scheduling for Clustered VLIW DSPs
  • Partition (simulated annealing)
  • T 10
  • RandomPartion(P)
  • mincost ListSchedule(graph,P)
  • while (T gt 0.01)
  • for i1 to 50
  • rRandom(1,n)
  • SwitchToOtherCluster(Pr)
  • cost ListSchedule(graph,P)
  • delta cost mincost
  • if (delta lt 0 or Random(0,1) lt exp(-delta/T))
  • mincost cost
  • else
  • SwitchToOtherCluster(Pr)
  • T T 0.9
  • return P

55
Getting Non-Local Operands
  • Check for an already existing copy with either
    the destination or source in the required cluster
    (CSE)
  • Use the crosspath if the crosspath is available
    this cycle and the operand supports it (take into
    account commutativity)
  • Insert a copy operation in a previous cycle

56
TI optimizing assembler versus Optimal
57
TI compiler versus algorithm
58
Effective Cluster Assignment for Modulo Scheduling
  • Problem
  • Acyclic scheduling is concerned with minimizing
    the schedule length
  • Cyclic scheduling is concerned with maximizing
    throughput
  • Algorithm
  • Greedy cluster assignment
  • Insert any necessary copy operations
  • Schedule using any standard non-cluster aware
    modulo scheduler

59
Cluster Assignment
  • Give higher priority to nodes in recurrence
    cycles
  • More critical the recurrence (higher recII)
    higher the priority
  • Speculatively reserve space for future copy
    operations to minimize resource contention
  • Aggressive cluster assignment could fill a
    cluster and prevent scheduling of a required copy
  • Iterative approach
  • Correct early sub-optimal assignments

60
Standard Bottom-Up Greedy Approach
61
Modified Priority and Speculative Copy Approach
62
Cluster Selection
63
Power-Aware Modulo Scheduling
Amit Marathe, Hai Huang
EECS 583 Class Presentation II 10th April, 2002
64
Reference Paper and Motivation
  • Power-Aware Modulo Scheduling for
    High-Performance VLIW Processors Yun and Kim,
    Seoul National University
  • Published in ISPLED 2001 (ACM Conference)
  • Motivation
  • Reduce Step Power and Peak Power from the
    software perspective
  • step/peak powers are more important than average
    power as far as reliability is concerned (not
    necessarily optimum power consumption)

65
Step Power
  • Step power is the difference in the average power
    consumed in two consecutive clock cycles
  • Reflected by surge in current for
    charging/discharging
  • Due to Aggressive/wider datapath design,
    increasing clock frequency, growing transistors
  • Reduces reliability and causes timing and logic
    errors (circuit switches at wrong time, latches
    wrong value)
  • At Microarchitectural level
  • Represents inductive noise Ldi/dt
  • Large surge in current gt more noise gt more
    faults
  • Aggressive turning off of FUs to reduce average
    power consumption can have conflicting goals with
    reducing step power

66
Peak Power
  • Peak Power is the maximum power dissipation
    during the execution of a given program
  • Peak power is exponentially proportional to chip
    reliability
  • High Peak power leads to device degradation,
    reducing the chip lifetime
  • Complex cooling systems needed to avoid
    overheating and ensure system reliability

67
Power-Aware Modulo Scheduling Algo
  • Aims at generating a balanced schedule that would
    reduce both step power and the peak power
  • Ideology Compilers are smarter than
    hardware-assisted solutions
  • Because compilers can fully control the usage of
    the functional units
  • Machine models tested
  • 8-issue VLIW
  • 1 IALU, 2 MEM, 1 IMPY, 2 FALU, 2 FMPY
  • 16-issue VLIW
  • 2(8-issue)
  • Benchmarks Tested
  • SPEC95 FP

68
Power-Aware Modulo Scheduling (contd)
  • (Too) Simple Power Estimation Method
  • P(op,i) is the power consumed by operation op
    in pipeline stage I
  • The total power consumed in 1 clock cycle is
    given by the sum of the total power consumed in
    each pipeline stage of that clock cycle
  • Total power consumed in one pipeline stage in a
    given clock cycle is the total power consumed by
    all ops in that pipeline stage
  • Problems What about inter-stage effects or
    inter-operation effects on power consumption ?

69
Base Algo (Iterative Modulo Scheduling)
70
Balanced IMS (power-aware algo)
Cost Function
Aim Minimize the cost function This is NOT a
complicated function ? It just says that pick a
schedule in which the above function is
minimized. P (Lsp,i) is the power consumed in
time-slot i of the software pipeline loop.
Ideal P (Lsp,I) is when all the ops are
no-ops Peak power is maximum P(Lsp,i) and the
step power is P(Lsp,I) P(Lsp,I-1) Somehow,
minimizing the cost function minimizes peak power
step power
71
Summary
  • IMS selects earliest time slot (within the
    computed slack time slack time is the range of
    time in which the op can be scheduled without
    violating dependency constraints) in which there
    is no resource conflict and schedules the op
  • BIMS uses the cost function F(Lsp,I) to place an
    instruction in one of the time slots in the slack
    (basically that time slot which incurs least
    increase of the cost function)

72
An Example for a better picture
IMS If power(noop)0, and power(other
ops)1 Peak power 4 Step power 3
Balanced IMS If power(noop) 0 and power(other
ops) 1 Peak power 2 Step power 0
73
Results
74
Conclusions
  • They dont make a strong case as to why average
    power is not important (they dont even analyze
    the average power)
  • Power model too simplistic
  • Seems to be a novel idea (so far most of the
    papers have focused on reducing step/peak power
    in hardware)
  • Promising results (almost 37.1 reduction in
    step-power consumption)
  • Idea worth exploring for large systems

75
Dynamic Voltage ScalingAn Overview
  • Hai Huang
  • Amit Marathe

76
Issue of Operating Voltage
  • Predominant device technology is CMOS
  • Energy proportional to operating voltage
  • Maximum gate delays inversely related to voltage
  • Can reduce unit computation energy by reducing
    frequency and voltage

77
Dynamic Voltage Scaling (DVS)
  • Weiser94
  • busy system --gt increase freqency
  • idle system --gt reduce frequency
  • Needs processors supporting software adjustable
    PLL, voltage regulator
  • e.g., Xscale, SpeedStep, PowerNow!, Crusoe

78
RT System vs. NRT System
  • All systems can be classified as either
  • 1. Real-Time System
  • 2. Non Real-Time System (or Soft Real-Time
    System)
  • NRTS works well with DVS no deadline
  • Some challenges of using DVS with RTS

79
Real-Time Systems
  • A task is characterized as (P, D, C)
  • P - period
  • D - deadline
  • C - worst case execution time (WCET)
  • All it matters is the task meeting its deadline!

80
RTS with DVS
  • Static DVS
  • Worst-case utilization, U
  • Task1 (10, 3) Task2 (5, 1) Task3 (10, 4)
  • U 3/10 1/5 4/10 0.9

Before
After
1.0
T1
T2
T3
T1
T2
T3
0.5
5
10
15
20
81
RTS with DVS (cont.)
  • Dynamic DVS
  • Observation WCET is much greater than ACET
  • Use actual execution time instead of WCET yields
    even higher energy saving

82
NRTS with DVS
  • Use the past to predict the future
  • Potentially have longer delay
  • No problem, no strict deadlines
  • Opportunity for more aggressive DVS algorithms
  • Needs to strike balance between energy-saving and
    performance

83
DVS in Compiler?
  • Mosse00
  • Power Management Points (PMP) are inserted to the
    generated code
  • Application monitors its own progress and adjust
    clock speed if appropriate
  • Targeted to single-threaded embedded systems

84
Power Management Points
  • A task is divided into n sections
  • Each section has a WCET
  • PMPs are inserted at the section boundaries
  • Obtains actual run-time of the section
  • Compare actual time to WCET, and adjust processor
    frequency accordingly
  • Natural places are loop boundaries and procedure
    call sites
  • Use profiling information to eliminate
    unnecessary PMPs to reduce overheads

85
Voltage Adjustment Schemes
  • NPM No Power Management
  • Every section runs at highest speed
  • SPM Static Power Management
  • Same as Static DVS approach use worst case
    utilization
  • DPM-P Dynamic Power Management Proportional
  • Task is divided in n sections, with task
    deadline d
  • j sections finished at
  • Speed is set to

86
Voltage Adjustment Schemes
  • DPM-G Dynamic Power Management Greedy
  • Task is divided in n sections, with task
    deadline d
  • j sections finished at
  • Speed is set to

87
Voltage Adjustment Schemes
  • DPM-S Dynamic Power Management Statistic
  • Task is divided in n sections, with task
    deadline d
  • j sections finished at
  • Speed is set to

88
Performance
89
Performance
90
Conclusion
  • DVS is a powerful way to save energy
  • If deadline is not an issue, then opportunity to
    be more aggressive to save energy
  • If meeting deadline is important, then be more
    conservative
  • DVS applying to compiler is still an open
    research area systems are mostly multitasking
Write a Comment
User Comments (0)
About PowerShow.com