PredicateAware Scheduling: A Technique for Reducing Resource Constraints - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

PredicateAware Scheduling: A Technique for Reducing Resource Constraints

Description:

Predicate-Aware Scheduling: A Technique for Reducing. Resource Constraints ... Hsien-Hsin (Sean) Lee. School of ECE. Georgia Institute of Technology. Motivation ... – PowerPoint PPT presentation

Number of Views:496
Avg rating:3.0/5.0
Slides: 23
Provided by: int6114
Category:

less

Transcript and Presenter's Notes

Title: PredicateAware Scheduling: A Technique for Reducing Resource Constraints


1
Predicate-Aware SchedulingA Technique for
ReducingResource Constraints
  • Mikhail Smelyanskiy, Scott Mahlke, Edward
    Davidson
  • Department of EECS
  • University of Michigan

Hsien-Hsin (Sean) Lee School of ECE Georgia
Institute of Technology
2
Motivation
  • Predication eliminates branch instructions
  • but increases resource requirements
  • Predicate-aware scheduling oversubscribes
    resources
  • reduces resource requirements
  • reduces schedule length

A br cond
0 A 1 p1,p2pred_def(cond) 2 B if p1 C if
p2 3 D
0 A 1 p1,p2pred_def(cond) 2 B if p1 3 C if
p2 4 D
F
T
B
C
D
3
Potential for Disjoint Operations
  • Combining reduces dynamic operation count by 13

4
Outline
  • Motivation
  • Resource Pressure Problem in Predicated Code
  • PRAVO PRedicate-Aware VLIW Processor
  • Predicate-aware Scheduling
  • Performance Results
  • Conclusion and Future Work

5
Modulo Scheduling Example
Predicated Code
Source Code
for(i0 i lt im_size i) if (q_imi
1) resi q_imi bin_size
correction else if (q_imi -1) resi
q_imi bin_size correction else resi
bin_size correction
op1 t1 load(i1, q_im) if T op2 p1,p2pred_de
f (t1 1) if T op3 t2 multsub(t1, tbs, tcor)
if p1 op4 store(i1, res, t2) if p1 op5 p3,p4
pred_def (t1 -1) if p2 op6 t2 multadd(t1,
tbs, tcor) if p3 op7 store(i1, res, t2) if
p3 op8 t2 add(tbs, tcor) if
p4 op9 store(i1, res, t2) if p4 op10 if (i
lt im_size) goto op1 if T
  • Three control paths PT, PFT, PFF

6
Traditional Modulo Schedule (Rau 94)
Modulo Schedule
II5
7
Two Predicate-Aware Modulo Schedules
  • Resource oversubscription can produce more
    efficient schedules (if colored operations can
    share entry)
  • Larger Fetch Width (FW) allows more
    oversubscription and faster schedule

8
Baseline Architecture Model
Must-use Resources
May-use
Predicate Register File
REGISTER READ
FETCH
DISPATCH
DECODE
WRITE BACK
PRED READ EXECUTE
  • Predicate Register File is only accessed in
    EXECUTE stage
  • Resources from FETCH to EXECUTE are
    unconditionally reserved

9
Predicate-aware Architecture (PRAVO)
Must-use Resources
May-use Resources
Predicate Register File (PRF)
REGISTER READ
PRED READ DISPATCH
DECODE
FETCH
WRITE BACK
EXECUTE
  • PRF is accessed early in DISPATCH stage
  • increases predicate defining operation latency

10
Predicate-aware Architecture (PRAVO)
Must-use Resources
May-use Resources
Predicate Register File (PRF)
REGISTER READ
PRED READ DISPATCH
DECODE
FETCH
WRITE BACK
EXECUTE
  • DECODE and DISPATCH are reversed

11
Three Main Changes to Conventional Scheduler
4
Reservation Tables
1
5
2
3
  • Predicate defining operation edge latency
    adjustment
  • ResMII computation
  • Predicate-Aware Reservation Table

12
Data Dependence Graph Latency Adjustment
Original
Brute force
Selective
p1,p2pred_def
p1,p2pred_def
p1,p2pred_def
2
2
2
1
1
1
1 if p1
1 if p1
1 if p1
ld if p2
ld if p2
ld if p2
1
1
1
1
1
1
3 if p2
2 if p1
3 if p2
2 if p1
3 if p2
2 if p1
1
1
1
4 if p2
4 if p2
4 if p2
13
Computation of Resource-Constrained Lower Bound
4 if p2
p1,p2pred_def
1
1
3 if p2
1 if p1
ld if p2
2 if p1
2 if p1
4 if p2
1
1
1 if p1
1 if p1
3 if p2
2 if p1
3 if p2
ld if p2
p,p
p1,p2
ld if p
1
4 if p2
M
A
FW
Mmay
FWmust
Amay
Original (ResMII5)
Predicate-Aware (ResMII3)
  • Predicate-aware ResMII computation
  • first-fit combining
  • Fetch Width (FW) resource constraint

14
Reservation Table (similar to Warter 92)
  • One operation per RT entry
  • Multiple disjoint operations per RT entry
  • Check disjointness (using PQS Johnson96)

15
Performance Results
  • Compare the performance of baseline and
    predicate-aware scheduling
  • Compiler Support
  • Trimaran and ELCOR Trimaran99
  • Mediabench Lee97 benchmark suite was evaluated
  • Processor Models (BA base, PA predicate-aware)

16
Predicate-aware Speedup over Baseline(PA42 vs.
BA42)
average
  • Speedup is only due to improvable PA regions
  • Speedup decreases for higher latency and wider
    machine

17
Average Speedup Breakdown
  • Only 68 of regions are PA scheduled
  • PA is more effective in modulo scheduled loops

18
Summary and Future Work
  • Summary
  • Predicate-aware Scheduling
  • reduces resource constraints in predicated code
  • is supported by PRAVO architecture
  • is effective in cyclic regions (16 speedup on
    4-wide PRAVO)
  • Future work
  • More resource sharing can be achieved by
    combining probabalistically disjoint operations

19
QA and Suggestions
20
  • Backup Foils

21
Modulo Scheduling Using PART
22
Speedup Analysis
Predicate-Aware Acyclic Region
Predicate-Aware Cyclic Region
6-wide cmpplat2
4-wide cmpplat2
4-wide cmpplat3
6-wide cmpplat2
4-wide cmpplat2
4-wide cmpplat3
Case 2
Case 1
Case 3
Case 6
Case 5
Case 4
0
0
PA Potential ? Base Sched. Length ? PA
Sched. Length ? PA Critical Path Length ? PA
Resource Bound
Write a Comment
User Comments (0)
About PowerShow.com