InstructionLevel Parallelism for LowPower Embedded Processors - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

InstructionLevel Parallelism for LowPower Embedded Processors

Description:

Reduction of critical path. Control Dependences. Resource ... A new VLIW architecture to reduce increase in code size. A prefix based predicated ... – PowerPoint PPT presentation

Number of Views:811
Avg rating:3.0/5.0
Slides: 44
Provided by: vand169
Category:

less

Transcript and Presenter's Notes

Title: InstructionLevel Parallelism for LowPower Embedded Processors


1
Instruction-Level Parallelism for Low-Power
Embedded Processors
Ph.D. Thesis Jean Michel Puiati, EPFL
  • January 23, 2001
  • Presented By
  • Anup Gangwar

2
Introduction
  • Need for high performance low power processors
  • Synergistic hardware -compiler design for EPIC or
    VLIW like architectures
  • A new variable instruction length scheme
  • Full predication support in hardware

3
Outline
  • Instruction-Level Parallelism
  • Power Consumption in VLSI Circuits
  • A Look at Available Mobile and DSP Processors
  • High-Level Evaluation of A Low-Power VLIW
    Processor
  • The DEVIL Low-Power Processor
  • A Step Towards Predicated Execution
  • Conclusion

4
ILP Concepts and Limitations
  • Data Dependences
  • Flow Dependence or RAW
  • Anti Dependence or WAR
  • Output Dependence or WAW
  • Reduction of critical path
  • Control Dependences
  • Resource Conflicts

5
(No Transcript)
6
Achieving ILP Pipelining
  • Control dependencies affect pipelined execution
  • Data dependencies affect pipelined execution
  • Resource conflicts affect pipelined execution

7
Achieving ILP Superscalar
Architectures
  • In-order issue with in-order completion
  • In-order issue with out-of-order completion
  • Out-of-order issue with out-of-order completion

8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Achieving ILP VLIW Processors
  • Low circuit overhead than Superscalar Processors
  • Limited number of resources
  • Explicit insertion of NOPs increases code size

12
(No Transcript)
13
Extracting ILP BasicBlock Scheduling
14
Extracting ILP Superblock Scheduling
15
Extracting ILP Predicated Execution
16
Power Consumption in CMOS Circuits Parallelism
for Energy Efficiency
17
(No Transcript)
18
Available Mobile and VLIW Processors
  • The ARM Family
  • The ARM7 Generation
  • The StrongARM
  • The ARM Thumb Option
  • The ARM Piccolo Option
  • The ARM9 and ARM10

19
Available Mobile and VLIW Processors
  • The Motorola M-Core
  • The LSI TinyRisc
  • The Hitachi SuperH Family
  • VLIW Processors
  • The Motorola-Lucent StarCore
  • The Philips TriMedia
  • The HP/Intel IA-64

20
High Level Evaluation of A Low-Power VLIW
Processor
  • Energy consumption distribution

21
High Level Evaluation of A Low-Power VLIW
Processor
  • NOP Elimination in VLIW Processor

22
High Level Evaluation of A Low-Power VLIW
Processor
  • Speed-up Comparison

23
High Level Evaluation of A Low-Power VLIW
Processor
  • Energy Comparison

24
High Level Evaluation of A Low-Power VLIW
Processor
  • Energy-Delay Product Comparison

25
The DEVIL Low-Power Processor
  • Complexity in VLIW Architectures
  • Hardware Duplication
  • FUs and number of registers as well as ports
  • Number of FUs versus type of FU
  • Number of FUs versus available ILP

26
The DEVIL Low-Power Processor
  • Code Memory

27
The DEVIL Low-Power Processor
28
The DEVIL Low-Power Processor
  • Instruction Fetch Mechanism

29
The DEVIL Low-Power Processor
  • Branch Prediction Mechanism

30
The DEVIL Low-Power Processor
  • Performance with and without superscalar
    optimizations

31
The DEVIL Low-Power Processor
  • Effect of SuperScalar optimization on code size

32
The DEVIL Low-Power Processor
  • Effect of NOP elimination on code size

33
The DEVIL Low-Power Processor
  • Effect of NOP elimination on the number of
    accesses to code memory

34
The DEVIL Low-Power Processor
  • Effect of instruction fetch mechanism on code size

35
The DEVIL Low-Power Processor
  • Code size comparison with existing mobile
    processors

36
A Step Towards Predicated Execution
  • Compiler techniques for reducing predicate code
    size
  • Reduction of number of Control Instructions
  • Predicate promotion and Instruction merging
  • Instruction reduction for advanced code generation

37
A Step Towards Predicated ExecutionReduction of
number of Control Instructions
38
A Step Towards Predicated Execution Predicate
promotion and Instruction merging
39
A Step Towards Predicated Execution
  • Introducing predication support into processor
  • Effect on code size of full predication
  • Predication code size and Execution
    Characterstics
  • Prefix based predication

40
A Step Towards Predicated Execution
  • Relative number of predicated instructions

41
A Step Towards Predicated Execution
  • Code expansion considering predication

42
A Step Towards Predicated Execution
  • Code reductions due to predicated execution

43
Conclusions
  • A synergistic hardware-compiler approach for
    low-power processors
  • A new VLIW architecture to reduce increase in
    code size
  • A prefix based predicated execution architecture
    framework
Write a Comment
User Comments (0)
About PowerShow.com