Safe RTL Annotations for Low Power Microprocessor Design - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Safe RTL Annotations for Low Power Microprocessor Design

Description:

Instruction-driven Slicing to attain lower power dissipation ... microprocessor description at the Register Transfer Level and Architectural level ... – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 30

Provided by: cercU

Category:

more less

Transcript and Presenter's Notes

Title: Safe RTL Annotations for Low Power Microprocessor Design

1
Safe RTL Annotations for Low Power Microprocessor
Design
Vinod Viswanath Department of Electrical and
Computer Engineering University of Texas at Austin
Talk at Tata Institute of Fundamental Research,
Mumbai, India.
2
Outline

Power Dissipation in Hardware Circuits
Instruction-driven Slicing to attain lower power
dissipation
Automatically annotates microprocessor
description at the Register Transfer Level and
Architectural level
Correctness of the introduced annotations
Case studies

3
Power Dissipation
P 1/2 C V2DD f N QSC VDD f
N Ileak VDD

Switching activity power dissipation
To charge and discharge nodes
Short Circuit power dissipation
High only for output drivers, clock buffers
Static power dissipation
Due to leakage current

4
Switching Activity Power Dissipation

Reduce the squared term VDD
Leads to exponential increase in Ileak
Host of techniques to reduce switching power at
the gate level
Clock gating
Relatively much lesser at the RTL
Use program structure and dataflow information
available at that level of abstraction

5
Instruction-driven Slice

An instruction-driven slice of a microprocessor
design is
all the relevant circuitry of the design required
to completely execute a specific instruction
Parts of the decode, execute, writeback etc.
blocks
Cone of influence of the semantics of the
instruction

6
Instruction-driven Slicing

Given a microprocessor design and an instruction
Identify the instruction-driven slice
Shut off the rest of the circuitry
This might include
Gating out parts of different blocks
Gating out floating point units during integer
ALU execution
Turning off certain FSMs in different control
blocks since exact constraints on their inputs
are available due to instruction-driven slicing

7
Algorithm (High Level)

Algorithm instruction-driven-slicing.
Begin
Inputs vRTL (Verilog RTL), insts (instructions)
Output aRTL (Annotated RTL)
Parse vRTL to obtain the Abstract Syntax Program
Graph (ASPG)
For each instruction I in insts repeat
Slice the ASPG for instruction I
Traverse the ASPG
Add annotation variables if such a block is found
If a particular flop is already gated, then
add the current annotation in an optimal
fashion
Return the annotated ASPG
Generate Verilog code (aRTL) for the annotated
ASPG
End.

8
Instructions as LTL Properties

Let I i1 Æ X i2 Æ XX i3 ... Xn-1 in be an
instruction written as an LTL property, such that
ir represents the conditions for the instruction
I on clock cycle r.
i1 represents the instruction word.

9
RISC Pipeline (OR1200)

5 stage RISC pipeline implementation
Condition for slicing on ADDC instruction
i1 ((icpu_dat_i31266b 111000) Æ
(!rst) Æ (!flushpipe) Æ (!if_freeze))
i2 (!id_freeze)
i3 (!ex_freeze)
i4 (!mem_freeze)
i5 (!wb_freeze)
I i1 Æ X i2 Æ X2i3 Æ X3i4 Æ X4i5

10
OR1200 ADDC Instruction

Introduces five variables
iADDC_if i1
iADDC_id 1 iADDC_if Æ i2
iADDC_ex 1 iADDC_id Æ i3
iADDC_mem 1 iADDC_ex Æ i4
iADDC_wb 1 iADDC_mem Æ i5

11
or1200_ctrl.lsu_op
12
or1200_ctrl.pre_branch_op
13
Correct Annotations

Notion of correctness
Original RTL and the annotated RTL should be
functionally equivalent under all conditions
Correctness theorem
(defthm or1200_slicing_correct
(equal (or1200_cpu n)
(or1200_cpu_sliced n)))

14
ACL2 Theorem Prover

First order logic general purpose theorem prover
Breakdown the theorem into sub-goals
Many engines work on the sub-goals and will
either prove them or break them down further and
add to the central pool of goals to be proved
Success story in Hardware
Verified FDIV in the AMD processors

15
Proof Methodology

The RTL is a shallow embedding in ACL2
Convert Verilog RTL into ACL2RTL
We have created a large RTL library to recognize
as well as analyze ACL2RTL
Slicing is done on the Verilog code
Both original and annotated Verilog are converted
into ACL2 and we construct the functional
equivalence proof in ACL2

16
Verilog to ACL2
17
Methodology

In order to demonstrate our technique
We have incorporated instruction-driven slicing
as part of the traditional design flow
The vRTL model is annotated to obtain the aRTL
model
Synopsys Design Environment has been sufficiently
modified to accept the aRTL, SPEC2000 benchmarks
and power process parameters and estimate the
power dissipation due to switching activity
The annotated Architectural model is fed to the
SimpleScalar simulator with the Wattch power
estimator to estimate the power dissipation

18
Methodology
19
Experiment OR1200

We have used our tool-chain to test our
methodology on OR1200
OR1200 is a pipelined microprocessor implementing
the OpenRISC ISA.
5-stage integer pipeline with single instruction
issue per cycle
We have annotated both the RTL and the
architectural models of OR1200

20
OR1200 single instruction issue pipelined
microprocessor
21
OR1200 Power Gain Results

Results are shown after annotating the
RTL (left) and Architectural (Right) models
For un-sliced and sliced on 1, 4, 10 instructions
For SPECINT2000 benchmarks
Power dissipation decreases consistently

22
OR1200 Results (contd.)
Fig.2a
Fig.2b
Fig. 1

Power gains are consistently good (Fig. 1)
Power gains far outperform area losses (Fig 1)
Flop distribution shown before slicing (Fig. 2a)
after slicing on add (Fig. 2b) and after slicing
on load (Fig. 2c)

Fig.2c
23
Experiment PUMA

We have used our tool-chain to test our
methodology on PUMA
PUMA is a dual-issue, out-of-order super-scalar,
fixed-point PowerPC core
We have annotated both the RTL and the
architectural models of PUMA

24
PUMA a fixed point PowerPC core
25
PUMA Power Gain Results

Results are shown after annotating the
RTL (left) and Architectural (Right) models
For un-sliced and sliced on 1, 4, 10 instructions
For SPECINT2000 benchmarks
Power dissipation decreases consistently

26
PUMA Results (contd.)
Fig.3a
Fig. 1
Fig. 2
Fig.3b

Power gains are good upon slicing for a few
instructions (7) before delay losses start
dominating (Fig. 1)
Power gains far outperform area losses (Fig 2)
Flop distribution shown before slicing (Fig. 3a)
after slicing on add (Fig. 3b) and after slicing
on load (Fig. 3c)

Fig.3c
27
Comparing OR1200 and PUMA
28
Conclusions

Proposed Instruction-driven Slicing as a new
technique to automatically reduce power
dissipation
Implemented the methodology of incorporating
instruction-driven slicing into the design flow
tool-chain
Inserting these annotations preserves the
functionality of the circuit

29
Conclusions (continued)

This technique seems most applicable to
single-issue multi-staged pipelined machines.
When there are multiple instructions in-flight in
the same pipeline stage, the gains of a
single-instruction-abstraction are lost.
Graphics processors, various embedded
applications are more often better suited for
this technique than general purpose out-of-order
superscalars.

Write a Comment

User Comments (0)