Lecture 4: Instruction Set DesignPipelining presentation

About This Presentation

Transcript and Presenter's Notes

Title: Lecture 4: Instruction Set DesignPipelining

1
Lecture 4 Instruction Set Design/Pipelining

Instruction set design (Sections 2.9-2.12)
control instructions
instruction encoding
Basic pipelining implementation (Section A.1)

2
Control Transfer Instructions

Conditional branches (75 - Int) (82 - FP)
Jumps (6 - Int) (10 - FP)
Procedure calls/returns (19 - Int) (8 - FP)
Design issues
How do you specify the target address?
How do you specify the condition?
What happens on a procedure call/return?

3
Specifying the Target Address

PC-Relative needs fewer bits to encode,
independent of
how/where the compiled code is linked, used for
branches
and jumps typically, the displacement needs
4-8 bits
Register-indirect jumps the address is not
known at
compile-time and has to be computed at run-time
(note can
use any other addressing mode too)
procedure returns
case statements
virtual functions
function pointers
dynamically shared libraries

4
Specifying the Condition
5
Procedure Call/Returns

Need to maintain a stack of return addresses (in
memory or
in hardware)
Can copy and save all registers together or this
can be done
selectively
Who is responsible for saving registers?
Caller saving correctness issues (global
register has to
be made available to other procedures), it
only saves
values that it cares about
Callee saving it saves only as many registers
as it
needs (provided it doesnt call other
procedures)
A combination of both is typically employed

6
Instruction Set Encoding

Operations are easy to encode efficiently the
key issues
are the number of operands and their
addressing modes
Few addressing modes ? low complexity in
decoding and
pipelining, but greater code size
Fixed instruction lengths ? low complexity in
decoding, but
greater code size

7
Instruction Lengths
8
Dealing with Code Size in RISC

Some hybrid versions allow for 16 and 32-bit
instructions
(40 reduction in code size) useful for
embedded apps
IBM PowerPC stores 32-bit instructions in
compressed
form in memory more hardware complexity on an
I-cache
miss (need to translate from uncompressed to
compressed
in addition to virtual to physical)
Reducing the register file size can also reduce
the
instruction length

9
Compiler Optimizations

The phase-ordering problemearly phases have to
assume that
register allocation will find a register, else,
optimizations such as
common subexpression elimination may increase
memory traffic

10
Register Allocation Issues

Graph coloring determine when variables are
live and
avoid allocating the same register to variables
that are
simultaneously live
Stack variables (typically local to a
procedure) easy to
allocate registers for
Global data can be accessed from multiple
places (aliasing),
difficult to allocate to registers
Heap data dynamically created objects, accessed
with
pointers, difficult to allocate to registers
because of aliasing

11
Case Study The MIPS ISA

Load-store architecture
Focus on pipelining, decoding, and compiler
efficiency
In other words, RISC

12
Registers

32 GPRs (general-purpose/integer registers) and
32 FPRs
64-bit registers two single-precision FP values
can fit in
one register
Register R0 is hardwired to zero with
displacement
addressing mode, we can also accomplish
absolute
addressing other uses for R0?

13
Instruction Format
14
Control Instructions

Comparisons with zero can happen as part of the
branch
Compares between registers are placed in other
registers
that are tested by branches
Jump-and-link places the return address in
register R31

15
Instruction Frequencies
16
Summary

In the 1960s, stack architectures were
considered a good
match for high-level languages
In the 1970s, software costs were a concern
ISAs were
enriched to make the compilers job easier
CISC
In the 1980s, there was a push for simpler
architectures
high clock speed and high parallelism RISC
ISAs designed in 1980 are still around!

17
The Assembly Line
Unpipelined
Start and finish a job before moving to the next
Jobs
Time
A
B
C
Break the job into smaller stages
A
B
C
A
B
C
A
B
C
Pipelined
18
Performance Improvements?

Does it take longer to finish each individual
job?
Does it take shorter to finish a series of jobs?
What assumptions were made while answering these
questions?
Is a 10-stage pipeline better than a 5-stage
pipeline?

Lecture 4: Instruction Set DesignPipelining PowerPoint PPT Presentation