Introduction to VLSI Programming Lecture 7: Introduction to the DLX - PowerPoint PPT Presentation

About This Presentation

Title:

Introduction to VLSI Programming Lecture 7: Introduction to the DLX

Description:

Introduction to VLSI Programming Lecture 7: ... Introduce resource sharing: commands, auxiliary variables, expressions, operators. ... DLX ('Deluxe' ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 33

Provided by: keesvan

Learn more at: http://www.cs.unc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to VLSI Programming Lecture 7: Introduction to the DLX

1
Introduction to VLSI Programming Lecture 7
Introduction to the DLX

(course 2IN30)
Prof. dr. ir.Kees van Berkel

2
VLSI programming for

Low costs
introduce resource sharing.
Low delay (high throughput)
introduce parallelism.
Low energy (low power)
reduce activity

3
VLSI programming for low costs

Keep it simple!!
Introduce resource sharing commands, auxiliary
variables, expressions, operators.
Enable resource sharing, by
reducing parallelism
making similar commands equal

4
Procedure definition vs declaration

Procedure definition P proc (). S
provides a textual shorthand (expansion)
each call generates copy of resource, i.e. no
sharing
Procedure declaration P proc (). S
defines a sharable resource
each call generates access to this resource

5
Hints and Tips optimization

When asked to optimize for area (low cost) it is
allowed to invest time (execution time, extra
iterations, )
When asked to optimize for speed, it is allowed
to invest area (pipeline stages, parallelism, )

6
Hints and Tips a known bug

Statement of form if x then S0 else S1 fi
During simulation wrong alternative is selected
(e.g. S0 when x true)
Work around remove negation if x then S1 else
S0 fi

7
Instruction Set Architecture

ISA is interface between hardware and software.
Hence, a good ISA
allows easy programming (compilers, OS, ..)
allows efficient implementations (hardware)
has a long lifetime (survives many HW
generations)
is general purpose.

8
ISA classification
Code sequence for C AB
9
Reduced Instruction Set Computer

1980 Patterson and Ditzel The Case for RISC
fixed 32-bit instruction set, with few formats
load-store architecture
large register bank (32 registers), all general
purpose
On processor organization
hard-wired decode logic
pipelined execution
single clock-cycle execution

10
RISC processors

Advantages
smaller die size (single chip processor)
shorter development time (simplicity)
higher performance

Disadvantages
poor code density
cannot execute X86 code

11
A Typical RISC

32-bit instructions, 3 fixed formats
32 general purpose registers, 32-bit
3 address arithmetic instructions, reg-reg
single address mode for load/store address
displacement
simple branch conditions delayed branch

12
DLX (Deluxe)

(AMD 29K DECstation 3100 HP850 IBM801
Intel i860 MIPS M/120A MIPS M/1000 Motorola
88K RISC I SGI 4D/60 SPARCstation-1 Sun
4/110 Sun-4/260) / 13
DLX
Other RISC examples include
Cray-1,2,3, AMD2900, DEC
Alpha, ARM.

13
DLX instruction formats
31 26, 25 21, 20 16, 15 11, 10
0
14
Example instructions
15
GCD in GCL

x,y X,Y
do x?y ? if xgty ? x x-y
xlty ? y y-x
fi
od
R xgcd(X,Y)

16
GCD in DLX assembler

pre LW R1,4(R0) R1Mem40
LW R2,8(R0) R2Mem80
loop SUB R3,R1,R2 R3R1-R2
BEQZ R3,exit if (R30) then PCexit
SLT R4,R1,R2 R4(R1ltR2)
BEQZ R4,pos2 if (R40) then PCpos2
pos1 SUB R2,R2,R1 R2R2-R1
J loop PCloop
pos2 SUB R1,R1,R2 R1R1-R2
J loop PCloop
exit SW 20(R0),R1 Mem200R1
HLT

17
DLX instruction mixes
from HP, Figs 2.26, 2.27
18
DLX interface, state
Instruction memory
Mem (Data memory)
address
address
r0
pc
r1
r2
DLX CPU
Reg
instruction
data
r/w
r31
clock
interrupt
19
DLX Moore machine(ignoring interrupts)

?Reg0,pc ? ?0,0?
do ?MemRegrs1 immediate, pc, Regrd ?
? if SW ? Regrd fi
, if J ? pc4offset
BEQZ ? if Regrs0 ? pc4
immediate Regrs0 ? pc4 fi
else ? pc4
fi
, if LW ? Memrs1immediate
ADD ? ALU(add, Regrs1, Regrs2)
fi ?
od

20
DLX 5-step sequential execution
21
DLX 5-step sequential execution
IF
ID
EX
MM
WB
22
Bibliography

Computer Architecture a Quantitative Approach
(3rd Ed.) John L Hennessy David A Patterson
Morgan Kaufmann Publishers Inc, 1996.
ARM System Architecture Steve Furber Addison
Wesley, 1996.
DSP Processor Fundamentals, Architectures and
Features Phil Lapsey et al (Berkeley Design
Technology Inc.), IEEE, 1996.
www.handshakesolutions.com
www.arm.com/news/6936.html
www.research.philips.com/ newscenter/archive/2004/
handshake.html

23
Some references

www.handshakesolutions
www.arm.com/news/6936.html
www.research.philips.com/ newscenter/archive/2004/
handshake.html

24
Pipelining in Tangram (cntd)

Output sequence b identical for P0, P1, and P2.
P0 and P1 have same communication behavior P1
is larger, slower, and warmer.
P2 vs P1 similar in size, energy, and latency,
but up to 3 times higher throughput, depending
on (relative) complexity of f0, f1, f2.

25
DLX 5-step sequential execution
IF
ID
EX
MM
WB
26
DLX pipelined execution
Time ? in clock cycles 1 2 3
4 5 6 7 8
...
Program execution ? instructions
27
DLX pipelined execution
Instruction Fetch
Inst.Decode
EXecute
Memory
Write Back
4
0?
pc
Instr. mem
Reg
Mem
28
DLX system organization
RAMaddrdatatoRAMdatafromRAM
ROMaddrROMdata
dlx()

systemboundary
rom()
ram()
filesRAMoutRAMin
system_dlx()
file gcd.bin
29
dlx0.ht

include types.ht
dlx0 export proc ( ROMaddr!chan adtype
ROMdata?chan word
RAMaddr!chan rwadtype datatoRAM!chan
S30 datafromRAM?chan S30
) .
begin
RF ram array U5 of S30
end

30
system_dlx0.ht

include "dlx0.ht"
dlx0 proc ( ROMaddr!chan adtype
ROMdata?chan word
RAMaddr!chan rwadtype datatoRAM!chan
S30 datafromRAM?chan S30
) . import
env_dlx4 main proc (
ROMfile? chan word
RAMinfile? chan S30
RAMfile! chan S30 / ltltaddress,datagtgt
/
) .
begin
next slide
end

31
system_dlx0.ht main body

begin
ROMaddr chan adtype
ROMdata chan word
RAMaddr chan rwadtype
datatoRAM chan S30
datafromRAM chan S30
ROMinterface proc() . begin .. end
RAMinterface proc() . begin .. end
initialise() ROMinterface()
RAMinterface() dlx0( ROMaddr, ROMdata,
RAMaddr, datatoRAM, datafromRAM )
end

32
script