Title: Introduction to VLSI Programming Lecture 7: Introduction to the DLX
1Introduction to VLSI Programming Lecture 7
Introduction to the DLX
- (course 2IN30)
- Prof. dr. ir.Kees van Berkel
-
2VLSI programming for
- Low costs
- introduce resource sharing.
- Low delay (high throughput)
- introduce parallelism.
- Low energy (low power)
- reduce activity
3VLSI programming for low costs
- Keep it simple!!
- Introduce resource sharing commands, auxiliary
variables, expressions, operators. - Enable resource sharing, by
- reducing parallelism
- making similar commands equal
4Procedure definition vs declaration
- Procedure definition P proc (). S
- provides a textual shorthand (expansion)
- each call generates copy of resource, i.e. no
sharing - Procedure declaration P proc (). S
- defines a sharable resource
- each call generates access to this resource
5Hints and Tips optimization
- When asked to optimize for area (low cost) it is
allowed to invest time (execution time, extra
iterations, ) - When asked to optimize for speed, it is allowed
to invest area (pipeline stages, parallelism, )
6Hints and Tips a known bug
- Statement of form if x then S0 else S1 fi
- During simulation wrong alternative is selected
(e.g. S0 when x true) - Work around remove negation if x then S1 else
S0 fi
7Instruction Set Architecture
- ISA is interface between hardware and software.
- Hence, a good ISA
- allows easy programming (compilers, OS, ..)
- allows efficient implementations (hardware)
- has a long lifetime (survives many HW
generations) - is general purpose.
8ISA classification
Code sequence for C AB
9Reduced Instruction Set Computer
- 1980 Patterson and Ditzel The Case for RISC
- fixed 32-bit instruction set, with few formats
- load-store architecture
- large register bank (32 registers), all general
purpose - On processor organization
- hard-wired decode logic
- pipelined execution
- single clock-cycle execution
10RISC processors
- Advantages
- smaller die size (single chip processor)
- shorter development time (simplicity)
- higher performance
- Disadvantages
- poor code density
- cannot execute X86 code
11A Typical RISC
- 32-bit instructions, 3 fixed formats
- 32 general purpose registers, 32-bit
- 3 address arithmetic instructions, reg-reg
- single address mode for load/store address
displacement - simple branch conditions delayed branch
12DLX (Deluxe)
-
- (AMD 29K DECstation 3100 HP850 IBM801
Intel i860 MIPS M/120A MIPS M/1000 Motorola
88K RISC I SGI 4D/60 SPARCstation-1 Sun
4/110 Sun-4/260) / 13 - DLX
- Other RISC examples include
Cray-1,2,3, AMD2900, DEC
Alpha, ARM.
13DLX instruction formats
31 26, 25 21, 20 16, 15 11, 10
0
14Example instructions
15GCD in GCL
- x,y X,Y
- do x?y ? if xgty ? x x-y
- xlty ? y y-x
- fi
- od
- R xgcd(X,Y)
16GCD in DLX assembler
- pre LW R1,4(R0) R1Mem40
- LW R2,8(R0) R2Mem80
- loop SUB R3,R1,R2 R3R1-R2
- BEQZ R3,exit if (R30) then PCexit
- SLT R4,R1,R2 R4(R1ltR2)
- BEQZ R4,pos2 if (R40) then PCpos2
- pos1 SUB R2,R2,R1 R2R2-R1
- J loop PCloop
- pos2 SUB R1,R1,R2 R1R1-R2
- J loop PCloop
- exit SW 20(R0),R1 Mem200R1
- HLT
17DLX instruction mixes
from HP, Figs 2.26, 2.27
18DLX interface, state
Instruction memory
Mem (Data memory)
address
address
r0
pc
r1
r2
DLX CPU
Reg
instruction
data
r/w
r31
clock
interrupt
19DLX Moore machine(ignoring interrupts)
- ?Reg0,pc ? ?0,0?
- do ?MemRegrs1 immediate, pc, Regrd ?
- ? if SW ? Regrd fi
- , if J ? pc4offset
- BEQZ ? if Regrs0 ? pc4
immediate Regrs0 ? pc4 fi - else ? pc4
- fi
- , if LW ? Memrs1immediate
- ADD ? ALU(add, Regrs1, Regrs2)
- fi ?
- od
20DLX 5-step sequential execution
21DLX 5-step sequential execution
IF
ID
EX
MM
WB
22Bibliography
- Computer Architecture a Quantitative Approach
(3rd Ed.) John L Hennessy David A Patterson
Morgan Kaufmann Publishers Inc, 1996. - ARM System Architecture Steve Furber Addison
Wesley, 1996. - DSP Processor Fundamentals, Architectures and
Features Phil Lapsey et al (Berkeley Design
Technology Inc.), IEEE, 1996. - www.handshakesolutions.com
- www.arm.com/news/6936.html
- www.research.philips.com/ newscenter/archive/2004/
handshake.html
23Some references
- www.handshakesolutions
- www.arm.com/news/6936.html
- www.research.philips.com/ newscenter/archive/2004/
handshake.html
24Pipelining in Tangram (cntd)
- Output sequence b identical for P0, P1, and P2.
- P0 and P1 have same communication behavior P1
is larger, slower, and warmer. - P2 vs P1 similar in size, energy, and latency,
but up to 3 times higher throughput, depending
on (relative) complexity of f0, f1, f2.
25DLX 5-step sequential execution
IF
ID
EX
MM
WB
26DLX pipelined execution
Time ? in clock cycles 1 2 3
4 5 6 7 8
...
Program execution ? instructions
27DLX pipelined execution
Instruction Fetch
Inst.Decode
EXecute
Memory
Write Back
4
0?
pc
Instr. mem
Reg
Mem
28DLX system organization
RAMaddrdatatoRAMdatafromRAM
ROMaddrROMdata
dlx()
systemboundary
rom()
ram()
filesRAMoutRAMin
system_dlx()
file gcd.bin
29dlx0.ht
- include types.ht
- dlx0 export proc ( ROMaddr!chan adtype
- ROMdata?chan word
- RAMaddr!chan rwadtype datatoRAM!chan
S30 datafromRAM?chan S30 - ) .
- begin
- RF ram array U5 of S30
- end
30system_dlx0.ht
- include "dlx0.ht"
-
- dlx0 proc ( ROMaddr!chan adtype
- ROMdata?chan word
- RAMaddr!chan rwadtype datatoRAM!chan
S30 datafromRAM?chan S30 - ) . import
- env_dlx4 main proc (
- ROMfile? chan word
- RAMinfile? chan S30
- RAMfile! chan S30 / ltltaddress,datagtgt
/ - ) .
- begin
- next slide
- end
31system_dlx0.ht main body
- begin
- ROMaddr chan adtype
- ROMdata chan word
- RAMaddr chan rwadtype
- datatoRAM chan S30
- datafromRAM chan S30
-
- ROMinterface proc() . begin .. end
- RAMinterface proc() . begin .. end
-
- initialise() ROMinterface()
RAMinterface() dlx0( ROMaddr, ROMdata,
RAMaddr, datatoRAM, datafromRAM ) - end
32script
- htcomp -B system_dlx0
- htsim -limit 1000 system_dlx0 gcd.bin RAMin
RAMout - htview system_dlx0