Introduction to VLSI Programming High Performance DLX - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to VLSI Programming High Performance DLX

Description:

Kees van Berkel. 3. Pipelining in Tangram (cntd) Output sequence b identical for P0, P1, and P2. ... Kees van Berkel. 12. Final Project. 3-stage DLX, with ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 36
Provided by: keesvan
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to VLSI Programming High Performance DLX


1
Introduction to VLSI Programming High
Performance DLX
  • (course 2IN30)
  • Prof. dr. ir.Kees van Berkel

2
Recap Pipelining in Tangram
  • Compare three programs
  • P0 a?x0 b!f2(f1(f0(x0)))
  • P1 a?x0 x1 f0(x0) x2 f1(x1)
    b!f2(x2)
  • P2 a?x0 a1!f0(x0) a1?x1
    a2!f1(x1) a2?x2 b!f2(x2)

3
Pipelining in Tangram (cntd)
  • Output sequence b identical for P0, P1, and P2.
  • P0 and P1 have same communication behavior P1
    is larger, slower, and warmer.
  • P2 vs P1 similar in size, energy, and latency,
    but up to 3 times higher throughput, depending
    on (relative) complexity of f0, f1, f2.

4
Recap DLX Moore machine(ignoring interrupts)
  • ?Reg0,pc ? ?0,0?
  • do ?MemRegrs1 immediate, pc, Regrd ?
  • ? if SW ? Regrd fi
  • , if J ? pc4offset
  • BEQZ ? if Regrs0 ? pc4
    immediate Regrs0 ? pc4 fi
  • else ? pc4
  • fi
  • , if LW ? Memrs1immediate
  • ADD ? ALU(add, Regrs1, Regrs2)
  • fi ?
  • od

5
DLX0 instruction loop
  • do -halted then
  • ROMaddr!PC
  • ROMdata?ir
  • PCPC4
    auxPCPC4 PCPCaux
  • case (ir cast Itype.0)
  • is ltltt,f,f,f,f,fgtgt then LW()
  • or ltltt,f,f,f,f,tgtgt then SW()
  • or ltltf,f,f,f,f,fgtgt then if (ir cast
    Rtype.4 1) then SLT() fi
  • or ltltf,t,f,f,f,fgtgt then BEQZ()
  • or ltltf,t,f,f,f,tgtgt then J()
  • or ltltf,f,t,f,f,fgtgt then haltedtrue
  • si
  • od

6
DLX0 instruction loop
  • Each instruction cycle
  • 4 sequential commands for each instruction
    type
  • 1-3 sequential commands for specific
    instructions
  • 5-7 sequential commands each cycle
  • Pipelining
  • split these 5-7 commands over 2 stages,
  • in a (more or less) balanced way.
  • is simple when instruction does not affect PC,
    but more difficult for jump and branch
    instructions.

7
2-stage DLX example template
8
DLX 3-stage pipelined execution
Time ? instruction cycles 1 2
3 4 5 6 7
...
Program execution ? instructions
Stage EX includes memory access and writeback
9
3-stage DLX example template
10
Reducing pipeline branch penalties
  • Problem
  • which instruction to fetch after branch
    instruction?
  • Strategies
  • wait until branch address is computed (DLX0)
  • predict branch not taken
  • predict branch taken
  • introduce branch-delay slots (next assignment)

11
Branch delay slots
  • Single branch delay slot
  • branch instruction
  • branch-delay instruction
  • branch target (if not taken)
  • Branch-delay instruction, various possibilities
  • e.g. instruction preceding branch instruction(if
    branch condition does not depend on outcome)
  • ... or an instruction succeeding the branch, if
  • NOP instruction if no productive alternative
    available.
  • This constitutes a change in the ISA!

12
Final Project
  • 3-stage DLX, with instruction rate exceeding 80
    MIPS when executing GCD (measured over several
    GCD cycles).
  • NB1 exploit branch delay slots. This requires a
    different version of the assembler text!!.
  • NB2 can be achieved using command level
    parallelism and pipelining. (Expression-level
    parallelism may yield a bonus.)
  • NB3 speed up the environment (RAM, ROM) when
    necessary.

13
VLSI programming of asynchronous circuits
behavior, area, time, energy, test coverage
Tangram program
feedback
compiler
simulator
Handshake circuit
expander
Asynchronous circuit
(netlist of gates)
14
Demonstrator ICs
15
Added value
  • 1985 modularity, ease of design (no value
    added to product!)
  • 1990 low power (ESPRIT project ?????)
  • 1992 low noise, low EME
    (Electro-Magnetic Emission)
  • 2000 ...

16
Added value low power DCC Error Corrector
17
A sync-async arms race
18
Added value Low Power
Synchronous 80C51 - Asynchronous 80C51
19
Added value Low EM Emission
20
Roadblock circuit sizethe 80C51 learning curve
1995/6
1999/4
21
Just in time processing
22
ADPCM
23
ADPCM
24
ADPCM
25
Industrialization of the Technology
  • Philips Semiconductors Zürich (1994 Dec)
    We want to set a world record in low power,
    .. by using asynchronous technology.
  • Their choice for a vehicle the 80C51
    micro-controller (used in many consumer
    products).
  • Result 4 less power, minimal EME.
  • Follow-up pager baseband ICs,
  • In parallel transfer and upgrade of tools
    design flow

26
Pager Baseband Controller ICs
  • Myna pager
  • FLEX protocol
  • 32 alphanumeric messages
  • a single AAA battery (1V)
  • up to 25 weeks battery life
  • Pager baseband controller ICs
  • PCA5007, PCA 5010
  • http//www.semiconductors.philips.
    com/pip/PCA5007
  • http//www.win.tue.nl/pa/wsinap/ async.html

27
1998-Sep the PCA 5007
28
A new generation of pagersa common platform for
all standards
29
EMI a critical design factor (Electro-Magnetic
Interference)
  • Antenna signal may be as small as 25?V.
  • Clock harmonics of synchronous micro-controllers
    interfere with RF (X00 MHz).
  • With asynchronous 80C51 signal decoding by means
    of (standard-specific) software. (This also
    enables upgrading/downloading!)
  • Furthermore no shielding is required between
    controller and RF receiver.

30
PCA5007 block diagram

31
Contactless smartcard IC (ESPRIT project
DESCALE)
Power regulator
80C51 micro-controller DES engine UART RAM, ROM,
EEPROM
13.56 MHz clock power (a few mW) bi-directional
communication (106 kbit/s)
Radio link
32
Contactless smartcard IC
  • Properties
  • a) low average power
  • lower peak power
  • speed adaptation
  • Merits
  • Maximum speed for received power (a,c)
  • Robust operation against voltage drops (c)
  • Smaller buffer capacitor (b,c)

33
Conclusion
  • First asynchronous VLSI circuits on the market
    (high volume sales).
  • Prospects for more async products look good.
  • Added value low power, EME performance.
  • Added costs test, IC area, being different.
  • Asynchronous VLSI technology
  • there is room for it in market niches,
  • but it may contribute to main-stream VLSI.

34
Bibliography
  • Computer Architecture a Quantitative Approach
    (3rd Ed.) John L Hennessy David A Patterson
    Morgan Kaufmann Publishers Inc, 1996.
  • ARM System Architecture Steve Furber Addison
    Wesley, 1996.
  • DSP Processor Fundamentals, Architectures and
    Features Phil Lapsey et al (Berkeley Design
    Technology Inc.), IEEE, 1996.
  • www.handshakesolutions.com
  • www.arm.com/news/6936.html
  • www.research.philips.com/ newscenter/archive/2004/
    handshake.html

35
Lab-work and report
  • You are allowed to team up with a colleague (Not
    mandatory.)
  • Report more than listing of functional Tangram
    programs
  • analyze the specifications and requirements
  • present design options, alternatives, trade-offs
  • motivate your design choices
  • explain functional correctness of your Tangram
    programs
  • analyze explain area, time, energy of your
    programs.
Write a Comment
User Comments (0)
About PowerShow.com