New Opportunities with Platform Based Design - PowerPoint PPT Presentation

Loading...

PPT – New Opportunities with Platform Based Design PowerPoint presentation | free to download - id: 6b4bf3-Y2NhY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

New Opportunities with Platform Based Design

Description:

New Opportunities with Platform Based Design Frank Vahid Associate Professor Dept. of Computer Science and Engineering University of California, Riverside – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 45
Provided by: FrankV154
Learn more at: http://www.cs.ucr.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: New Opportunities with Platform Based Design


1
New Opportunities with Platform Based Design
  • Frank Vahid
  • Associate Professor
  • Dept. of Computer Science and Engineering
  • University of California, Riverside
  • Also with the Center for Embedded Computer
    Systems at UC Irvine
  • http//www.cs.ucr.edu/vahid
  • This research has been supported by the National
    Science Foundation, NEC, Trimedia, and Triscend

2
How Much is Enough?
3
How Much is Enough?
Perhaps a bit small
4
How Much is Enough?
Reasonably sized
5
How Much is Enough?
Probably plenty big
6
How Much is Enough?
More than typically necessary
7
How Much is Enough?
Very few people could use this
8
How Much Custom Logic is Enough?
1993 1 million logic transistors
Perhaps a bit small
9
How Much Custom Logic is Enough?
1996 5-8 million logic transistors
Reasonably sized
10
How Much Custom Logic is Enough?
1999 10-50 million logic transistors
Probably plenty big
11
How Much Custom Logic is Enough?
2002 100-200 million logic transistors
More than typically necessary
12
How Much Custom Logic is Enough?
  • Point of diminishing returns
  • 32-bit ARM 30K
  • MPEG dcd 1M
  • Other examples
  • Fast cars (gt 100 mph)
  • High res digital cameras (gt 4M)
  • Disk space
  • Even IC performance

1993 1 M
2008 gt1 BILLION logic transistors
Perhaps very few people could design this
13
Very Few Companies Can Design High-End ICs
Design productivity gap
Source ITRS99
  • Designer productivity growing at slower rate
  • 1981 100 designer months ? 1M
  • 2002 30,000 designer months ? 300M

14
Meanwhile, ICs Themselves are Costlier
Tech 0.8 0.35 0.18 0.13
NRE 40k 100k 350k 1,000k
Turnaround 42 days 49 days 56 days 76 days
Market 3.5B 6B 12B 18B
Source DAC01 panel on embedded programmable
logic
  • And take longer to fabricate
  • While market windows are shrinking
  • Less than 1,000 out of 10,000 ASIC designs have
    volumes to justify fabrication in 0.13 micron

15
Summarizing So Far...
Designers
16
Trend Towards Pre-Fabricated Platforms ASSPs
  • ASSP application specific standard product
  • Domain-specific pre-fabricated IC
  • e.g., digital camera IC
  • ASIC application specific IC
  • ASSP revenue gt ASIC
  • ASSP design starts gt ASIC
  • Unique IC design
  • Ignores quantity of same IC
  • ASIC design starts decreasing
  • Due to strong benefits of using pre-fabricated
    devices

Source Gartner/Dataquest September01
17
Will High End ICs Still be Made?
  • YES
  • The point is that mainstream designers likely
    wont be making them
  • Very high volume or very high cost products
  • Platforms are one such product high volume
  • Need to be highly configurable to adapt to
    different applications and constraints

18
Configurable Platform Design Cache
Periph- erals
JPEG dcd
L1 cache
L1 cache
uP
DSP
FPGA
  • ARM920T Caches consume half of total power
    (Segars 01)
  • MCORE Unified cache consumes half of total
    power (Lee/Moyer/Arends 99)

IC
Pre-fabricated Platform (A pre-designed
system-level architecture)
19
Best Cache Architecture for Embedded Systems
  • Not clear
  • Huge variety among popular embedded processors
  • Whats the best
  • Associativity, Line size, Total size?

20
Cache Associativity
00 0 000
A
  • Direct mapped cache
  • Certain bits index into cache
  • Remaining tag bits compared

B
01 0 000
C
10 0 000
11 0 000
D
21
Cache Associativity
  • Reduces miss rate thus improving performance
  • Impact on power and energy?
  • (Energy Power Time)

22
Associativity is Costly
  • Associativity improves hit rate, but at the cost
    of more power per access
  • Are the power savings from reduced misses
    outweighed by the increased power per hit?

Energy access breakdown for 8 Kbyte, 4-way set
associative cache (considering dynamic power only)
Energy per access for 8 Kbyte cache
23
Associativity and Energy
  • Best performing cache is not always lowest energy

24
Associativity Dilemma
  • Direct mapped cache
  • Good hit rate on most examples
  • Low power per access
  • But poor hit rate on some examples
  • High power due to many misses
  • Four-way set-associative cache
  • Good hit rate on nearly all examples
  • But high power per access
  • Overkill for most examples, thus wasting energy
  • Dilemma Design for the average or worst case?

25
Associativity Dilemma
  • Obviously not a clear choice

26
Our Solution Configurable Cache
  • Can be configured as 4, 2, or 1 way
  • Ways can be concatenated
  • Size can also be configured
  • By shutting down ways
  • Saves static power (leakage)

D
11x
C
10x
0000
D
110
0000
This bit selects the way
11 0 000
11 0 000
27
Configurable Cache Design Way Concatenation (4,
2 or 1 way)
a31 tag address
a13 a12 a11 a10
index a5
a4 line offset a0
Configuration circuit
a11
Small area and performance overhead
reg0
a12
reg1
tag part
c1
c3
c0
c2
bitline
c1
c0
index
6x64
6x64
6x64
data array
c2
c3
6x64
6x64
column mux
sense amps
tag address
line offset
mux driver
data output
critical path
28
Configurable Cache Experiments
  • Motorola PowerStone benchmark g3fax
  • Way concatenate outperforms 4 way and direct map.

29
Configurable Cache Experiments
100 4-way conventional cache
  • Configurable cache with both way concatenation
    and way shutdown was best on average
  • Considered programs from Powerstone, MediaBench,
    and Spec2000
  • And, it was superior on every benchmark

30
Configurable Cache Experiments Line Size Too
100 4-way conventional cache
csb concatenate plus shutdown cache
  • Best line size also differs per example
  • Our cache can be configured for line of 16, 32 or
    64 bytes
  • 64 is usually best but 16 is much better in a
    couple cases
  • A configurable cache with way concatenation, way
    shutdown, and variable line size, can save a lot
    of energy

31
Configurable Platform Use
  • Platforms increasingly come with on-chip FPGA
  • Can we use that FPGA to improve software
    performance and energy?

Periph- erals
JPEG dcd
L1 cache
uP
DSP
FPGA
IC
Pre-fabricated Platform
32
Commercial Single-Chip Microprocessor/FPGA
Platforms
  • Triscend E5 based on 8-bit 8051 CISC core
  • 10 Dhrystone MIPS at 40MHz
  • 60 kbytes on-chip RAM
  • up to 40K logic gates
  • Cost only about 4 (in volume)

33
Single-Chip Microprocessor/FPGA Platforms
  • Atmel FPSLIC
  • Field-Programmable System-Level IC
  • Based on AVR 8-bit RISC core
  • 20 Dhrystone MIPS
  • 5k-40k configurable logic gates
  • On-chip RAM (20-36Kb) and EEPROM
  • 5-10

Courtesy of Atmel
34
Single-Chip Microprocessor/FPGA Platforms
  • Triscend A7 chip
  • Based on ARM7 32-bit RISC processor
  • 54 Dhrystone MIPS at 60 MHz
  • Up to 40k logic gates
  • On-chip cache and RAM
  • 10-20 in volume

Courtesy of Triscend
35
Single-Chip Microprocessor/FPGA Platforms
  • Alteras Excalibur EPXA 10
  • ARM (922T) hard core
  • 200 Dhrystone MIPS at 200 MHz
  • Devices range from 200k to 2 million
    programmable logic gates

Source www.altera.com
36
Single-Chip Microprocessor/FPGA Platforms
  • Xilinx Virtex II Pro
  • PowerPC based
  • 420 Dhrystone MIPS at 300 MHz
  • 1 to 4 PowerPCs
  • 4 to 16 gigabit transceivers
  • 12 to 216 multipliers
  • 3,000 to 50,000 logic cells
  • 200k to 4M bits RAM
  • 204 to 852 I/O
  • 100-500 (gt25,000 units)
  • Up to 16 serial transceivers
  • 622 Mbps to 3.125 Gbps

PowerPCs
Config. logic
Courtesy of Xilinx
37
Single-Chip Microprocessor/FPGA Platforms
  • Why wouldnt future microprocessor chips include
    some amount of on-chip FPGA?

38
Single-Chip Microprocessor/FPGA Platforms
  • Lots of silicon area taken up by configurable
    logic
  • As discussed earlier, less of an issue every year
  • Smaller area doesnt necessarily mean higher
    yield (lower costs) any more
  • Previously could pack more die onto a wafer
  • But die are becoming pad (pin) limited in
    nanoscale technologies
  • Configurable logic typically used for
    peripherals, glue logic, etc.
  • We have investigated another use...

39
Software Improvements using On-Chip Configurable
Logic
  • Partitioned software critical loops onto on-chip
    FPGA for several benchmarks
  • Most time spent in one or two loops
  • Extensive simulated results for 8051 and MIPS
  • For Powerstone (PS), MediaBench (MB) and Netbench
    (NB)

40
Software Improvements using On-Chip Configurable
Logic
41
Speedup Gained with Relatively Few Gates
  • Created several partitioned versions of each
    benchmarks
  • Most speedup gained with first 20,000 gates
  • Surprisingly few gates
  • Stitt, Grattan and Vahid, Field-programmable
    Custom Computing Machines (FCCM) 2002
  • Stitt and Vahid, IEEE Design and Test, Dec. 2002
  • J. Villarreal, D. Suresh, G. Stitt, F. Vahid and
    W. Najjar, Design Automation of Embedded Systems,
    2002 (to appear).

42
Software Improvements using On-Chip Configurable
Logic Verified through Physical Measurement
A7 IC
  • Performed physical measurements on Triscend A7
    and E5 devices
  • Similar results (even a bit better)

Triscend A7 development board
43
Other Types of Configurability
  • Microprocessor (other researchers)
  • VLIW configurations
  • Voltage scaling
  • Peripherals
  • e.g., JPEG decoder with different precisions
  • Bus topology
  • Etc.

Periph- erals
JPEG dcd
L1 cache
uP
DSP
FPGA
IC
44
Conclusions
  • Trend is away from semi-custom IC fabrication
  • Pressures encourage buying pre-fabricated
    platforms
  • Platforms must be highly configurable
  • To be useful for a variety of applications, and
    hence mass produced
  • We have discussed
  • Software speedup/energy benefits of on-chip
    configurable logic 3x speedups and 34 energy
    savings with only 10,000 gates
  • Creating a highly-configurable cache
    architecture 40 energy savings compared to
    conventional cache
  • Designing highly-configurable platforms, and
    facilitating their use with good exploration
    tools, can help enable platform-based design
  • See http//www.cs.ucr.edu/vahid for more
    information
About PowerShow.com