Joe Gebis - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Joe Gebis

Description:

Hercules DRC and LVN. Avant! StarRC parasitic extraction ... Hercules DRC. Synopsys PrimeTime static timing analysis. Joe Gebis, IRAM Retreat, Summer 2000 ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 27
Provided by: csBer
Category:
Tags: drc | gebis | joe

less

Transcript and Presenter's Notes

Title: Joe Gebis


1
IRAM Chip Status
  • Joe Gebis
  • Computer Science Division
  • University of California, Berkeley
  • gebis_at_cs.berkeley.edu
  • http//iram.cs.berkeley.edu

2
Outline
  • Overview of VIRAM-1 organization
  • Hardware status
  • CAD plan

3
VIRAM1 Block Diagram
Flag Unit 0
Flag Unit 1
Flag Register File (512B)
Arithmetic Unit 0
Arithmetic Unit 1
32B
32B
Vector Register File (8KB)
SysAD IF
8B
8B
TLB
Memory Unit
32B
DMA
Memory Crossbar
JTAG IF

JTAG
DRAM0 (2MB)
DRAM1 (2MB)
DRAM7 (2MB)
4
VIRAM1 Vector Units
Datapath
Lane
Vector Registers
Functional Unit
Memory Unit
  • 4 partitionable 64-bit lanes
  • 2 arithmetic functional units (one FP), 2 flag
    processing
  • Lane provides basic unit of design, replication

5
VIRAM1 Layout
  • IBM SA-27E process
  • .18mm, 6 copper layers
  • 290 mm2 area
  • 150M transistors
  • 1.2V logic, 1.8V DRAM
  • 2W Power consumption

6
Scalar Core Status
  • Have synthesizable MIPS64 5Kc core
  • Will run at 200 MHz
  • Has 8KB instruction and data caches
  • Caches will be compiled by IBM

7
Vector Integer Unit Status
  • Complete
  • Design of all blocks
  • Layout of subblocks
  • Partially done
  • Assembling block component
  • Remaining
  • Final assembly

Logical Unit
Multiplier

Shifter
Rounder
shamt
Design complete, basic subblocks layout done
Adder
zero detect
Design complete, components ready for assembly
Saturate
8
Vector Register File
  • Have a register file from Transmeta which was
    successfully fabbed in the same process
  • Using the complete Transmeta register file?
  • Contains shadow registers we couldnt use
  • Has more ports than we need
  • Would require combining 8 duplicates
  • Use the Transmeta bit cell?
  • It is larger than it needs to be for our purposes
  • Build our own bit cell?
  • Possibly a significant amount of work

9
Control
  • Small changes to work with new MIPS core
  • Working model of the vector unit complete
  • Some small glue logic remains to be able to do
    complete simulation with core and on-chip DRAM

10
Floating Point Vector Unit
  • Synthesizable Verilog received from MIT RAW
    architecture group
  • FPU as received not fully IEEE compliant
  • Required some changes to work with core

11
Crossbar Design
DRAM 0 Port
DRAM 1 Port
DRAM 2 Port
DRAM 3 Port
256-bit load crossbar
Scalar / DMA Port
256-bit store crossbar
VL0 Port
VL1 Port
VL2 Port
VL3 Port
12
Memory and Crossbar
  • Model for DRAM controllers complete
  • Crossbar design is complete, layout progressing
  • Crossbar issues
  • Switches only 64-bit words
  • Operates at 1.2V, contains level shifters to
    interface to 1.8V DRAM
  • Segmented with repeaters at approximately 2mm
    intervals
  • 5 ns cycle time, interfaces to DRAM without
    additional subclocks

13
CAD Plan - Synthesized Blocks
  • Synopsys VCS Verilog compiler and environment
  • Synopsys Design Compiler
  • Synopsys Module Compiler
  • Avant! Apollo place route
  • Synopsys PrimeTime static timing analysis

14
CAD Plan - Custom Blocks
  • Cadence layout editor
  • Cadence schematic editor
  • Avant! Hercules DRC and LVN
  • Avant! StarRC parasitic extraction
  • Avant! Hspice
  • Synopsys TimeMill dynamic timing
  • Synopsys PowerMill power consumption simulation

15
CAD Plan - Integrated Blocks
  • Avant! Apollo place route
  • Avant! Hercules DRC
  • Synopsys PrimeTime static timing analysis

16
CAD Plan - Other Blocks
  • Cache blocks
  • IBM SRAM compilers
  • Functional verification
  • Synopsys VCS

17
Remaining Work
  • Some design, layout work remains
  • Synthesizing blocks
  • Verification
  • Tapeout planned for late fall

18
(No Transcript)
19
Vector Execution Model
Scalar Execution
Vector Execution
r1
vector length
add r3, r1, r2
add.vv v3, v1, v2
20
Vector Architectural State
21
VIRAM ISA Extensions
Scalar
MIPS64 scalar instruction set
Vector ALU
All ALU / mem operations under mask
Vector Memory
Vector Register
Plus flag, convert, fixed-point, and transfer
operations
22
Fixed-point Arithmetic
  • Multiply upper or lower halves, shift and round
  • Add/Sub and saturate
  • Shift right and round, shift left and saturate
  • All combinations of multiply and add/sub
    instructions
  • Saturate to narrower width

23
Multiplier Partitioning
16-bit Multiplier Block
result150
result3116
16-bit adder
24
Scaling
  • Scaled-down version from the original
  • Vector unit with same control
  • Or scale up for future versions

25
Scalar Core
  • Synthesizable core from MIPS
  • 64 bit (MIPS64 ISA)
  • 6 stage
  • Single instruction issue
  • 8 kB direct-mapped D/I cache
  • Has coprocessor interface used for vector unit
    access and FPU

26
Floating-point datapath
  • Single precision
  • Contains add, sub, mul, div, compare, convert,
    truncate
  • Does not contain mul-add, sqrt
  • Only supports round to nearest even mode
  • Fully pipelined
  • 3 cycle latency for add/sub/mul/compare/convert
  • 10 cycle latency for divide, 8 cycle repeat rate
  • Fast execution mode
  • exceptions for each element noted in flag
    register exception raised at the end of
    instruction execution
  • Precise execution mode
  • Following FP instructions are stalled early in
    the pipeline until execution of previous
    instruction is complete and any exceptions are
    raised
  • Operates at half performance (0.8 GFLOPS)
Write a Comment
User Comments (0)
About PowerShow.com