Emotion Engine - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Emotion Engine

Description:

Emotion Engine AKA the Playstation 2 Architecture Or The progeny of a MIPS and a DSP By Idan Gazit June 2002 Overview Based around a modified and ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 32
Provided by: RobertD172
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Emotion Engine


1
Emotion Engine
  • AKA the Playstation 2 Architecture
  • Or
  • The progeny of a MIPS and a DSP
  • By Idan Gazit June 2002

2
Overview
  • Based around a modified and extended MIPS R3000
    core.
  • Designed from the ground up to run media
    applications (read games) VERY fast but can
    function as a general purpose CPU
  • Bears much resemblence to DSPs (Digital Signal
    Processors) more on this later.

3
Basic Layout Parallelism is Key!
  • MIPS R3K CPU
  • 1 FPU (Floating Point) coprocessor
  • 2 VU (Vector Units) more on this later
  • Graphics Interface Unit (GIF) passes on
    rendered data to the Graphics Synth, which does
    the work of actually drawing it to the screen.
  • 128b wide main bus
  • 10 Channel DMA controller

4
Basic Layout
5
The Nitty-Gritty
  • The main job of the EE is to render entire
    frames, the product of which is a display list,
    i.e. a list of geometry (points, polygons,
    textures) and where they need to be placed on the
    screen.
  • All of this needs to be done very fast, so note
    the very wide data paths (128b main bus, and
    additional private links between certain
    units).
  • Also 10 channel DMA controller CPU shouldnt
    waste time on I/O. Multiple connections between
    different units allow for more than one I/O
    transaction at once, so long as theyre on
    different buses

6
The CPU
  • Honest, its just a plain MIPS with some minor
    extensions.
  • 32x128b general purpose regs
  • 2 x 64b ALU (Arithmetic Logic Units)
  • 1 x 128b Load/Store unit (Parallelism again
    load/store 4 words at once)
  • 1 Branch execution unit
  • 2 Coprocessors FPU and VU0 proper MIPS
    coprocessors controlled by COP instructions!

7
The CPU
  • Able to do 2 64b integer ops per cycle, or one
    64b int op and one 128b load/store.
  • ALUs are interesting they are pipelined, but can
    be used two ways
  • Separately, as in normal CPUs (2 x 64b op)
  • Locked, to perform a 128b instruction
  • 16 x 8b ops in one cycle
  • 8 x 16b ops in one cycle
  • 4 x 32b ops in one cycle

8
The CPU
  • Example Supported instructions
  • MUL/DIV instructions
  • 3-op MUL/MADD instructions
  • Arithmetic ADD/SUB instructions
  • Pack and extend instructions
  • Min/Max instructions
  • Absolute instructions
  • Shift instructions
  • Logical instructions
  • Compare instructions
  • Quadword Load/Store (remember, 128b L/S unit)

9
The CPU
  • 8k data / 16k instruction cache, 2-way set
    associative
  • 6-stage pipeline (shallow, compared to modern PC
    architectures)
  • Speculative execution possible, but the penalty
    for a branch miss isnt bad because its a short
    pipeline.
  • Pipeline Stages1. PC select 2. Instruction
    fetch 3. Instruction decode and register
    read 4. Execute 5. Cache access 6. Writeback

10
The CPU
  • 16k of SPRAM Scratch Pad RAM VERY VERY
    FAST.
  • In the CPU core.
  • What is this stuff? This is actually a very fast
    data cache shared by the CPU and VU0.
  • The 128b private link between the CPU and VU0
    allows VU0 to use the SPRAM and the CPU to
    directly reference the VUs registers.
  • Which leads us nicely to the fact that the really
    difficult work is performed by

11
Vector Units The heart of EE
  • FMAC Floating-Point Multiply-Accumulate
  • As it turns out, this operation is critical to 3D
    rendering, and is performed many times in tight
    loops.
  • An obvious candidate for parallelism and
    pipelining!
  • Between both VUs and the FPU, a total of 10 FMAC
    units able to do 1 FMAC per cycle, but also other
    useful instructions.

12
Example VU Useful Instructions
  • FMAC 1 cycle
  • Min/Max 1 cycle
  • FDIV another logical unit, 1 per VU
  • Floating-Point divide 7 cycles
  • Square Root 7 cycles
  • Inv Square Root 13 cycles

13
Vector Units
  • However, there are differences to the two VUs
    and how they are utilized.
  • Both are VLIW take long instructions with
    multiple pieces of data.
  • Processing units are split into two working
    groups
  • Group 1 CPU FPU VU0Emotion Synthesis on
    diagram
  • Group 2 VU1 GIFGeometry Processing on
    diagram

14
Group 1
  • Here, the FPU and VU0 act as proper MIPS
    coprocessors, and are linked to the CPU by a
    private 128b wide bus to avoid crowding the main
    bus.
  • FPU is nothing special, just another FPU
    coprocessor. 1 FMAC unit, 1 FDIV unit, each
    identical to VU FMAC/FDIV.
  • VU0 does the real heavy lifting when it comes to
    the math the CPU acts as more of a traffic
    director in feeding data as fast as it can to the
    VU for processing.

15
Group 1
  • Although group 1 does geometry processing, it is
    also responsible for more general-purpose
    calculations, such as enemy AI, game physics,
    etc.
  • Therefore group 1 has the (more generalized) CPU,
    whereas group 2 focuses only on geometry (and has
    only VU1 and the GIF)
  • Definite hierarchy of control in group 1 CPU
    controls FPU and VU0.

16
Group 1 Vector Unit 0
17
Group 1 Vector Unit 0
  • 32 x 128b FP registers, each holds 4 x 32b
    single-precision FP numbers.
  • 16 x 16b integer regs for int math
  • Instructions are just standard 32b COP
    (coprocessor) instructions
  • Data is passed from CPU in 128b bundles, which
    the VIF (VU Interface) unpacks into 4x32b data
    words.
  • 8k each for data cache/inst cache

18
Group 2
  • Consists of VU1 and the GIF (Graphics Interface).
  • VU1 acts like a standalone VLIW processor, and is
    not directly controlled by the CPU.
  • Perhaps a proper name for VU1 is the Geometry
    Processor for the GIF this is pure data
    processing and it has to happen quick to keep the
    GIF saturated with graphics to draw out to your
    TV.

19
Group 2 Vector Unit 1
20
Group 2 Vector Unit 1
  • Same general features as VU0, but some
    differences according to VU1s role
  • Addition of an EFU (elementary functional unit)
    basically one FMAC and FDIV unit doing the more
    rudimentary geometry calculations. Note a
    striking resemblence to the FPU from group 1
  • 16k each of data inst cache, up from 8k since
    VU1 must handle geometry independently of the
    CPU, it ends up handling much more data than VU0.

21
Group 2 Vector Unit 1
  • Special direct connection between data cache and
    the GIF.
  • Why is this special? VU1 can work on a display
    list in cache and have it sent over to the GIF by
    DMA. Quicker than using the main bus to shuttle
    data around, less dependent on CPU, and leaves
    the main bus free for load instructions.

22
Vector Unit Comparison
  • Designers opted for flexibility in design, and
    thus the architecture is slightly confusing
  • VU0 is a coprocessor, VU1 is a VLIW
    mini-processor.
  • BUT VU0 can be switched into VLIW-mode, where
    the CPU then communicates with it like VU1. (E.G.
    receiving 64b instruction bundles and parsing
    them with the VIF).

23
Vector Unit Instructions
  • We really should treat the VUs as limited
    processors.
  • Each 64b VLIW breaks down into two 32b COP
    instructions, an upper instruction and a
    lower instruction.
  • The upper/lower distinction is important the
    types of work they do are different

24
Vector Unit Instructions
  • Upper Instructions SIMD (Single Instruction
    Multiple Data) instructions
  • Aptly named these are the fast multimedia
    instructions that do the same operation on lots
    and lots of data.
  • Logically, these types of instructions are
    handled by the special VU units FMAC, FDIV, etc.
  • Note that these instructions ONLY use the
    special units in each VU.

25
Vector Unit Instructions
  • Lower Instructions non SIMD type
  • More utility than processing
  • Load/store instructions
  • Jump/Branch instructions
  • Random Number Generation
  • EFU instructions (only in VU1, remember 1 FMAC
    and 1 FDIV).
  • Note that these instructions use units in the
    VUs that I didnt mention (RNG unit, Load/Store
    unit, etc) theyre the more mundane units for
    the more mundane tasks.

26
Flow of Execution
  • So with all of this confusing flexibility, what
    do we get?
  • Two ways of doing work
  • Group 1 Group 2 both render in parallel, both
    passing on display lists to the GIF
  • Group 1 (CPU,VU0,FPU) prepares instructions for
    VU1 load/store, branching, etc which VU1
    renders and passes on to the GIF.

27
Flow of Execution
  • Method 1 (parallel)
  • Method 2 (serial)

28
DSPs, PS2s and PCs, oh my!
  • Essentially, the PS2 (like DSPs), is performing
    a small amount of instructions on a large amount
    of uniform data.
  • Exactly the opposite of PCs performing large
    amounts of instructions on varying data.
  • Side-effect bonus good Locality of Reference
    instructions in PS2 dont jump around much like
    in PCs, therefore less chance of cache misses or
    branch mispredictions.

29
DSPs, PS2s and PCs, oh my!
  • Note design decisions that promote data-intensive
    computing
  • Wide buses, and private connections between units
    that move a lot of data.
  • VLIW instructions come packaged with lots and
    lots of data.
  • Large registers and load/store units.
    Instructions geared towards SIMD-style (e.g. 128
    bit loads 4 words of data at once.)
  • MASSIVE ability to calculate inner-loop
    instructions (FMAC) in ONE CYCLE 10 FMACs,
    therefore 10 of these can be done in 1 cycle.
    Even FDIVs are fast (7 cycles).

30
Conclusion
  • Entire EE design centered around
    specialized-purpose games! It can run
    generalized apps but with a penalty.
  • How much of a penalty? Interesting question.
    Perhaps not much, because there is a
    general-purpose MIPS at the core.
  • More similar in design to a DSP fixed small
    amount of instructions to be done on large
    amounts of uniform data.

31
The End References
  • http//www.arstechnica.com/reviews/1q00/playstatio
    n2/ee-1.html
  • http//www.arstechnica.com/cpu/2q00/ps2/ps2vspc-1.
    html
  • http//www.scea.com/news/press_example.asp?ps2ps2
    ReleaseID9
  • http//users.ece.gatech.edu/scotty/7102/pres/5
  • http//www.eecg.toronto.edu/stoodla/processors/So
    ny/EmotionEngine.html
  • http//ntsrv2000.educ.ualberta.ca/nethowto/example
    s/m_ho/ps2eengine.html
  • http//www.geocities.com/SiliconValley/Bay/6114/cp
    u2.html
Write a Comment
User Comments (0)
About PowerShow.com