Vector Unit Assembly - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Vector Unit Assembly

Description:

... graph shows some vector-math intensive function calls. 200K calls were ... Micro mode is a mode that allows your vector processor to act as an independent CPU ... – PowerPoint PPT presentation

Number of Views:12

Avg rating:3.0/5.0

Slides: 18

Provided by: benjamin8

Category:

Tags: assembly | cpuintensive | unit | vector

Transcript and Presenter's Notes

Title: Vector Unit Assembly

1
Vector Unit Assembly

bquintero_at_fullsail.com

2
Overview

Architecture Review
VU0 Macro Mode Instruction Set
Building a Vector Library

3
Review

Playstation2 has two vector units that are
similar but not the same
VU0 is the CPUs alternate processing unit
VU1 is the GSs alternate processing unit
Each Unit has a direct pipeline to its
respective processor
Vector Units are designed for 4Dx32bit vectors

4
Review

VU0/1 each have access to 32 float registers and
16 integer register
Float registers are not like PC registers they
are 128bits in size (PC is 32bit)
128bits can fit 4 float values at once (4D
vector)
Integer registers are typically used as loop
counters and address calculators

5
Review

VU0 has two bus lines
One bus is dedicated to the CPU
The other bus is used to communicate with all
other devices
VU0 has 4KB of

6
Vector Unit Processing Speed

The graph shows some vector-math intensive
function calls
200K calls were made to each function

7
Macro and Micro Modes

Vector Unit Zero (VU0) has two modes
Micro mode is a mode that allows your vector
processor to act as an independent CPU
A mini program is uploaded and executed in
parallel to the main CPU
Macro mode allows your CPU to directly offload
heavy vector computation with low overhead
Most popular method, hands down.

8
Micro Mode

When uploaded, the micro program is executed
independent to the CPU
This means that we must time our execution so
that the result is fetched by the CPU after the
program is completed by the Vector Unit
Micro mode causes serious stalls and timing
issues since execution speed is near impossible
to determine

9
Macro Mode

Macro mode is a much easier method of executing
fast math functionality
Assembly can be used as inline instructions,
telling the compiler to offload the math to VU0
Notes
Just because its in assembly does not mean it
will be faster
Switching CPU focus has its overheads

10
Assembly Structure

There is typically a specific method to writing
assembly routines
Load the variable data/addresses to registers
Apply vector computations to those registers
Store the result back into a variable address
Overhead of using assembly is in the load and
store
Make sure that the computation stage will improve
performance enough to offset the load/store
overhead

11
Vector Unit MIPS Instructions

Coprocessor Transfer Instructions
Store / Load
Coprocessor Branch Instructions
Macro (primitive) calculation instructions
Add / Subtract / Multiply / Divide / ect
Micro subroutine execution instructions
(VU Macro Instructions)

12
EEVectorAdd

Adding two vectors using the EE Core (CPU)
// (Vec4T v0, Vec4T v1, Vec4T v2)
v2-gtx v0-gtx v1-gtx
v2-gty v0-gty v1-gty
v2-gtz v0-gtz v1-gtz
v2-gtw v0-gtw v1-gtw

13
VectorAdd

Adding two vectors using the VU0
// (Vec4T v0, Vec4T v1, Vec4T v2)
asm __volatile__ ("
lqc2 vf05, 0x0(0)
lqc2 vf06, 0x0(1)
vadd.xyzw vf07, vf05, vf06
sqc2 vf07, 0x0(2)
"r" (v0) , "r" (v1), "r" (v2)
)

14
EECrossProduct

Notice how we must use a temp because of the
cross
// (Vec4T v1, Vec4T v2, Vec4T cross)
Vec4T temp
temp.x v1-gty v2-gtz - v1-gtz v2-gty
temp.y v1-gtz v2-gtx - v1-gtx v2-gtz
temp.z v1-gtx v2-gty - v1-gty v2-gtx
VectorCopy(temp, cross)

15
CrossProduct

// (Vec4T v1, Vec4T v2, Vec4T cross)
asm __volatile__("
lqc2 vf05, 0x0(0)
lqc2 vf06, 0x0(1)
vopmula.xyz ACC, vf05, vf06 first
vopmsub.xyz vf06, vf06, vf05 - second
vsub.w vf06, vf00, vf00 w 0
sqc2 vf06, 0x0(2)
// No Output
"r"(v1), "r"(v2), "r"(cross)
)

16
Vector Outer Product

The vopmula instruction performs an outer product
The result is stored into the special purpose ACC
register

VF05 X Y Z
VF06 X Y Z
ACC X Y Z

17
For Next Time

Read Chapters 7.3.2 7.4.2
Read Chapters 9.3

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

UNIT-IV MEMORY ORGANIZATION PowerPoint PPT Presentation

UNIT-IV MEMORY ORGANIZATION - MEMORY ORGANIZATION & MULTIPROCESSORS TESTING & SETTING SEMAPHORE TSL means Test and Set while locked SEM : A LSB of Memory word s address TSL SEM R M[SEM ... | PowerPoint PPT presentation | free to view

Whole Genome Assembly Microarray analysis PowerPoint PPT Presentation

Whole Genome Assembly Microarray analysis - Whole Genome Assembly Microarray analysis | PowerPoint PPT presentation | free to view

Arithmetic Logical Unit PowerPoint PPT Presentation

Arithmetic Logical Unit - Arithmetic Logical Unit Be able to explain the organization of the classical von Neumann machine and its major functional components Focus on ALU design issues | PowerPoint PPT presentation | free to view

Assembly Modeling In an assembly model, components are brought together to define a larger, more complex product representation. PowerPoint PPT Presentation

Assembly Modeling In an assembly model, components are brought together to define a larger, more complex product representation. - Assembly Modeling In an assembly model, components are brought together to define a larger, more complex product representation. Assembly modeling is a tool that ... | PowerPoint PPT presentation | free to view

VCE PHYSICS Unit 3 Topic 2 PowerPoint PPT Presentation

VCE PHYSICS Unit 3 Topic 2 - VCE PHYSICS Unit 3 Topic 2 ELECTRIC POWER Unit Outline This unit covers the following areas: Apply a field model to magnetic phenomena including shapes and directions ... | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 12 Vector Processing (Con PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 12 Vector Processing (Con - Graduate Computer Architecture Lecture 12 Vector Processing (Con t) Branch Prediction John Kubiatowicz Electrical Engineering and Computer Sciences | PowerPoint PPT presentation | free to view

Pipelining and Vector Processing PowerPoint PPT Presentation

Pipelining and Vector Processing - Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements ... | PowerPoint PPT presentation | free to view

Lecture 15 Multimedia Instruction Sets: SIMD and Vector PowerPoint PPT Presentation

Lecture 15 Multimedia Instruction Sets: SIMD and Vector - Video/audio decoding & encoding (set top boxes) Image processing (digital cameras) ... Vector-length (VL) register controls the length of any vector operation, ... | PowerPoint PPT presentation | free to view

Tarantula A Vector Extension to the Alpha Architecture PowerPoint PPT Presentation

Tarantula A Vector Extension to the Alpha Architecture - Tarantula. A Vector Extension to the Alpha Architecture ... Tarantula. EV8 core tightly integrated Vector Unit. Out of Order execution, Register Renaming ... | PowerPoint PPT presentation | free to view

Whole Genome Assembly Microarray analysis PowerPoint PPT Presentation

Whole Genome Assembly Microarray analysis - Human, Mouse, Rat, Dog, Chimpanzee.. Many Prokaryotes (One can be sequenced in a day) ... DNA signals. Gene Finding. Assembly. Other static analysis is possible ... | PowerPoint PPT presentation | free to view

VESPA: Portable, Scalable, and Flexible FPGA-Based Vector Processors PowerPoint PPT Presentation

VESPA: Portable, Scalable, and Flexible FPGA-Based Vector Processors - VESPA: Portable, Scalable, and Flexible FPGA-Based Vector Processors Peter Yiannacouras Univ. of Toronto J. Gregory Steffan Univ. of Toronto | PowerPoint PPT presentation | free to view

Two-Dimensional Rotational Dynamics 8.01 W09D2 Young and Freedman: 1.10 (Vector Product), 10.1-10.2, 10.4, 11.1-11.3; PowerPoint PPT Presentation

Two-Dimensional Rotational Dynamics 8.01 W09D2 Young and Freedman: 1.10 (Vector Product), 10.1-10.2, 10.4, 11.1-11.3; - Two-Dimensional Rotational Dynamics 8.01 W09D2 Young and Freedman: 1.10 (Vector Product), 10.1-10.2, 10.4, 11.1-11.3; | PowerPoint PPT presentation | free to view

BMS Business Unit Overview PowerPoint PPT Presentation

BMS Business Unit Overview - ... Becomes e2v semiconductors Grenoble Industrial Facilities Wafer Fab Front-end Class 10 and 1 clean rooms CCD technology CMOS imager and sensor post ... | PowerPoint PPT presentation | free to view

Lecture 15 Multimedia Instruction Sets: SIMD and Vector PowerPoint PPT Presentation

Lecture 15 Multimedia Instruction Sets: SIMD and Vector - Title: Scalable Vector Media-processors for Embedded DRAM Subject: QUALS TALK Author: Christoforos Kozyrakis Last modified by: Dave Created Date: 9/14/1998 1:26:23 AM | PowerPoint PPT presentation | free to view

CS271 ASSEMBLY LANGUAGE PROGRAMMING PowerPoint PPT Presentation

CS271 ASSEMBLY LANGUAGE PROGRAMMING - Graduated from the University of Connecticut (05 ... Bachelor of Science from Hanoi University of Technology (86-91) ... Introductory courses at UOP and Devry ... | PowerPoint PPT presentation | free to view

Do IMF and World Bank Influence Voting in the UN General Assembly? PowerPoint PPT Presentation

Do IMF and World Bank Influence Voting in the UN General Assembly? - Do IMF and World Bank Influence Voting in the UN General Assembly? Axel Dreher and Jan-Egbert Sturm International Political Economy Society Inaugural Conference ... | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 11 Vector Processing PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 11 Vector Processing - CS252. Graduate Computer Architecture. Lecture 11. Vector Processing. John Kubiatowicz ... Pt. and integer code for all but one efficiency measure (SPECFP/Watt) ... | PowerPoint PPT presentation | free to view

Intel Xscale PowerPoint PPT Presentation

Intel Xscale - Intel Xscale Assembly Language and C Lecture #3 | PowerPoint PPT presentation | free to view

Vector IRAM A Media-oriented Vector Processor with Embedded DRAM PowerPoint PPT Presentation

Vector IRAM A Media-oriented Vector Processor with Embedded DRAM - A Media-oriented Vector Processor with Embedded DRAM Christoforos E. Kozyrakis Computer Science Division ... If the DRAM macro used had a multi-bank structure, ... | PowerPoint PPT presentation | free to view

Sequence Analysis Unit 5 PowerPoint PPT Presentation

Sequence Analysis Unit 5 - Lectures & Lab: every Wednesday, Duncan Hall, Room 550, 6:00 pm to 9:45 pm. Office hours: Wednesday, 4pm-6pm (Room 554, phone: 924-4831) and by appointment ... | PowerPoint PPT presentation | free to view

Pipelining and Vector Processing PowerPoint PPT Presentation

Pipelining and Vector Processing - Pipelining and Vector Processing Chapter 8 S. Dandamudi | PowerPoint PPT presentation | free to view

Atmospheric Image Assembly for the Solar Dynamics Observatory PowerPoint PPT Presentation

Atmospheric Image Assembly for the Solar Dynamics Observatory - Atmospheric Image Assembly for the Solar Dynamics Observatory Alan Title AIA Principal Investigator title@lmsal.com 650-424 4034 Outline Quick Overview of the SDO ... | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 20 Vector Processing => Multimedia PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 20 Vector Processing => Multimedia - CDC bets of vectors with Star-100. Amdahl argues against vector. CS252/Culler. Lec 20.6 ... with each iteration of the j-loop (c[i][j:j 31]) for (i=1; i n; i ) ... | PowerPoint PPT presentation | free to view

Basic%20Assembly%20Constraints%20 PowerPoint PPT Presentation

Basic%20Assembly%20Constraints%20 - Title: Basic Assembly Constraints & Concepts Subject: IED - Unit 2 - Lesson 2.3 Advanced Modeling Skills Author: David Boe, Donna Matteson, and Brett Handley | PowerPoint PPT presentation | free to view

Vector computers PowerPoint PPT Presentation

Vector computers - ... machine costing $30 milion + A device to turn a compute-bound problem into an I/O bound problem Any machine designed by Seymour Cray ... The Cray SV1 can ... | PowerPoint PPT presentation | free to view

Problems with Superscalar approach PowerPoint PPT Presentation

Problems with Superscalar approach - 1) Pipelined clock rate: Increasing clock rate requires deeper ... or CISC) Vector. ISA. Up to. Maximum. Vector. Length (MVL) Typical MVL = 64 (Cray) VEC-1 ... | PowerPoint PPT presentation | free to view

Lecture 15 Multimedia Instruction Sets: SIMD and Vector PowerPoint PPT Presentation

Lecture 15 Multimedia Instruction Sets: SIMD and Vector - The Need for Multimedia ISAs. Why aren't general-purpose processors and ISAs ... Characteristics of Multimedia Apps (1) Requirement for real-time response ' ... | PowerPoint PPT presentation | free to view