C on Next-Gen Consoles: Effective Code for New Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

C on Next-Gen Consoles: Effective Code for New Architectures

Description:

Title: GDC 2005 Author: Pete Isensee Last modified by: Pete Isensee Created Date: 8/4/2004 10:10:32 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 34
Provided by: PeteIs1
Category:

less

Transcript and Presenter's Notes

Title: C on Next-Gen Consoles: Effective Code for New Architectures


1
C on Next-Gen ConsolesEffective Code for New
Architectures
  • Pete Isensee
  • Development Manager
  • Microsoft Game Technology Group

2
Last Year at GDC
  • Chris Hecker ranted
  • What did he say?
  • Programmers danger ahead
  • Out-of-order execution good
  • In-order execution bad
  • Microsoft and Sony are going to screw you
  • You are so hosed. Game over, man.
  • Theres absolutely nothing you can do about this

3
Console Hardware Architectures
  • Optimized to do floating-point math
  • Optimized for multithreaded tasks
  • Optimized to run games
  • Not optimized to run general purpose code
  • Not optimized to do branch prediction, code
    reordering, instruction pipelining or other
    out-of-order magic
  • Large L2 caches
  • Large latencies

4
Were Game Programmers.We Love Challenges.
  • We will make games on these consoles
  • The solution is not assembly language
  • The solution is to tailor our C/C engines,
    inner loops and bottleneck functions to the
    realities of the hardware
  • Remember C code can make or break your games
    performance

5
Not Covering
  • Profiling (do it)
  • Multithreading (do it)
  • Memory allocation (avoid in game loop)
  • Compiler settings (experiment)
  • Exception handling (avoid it)

6
Topics for Today
  • Thinking about L2
  • Optimize memory access
  • Use CPU caches effectively
  • Thinking about in-order processing
  • Avoid function call overhead
  • Tips for efficient math
  • Avoid hidden C inefficiencies

7
Optimize Memory Access
  • Proverb thou shalt treat memory as if it were
    thy hard drive
  • You will be memory-bound on new consoles
  • Recommendations
  • Never read from the same place twice in a frame
  • Read data sequentially
  • Write data sequentially
  • Use everything you read

8
Minimize Data Passes
  • Game frame loops often access data twice
  • Or three times
  • Or more
  • Optimize for a single pass
  • Consider less frequent operations
  • AI
  • Physics, collision
  • Networking
  • Particle systems

Multiple Pass Architecture
9
Pointer Aliasing Explained
  • void init( float a, const float b )
  • a0 1.0f - b
  • a1 1.0f - b
  • Nominal case
  • Worst case
  • float a20.0f
  • init( a, a0 )

0.0
0.0
0.0
1.0
1.0
b
a
0.0
0.0
1.0
0.0
a
b
10
A Solution Restrict
  • Restrict keyword tells the compiler theres no
    aliasing
  • Restrict permits the compiler to generate much
    more efficient code
  • void init( float __restrict a,
  • const float __restrict b )
  • a0 1.0f - b // compiler can do
  • a1 1.0f - b // the right thing

11
What to Restrict
  • Use restrict widely
  • Function pointer parameters
  • Local pointers
  • Pointers in structs/classes
  • But not
  • Function return types
  • Casts
  • Global pointers (maybe)
  • References (maybe)

12
Use the CPU Caches Effectively
  • The L2 cache is your best friend
  • Using the cache well is an art
  • Ensure you have a good profiler by your side

13
Keep the Working Set Small
  • Pack commonly used data together
  • Frequently used data might deserve its own
    struct/class
  • Keep rarely used data separate
  • Example texture file names
  • Consider bitfields
  • Bitfields are extremely efficient on PowerPC
  • Consider other forms of lossless compression

14
Inefficient Structs Are Bad Mojo
  • struct InefficientCar
  • bool manual // padding here
  • wheel wheels8 // 8 wheels?
  • bool convertible // more pad
  • char engine // 4 bits used
  • char file32 // rarely used
  • double maxAccel // double?
  • sizeof(InefficientCar) 80

15
Carefully Design Structures
  • struct EfficientCar
  • wheel wheels4 // 4 wheels
  • wheel moreWheels
  • char file // stored elsewhere
  • float maxAccel // float
  • unsigned engine4 // bitfields
  • unsigned manual1
  • unsigned convertible1
  • sizeof(EfficientCar) 32

16
Choose the Right Container
  • Prefer contiguous containers
  • Or at least mostly contiguous
  • Examples array, vector, deque
  • Avoid node-based containers
  • List, set/map, binary trees, hash tables
  • If you must use a tree, consider a custom
    allocator for memory locality
  • Vector stdsort is often faster (and smaller)
    than set or map or hash tables, by an order of
    magnitude

17
Avoid Function Call Overhead
  • Function call overhead was a surprising cause of
    performance issues on Xbox
  • The same is true on Xbox 360 and PS3
  • Fortunately, there are lots of solutions
  • Research compiler settings. On Xbox 360
  • Inline any suitable
  • Enable link-time code generation
  • Spend time ensuring the compiler is inlining the
    right things

18
Avoid Virtual Functions
  • Weigh the limitations of virtual functions
  • Adds a branch instruction
  • Branch is always mispredicted
  • Compiler is limited in how it can optimize
  • Consider replacing
  • virtual void Draw() 0
  • With
  • Xbox360.cpp void Draw() ...
  • Windows.cpp void Draw() ...
  • PS3.cpp void Draw() ...

19
Maximize Leaf Functions
  • Leaf functions dont call other functions, ever
  • If a potential leaf function calls another
    function, the high-level function
  • Is much less likely to be inlined
  • Must set up a stack frame
  • Must set up registers
  • Potential solutions
  • Remove the inner function completely
  • Inline the inner function
  • Provide two versions of the outer function

20
Unroll Inner Loops
  • Compiler cant unroll loops where n is variable
  • Even unrolling from i to i4 can be a
    significant gain
  • Eliminates three branch instructions
  • Increases opportunity for code scheduling
  • Dont forget to hoist invariants out, too

21
Example Unrolling
  • // original
  • for( ia.beg() i!a.end() i )
  • process(i)
  • // unrolled
  • e a.end()
  • for( ia.beg() i!e i4 )
  • process(i) process(i1)
  • process(i2) process(i3)

22
Pass Native Types by Value
  • Tradition says that large types are passed by
    pointer or reference, but be careful
  • New consoles have really large registers
  • Native types include
  • 64-bit int (__int64)
  • VMX vector (__vector4) 128 bits!
  • Pass structs by pointer or reference
  • One exception pass structs consisting of
    bitfields lt 64 bits by value

23
Know Data Type Performance
  • int32 and int64 have equivalent perf
  • float and double have equivalent perf
  • int8 and int16 are slower than int
  • They generate extra instructions
  • High bits cleared or sign-extended
  • Example int32 adds 2X faster than int16 adds
  • Recommendations
  • Store as smallest type required
  • Load into int32, int64 or double for calculations

24
Use Native Vector Types
  • In CS 101, you learned to create abstract data
    types, such as matrices
  • typedef stdvectorltfloat,4gt vec
  • typedef stdvectorltvec,4gt matrix
  • This code is an abomination
  • At least on Xbox 360 and PS3
  • Xbox 360 and PS3 have dedicated vector math units
    called VMX units
  • Use them!

25
Your Math Buddies
  • __vector4 (4 32-bit floats 128-bit register)
  • XMVECTOR (typedef for vector4)
  • XMMATRIX (array of 4 vector4s)
  • XMVECTOR operators (,-,,/)
  • Hundreds of XMVECTOR and XMMATRIX functions
  • Xbox 360-specific, but similar constructs in PS3
    compilers

26
Avoid Floating-Point Branches
  • FP branches are slow
  • Cache has to be flushed
  • 10X slower than int branches
  • Avoid loops with float test expressions
  • Eliminate altogether if possible
  • Can be faster to calculate values you wont use!
  • Compare integers instead
  • Replace with fsel when possible
  • 10-20X performance gain

27
The fsel Option in Detail
  • Definition of hardware implementation
  • float fsel(float a, float b, float c)
  • return ( a lt 0.0f ) ? b c
  • You can replace expressions like
  • v ( w lt x ) ? y z // slow
  • With faster expressions like
  • v fsel( w - x, y, z ) // turbo

28
Prefer Platform-Specific Funcs
  • The C runtime (CRT) is not usually the best
    option when performance matters
  • Xbox 360 examples
  • Prefer CreateFile to fopen or C streams
  • Options for asynchronous reads and other goodness
  • Prefer XMemCpy to memcpy
  • 2-6X faster
  • Prefer XMemSet to memset
  • 8-14X faster

29
Avoid Hidden C Inefficiencies
  • C rocks the house!
  • C can bring your game to its knees!
  • Consider these innocuous snippets
  • Quaternion q
  • s.push_back( k )
  • if( (float)i gt f )
  • obj-gtDraw()
  • GameObject arr1000
  • a b c
  • i

30
C is Dangerous
  • With power comes responsibility
  • Beware constructors
  • Is initialization the right thing to do?
  • Beware hidden allocations
  • Conversion casts may have significant cost
  • Use virtual functions with care
  • Beware overloaded operators
  • Stick to known idioms
  • Operator should be a constant-time operation.
  • Really.

31
Summary
  • There absolutely are many things you can do to
    efficiently program next-gen consoles
  • Two key issues L2/memory and in-order processing
  • Treat memory as you would a hard disk
  • Watch out for those branches use tricks like
    fsel
  • Prefer a light C touch

32
Whats Next
  • Our games are only as good as the weakest member
    of the team
  • Share what youve learned
  • The sharing of ideas allows us to stand on one
    anothers shoulders instead of on one anothers
    feet Jim Warren

33
Questions
  • pkisensee_at_msn.com
  • Fill out your feedback forms
Write a Comment
User Comments (0)
About PowerShow.com