Title: C on Next-Gen Consoles: Effective Code for New Architectures
1C on Next-Gen ConsolesEffective Code for New
Architectures
- Pete Isensee
- Development Manager
- Microsoft Game Technology Group
2Last Year at GDC
- Chris Hecker ranted
- What did he say?
- Programmers danger ahead
- Out-of-order execution good
- In-order execution bad
- Microsoft and Sony are going to screw you
- You are so hosed. Game over, man.
- Theres absolutely nothing you can do about this
3Console Hardware Architectures
- Optimized to do floating-point math
- Optimized for multithreaded tasks
- Optimized to run games
- Not optimized to run general purpose code
- Not optimized to do branch prediction, code
reordering, instruction pipelining or other
out-of-order magic - Large L2 caches
- Large latencies
4Were Game Programmers.We Love Challenges.
- We will make games on these consoles
- The solution is not assembly language
- The solution is to tailor our C/C engines,
inner loops and bottleneck functions to the
realities of the hardware - Remember C code can make or break your games
performance
5Not Covering
- Profiling (do it)
- Multithreading (do it)
- Memory allocation (avoid in game loop)
- Compiler settings (experiment)
- Exception handling (avoid it)
6Topics for Today
- Thinking about L2
- Optimize memory access
- Use CPU caches effectively
- Thinking about in-order processing
- Avoid function call overhead
- Tips for efficient math
- Avoid hidden C inefficiencies
7Optimize Memory Access
- Proverb thou shalt treat memory as if it were
thy hard drive - You will be memory-bound on new consoles
- Recommendations
- Never read from the same place twice in a frame
- Read data sequentially
- Write data sequentially
- Use everything you read
8Minimize Data Passes
- Game frame loops often access data twice
- Or three times
- Or more
- Optimize for a single pass
- Consider less frequent operations
- AI
- Physics, collision
- Networking
- Particle systems
Multiple Pass Architecture
9Pointer Aliasing Explained
- void init( float a, const float b )
- a0 1.0f - b
- a1 1.0f - b
-
- Nominal case
- Worst case
- float a20.0f
- init( a, a0 )
0.0
0.0
0.0
1.0
1.0
b
a
0.0
0.0
1.0
0.0
a
b
10A Solution Restrict
- Restrict keyword tells the compiler theres no
aliasing - Restrict permits the compiler to generate much
more efficient code - void init( float __restrict a,
- const float __restrict b )
- a0 1.0f - b // compiler can do
- a1 1.0f - b // the right thing
-
11What to Restrict
- Use restrict widely
- Function pointer parameters
- Local pointers
- Pointers in structs/classes
- But not
- Function return types
- Casts
- Global pointers (maybe)
- References (maybe)
12Use the CPU Caches Effectively
- The L2 cache is your best friend
- Using the cache well is an art
- Ensure you have a good profiler by your side
13Keep the Working Set Small
- Pack commonly used data together
- Frequently used data might deserve its own
struct/class - Keep rarely used data separate
- Example texture file names
- Consider bitfields
- Bitfields are extremely efficient on PowerPC
- Consider other forms of lossless compression
14Inefficient Structs Are Bad Mojo
- struct InefficientCar
- bool manual // padding here
- wheel wheels8 // 8 wheels?
- bool convertible // more pad
- char engine // 4 bits used
- char file32 // rarely used
- double maxAccel // double?
-
- sizeof(InefficientCar) 80
15Carefully Design Structures
- struct EfficientCar
- wheel wheels4 // 4 wheels
- wheel moreWheels
- char file // stored elsewhere
- float maxAccel // float
- unsigned engine4 // bitfields
- unsigned manual1
- unsigned convertible1
-
- sizeof(EfficientCar) 32
16Choose the Right Container
- Prefer contiguous containers
- Or at least mostly contiguous
- Examples array, vector, deque
- Avoid node-based containers
- List, set/map, binary trees, hash tables
- If you must use a tree, consider a custom
allocator for memory locality - Vector stdsort is often faster (and smaller)
than set or map or hash tables, by an order of
magnitude
17Avoid Function Call Overhead
- Function call overhead was a surprising cause of
performance issues on Xbox - The same is true on Xbox 360 and PS3
- Fortunately, there are lots of solutions
- Research compiler settings. On Xbox 360
- Inline any suitable
- Enable link-time code generation
- Spend time ensuring the compiler is inlining the
right things
18Avoid Virtual Functions
- Weigh the limitations of virtual functions
- Adds a branch instruction
- Branch is always mispredicted
- Compiler is limited in how it can optimize
- Consider replacing
- virtual void Draw() 0
- With
- Xbox360.cpp void Draw() ...
- Windows.cpp void Draw() ...
- PS3.cpp void Draw() ...
19Maximize Leaf Functions
- Leaf functions dont call other functions, ever
- If a potential leaf function calls another
function, the high-level function - Is much less likely to be inlined
- Must set up a stack frame
- Must set up registers
- Potential solutions
- Remove the inner function completely
- Inline the inner function
- Provide two versions of the outer function
20Unroll Inner Loops
- Compiler cant unroll loops where n is variable
- Even unrolling from i to i4 can be a
significant gain - Eliminates three branch instructions
- Increases opportunity for code scheduling
- Dont forget to hoist invariants out, too
21Example Unrolling
- // original
- for( ia.beg() i!a.end() i )
- process(i)
- // unrolled
- e a.end()
- for( ia.beg() i!e i4 )
- process(i) process(i1)
- process(i2) process(i3)
-
22Pass Native Types by Value
- Tradition says that large types are passed by
pointer or reference, but be careful - New consoles have really large registers
- Native types include
- 64-bit int (__int64)
- VMX vector (__vector4) 128 bits!
- Pass structs by pointer or reference
- One exception pass structs consisting of
bitfields lt 64 bits by value
23Know Data Type Performance
- int32 and int64 have equivalent perf
- float and double have equivalent perf
- int8 and int16 are slower than int
- They generate extra instructions
- High bits cleared or sign-extended
- Example int32 adds 2X faster than int16 adds
- Recommendations
- Store as smallest type required
- Load into int32, int64 or double for calculations
24Use Native Vector Types
- In CS 101, you learned to create abstract data
types, such as matrices - typedef stdvectorltfloat,4gt vec
- typedef stdvectorltvec,4gt matrix
- This code is an abomination
- At least on Xbox 360 and PS3
- Xbox 360 and PS3 have dedicated vector math units
called VMX units - Use them!
25Your Math Buddies
- __vector4 (4 32-bit floats 128-bit register)
- XMVECTOR (typedef for vector4)
- XMMATRIX (array of 4 vector4s)
- XMVECTOR operators (,-,,/)
- Hundreds of XMVECTOR and XMMATRIX functions
- Xbox 360-specific, but similar constructs in PS3
compilers
26Avoid Floating-Point Branches
- FP branches are slow
- Cache has to be flushed
- 10X slower than int branches
- Avoid loops with float test expressions
- Eliminate altogether if possible
- Can be faster to calculate values you wont use!
- Compare integers instead
- Replace with fsel when possible
- 10-20X performance gain
27The fsel Option in Detail
- Definition of hardware implementation
- float fsel(float a, float b, float c)
-
- return ( a lt 0.0f ) ? b c
-
- You can replace expressions like
- v ( w lt x ) ? y z // slow
- With faster expressions like
- v fsel( w - x, y, z ) // turbo
28Prefer Platform-Specific Funcs
- The C runtime (CRT) is not usually the best
option when performance matters - Xbox 360 examples
- Prefer CreateFile to fopen or C streams
- Options for asynchronous reads and other goodness
- Prefer XMemCpy to memcpy
- 2-6X faster
- Prefer XMemSet to memset
- 8-14X faster
29Avoid Hidden C Inefficiencies
- C rocks the house!
- C can bring your game to its knees!
- Consider these innocuous snippets
- Quaternion q
- s.push_back( k )
- if( (float)i gt f )
- obj-gtDraw()
- GameObject arr1000
- a b c
- i
30C is Dangerous
- With power comes responsibility
- Beware constructors
- Is initialization the right thing to do?
- Beware hidden allocations
- Conversion casts may have significant cost
- Use virtual functions with care
- Beware overloaded operators
- Stick to known idioms
- Operator should be a constant-time operation.
- Really.
31Summary
- There absolutely are many things you can do to
efficiently program next-gen consoles - Two key issues L2/memory and in-order processing
- Treat memory as you would a hard disk
- Watch out for those branches use tricks like
fsel - Prefer a light C touch
32Whats Next
- Our games are only as good as the weakest member
of the team - Share what youve learned
- The sharing of ideas allows us to stand on one
anothers shoulders instead of on one anothers
feet Jim Warren
33Questions
- pkisensee_at_msn.com
- Fill out your feedback forms