Title: Parallel White Noise Generation on a GPU via Cryptographic Hash
1Parallel White Noise Generationon a GPU via
Cryptographic Hash
- Stanley Tzeng Li-Yi Wei
- Microsoft Research Asia
2What is White Noise?
- Spatial domain uniform random number
- Frequency domain white noise
spatial domain
frequency domain
3Importance
- Mother of all random numbers
- Commonly used, e.g. rand() in C/C
- Major algorithms sequential
- e.g. xn a xn-1 b mod c
- Processors are becoming parallel
- GPU, multi-core CPU, Cell
- sequential algorithms cannot leverage that
4Contribution
- ?Parallel algorithm for white noises
- independent evaluation for every sample
- easy implementation as a GPU pixel shader
- speed faster than sequential algorithms
- quality same or better
- usage similar to texture mapping
5PRNG (Pseudo Random Number Generator)
- The main source of randomness in programs
- Desirable properties
- white noise statistics
- repeatable
- fast computation
- low memory usage
6Core Idea
- input trivially prepared in parallel, e.g. linear
ramp - feed input value into hash, independently and in
parallel - output white noise
- key idea
- borrow cryptographic hash!
input
hash
output
7Hash
- (however nice) input ? (unrecognizable) mess
8Cryptographic Hash
- A subclass of hash
- Commonly used for security applications
- e.g. password, digital signature
- Properties
- irreversible cannot find input from hash output
- decorrelating similar inputs, dissimilar
outputs - uniform probability all outputs likely to occur
9Cryptographic Hash - Example
- irreversible, decorrelating, uniform probability
CHash ("The quick brown fox jumps over the lazy
dog") 9e107d9d372bb6826bd81d3542a419d6 CHash
("The quick brown fox jumps over the lazy eog")
ffd93f16876049265fbaef4da268dd0e
10Cryptographic Hash as a PRNG
- White noise statistics
- CHash is cryptographically secure
- Repeatable
- CHash is invariant with same input
- Fast computation
- CHash is parallel constant cost
- Low memory usage
- CHash maintains no state
- Order-independent i.e. Random accessible
- important for parallel GPU applications
hash
11Which Cryptographic Hash?
- Many options
- MD5, SHA, RIPEMD, Tiger, block cipher, etc
- Desirable properties
- white noise quality
- fast computation
- power-of-2 aligned (output operations)
- pure pixel shader, no state maintenance
12Our Hash of Choice MD5 Rivest 1992
- 128-bit outputs and 32-bit operation
- Small number of constants fit entirely in shader
- Fastest among those satisfying quality criteria
- Not 100 secure Wang and Yu 2005
- but good enough for our goal
13MD5 Algorithm Overview
Scrambling (bit op, table, arithmetic)
Input
Output
shift table
sin table
64 rounds
14Performance Bottlenecks for Pixel Shader
Scrambling (bit op, table, arithmetic)
Input
Output
shift table
sin table
64 rounds
15Our Optimization
Scrambling (bit op, table, arithmetic)
Input
Output
shift table
sin table
sin function
reduced shift table
64 rounds
loop unrolling
16Previous PRNG
- GPU
- BBS Blum et al. 1986, Olano 2005
- O extremely fast
- X not good quality
- CEICG Entacher et al. 1998, Sussman et al. 2006
- O decent quality
- X processing time varies
- AES NIST 2001, Yamanouchi 2007
- O invertible (not hash)
- X not good quality
- CPU
- rand
- O commonly used
- X not good quality
- drand48
- O better quality
- X slower
- Mersenne Twister Matsumoto and Nishimura 1998
- O high quality and fast
- X not random accessible
17Assessing Quality DIEHARD Marsaglia 1995
- De facto standard on measuring PRNG quality
- Runs 15 different tests on the bits generated
- Outputs p-val. If p 0 p 1, fail.
BIRTHDAY SPACINGS TEST, M 512 N224 LAMBDA
2.0000 Results for aes.bin
For a sample of size 500 mean
aes.bin using bits 1 to 24
2.036 duplicate number number
spacings observed expected 0
66. 67.668 1 130.
135.335 2 148. 135.335
3 80. 90.224 4
44. 45.112 5 20.
18.045 6 to INF 12. 8.282
Chisquare with 6 d.o.f. 4.50 p-value
.391147
18Cumulative Distribution Function
- Shows how data is distributed within set
- Given x in data, what of data values are x
100
100
0
0
X0
1
X0
1
Normal Distribution
Uniform Distribution
19Kolmogorov-Smirnov Test
- Determines how two sets of data are alike
- Looks at max difference D between distribution
functions
100
100
not alike
alike
D
D
0
0
1
X0
1
X0
20Assessing Quality DIEHARD
- Run the results of the DIEHARD test (p-value)
through a KS-test. Look at D-value.
Uniform Distribution Curve P-value Curve D-Value
100
D
Smaller D is better quality!
0
Cumulative Distribution Function
21Assessing Quality Power Spectrum
- Radial mean should be uniform
- Radial variance should be low uniform
Power spectrum density
Radial mean
Radial variance (Anisotropy)
22Assessing Speed Batch Rendering
- Clock time to generate random bits
- n2 x 128 bits image, n 512, 1024, 2048 and 4096
n2
n2
23Assessing Speed Texture Subset(For random
accessibility)
A
- A huge virtual texture
- clock time for access A B
- measure difference
- (smaller is better)
220
220
B
24Test Results DIEHARD Results
the higher the better
the lower the better
25Test Results Power Spectrum Tests
MD5
M. Twister
GPU BBS
26Test Results Batch Render Speed
27Test Results Texture Subset Speed
28Trading Quality for Speed
- Reducing of rounds
- O faster speed
- X lower quality
Rounds Time(ms) DIEHARD tests passed KS D-Val
64 6.3 15/15 0.2029
48 4.7 14/15 0.2042
32 3.1 13/15 0.2295
16 1.6 13/15 0.253
29Applications
Texture tiling (fragment shader)
Fractal terrain (vertex shader)
30Future Work
- Implement our method in hardware
- very similar to texture unit but much smaller
- (no need for cache)
- Alternative hashes
- ride with advances in cryptographic hash
31Thank You!