CPE 619 Testing Random-Number Generators - PowerPoint PPT Presentation

About This Presentation
Title:

CPE 619 Testing Random-Number Generators

Description:

CPE 619 Testing Random-Number Generators Aleksandar Milenkovi The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 55
Provided by: Mil36
Learn more at: http://www.ece.uah.edu
Category:

less

Transcript and Presenter's Notes

Title: CPE 619 Testing Random-Number Generators


1
CPE 619Testing Random-Number Generators
  • Aleksandar Milenkovic
  • The LaCASA Laboratory
  • Electrical and Computer Engineering Department
  • The University of Alabama in Huntsville
  • http//www.ece.uah.edu/milenka
  • http//www.ece.uah.edu/lacasa

2
Overview
  • Chi-square test
  • Kolmogorov-Smirnov Test
  • Serial-correlation Test
  • Two-level tests
  • K-dimensional uniformity or k-distributivity
  • Serial Test
  • Spectral Test

3
Testing Random-Number Generators
  • Goal To ensure that the random number generator
    produces a
  • random stream
  • Plot histograms
  • Plot quantile-quantile plot
  • Use other tests
  • Passing a test is necessary but not sufficient
  • Pass ¹ GoodFail ? Bad
  • New tests ? Old generators fail the test
  • Tests can be adapted for other distributions

4
Chi-Square Test
  • Most commonly used test
  • Can be used for any distribution
  • Prepare a histogram of the observed data
  • Compare observed frequencies with theoretical
  • k Number of cells
  • oi Observed frequency for ith cell
  • ei Expected frequency
  • D0 Þ Exact fit
  • D has a chi-square distribution with k-1 degrees
    of freedom.
  • Þ Compare D with c21-a k-1 Pass with
    confidence a if D is less

5
Example 27.1
  • 1000 random numbers with x0 1
  • Observed difference 10.380
  • Observed is Less ? Accept IID U(0, 1)

6
Chi-Square for Other Distributions
  • Errors in cells with a small ei affect the
    chi-square statistic more
  • Best when ei's are equal
  • Þ Use an equi-probable histogram with variable
    cell sizes
  • Combine adjoining cells so that the new cell
    probabilities are approximately equal
  • The number of degrees of freedom should be
    reduced to k-r-1 (in place of k-1), where r is
    the number of parameters estimated from the
    sample
  • Designed for discrete distributions and for large
    sample sizes only ? Lower significance for finite
    sample sizes and continuous distributions
  • If less than 5 observations, combine neighboring
    cells

7
Kolmogorov-Smirnov Test
  • Developed by A. N. Kolmogorov and N. V. Smirnov
  • Designed for continuous distributions
  • Difference between the observed CDF (cumulative
    distribution function) Fo(x) and the expected cdf
    Fe(x) should be small

8
Kolmogorov-Smirnov Test
  • K maximum observed deviation below the
    expected cdf
  • K- minimum observed deviation below the
    expected cdf
  • K lt K1-an and K- lt K1-an Þ Pass at a
    level of significance
  • Don't use max/min of Fe(xi)-Fo(xi)
  • Use Fe(xi1)-Fo(xi) for K-
  • For U(0, 1) Fe(x)x
  • Fo(x) j/n, where x gt x1, x2, ..., xj-1

9
Example 27.2
  • 30 Random numbers using a seed of x015
  • The numbers are14, 11, 2, 6, 18, 23, 7,
    21, 1, 3, 9, 27, 19, 26, 16, 17, 20,
    29, 25, 13, 8, 24, 10, 30, 28, 22, 4,
    12, 5, 15.

10
Example 27.2 (contd)
  • The normalized numbers obtained by dividing the
    sequence by 31 are0.45161, 0.35484, 0.06452,
    0.19355, 0.58065, 0.74194, 0.22581, 0.67742,
    0.03226, 0.09677, 0.29032, 0.87097, 0.61290,
    0.83871, 0.51613, 0.54839, 0.64516, 0.93548,
    0.80645, 0.41935, 0.25806, 0.77419, 0.32258,
    0.96774, 0.90323, 0.70968, 0.12903, 0.38710,
    0.16129, 0.48387.

11
Example 27.2 (contd)
  • K0.9n value for n 30 and a 0.1 is
    1.0424
  • ObservedltTable? Pass

12
Chi-square vs. K-S Test
13
Serial-Correlation Test
  • Nonzero covariance Þ Dependence. The inverse is
    not true
  • Rk Autocovariance at lag k Covxn, xnk
  • For large n, Rk is normally distributed with a
    mean of zero and a variance of 1/144(n-k)
  • 100(1-a) confidence interval for the
    autocovariance is
  • For k?1 Check if CI includes zero
  • For k 0, R0 variance of the sequence
    Expected to be 1/12 for IID U(0,1)

14
Example 27.3 Serial Correlation Test
  • 10,000 random numbers with x01

15
Example 27.3 (contd)
  • All confidence intervals include zero ? All
    covariances are statistically insignificant at
    90 confidence.

16
Two-Level Tests
  • If the sample size is too small, the test results
    may apply locally, but not globally to the
    complete cycle.
  • Similarly, global test may not apply locally
  • Use two-level tests
  • Þ Use Chi-square test on n samples of size k
    each and then use a Chi-square test on the set
    of n Chi-square statistics so obtained
  • Þ Chi-square on Chi-square test.
  • Similarly, K-S on K-S
  • Can also use this to find a nonrandom'' segment
    of an otherwise random sequence.

17
k-Distributivity
  • k-Dimensional Uniformity
  • Chi-square Þ uniformity in one dimensionÞ Given
    two real numbers a1 and b1 between 0 and 1 such
    that b1 gt a1
  • This is known as 1-distributivity property of un.
  • The 2-distributivity is a generalization of this
    property in two dimensions
  • For all choices of a1, b1, a2, b2 in 0, 1,
    b1gta1 and b2gta2

18
k-Distributivity (contd)
  • k-distributed if
  • For all choices of ai, bi in 0, 1, with bigtai,
    i1, 2, ..., k.
  • k-distributed sequence is always
    (k-1)-distributed. The inverse is not true.
  • Two tests
  • Serial test
  • Spectral test
  • Visual test for 2-dimensions Plot successive
    overlapping pairs of numbers

19
Example 27.4
  • Tausworthe sequence generated by
  • The sequence is k-distributed for k up to d /l
    e, that is, k1.
  • In two dimensions Successive overlapping pairs
    (xn, xn1)

20
Example 27.5
  • Consider the polynomial
  • Better 2-distributivity than Example 27.4

21
Serial Test
  • Goal To test for uniformity in two dimensions or
    higher.
  • In two dimensions, divide the space between 0
    and 1 into K2 cells of equal area

22
Serial Test (contd)
  • Given x1, x2,, xn, use n/2 non-overlapping
    pairs (x1, x2), (x3, x4), and count the points
    in each of the K2 cells
  • Expected n/(2K2) points in each cell
  • Use chi-square test to find the deviation of the
    actual counts from the expected counts
  • The degrees of freedom in this case are K2-1
  • For k-dimensions use k-tuples of non-overlapping
    values
  • k-tuples must be non-overlapping
  • Overlapping ? number of points in the cells are
    not independent chi-square test cannot be used
  • In visual check one can use overlapping or
    non-overlapping
  • In the spectral test overlapping tuples are used
  • Given n numbers, there are n-1 overlapping pairs,
    n/2 non-overlapping pairs

23
Spectral Test
  • Goal To determine how densely the k-tuples x1,
    x2, , xk can fill up the k-dimensional
    hyperspace
  • The k-tuples from an LCG fall on a finite number
    of parallel hyper-planes
  • Successive pairs would lie on a finite number of
    lines
  • In three dimensions, successive triplets lie on a
    finite number of planes

24
Example 27.6 Spectral Test
  • All points lie on three straight lines.
  • Or
  • Plot of overlapping pairs

25
Example 27.6 (contd)
  • In three dimensions, the points (xn, xn-1, xn-2)
    for the above generator would lie on five planes
    given by
  • Obtained by adding the following to equation
  • Note that kk1 will be an integer between 0 and
    4.

26
Spectral Test (More)
  • Marsaglia (1968) Successive k-tuples obtained
    from an LCG fall on, at most, (k!m)1/k parallel
    hyper-planes, where m is the modulus used in the
    LCG.
  • Example m 232, fewer than 2,953 hyper-planes
    will contain all 3-tuples, fewer than 566
    hyper-planes will contain all 4-tuples, and
    fewer than 41 hyper-planes will contain all
    10-tuples. Thus, this is a weakness of LCGs.
  • Spectral Test Determine the max distance
    between adjacent hyper-planes.
  • Larger distance Þ worse generator
  • In some cases, it can be done by complete
    enumeration

27
Example 27.7
  • Compare the following two generators
  • Using a seed of x015, first generator
  • Using the same seed in the second generator

28
Example 27.7 (contd)
  • Every number between 1 and 30 occurs once and
    only once
  • Þ Both sequences will pass the chi-square test
    for uniformity

29
Example 27.7 (contd)
  • First Generator

30
Example 27.7 (contd)
  • Three straight lines of positive slope or ten
    lines of negative slope
  • Since the distance between the lines of positive
    slope is more, consider only the lines with
    positive slope
  • Distance between two parallel lines yaxc1 and
    yaxc2 is given by
  • The distance between the above lines is
    or 9.80

31
Example 27.7 (contd)
  • Second Generator

32
Example 27.7 (contd)
  • All points fall on seven straight lines of
    positive slope or six straight lines of negative
    slope.
  • Considering lines with negative slopes
  • The distance between lines is
    or 5.76.
  • The second generator has a smaller maximum
    distance and, hence, the second generator has a
    better 2-distributivity
  • The set with a larger distance may not always be
    the set with fewer lines

33
Example 27.7 (contd)
  • Either overlapping or non-overlapping k-tuples
    can be used
  • With overlapping k-tuples, we have k times as
    many points, which makes the graph visually more
    complete.The number of hyper-planes and the
    distance between them are the same with either
    choice.
  • With serial test, only non-overlapping k-tuples
    should be used.
  • For generators with a large m and for higher
    dimensions, finding the maximum distance becomes
    quite complex.
  • See Knuth (1981)

34
Summary
  • Chi-square test is a one-dimensional
    testDesigned for discrete distributions and
    large sample sizes
  • K-S test is designed for continuous variables
  • Serial correlation test for independence
  • Two level tests find local non-uniformity
  • k-dimensional uniformity k-distributivity
    tested by spectral test or serial test

35
Random Variate Generation
36
Overview
  • Inverse transformation
  • Rejection
  • Composition
  • Convolution
  • Characterization

37
Random-Variate Generation
  • General Techniques
  • Only a few techniques may apply to a particular
    distribution
  • Look up the distribution in Chapter 29

38
Inverse Transformation
  • Used when F-1 can be determined either
    analytically or empirically

39
Proof
40
Example 28.1
  • For exponential variates
  • If u is U(0,1), 1-u is also U(0,1)
  • Thus, exponential variables can be generated by

41
Example 28.2
  • The packet sizes (trimodal) probabilities
  • The CDF for this distribution is

42
Example 28.2 (contd)
  • The inverse function is
  • Note CDF is continuous from the right? the
    value on the right of the discontinuity is
    used? The inverse function is continuous from
    the left? u0.7 ? x64

43
Applications of the Inverse-Transformation
Technique
44
Rejection
  • Can be used if a pdf g(x) exists such that c g(x)
    majorizes the pdf f(x) ? c g(x) gt f(x) 8 x
  • Steps
  • 1. Generate x with pdf g(x)
  • 2. Generate y uniform on 0, cg(x)
  • 3. If y lt f(x), then output x and
    returnOtherwise, repeat from step 1? Continue
    rejecting the random variates x and y until y gt
    f(x)
  • Efficiency how closely c g(x) envelopes f(x)
    Large area between c g(x) and f(x) ? Large
    percentage of (x, y) generated in steps 1 and 2
    are rejected
  • If generation of g(x) is complex, this method
    may not be efficient

45
Example 28.2
  • Beta(2,4) density function
  • Bounded inside a rectangle of height 2.11?
    Steps
  • Generate x uniform on 0, 1
  • Generate y uniform on 0, 2.11
  • If y lt 20 x(1-x)3, then output x and
    returnOtherwise repeat from step 1

46
Composition
  • Can be used if CDF F(x) Weighted sum of n other
    CDFs.
  • Here, , and
    Fi's are distribution functions.
  • n CDFs are composed together to form the desired
    CDFHence, the name of the technique.
  • The desired CDF is decomposed into several other
    CDFs? Also called decomposition
  • Can also be used if the pdf f(x) is a weighted
    sum of n other pdfs

47
  • Steps
  • Generate a random integer I such that
  • This can easily be done using the
    inverse-transformation method.
  • Generate x with the ith pdf fi(x) and return.

48
Example 28.4
  • pdf
  • Composition of two exponential pdf's
  • Generate
  • If u1lt0.5, return otherwise return xa ln u2.
  • Inverse transformation better for Laplace

49
Convolution
  • Sum of n variables
  • Generate n random variate yi's and sum
  • For sums of two variables, pdf of x
    convolution of pdfs of y1 and y2. Hence the name
  • Although no convolution in generation
  • If pdf or CDF Sum ? Composition
  • Variable x Sum ? Convolution

50
Convolution Examples
  • Erlang-k åi1k Exponentiali
  • Binomial(n, p) åi1n Bernoulli(p)? Generated n
    U(0,1), return the number of RNs less than p
  • c2(n) åi1n N(0,1)2
  • G(a, b1)G(a,b2)G(a,b1b2)? Non-integer value
    of b integer fraction
  • åi1n Any Normal ? å U(0,1) Normal
  • åi1m Geometric Pascal
  • åi12 Uniform Triangular

51
Characterization
  • Use special characteristics of distributions ?
    characterization
  • Exponential inter-arrival times ? Poisson number
    of arrivals? Continuously generate exponential
    variates until their sum exceeds T and return the
    number of variates generated as the Poisson
    variate.
  • The ath smallest number in a sequence of ab1
    U(0,1) uniform variates has a b(a, b)
    distribution.
  • The ratio of two unit normal variates is a
    Cauchy(0, 1) variate.
  • A chi-square variate with even degrees of freedom
    c2(n) is the same as a gamma variate g(2,n/2).
  • If x1 and x2 are two gamma variates g(a,b) and
    g(a,c), respectively, the ratio x1/(x1x2) is a
    beta variate b(b,c).
  • If x is a unit normal variate, ems x is a
    lognormal(m, s) variate.

52
Summary
Yes
Is CDF a sum of other CDFs?
Use composition
Is pdf a sum of other pdfs?
Yes
Use Composition
53
Summary (contd)
Is the variate a sum of other variates
Yes
Use convolution
Is the variate related to other variates?
Yes
Use characterization
Does a majorizing function exist?
Yes
Use rejection
No
Use empirical inversion
54
Homework 6
  • Submit answers to exercise 27.1
  • Submit answers to exercise 27.4
  • Due Monday, April 7, 2008, 1245 PM
  • Submit a hard copy to instructor
Write a Comment
User Comments (0)
About PowerShow.com