Title: Statistical Procedures for Comparing RealWorld System and Simulation Output Data
1Statistical Procedures for Comparing Real-World
System and Simulation Output Data
- There are three approaches to compare
observations from a real-world system and output
data from a corresponding simulation model. - Inspection Approach
- Compute one or more statistics from
real-world system observations and corresponding
statistics from the model output data, and then
compare them without the use of formal
statistical procedure (using sample variance is
danger Basic inspection approach).
2Statistical Procedures (cont.)
- To avoid the danger due the using of basic
inspection approach, using the correlated
inspection approach.
3Statistical Procedures (cont.)
- Confidence-Interval (?) Approach Based on
Independent Data - More reliable approach
- Is used to answer two questions
- How large is the mean difference, and how
precise is the estimator of mean difference? - Is there a significant difference between the
actual system and the model? This will lead to
one of - If the ? ?0. for Ers? Esm then there is strong
evidence. - If the ? 0. for Ers Esm then, based on the data
at hand, there is no strong evidence.
4Statistical Procedures (cont.)
- A two-sided 100(1-a)? will always be of the
form - Where is the sample mean, is the
degree of freedom, is the 100(1-a) of a
paired-t, and se(.) is the standard error of the
specified estimator
5Statistical Procedures (cont.)
- Time series Approaches
- Spectral-analysis approach
- Computing the sample spectrum i.e. the Fourier
cosine transformation of the estimated
autocovariance function, of each output applied,
then using existing theory to construct a
confidence interval for the difference of the
logarithms of the two spectra.
6Statistical Procedures (cont.)
- Alternative approach
- Fitting a parametric time-series model to each
set of output data, then apply a hypothesis test
to see whether the two models appear to the same. - Chen and Sargents method
- Construct a confidence interval for the
difference between the system steady-state mean
and the corresponding steady-state mean of the
model.
7Selecting Input Probability Distributions
- How analyst might go about specifying the input
probability distributions? - All real systems contain one or more sources of
randomness. - Sources of randomness for common simulation
application are given in the following table.
8(No Transcript)
9Selecting Input (cont.)
- Useful probability distributions
- Parameterization of continuous distributions
- The parameters can be classified as location
, scale (ß), or shape parameters (a). - Continuous distributions
- Continuous random variables can be used to
describe random phenomena in which the variable
of interest can take on any value in some
interval.
10Continuous distributions (cont.)
- Uniform distribution U(a,b) Models complete
uncertainty, since all outcomes are equally
likely. - Exponential distribution expo(ß) models the time
between independent events, or a process time
which is memoryless. - Gamma distribution gamma (a,ß) An extremely
flexible distribution used to model nonnegative
random variables.
11Continuous distributions (cont.)
- Beta distribution Beta(a1, a2) An extremely
flexible distribution used to model bounded
random variables. - Weibull distribution weibull (a,ß) Models the
time to failure for components. - Normal distribution N(µ,s2) Models the
distribution of process that can be thought of as
the sum of a number of component process.
12Continuous distributions (cont.)
- Lognormal distribution LN (µ,s2) Models the
distribution of process that can be thought of as
the product of (meaning to multiply together) a
number of component process. - Pearson type V distribution PT5(a,ß) time to
perform some task (similar to Lognormal)
13Continuous distributions (cont.)
- Pearson type VI distribution PT6 (a,ß) time to
perform some task - Triangular distribution triang (a,b,c) Models a
process when only the minimum, most likely, and
maximum values of the distribution are known.
14Discrete distributions
- Bernoulli distribution Bernoulli (p) used to
generate other discrete random varieties - Discrete uniform distribution DU (i, j) Models
complete uncertainty, since all outcomes are
equally likely. - Binomial distribution bin (t,p) Models the
number of successes in trials, when the trials
are independent with common success probability,
p.
15Discrete distributions (cont.)
- Geometric distribution geom (p) Number of
failures before the first success in a sequence
of independent Bernoulli trials with probability
p of success on each trial. - Negative Binomial distribution negbin(s,p)
Models the number of trial required to achieve k
success. - Poisson distribution Poisson(?) Models the
number of independent events that occur in a
fixed amount of time or space.
16Empirical distributions
- If the modeler has been unable to find a
theoretical distribution that provides a good
model for the input data, it may be necessary to
use the empirical distribution of the data.
17Identify the distribution with data
- Methods for selecting families of input
distributions when data are available will be
discussed - Activity I
- Hypothesizing Families of Distributions Decide
what general families appear to be appropriate on
the basis of their shapes, without worrying about
the specific parameter value for these families.
18Identify the distribution (cont.)
- Histograms and line graphs A
frequency distribution or histogram is useful in
identifying the shape of a distribution.
A histogram is
constructed as follows - Divide the range of the data (X1, X2, , Xn)
into k disjoint intervals ?b(bj-1-bj), j1, 2,
, k. - Label the horizontal axis to conform to the
intervals selected.
19Identify the distribution (cont.)
- Determine the frequency of occurrences within
each interval. Define the function - Label the vertical axis so that the total
occurrences can be plotted for each intervals. - Plot the frequencies on the vertical axis.
20Identify the distribution (cont.)
- Then Compared with plots of densities of various
distributions in the basis of shape alone because
for any some number y - For Choosing the number of interval K using
21Identify the distribution (cont.)
- Quantile (q) summaries and box plots
- Useful for determining whether the underlying
probability is symmetrical or skewed to the right
or to the left. - If 0 lt F(x) lt 1
- Then for 0lt q lt1, the q-quantile of F(x) is that
number such that - Inverse F(x),
- The median ,
22Identify the distribution (cont.)
- The lower and upper quartiles and
- The lower and upper octiles and
23Identify the distribution (cont.)
- Activity II
- Estimation of parameters
- After a family of distributions has been
selected, the next step is to estimate the
parameters of the distribution. - MLEs (maximum-likelihood estimators)
- The likelihood function for unknown ?
24MLE Properties
- MLE is unique( )
- MLEs need not be unbiased
- MLEs are invariant
- MLEs are asymptotically normally distributed
- MLEs are strongly constant