A Refresher on Probability and Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

A Refresher on Probability and Statistics

Description:

Title: Appendix C -- A Refresher on Probability and Statistics Author: Kelton/Sadowski/Sadowski Last modified by: Administrator Created Date: 6/23/2001 8:49:48 PM – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 59
Provided by: KeltonSad69
Category:

less

Transcript and Presenter's Notes

Title: A Refresher on Probability and Statistics


1
A Refresher on Probability and Statistics
2
What Well Do ...
  • Ground-up review of probability and statistics
    necessary to do and understand simulation
  • Outline
  • Probability basic ideas, terminology
  • Random variables, joint distributions
  • Sampling
  • Statistical inference point estimation,
    confidence intervals, hypothesis testing

3
Monte Carlo Simulation
  • Monte Carlo method Probabilistic simulation
    technique used when a process has a random
    component
  • Identify a probability distribution
  • Setup intervals of random numbers to match
    probability distribution
  • Obtain the random numbers
  • Interpret the results

4
Probability Basics
  • Experiment activity with uncertain outcome
  • Flip coins, throw dice, pick cards, draw balls
    from urn,
  • Drive to work tomorrow Time? Accident?
  • Operate a (real) call center Number of calls?
    Average customer hold time? Number of customers
    getting busy signal?
  • Simulate a call center same questions as above
  • Sample space complete list of all possible
    individual outcomes of an experiment
  • Could be easy or hard to characterize
  • May not be necessary to characterize

5
Probability Basics (contd.)
  • Event a subset of the sample space
  • Describe by either listing outcomes, physical
    description, or mathematical description
  • Usually denote by E, F, G or E1, E2, etc.
  • Ex arrival of a customer, start of work on a job
  • Probability of an event is the relative
    likelihood that it will occur when you do the
    experiment
  • A real number between 0 and 1 (inclusively)
  • Denote by P(E), P(E ? F), etc.
  • Interpretation proportion of time the event
    occurs in many independent repetitions
    (replications) of the experiment

6
Probability Basics (contd.)
  • Some properties of probabilities
  • If S is the sample space, then P(S) 1
  • If Ø is the empty event (empty set), then P(Ø)
    0
  • If EC is the complement of E, then P(EC) 1
    P(E)
  • P(E ? F) P(E) P(F) P(E ? F)
  • If E and F are mutually exclusive (i.e., E ? F
    Ø), then
  • P(E ? F) P(E) P(F)
  • If E is a subset of F (i.e., the occurrence of E
    implies the occurrence of F), then P(E) ? P(F)
  • If o1, o2, are the individual outcomes in the
    sample space, then

7
Probability Basics (contd.)
  • Conditional probability
  • Knowing that an event F occurred might affect the
    probability that another event E also occurred
  • Reduce the effective sample space from S to F,
    then measure size of E relative to its overlap
    (if any) in F, rather than relative to S
  • Definition (assuming P(F) ? 0)
  • E and F are independent if P(E ? F) P(E) P(F)
  • Implies P(EF) P(E) and P(FE) P(F), i.e.,
    knowing that one event occurs tells you nothing
    about the other
  • If E and F are mutually exclusive, are they
    independent?

8
Random Variables
  • One way of quantifying, simplifying events and
    probabilities
  • A random variable (RV) is a number whose value is
    determined by the outcome of an experiment
  • Assigns value to each point in the sample space
  • Associates with each possible outcome of the
    experiment
  • Usually denoted as capital letters X, Y, W1,
    W2, etc.
  • Probabilistic behavior described by distribution
    function

9
Discrete vs. Continuous RVs
  • Two basic flavors of RVs, used to represent or
    model different things
  • Discrete can take on only certain separated
    values
  • Number of possible values could be finite or
    infinite
  • Continuous can take on any real value in some
    range
  • Number of possible values is always infinite
  • Range could be bounded on both sides, just one
    side, or neither (? 8 ? ? ? 8 )

10
RV in Simulation
  • Input
  • Uncertain time duration (service or inter-arrival
    times)
  • Number of customers in an arriving group
  • Which of several part types a given arriving part
    is
  • Output
  • Average time in system
  • Number of customers served
  • Maximum length of buffer

11
Discrete Distributions
  • Let X be a discrete RV with possible values
    (range) x1, x2, (finite or infinite list)
  • Probability Mass Function (PMF)
  • p(xi) P(X xi) for i 1, 2, ...
  • The statement X xi is an event that may or
    may not happen, so it has a probability of
    happening, as measured by the PMF
  • Can express PMF as numerical list, table, graph,
    or formula
  • Since X must be equal to some xi, and since the
    xis are all distinct,

12
Discrete Distributions (contd.)
  • Cumulative distribution function (CDF)
    probability that the RV will be ? a fixed value
    x
  • Properties of discrete CDFs
  • 0 ? F(x) ? 1 for all x
  • As x ? ?, F(x) ? 0
  • As x ? ?, F(x) ? 1
  • F(x) is nondecreasing in x
  • F(x) is a step function continuous from the right
    with jumps at the xis of height equal to the PMF
    at that xi

13
Example of CDF
14
Example of CDF
15
Discrete Distributions (contd.)
  • Computing probabilities about a discrete RV
    usually use the PMF
  • Add up p(xi) for those xis satisfying the
    condition for the event
  • With discrete RVs, must be careful about weak vs.
    strong inequalities endpoints matter!

16
Discrete Expected Values
  • Data set has a center the average (mean)
  • RVs have a center expected value
  • Also called the mean or expectation of the RV X
  • Other common notation m, mX
  • Weighted average of the possible values xi, with
    weights being their probability (relative
    likelihood) of occurring
  • What expectation is not The value of X you
    expect to get
  • E(X) might not even be among the possible values
    x1, x2,
  • What expectation is
  • Repeat the experiment many times, observe many
    X1, X2, , Xn
  • E(X) is what converges to (in a certain
    sense) as n ? ?

17
Discrete Variances andStandard Deviations
  • Data set has measures of dispersion
  • Sample variance
  • Sample standard deviation
  • RVs have corresponding measures
  • Other common notation
  • Weighted average of squared deviations of the
    possible values xi from the mean
  • Standard deviation of X is
  • Interpretation analogous to that for E(X)

18
Continuous Distributions
  • Now let X be a continuous RV
  • Possibly limited to a range bounded on left or
    right or both
  • No matter how small the range, the number of
    possible values for X is always (uncountably)
    infinite
  • Not sensible to ask about P(X x) even if x is
    in the possible range
  • Technically, P(X x) is always 0
  • Instead, describe behavior of X in terms of its
    falling between two values

19
Continuous Distributions (contd.)
  • Probability density function (PDF) is a function
    f(x) with the following three properties
  • f(x) ? 0 for all real values x
  • The total area under f(x) is 1
  • For any fixed a and b with a ? b, the probability
    that X will fall between a and b is the area
    under f(x) between a and b

20
CDF and PDF
21
Continuous Distributions (contd.)
  • Cumulative distribution function (CDF) -
    probability that the RV will be ? a
    fixed value x
  • Properties of continuous CDFs
  • 0 ? F(x) ? 1 for all x
  • As x ? ?, F(x) ? 0
  • As x ? ?, F(x) ? 1
  • F(x) is nondecreasing in x
  • F(x) is a continuous function with slope equal to
    the PDF
  • f(x) F'(x)

22
Continuous Expected Values, Variances, and
Standard Deviations
  • Expectation or mean of X is
  • Roughly, a weighted continuous average of
    possible values for X
  • Same interpretation as in discrete case average
    of a large number (infinite) of observations on
    the RV X
  • Variance of X is
  • Standard deviation of X is

23
Joint Distributions
  • So far Looked at only one RV at a time
  • But they can come up in pairs, triples, ,
    tuples, forming jointly distributed RVs or random
    vectors
  • Input (T, P, S) (type of part, priority,
    service time)
  • Output W1, W2, W3, output process of
    times in system of exiting parts
  • One central issue is whether the individual RVs
    are independent of each other or related
  • Will take the special case of a pair of RVs (X1,
    X2)
  • Extends naturally (but messily) to higher
    dimensions

24
Joint Distributions (contd.)
  • Joint CDF of (X1, X2) is a function of two
    variables
  • Same definition for discrete and continuous
  • If both RVs are discrete, define the joint PMF
  • If both RVs are continuous, define the joint PDF
    f(x1, x2) as a nonnegative function with total
    volume below it equal to 1, and

25
Covariance Between RVs
  • Measures linear relation between X1 and X2
  • Covariance between X1 and X2 is
  • Covariance tells us whether the two random
    variables are related or not. If they are,
    whether the relationship is positive or negative.
  • Interpreting value of covariance difficult
    since it depends on units of measurement

26
Correlation Between RVs
  • Correlation (coefficient) between X1 and X2 is
  • Always between 1 and 1
  • Ex Correlation of 0.85 means strong
    relationship, 0.10 means weak.
  • Cor (X, Y) gt 0 means ve Correlation
  • X Y move in the same direction ? ?
  • Cor (X, Y) 0 means no correlation
  • Cor X, Y) lt 0 means ve correlation X ?, and Y ?

27
Independent RVs
  • X1 and X2 are independent if their joint CDF
    factors into the product of their marginal CDFs
  • Equivalent to use PMF or PDF instead of CDF
  • Properties of independent RVs
  • They have nothing (linearly) to do with each
    other
  • Independence ? uncorrelated
  • But not vice versa, unless the RVs have a joint
    normal distribution
  • Tempting just to assume it whether justified or
    not
  • Independence in simulation
  • Input Usually assume separate inputs are indep.
    valid?
  • Output Standard statistics assumes indep.
    valid?!?!?!?

28
Sampling
  • Statistical analysis estimate or infer
    something about a population or process based on
    only a sample from it
  • Think of a RV with a distribution governing the
    population
  • Random sample is a set of independent and
    identically distributed (IID) observations X1,
    X2, , Xn on this RV
  • In simulation, sampling is making some runs of
    the model and collecting the output data
  • Dont know parameters of population (or
    distribution) and want to estimate them or infer
    something about them based on the sample

29
Sampling (contd.)
  • Population parameter
  • Population mean m E(X)
  • Population variance s2
  • Population proportion
  • Parameter need to know whole population
  • Fixed (but unknown)
  • Sample estimate
  • Sample mean
  • Sample variance
  • Sample proportion
  • Sample statistic can be computed from a sample
  • Varies from one sample to another is a RV
    itself, and has a distribution, called the
    sampling distribution

30
Point Estimation
  • A sample statistic that estimates (in some sense)
    a population parameter
  • Properties
  • Unbiased E(estimate) parameter
  • Efficient Var(estimate) is lowest among
    competing point estimators
  • Consistent Var(estimate) decreases (usually to
    0) as the sample size increases

31
Confidence Intervals
  • A point estimator is just a single number, with
    some uncertainty or variability associated with
    it
  • Confidence interval quantifies the likely
    imprecision in a point estimator
  • An interval that contains (covers) the unknown
    population parameter with specified (high)
    probability 1 a
  • Called a 100 (1 a) confidence interval for the
    parameter
  • Confidence interval for the population mean m
  • CIs for some other parameters in text book

32
Confidence Intervals in Simulation
  • Run simulations, get results
  • View each replication of the simulation as a data
    point
  • Random input ? random output
  • Form a confidence interval
  • Brackets (with probability 1 a) the true
    expected output (what youd get by averaging an
    infinite number of replications)

33
Example
  • 1.2, 1.5, 1.68, 1.89, 0.95, 1.49, 1.58,
    1.55, 0.50, 1.09.
  • Calculate the 90 confidence interval
  • Sample Mean 1.34
  • Sample Variance s2 0.17l
  • 90 confidence interval means ? 1 0.90 0.1
  • Degrees of freedom n 10 1 9.
  • 1.34 ? t9,0.95 ? (0.17 / 10). Look into t
    distribution table for t9,0.95 1.83
  • 1.34 ? 1.83 ? (0.17 / 10). 1.34 ? 0.24
  • ? Confidence Interval 1.10, 1.58

34
Hypothesis Tests
  • Test some assertion about the population or its
    parameters
  • Null hypothesis (H0) what is to be tested
  • Alternate hypothesis (H1 or HA) denial of H0
  • H0 m 6 vs. H1 m ? 6
  • H0 s lt 10 vs. H1 s ? 10
  • H0 m1 m2 vs. H1 m1 ? m2
  • Develop a decision rule to decide on H0 or H1
    based on sample data

35
Errors in Hypothesis Testing
  • Type-I error is often called the producer's risk
  • The probability of a type-I error is the level of
    significance of the test of hypothesis and is
    denoted by a .
  • Type-II error is often called the consumer's risk
    for not rejecting possibly a worthless product
  • The probability of a type-II error is denoted by
    b . The quantity 1 - b is known as the Power of a
    Test
  • H0 and H1 are not given equal treatment. Benefit
    of doubt is given to H0

36
p-Values for Hypothesis Tests
  • Traditional method is Accept or Reject H0
  • Alternate method compute p-value of the test
  • p-value probability of getting a test result
    more in favor of H1 than what you got from your
    sample
  • Small p (lt 0.01) is convincing evidence against
    H0
  • Large p (gt 0.10) indicates lack of evidence
    against H0
  • Connection to traditional method
  • If p lt a, reject H0
  • If p ? a, do not reject H0
  • p-value quantifies confidence about the decision

37
Goodness-of-fit Test
  • Chi Square Test
  • Kolmogorov Smirnov test
  • Both tests ask how close the fitted distribution
    is to the empirical distribution defined directly
    by the data

38
Hypothesis Testing in Simulation
  • Input side
  • Specify input distributions to drive the
    simulation
  • Collect real-world data on corresponding
    processes
  • Fit a probability distribution to the observed
    real-world data
  • Test H0 the data are well represented by the
    fitted distribution
  • Output side
  • Have two or more competing designs modeled
  • Test H0 all designs perform the same on output,
    or test H0 one design is better than another

39
Case Study
40
Case Study Printed Circuit Assembly Manufacturing
  • The company, engaged in electronic assembly
    contract manufacturing, wants to achieve the
    following goals
  • Maximize equipment utilization
  • Minimize machine downtime
  • Increase inventory control accuracy
  • Provide material traceability
  • Minimize time and resources spent looking for
    materials and tools on the shop-floor

41
Electronics Assembly
  • Surface Mount Technology (SMT) or Pin
    Through-Hole (PTH) are used to place components
    on bare boards
  • An SMT assembly line typically include
  • Screen printer - to apply solder paste on the
    bare board
  • High-speed placement machine - for chips
    typically
  • Fine-Pitch placement machine - for larger
    components typically
  • Owen - to bake the board after components are
    placed.
  • The Company has 3 assembly lines

42
Typical Reasons for Assembly Line Down Time
  • Poor line balance and flexibility
  • Poor machine balance within assembly lines
  • Large number of setups and total setup time
  • Part shortage during the run
  • Feeder problems
  • Long reel changeovers
  • Operator is not attending the machine
  • Setup kit is not delivered on time
  • Placing wrong parts
  • Component data problems
  • Process Control 1st piece inspection
  • Operator waiting for support
  • Machine program changeover time

43
Real-Time Performance Monitoring
44
Machine Utilization
45
Machine Utilization
46
Assembly Line Performance Metrics
  • Assembly efficiency - the difference (in
    percentage) between the desired assembly time and
    the actual assembly time required to complete a
    board (desired time/actual time)100 target
    95-100
  • Minimum cycle time - the largest machine
    operation time within the assembly line
  • Average cycle time - the average time a board is
    completed, i.e. the last operation is completed
  • The average number of boards in the queue
    -between two placement machines

47
A Guided Tour Through Arena
48
Flowchart and Spreadsheet Views
  • Model window split into two views
  • Flowchart view
  • Graphics
  • Process flowchart
  • Animation, drawing
  • Edit things by double-clicking on them, get into
    a dialog
  • Spreadsheet view
  • Displays model data directly
  • Can edit, add, delete data in spreadsheet view
  • Displays all similar kinds of modeling elements
    at once
  • Many model parameters can be edited in either
    view
  • Horizontal splitter bar to apportion the two
    views
  • View/Split Screen to see only the most recently
    selected view

49
Modules
  • Basic building blocks of a simulation model
  • Two basic types flowchart and data
  • Different types of modules for different actions,
    specifications
  • Blank modules are on the Project Bar
  • To add a flowchart module to your model, drag it
    from the Project Bar into the flowchart view of
    the model window
  • To use a data module, select it (single-click) in
    the Project Bar and edit in the spreadsheet view
    of the model window

50
Relations Among Modules
  • Flowchart and data modules are related via names
    for objects
  • Queues, Resources, Entity types, Variables
    others
  • Arena keeps internal lists of different kinds of
    names
  • Presents existing lists to you where appropriate
  • Helps you remember names, protects you from typos
  • All names you make up in a model must be unique
    across the model, even across different types of
    modules

51
Create Module
52
Process Module
53
Queue-Length Plot
54
Dispose Module
55
Setting the Run Conditions
  • Run/Setup menu dialog five tabs
  • Project Parameters Title, your name, output
    statistics
  • Replication Parameters Number of Replications,
    Length of Replication (and Time Units), Base Time
    Units (output measures, internal computations),
    Warm-up Period (when statistics are cleared),
    Terminating Condition (complex stopping rules),
    Initialization options Between Replications
  • Other three tabs specify animation speed, run
    conditions, and reporting preferences
  • Terminating your simulation
  • You must specify part of modeling
  • Arena has no default termination
  • If you dont specify termination, Arena will
    usually keep running forever

56
Viewing the Reports
  • Click Yes in the Arena box at the end of the run
  • Opens up a new reports window (separate from
    model window) inside the Arena window
  • Project Bar shows Reports panel, with different
    reports (each one would be a new window)
  • Remember to close all reports windows before
    future runs
  • Default installation shows Category Overview
    report summarizes many things about the run
  • Reports have page to browse Also, table
    contents tree at left for quick jumps via
  • Times are in Base Time Units for the model

57
Types of Statistics Reported
  • Many output statistics are one of three types
  • Tally avg., max, min of a discrete list of
    numbers
  • Used for discrete-time output processes like
    waiting times in queue, total times in system
  • Time-persistent time-average, max, min of a
    plot of something where the x-axis is continuous
    time
  • Used for continuous-time output processes like
    queue lengths, WIP, server-busy functions (for
    utilizations)
  • Counter accumulated sums of something, usually
    just nose counts of how many times something
    happened
  • Often used to count entities passing through a
    point in the model

58
Homework 2
  • Work as a team of 2.
  • Problem 1 Question C4 from Appendix C
  • Problem 2 Question 3.6
  • Due 9/9/03.
  • Electronic submission
Write a Comment
User Comments (0)
About PowerShow.com