Probability Distributions continued... - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Probability Distributions continued...

Description:

Probability Distributions continued... Module 2b – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 44
Provided by: JamesM334
Category:

less

Transcript and Presenter's Notes

Title: Probability Distributions continued...


1
Probability Distributions continued...
  • Module 2b

2
Hypergeometric Distribution
  • Sampling without replacement - consider a batch
    of 20 microwave modules, of which 3 are
    defective.
  • We sample and test one module first, and find it
    is defective.
  • at this time, there was a 3/20 chance of
    obtaining a defect
  • We sample a second time, without replacing the
    module
  • this time, there is a 2/19 chance of obtaining a
    defect
  • the outcome of the second trial is no longer
    independent of the first trial
  • probability of success/failure changes with each
    trial

3
Hypergeometric Distribution
  • Modeling this situation
  • back away from looking at probability for each
    trial, and return to a counting approach
  • suppose we have a total of N objects in the
    batch, of which d are defective
  • we take samples of n objects, and we want to know
    the probability of x of them being defective
  • there are NCn ways of taking the sample of n
    objects
  • within the sample, there are N-dCn-x ways of
    choosing the n-x non-defective objects
  • within the sample, there are dCx ways of choosing
    the defective objects

4
Hypergeometric Distribution
  • the total number of ways of obtaining a sample
    with x defective objects is N-xCn-d dCx
  • using the counting approach, the probability of
    obtaining x defects is n(E)/n(S), where n(E) is
    the number of outcomes in the event obtain x
    defects
  • hypergeometric probability function

C
C
-
-
x
d
x
n
d
N

x
p
)
(
X
C
n
N
5
Hypergeometric Distribution
  • Example
  • given a batch of 200 dashboard components, of
    which 10 are typically defective
  • we take a sample of 10 components and test
    without replacement
  • what is the probability of 3 defective
    components?
  • probability of 0 defects is 0.34

C
C
-
-
3
20
3
10
20
200


055
.
0
)
3
(
p
X
C
10
200
6
Poisson Distribution
  • used when considering discrete occurrences in a
    continuous interval
  • e.g., of auto accidents in a 100 km stretch of
    road
  • e.g., of breakages in a length of yarn
  • obtained via a Binomial distribution argument, in
    which the number of trials is very large
  • key assumption - independence in the interval
    (think of infinitely many trials) - occurrences
    are statistically independent

7
Poisson Distribution
  • link to Binomial Distribution
  • consider continuous interval divided into
    sub-intervals
  • each sub-interval can be considered as a trial
  • assumption of independence is used here
  • take limit as number of sub-intervals goes to
    infinity, and obtain Poisson distribution

...
8
Poisson Distribution
  • probability function - probability of k
    occurrences in the interval
  • ? - average number of occurrences in the interval
    (parameter)
  • we can also fix the distribution in terms of ? -
    the average number of occurrences per unit
    interval - we then have ? ? t where t is the
    length of the interval

l
-
k
l
e

)
(
k
P
!
k
a
-
t
k
a
)
(
e
t

)
(
k
P
!
k
9
Poisson Distribution
  • Mean
  • we identified ? as the average number of
    occurrences in the interval
  • Variance

l
m




X
E
2
2
l
m
s

-


)
(
X
E
Note - the average number of occurrences in the
interval, ?, can be estimated from observations
10
Poisson Distribution
  • Additional Notes
  • Poisson distribution can be used to approximate
    Binomial distribution, when number of independent
    trials is very large, and p is very small (i.e.,
    np is relatively constant)
  • use ? n p
  • why is the approximation necessary? - if number
    of trials n is 100, we will have a term 100! in
    the Binomial probability function - difficult to
    compute in a calculator
  • e.g., for n gt 20, p lt 0.05 - approximation is
    good
  • e.g., for n gt 100, p lt 0.01 - approximation is
    very good

11
Poisson Distribution - Example
  • Consider a 100 km section of the 401, in which
    the discrete occurrence is an accident. The
    average number of accidents in the 100 km stretch
    (monthly) is 15.
  • What is the probability of
  • a) 0 accidents occurring
  • b) 10 accidents occurring
  • c) 15 accidents occurring
  • in this stretch?

12
Poisson Distribution - Example
  • Note
  • discrete occurrences (accidents) in continuous
    interval (distance)
  • ? 15
  • a) soln -
  • b) soln -
  • c) soln - P(15) 0.1

-
15
0
virtually no chance of no accidents occurring
15
e
-
15
-



7
06
.
3
)
0
(
E
e
P
!
0
-
15
10
15
e


05
.
0
)
10
(
P
!
10
13
Continuous Random Variables
14
Continuous Random Variables
  • take values on the real line
  • e.g., temperature, pressure, composition, density
  • Can we define a probability function for a
    continuous random variable?
  • First pass - follow the discrete case
  • have a function pX(x) that assigns a probability
    that Xx
  • problem - we have infinitely many values of x -
    we cant assign small enough probabilities so
    that they sum to 1 over the entire sample space
  • effectively, P(Xx) 0 - probability of a single
    value is 0
  • (hand-waving - think of the counting approach to
    computing probabilities)

This doesnt work!
15
Probability Density Function
  • Instead, consider a probability density function
    fX(x)
  • Interpretation
  • fX(x) gives us the probability that the values
    lie in an infinitessimally small neighbourhood
    around x - intuitive -not strictly rigorous
  • fX(x) - represents frequency of occurrence, and
    can be considered as the continuous histogram
  • restrictions on fX(x) follow from the
    restrictions on probability - fX(x) gt 0, and


i.e., P(S) 1 - something must happen

1
)
(
dx
x
f
ò
X

-
16
Probability Density Function
  • Example - Normal probability density function

- the familiar bell-shaped curve
2
ö
æ
-
m
)
(
x

ç
-

ç
2
1
s
2
ø
è

)
(
e
x
f
X
s
p
2
17
Cumulative Distribution Function
  • What is P(Xlt?)?
  • (? is some number of interest)
  • e.g., P(Temperaturelt350)
  • the event of interest is those values of
    temperature less than 350 C - sum of
    probabilities of outcomes in this event
  • sum becomes integral in the continuous case

Cumulative Distribution Function (also known as
cumulative density function)
t

t
dx
x
f
F
)
(
)
(
ò
X
X

-
18
Expected Value
  • We can also define the expected value operation
    in a manner analogous to the discrete case
  • weighting is performed by the probability density
    function
  • summation is replaced by integral because were
    dealing with a continuum of values



dx
x
f
x
X
E
)
(


ò
X

-
The mean is EXas in the discrete case.
m

19
Variance
  • is defined using the expected value
  • Standard deviation is the square root of
    variance.


2
2
m
m
-

-
)
(
)
(

)
(
dx
x
f
x
X
E
ò
X

-
2
s

20
Expected Values
  • can be taken of a general function of a random
    variable
  • Note - the expected value is a linear operation
    (as in the discrete case)
  • additivity -- E(X1X2) E(X1) E(X2)
  • scaling -- E(kX) k E(X)



dx
x
f
x
g
X
g
E
)
(
)
(
)
(

ò
X

-
21
Building a library of continuous distributions
  • We will consider -
  • uniform distribution
  • exponential distribution
  • Normal distribution
  • and later, as needed -
  • Students t-distribution
  • Chi-squared distribution
  • F-distribution

These are needed for statistical inference -
decision-making in the presence of uncertainty.
22
Uniform Distribution
  • We have values that occur in an interval
  • e.g., composition between 0 and 2 g/Land they
    the probability is equal (uniform) across the
    interval

We have a rectangular histogram - values occur
with equal frequency over the range.
fX(x)
x
a
b
23
Uniform Distribution
  • What is the probability density function?
  • constant
  • what is the height?
  • Area under the curve must equal 1
  • Area is (b-a) height
  • height 1/(b-a)

1
ì


b
x
a
for
ï
-
a
b

x
f
)
(
í
X
ï
else
everywhere
0
î
24
Uniform Distribution
  • Mean -
  • matches intuition
  • Variance -
  • variance grows as width of interval for uniform
    distribution grows


)
(


dx
x
f
x
X
E
ò
X

-
b

1
b
a


dx
x
ò
-
2
a
b
a
2
-
)
(
a
b
2
2

-

m
s

)
(
X
E
X
12
25
Uniform distribution
  • What might we model with such a distribution?
  • Example - instrument readout to the nearest
    integer
  • pressure gauge
  • if we are provided only with the nearest integer,
    the true pressure could be 0.5 below reading, or
    0.5 above
  • in absence of any additional information, we
    assume that values are distributed uniformly
    between these two limits
  • additional example - numerical roundoff in
    computations

26
Normal Distribution
  • arguably one of the most important distributions
  • probability density function

parameterized by the mean and variance - written
in a specific form
ö
æ
2
-
m
)
(
x

ç
-

ç
2
1
s
2
ø
è

)
(
e
x
f
X
s
p
2
27
Normal Distribution
  • is symmetric
  • centre is at the mean
  • variance - standard deviation - measure of width
    (dispersion) of the distribution
  • Cumulative distribution function
  • integral has no analytical (closed-form) solution
    - must rely on tables or numerical computation

ö
æ
2
-
m
x
)
(

ç
-
t
t

ç
2
1
s
2
ø
è


t
dx
e
dx
x
f
F
)
(
)
(
ò
ò
X
X
s
p
2

-

-
28
Standard Normal Distribution
  • Problem
  • cumulative distributions must be computed
    numerically - summarize in table form
  • cant have table for each possible value of mean,
    standard deviation
  • Solution
  • consider a new random variable
  • where X is normally distributed with mean and
    standard deviation

m
-
X
X

Z
s
X
s
m
,
X
X
29
Standard Normal Distribution
  • mean of Z
  • variance of Z
  • Standard Normal Distribution
  • scaling and centering to produce zero mean, unit
    variance

-
-
m
m


X
E
X
X
X



0




E
Z
E
s
s
X
X
2
2
ö
æ
-
s
m
X
2
X
X



-
m

ç
1



)
(
E
Z
E
Z
2
s
s
ø
è
X
X
30
Standard Normal Distribution
  • Values are available from tables - cumulative
    distribution values

31
Using the Standard Normal Tables
  • What is P(Z lt 1.96)?
  • What is P(Z lt -1.96)?
  • What is P(-1.96 lt Z lt 1.96)?

Interpretation
32
Central Limit Theorem
  • why the Normal distribution is so important
  • given N independent random variables, each having
    the same distribution with mean ? and variance ?2
    , then
  • the sum of the N random variables follows a
    Normal distributionAND
  • for , we havewhere Z
    is the standard Normal distribution

N
-
m
1
X


å
X
X
Z
lim
i
s
N
N
/


N

i
1
33
Central Limit Theorem - Consequences
  • in many instances, the Normal distribution
    provides a reasonable approximation for
    quantities that are a sum of independent random
    variables
  • e.g., Normal approximation to Binomial
  • e.g., Normal approximation to Poisson
  • many quantities measured physically tend to a
    Normal distribution

34
Failures in Time
  • we have an important pump on a recirculation line
  • the packing fails on average 0.6 times/year
  • what is the probability of the pump packing
    failing before 1 year?
  • What is probability that the time to failure is
    less than 1 year?

35
Exponential Distribution
  • Events occur in time at an average rate ? per
    unit time
  • What is the probability that the time to the
    event occurring is less than a given time t?
  • Approach -
  • similar to Poisson problem - think in terms of
    small time increments - independent trials
  • P(event occurs before a given time) 1 - P(event
    doesnt occur in given time)

36
Exponential Distribution
  • event doesnt occur in a given time ? 0
    occurrences
  • Poisson - with occurrence rate of ?t in interval
    t
  • P(event occurs before this time)

l
-
t
0
l
e
t
)
(
l
-
t


e
occurences
P
)
0
(
!
0
l
-
t
-
e
1
37
Exponential Distribution
  • Denote X as time to occurrence.
  • Cumulative distribution
  • function
  • Density function -
  • distributions are parameterized by ? - average
    number of occurrences per unit time
  • can also parameterize in terms of ? - mean time
    to failure - then

l
-
t
-

lt

e
t
X
P
t
F
1
)
(
)
(
X
l
-
t
l

e
t
f
)
(
X
q
l

/
1
38
Pump Failure Problem
  • packing fails on average 0.6 times / year
  • P(pump fails within year)
  • 45 chance of failure within year

-
)
1
(
6
.
0

-

lt
45
.
0
1
)
1
(
e
X
P
39
Exponential Distribution - Notes
  • the exponential random variable is a continuous
    random variable - time to occurrence takes on a
    continuum of values
  • development assumes failure rate is constant
  • development assumes that failures are
    independent, and that each time increment is an
    independent trial - cf. Poisson distribution
  • mean and variance

1
m




X
E
l
1
2
2
m
s

-


)
(
X
E
2
l
40
Exponential Distribution
  • Problem Variations -
  • given mean time to failure, determine probability
    that time to failure is less than a given value
  • given fraction of components failing in a
    specified time, what is probability that time to
    failure is less than a given value?
  • what is probability that a component lasts at
    least a given time?

41
Exponential Distribution
  • Memoryless Property
  • given that a component has operated for 100
    hours, what is the probability that it operates
    for at least 200 hours before failing, i.e.,
    P(Xgt200 Xgt 100) ?
  • consider A Xgt100, B Xgt200
  • A ? B B
  • recall conditional probability
  • for our events, we have

Ç
)
(
B
A
P

)

(
A
B
P
)
(
A
P
Ç
)
(
)
(
B
P
B
A
P


)

(
A
B
P
)
(
)
(
A
P
A
P
42
Exponential Distribution
  • Memoryless property
  • probability of individual events
  • for our events,

l
-
100
-
-

lt
-

gt
)
1
(
1
)
100
(
1
)
100
(
e
X
P
X
P
l
-
100

e
l
-
200

gt
)
200
(
e
X
P
l
-
200
Ç
)
(
)
(
e
B
P
B
A
P
l
-
100




)

(
e
A
B
P
l
-
100
)
(
)
(
A
P
A
P
e
43
Exponential Distribution
  • Memoryless property
  • interpretation - probability of component lasting
    for another 100 hours given that it has
    functioned for 100 hours is simply probability of
    it lasting 100 hours
  • prior history, in form of conditional
    probability, has no influence on probability of
    failure
  • consequence of form of distribution which results
    in part from assumption of independence of time
    slices - note that A and B are NOT independent
  • general result for exponential random variable

gt

gt

gt
)
(
)

(
b
X
P
a
X
b
a
X
P
Write a Comment
User Comments (0)
About PowerShow.com