Normal Dist1 - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Normal Dist1

Description:

Continuous Probability Distributions: The Normal Distribution – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 69
Provided by: Penel51
Learn more at: http://people.umass.edu
Category:

less

Transcript and Presenter's Notes

Title: Normal Dist1


1
Continuous Probability Distributions The Normal
Distribution
2
Towards the Meaning of Continuous Probability
Distribution Functions
When we introduced probabilities, we spoke of
discrete events S collection of all possible
sample points ei 0 P(ei) 1 ? Probability
of any event is between zero and
one ?P(ei) 1 ? Probability of all elementary
events sum to 1 (something happens)
3
In particular, for the binomial distribution

  • For the random variable X
  • x stands for a particular value

?
The probability that the random variable X takes
the value x is between 0 and 1, inclusive.


?
The sum of the probabilities over all possible
values of x is 1.
4
A continuous variable has infinitely many
possible values With infinitely many possible
values, the probability of observing any one
particular value is essentially zero Pr(Xx)
0 e.g., for x1.0 vs 1.02 vs
1.0195 vs 1.01947,
Pr(Xx) is meaningless for a continuous random
variable Instead, we consider a range of
values for X Pr(a?X ?b) We can make this range
quite broad or very narrow
5
Comparing Probability Distributions for Discrete
vs Continuous Random Variables We need new
notation to describe probability distributions
for continuous variables.
Discrete
Continuous
List all possible sample points, e.g., Sei,
i1 to k.
State the range of of possible values of X e.g.,
Note ? is the symbol for infinity
6
  • For a continuous Random Variable, X,
  • P(Xx) 0
  • Instead, we compute the probability of X within
    some interval

This function is the probability density function
of X.
Dont worry if you dont know or have forgotten
calculus, I wont be asking you to work with this
notation.
7
  • Much of statistical inference is based upon a
    particular choice of a probability density
    function, fx(x)
  • The Normal distribution.
  • This function is a mathematical model describing
    one particular pattern of variation of values.
  • It is appropriate for continuous variables only.

8
  • Practically speaking, the normal distribution
    function is appropriate for
  • Many phenomena that occur naturally.
  • Special cases of other phenomena. e.g.,
    averages of phenomena that, individually are not
    normally distributed.
  • For example, the sampling distribution of means
    may follow a normal distribution even when the
    underlying data do not.

9
The Normal Probability Density Function
  • Features to note
  • The range of X is ? to ?
  • p is the mathematical constant 3.14159
  • e is the mathematical constant 2.71828

10
The Normal Probability Density Function
  • Features to note
  • m is the mean of the distribution
  • s is the standard deviation of the distribution
  • s2 is the variance
  • (x m)2 the squared deviation from the mean
    appears in the function

11
Notation X N(m,s2) We say X follows a
Normal Distribution with mean m and variance s2
or X is Normally distributed with mean m and
variance s2
12
A Picture of the Normal Distribution
x
The infamous Bell-shaped Curve
13
  • There are infinitely many normal distributions,
    each determined by different values of ? and ?2.
  • The Shape of the Normal Distribution is
    characteristically
  • Smooth
  • Defined everywhere on the real axis
  • Bell-shaped
  • Symmetric about the mean ? (it is defined in
    terms of deviations about the mean)




14
x
The area under the curve represents probability,
and the total area under the curve 1
15
PrX lt x
m


x
-?
The area under the curve up to the value x is
often represented by the notation
16
A Feeling for the Shape of the Normal
distribution ? locates the center, and
? measures the spread
17
  • IF ? alone is changed by adding a constant c,
  • the entire curve is shifted in location
  • but the shape remains the same.

18
  • IF s alone is changed by multiplying by a
    constant c
  • the shape of the bell is changed
  • a larger variance implies a wider spread (or
    flatter curve) the area under the curve is
    always 1



c?
19
Picturing the Normal Probability Density
x
  • As the variance, ?2, increases
  • Bell flattens (gets wide)
  • Values close to the mean are less likely
  • Values farther from the mean more likely.
  • As the variance decreases
  • Bell narrows
  • Most values are close to the mean
  • Values close to the mean are more likely

20
A Very Handy Rough Rule of Thumb If X follows
a Normal Distribution Then 68 of the values
of X are in the interval m ? s
68

m

s
m
-
s
21
If X follows a Normal Distribution Then 95
of the values of X are in the interval m ?
1.96s 99 of the values of X are in the
interval m ? 2.576s
22
  • Why is the Normal Distribution So Important?
  • There are two types of data that follow a normal
    distribution
  • A number of naturally occurring phenomena
  • For example
  • heights of men (or women)
  • total blood cholesterol of adults
  • Special functions of some non-normally
    distributed phenomena, in particular sums and
    averages
  • The sampling distribution of sample means tends
    to be Normal.

23
Research often focuses on sample means Example
Blood pressure can vary with time of day,
stress, food, illness, etc. One reading may not
be a good representation of typical
Distribution of a single reading of blood
pressure for an individual tends to be skewed,
with a few high values
24
To have a better gauge of an individuals BP, we
might use the average of 5 readings
Sampling Distribution of mean of 5 readings for
an individual tends to be Normal, even when
the original distribution is not
25
  • A Feeling for the Central Limit Theorem.
  • Shake a pair of die.
  • On each roll, note the total of the two die
    faces.
  • This total can range from 2 to 12.
  • The most likely total is 7. (Why?)
  • How often do the other totals arise?

Histogram of die totals for n100 trials of
rolling die pair
26
Histogram of die totals for n1000 trials of
rolling die pair
As the sample size n increases the distribution
of the sum of the 2 die begins to look more and
more normal.
27
  • A Statement of the Central Limit Theorem
  • For any population with
  • mean ? and finite variance ?2,
  • the sampling distribution of means, x,
  • from samples of size n from this population,
  • will be approximately normally distributed
  • with mean ?,
  • and variance ?2/n,
  • for n large.
  • That is, for n large, and X ?? (?, ?2)
  • then Xn N (?, ?2/n)

28
  • This is the main reason for our interest in the
    normal distribution
  • regardless of the underlying distribution
  • if we take a large enough sample
  • we can make probability statements about means
    from such samples
  • based upon the normal distribution.
  • This is true, even when the underlying
    distribution is discrete.

29
Example The Central Limit Theorem Works even for
VERY non-normal data A population has only 3
outcomes in it
1
2
9
P(Xx) 1/3

1 2
9 X



m4
1,
2,
9


12
sum of
1,
2,
9
mean of


standard deviation of
1,
2,
9
s3.6

30
Experiment Take sample of size n with
replacement. Compute sum of all n. Repeat Look
at Sampling Distribution of Sums
n25
n50
n100
31
  • To compute probabilities for a normal
    distribution.
  • Recall that we are looking at intervals of values
    of the random variable, X.
  • The probability that X has a value in the
    interval between a and b is the area under the
    curve corresponding to that interval

Note since Pr(Xa) or any exact value is zero,
this can be written as Pr(a?X?b) or Pr(altXltb)
a
b
32
  • The symmetry of the normal distribution can also
    help in computing probabilities.
  • The normal distribution is symmetric about the
    mean µ.
  • This tells us that the probability of a value
    less than the mean is .5 or 50,
  • and the probability of a value greater than the
    mean is also .5 or 50

0.5
0.5
33
The Standard Normal Distribution
The standard normal distribution is just one of
infinitely many possible normal distributions.
It has mean m 0 variance s2 1
?1
?0
By convention we let the letter Z represent a
random variable that is distributed Normally with
m0 and s21 Z N(0,1)


34
  • The standard normal distribution is important for
    several reasons
  • Probabilities of Z within any interval have been
    computed and tabulated.
  • It is possible to look up Pr(a ? Z ? b) for any
    values of a and b in such tables.
  • Any other normal distribution can be transformed
    to a standard normal for computing probabilities.
  • Distances from the mean are equivalent to number
    of standard deviations from the mean.
  • This last is perhaps of greatest interest to us,
    now that software does much of the transformation
    and computation for us.

35
  • Table 3 in the Appendix of Rosner gives areas
    under the normal curve, in 4 different ways
  • Column A gives values between and z, where z
    is a particular value of the standard normal
    distribution.(Note Rosner uses X rather than
    Z)
  • That is, column A gives values for Pr( ?
    Z ? z) Pr(Z ? z)z is also known as a standard
    normal deviate.

PrZ lt z


z
0
-?
36
  • Table 3 in the Appendix of Rosner
  • Column B gives values between z and Pr(z ? Z
    ? ) Pr(z ? Z) Pr(Z ?z)
  • Column C gives values between 0 and z
  • Pr(0 ? Z ? z)
  • Column D gives values between -z and z Pr(-z ?
    Z ? z)

0 z
0 z
-z 0 z
37
  • A probability calculation for any random
    variable, XNormal (?,?2) can be re- expressed as
    an equivalent probability calculation for a
    standard Normal (0,1).This is nice because
  • we have tables for probabilities of the Normal
    (0,1) distribution.
  • We can interpret probabilities in terms of of
    std deviations from the mean
  • Of course, we can also use computer programs to
    compute probabilities for any Normal Distribution
    the program does the translation for us.

38
The Normal (0,1) or Standard Normal
Table. Positive values of z are read from the
first column (under x in Rosner)
The shaded area, which is the probability of Z ?
z, is shown under Col A of the table Pr(Z lt
0.31) .6217
z A B C D 0.0
.5000 .5000 .0 .0 0.01 .5040
.4960 .0040 .0080 0.30 .6179 .3821
.1179 .2358 0.31 .6217 .3783 .1217
.2434
A check that this makes sense any positive value
of z is above the mean, and should have a
probability gt .5
PrZ lt 0.31
z 0.31


0
39
  • Note that only positive values of z are
    tabulated.
  • We can take advantage of a few important features
    of the standard normal, to compute probabilities
    for values of z less than zero
  • Symmetry ? Pr(Z ? -z) Pr(Z ? z)
  • Zero is the median ? Pr(Z ? 0) Pr(Z ? 0)
    .50
  • Total area is 1 ? Pr(Z ? z) Pr(Z ? z) 1

40
For example, we cannot read Pr(Z lt -0.31)
directly from the tables. We can, however use the
property of symmetry
We can read this probability from Col B
Use the property of symmetry to get this.
Pr(Z gt 0.31) .3783
Pr(Z lt- 0.31) .3783


z 0.31
z - 0.31
41
-z 0 z
42
Example Word Problem What is the probability of a
value of Z more than 1 standard deviation below
the mean? Solution Since m 0 and s 1 1
standard deviation below the mean is z m - (1x
s) 0 - 1 -1 Pr(Zlt-1) 0.1587


-1 0
The probability of observing a value more than 1
standard deviation below the mean is .1587, or
just under 16.
43
Example What is the probability Z is between
1.5 and 1.5? We can read this from Column D of
the Table in Rosner Pr-1.50 ? Z ? 1.50 from
the table 0.8664 Example What is the
probability of Z more than 1.5 standard
deviations from the mean in either
direction? Since probabilities sum to 1 Pr Z ?
-1.50 or 1.50 ? Z 1 0.8664 0.1336 By
symmetry, half of this or 0.0668 lies at either
end.
.0668
.0668
-1.50 0 1.50
44
Exercise
Find the area under the standard normal curve
between Z 1 and Z 2
Solution.
It helps to draw pictures!
0 1 2 0
2 0 1
Pr(1ltZlt2) Pr(Zlt2) -
Pr(Zlt1) 0.9772 -
0.8413 0.1359
45
  • Notes on using Standard Normal Tables
  • These come in a variety of formats. The examples
    given here are for the version seen in Rosner,
    Table 3 in the Appendix.
  • Look at the accompanying picture of the
    distribution to be clear what probability is
    listed in the body of the table.
  • Draw a sketch (paper and pencil) when computing
    probabilities it always helps you keep track of
    what you are doing.
  • Minitab provides the same probabilities as Column
    A Pr(Xltx), when Cumulative Probability is
    selected

46
Using Minitab Calc ? Probability Distributions
? Normal
Select for Pr(Zltz) or Pr(Xltx)
Enter value of z (or x)
47
Finding Percentiles of the Normal
Distribution Example What is the 75th
percentile of N(0,1) ?
Solution Again, it helps to draw a picture!
0.75
0 z.75
We want the area under the curve to be 75 --
The value of z we want is the value, below which
75 of values are found. That is, find z.75 so
that Pr(Z lt z.75) .75
48
Use the Inverse Cumulative Option in Minitab
Input desired percentile
Inverse Cumulative Distribution Function Normal
with mean 0 and standard deviation 1.00000
P( X lt x) x 0.7500 0.6745
49
Standardizing a Normal Random Variate From
N(m,s2) to N(0,1)
We can transform any Normal distribution to a
standard normal by means of a simple
transformation
?
50
Standardizing a Normal Random Variate From
N(m,s2) to N(0,1)
Adding a constant For XN(m,s2) ? (Xb) N(?,?)
The mean is shifted over b units, but the
variance or spread of the data is unchanged by
adding a constant (Xb) N(mb, s2)
51
Multiplying by a constant For XN(m,s2) ? (aX)
N(?,?)
a?
am
The mean is adjusted to a times the original
mean, and the variance by a2 times the original
variance this is a shift in scale (aX)
N(am, a2s2)
52
Adding a constant, multiplying by a constant For
XN(m,s2) ? (aXb) N(?,?)
Both adjustments are made The mean is adjusted
to a times the original mean plus b, and the
variance by a2 times the original
variance (aXb) N(amb, a2s2)
53
Now, let a 1/s and b -m/s Then For
XN(m,s2) ? Z N(?,?) Or Z N(0,1)
54
?
  • We have transformed the original scale
  • to units measured in multiples of standard
    deviations
  • centered around zero
  • A value of z-1 means the value of x is 1
    standard deviation below the mean
  • A value of z2.5 means the value of x is 2.5
    standard deviations above the mean

55
This transformation is also important, because if
we want to know Pr(a ? X ? b) Then we can
convert it to an equivalent calculation
56
Word Problem
The profit from the Massachusetts state lottery
on any given week is distributed Normally with
mean 10.0 million and variance 6.25 million
dollars. What is the probability that this weeks
profit is between 8 and 10.5 million? Let X
weekly profit in millions Then X N(m,s2)
where m10 and s26.25 ( ? s2.5 ) What is
Pr(8 ? X ? 10.5) ?
57
What is Pr(8 ? X ? 10.5) ? Translate to Standard
Normal
-.8 .2
58
-.8
.2
Pr(Zlt0.2) Pr(Zlt-.8)


Read from Table 3 or use Minitab or other program
0.5793 0.2119


0.3674
The probability of a weekly profit between 8 and
10.5 million dollars is 36.74.
59
  • Application of the Central Limit Theorem
  • Means of samples of size n
  • from a population with
  • mean m and variance s2
  • follow a normal distribution
  • with mean m and variance s2/n, for n large.
  • That is, for X ?(m, s2)
  • for n large,
  • X N(m, s2/n)

60
Example Consider a population of families with
m3.4 children per family and s24.37. What
percentage of samples of size n4 families will
have means greater than 5 children per
family? Sample means from samples with n4
follow a normal distribution with mx 3.4 and
sx2 s2/n 4.37/4 1.09. Then sx
1.045 We want Pr(Xgt5) , where X N(3.4, 1.09)
61
Pr(z gt 1.53) 0.06
1.53
The probability of observing a sample with a mean
of 5 children per family or larger, when n4 is
about 6.
62
  • So far we have gone from
  • X N(m, s2) ? Z N(0,1)
  • We may be interested in the reverse
  • Z N(0,1) ? X N(m, s2)

63
Example The distribution of IQ scores is normal
with a mean of 100 and a standard deviation of
15. What is the 95th percentile of this
distribution? Step 1 Find the 95th percentile
of the standard normal use Minitab, or another
program to compute Inverse Cumulative
Distribution Function Normal with mean 0 and
standard deviation 1.00000 P( X lt x)
x 0.9500 1.6449 or z.95 1.645
64
Step 2 We know X N(100, 152), and z.95
1.645 x.95 sz.95 m (15)(1.645) 100
124.7 The 95th percentile of the IQ
distribution is 124.7
65
Another Example Taking samples of size n4 from
the population of families with m3.4 children
per family and s24.37 What is the middle 50
of the sampling distribution? That is,
find a and b so the Pr(a ? X ? b) .50 a is the
25th percentile of the sampling distribution of
X b is the 75th percentile of the sampling
distribution of X
50
25
25
a b
66
Use Minitab to find 25th and 75th percentiles of
standard normal Inverse Cumulative Distribution
Function P( X lt x) x 0.2500
-0.6745 0.7500 0.6745 For X N(m,
s2/n) where m3.4 and s2/n1.09, Convert z
back to x x z sx m x.75 .675 (1.045)
3.4 4.11 x.25 -.675 (1.045) 3.4
2.69 ? Pr( 2.69 lt X lt 4.11) .50 50 of
samples of size 4 from this population will have
mean family size between 2.69 and 4.11 children
per family.
67
  • Recap. . . Introduction to the Normal
    Distribution
  • For continuous variables, we speak of a
  • probability density function
  • We calculate the probabilities of intervals of
    values, not individual values
  • The normal distribution is a good description of
  • many naturally occurring phenomena
  • the average of non-normal phenomena
  • This last is particularly important since much
    statistical inference is based on the behavior of
    averages.

68
  • While there are infinitely many normal
    distributions, each determined by ? and ?2,
  • they can all be standardized by using the
    transformation
  • We use the standardized form to compute
    probabilities for any normal distribution.
  • In the standardized form, distance from the mean
    is in units of standard deviation
Write a Comment
User Comments (0)
About PowerShow.com