Normal Distribution - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Normal Distribution

Description:

... if you had those 10,000 ages on slips of paper, and you selected one at random, ... the Normal Curve table just shows half of the curve, that is, from zero ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 28
Provided by: drmarkjk
Category:

less

Transcript and Presenter's Notes

Title: Normal Distribution


1
Normal Distribution
  • HED 489 Biostatistics

2
  • Odds are that you've come across the normal
    distribution below. You may know it by it's more
    common namethe "bell-shaped curve." In case
    you've never seen it before, that's the normal
    distribution on the right. We spend time
    understanding the normal distribution for two
    reasons
  • it forms the basis of probability, and
    probability forms the basis of all statistical
    tests
  • whether the distribution is normal is one of
    theperhaps the primary determinant of which
    family of statistical tests you will apply.

3
Formation of The Normal Curve
  • Assume for the moment that the data in the slide
    to the right represent ages of a bunch of people.
    As you can see, young people are on the left and
    older people on the right. Most of the people are
    between 9 and 11, right? That's the big bunch in
    the middle.

4
           
  • Now, let's say that we draw a line connecting the
    top midpoint of each bar. Here's what that would
    look like

5
  • See the nice straight lines you get? That's
    because, for purposes of this explanation, I've
    created the distribution such that those straight
    lines would result! Now, how about if we smooth
    the lines so they become a nice curve. That's
    what this next slide shows

6
  • See how we have the outline of a bell-shaped
    curve. This would actually happen with these
    data. The last thing that happens when we have a
    normal distribution is that the outline becomes
    filled in with data. That's what this next slide
    shows.

7
  • So, if we strip away the unessential information,
    we're left with the first slide you saw in this
    lesson. Remember what it looked like?

8
Normal Curve Properties
  • All normal distributions share some standard
    properties
  • the mean bisects the distribution exactly, such
    that the two halves of the distribution form a
    mirror image of each other.
  • the standard deviation of a normal distribution
    will always be 1.0
  • the "tails" of the distribution never contact the
    x-axis.

9
Moving Towards Probability
  • As you can see on the slide to the right, we can
    plot the location of various values of the
    standard deviation by adding or subtracting the
    standard deviation (1.0) to or from the mean (0).
    See where positive and negative standard
    deviations fall on this distribution? Normally,
    we don't go beyond 3.0 standard deviations,
    commonly abbreviated "s.d." We'll talk about why
    in just a bit.
  • Remember, if we were measuring age, our s.d.
    would be in increments of years. If we were
    measuring water consumption, our increment might
    be ounces.

10
  • If you collect the ages of, say, 10,000 people,
    and build a frequency distribution, it will
    contain all 10,000 people, right? That goes
    without saying, doesn't it? Well, the same thing
    can be said of the normal distribution. That is,
    all of the data, or 100 of it for a particular
    variablelike agewill be contained within the
    distribution. For any normal distribution, no
    matter what the data are, 99.7 of the data will
    be contained in the space from -3 to 3 s.d. Do
    you see that in the slide on the right? See all
    the blue area between -3 and 3? That's where
    most of the data are. See how little blue there
    is to the left of -3 and to the right of 3?
    There are very few cases there. Combined, in
    fact, only .3 of the data are in those little
    "tails."

11
  • So, if you had those 10,000 ages on slips of
    paper, and you selected one at random, it would
    fall between 3.0 s.d. 99.7 of the time. That is
    to say there is a probability of .997 of
    selecting an age at random that falls between
    3.0 s.d. Probability, folks, is where we're
    going with this. Because the largest area under
    the curve falls between 1.0 s.d., it stands to
    reason that most of the cases will be contained
    in this area, too. As you can see from the slide,
    about 68.2 of the cases (34.1 2) fall between
    1.0 s.d.
  • An additional 14 of the cases are contained
    between 1.0 and 2.0 s.d., and 14 more fall
    between -1.0 and -2.0 s.d. See that? In total,
    95.4 of the cases fall between 2.0 s.d.
  • Because there's not much area under the curve
    between 2.0 and 3.0 s.d., only 2.2 of the cases
    will fall between 2.0 and 3.0 s.d., and between
    -2.0 and -3.0 s.d. Remember we said earlier that
    99.7 of the cases fall between 3.0 s.d.? Well,
    this is how we get to 99.7.
  • If you didn't follow that, go back and study it
    again until you understand it.

12
Taking Another Step
  • This slide summarizes what you just learned. That
    is, about 68 of the cases fall between 1.0
    s.d., about 95 between 2.0 s.d., and about 99
    fall between 3.0 s.d. Now, pay close attention
    it's going to get tricky If about 95 of the
    cases fall between 2.0 s.d., then about 5 will
    be greater than -2.0 or 2.0 s.d. Do you see
    that? If all, or 100 of the cases fall somewhere
    along the curve, and you account for 95 of them
    between 2.0 s.d., then 5 are left. About 2.5
    of the cases fall to the left of 2.0 and another
    2.5 fall to the right of 2.0 s.d.
  • Still with me? Ok, contemplate this if you pick
    a value at random, there is a probability of .05
    that it will be greater than -2.0 or 2.0 s.d.
    Why? Because there's a probability of .95 that it
    will be between 2.0 s.d.

13
Calculating the Z Score
  • Calculating the z score is relatively simple. 
    Just follow the formula.

14
  • You will need to know the mean and the standard
    deviation.  Just punch in the score you need and
    you'll get the z score.  For example, if you
    scored 100 on the test, and the mean is 80 and
    the standard deviation is 10, you'll have the
    formula as such
  • z 100 - 80 / 10 z  20 /10z 2

15
The Normal Distribution So What Does the z Score
mean?
  • It's time for you to open your textbook to the
    inside back cover (to Table A).  Click slide and
    it will appear on-screen (it is also shown on the
    next slide in this program). Now, what this table
    shows is the area under the curve from zero to
    any point along the curve, out to three decimal
    points, to 4.0 s.d.
  • This table is also referred to a table of
    "z-scores." See the "z" in the upper left cell?
    For a normal distribution of data, the standard
    deviation is equal to z. More on that in a
    moment. First, I want to get you comfortable with
    this table.
  • Remember we said that about 34 of the cases fall
    between zero and 1 s.d.? In case you forgot that,
    just look at the slide above and it will remind
    you. And, remember that we said that if we
    selected a case at random from a normal
    distribution of data that the probability is
    about .34 that the value of that case will fall
    between zero and 1.0 s.d.?

16
  • Remember that? Ok, now, look down the left column
    of Table A, either in the book or on the slide at
    the top. Did you get there? Now, move your finger
    one column to the right. That's the one labelled
    ".00" The value in the cell where your finger is
    pointing is .3413, right? That's where I got
    "about" 34 and "about" .34.

17
  • The above formula is used to calculate the z
    score.  What happens if you're interested in some
    s.d. other than 1.0, 2.0, or 3.0? Here's where
    the table really comes in handy! Say you're
    interested in the area under the curve (also
    known as "probability," right?) for a s.d. (or
    z-score, as I want you to begin thinking, as
    well) of 1.96? How do you get there. Well, run
    your finger down the first column until you get
    to 1.9, and then run across until you get to .06.
    1.9 .06 1.96. What value did you find? .4750?
    If so, you did it just right!
  • Note that the Normal Curve table just shows half
    of the curve, that is, from zero to the right.
    That's because the curve is symmetrical, so if
    you want to know the area, or probability, for
    the other half of the curve, just double the
    tabled value. For example, if you wanted to know
    the probability for 1.96, you'd multiply .4750
    time 2. That would give you .9500, and you'd say
    that the probability of selecting a value at
    random that would fall between 1.96 s.d. would
    be .95. Ok? Now, I want to give you some practice
    working with the normal curve so I know that
    you've become comfortable with it.

18
Calculate the percent under the curve between the
mean and a z-score (or s.d.) of -1.39.
  • To do this, use the Normal Curve table. Run your
    finger down the first column until you reach 1.3.
    Run your finger along the 1.3 row until you reach
    the .09 column. The figure in that cell is the
    area under the curve from 0.00 to 1.39. It is
    also the probability of selecting a value at
    random and having it fall between 0.00 and 1.39.
    Since the curve is symmetrical, the value for the
    area 0.00 to 1.39 is the same as for 0.001.39.
    Note this is probability or area under the curve,
    not percentage. Click for table.

19
Calculate the probability of selecting a value at
random greater than a z-score (or s.d.) of 2.01.
  • You should be able to find the area under the
    curve, also know as probability, for a value of
    2.01. Run down the first column until you get to
    2.0. Then across the row to the .01 column.
  • However, consider the curve at the lower right.
    See the figure .4772? That's the area under the
    curve for 0.00 to 2.00. What you want is the area
    beyond, or to the right of 2.01. To get that, you
    have to subtract the probability for 0.00 to 2.01
    from all of the area on the right side of the
    curve. That is, from 0.00 to infinity. What is
    this figure? Can't remember? Look at the Normal
    Curve table.
  • Remember, report probability, not percentage.
    Click to get table.

20
Calculate the probability of selecting a value at
random greater than a z-score of 2.00 (note ).
  • This assignment requires you to consider both
    sides of the normal curve. Like the last
    assignment you need to calculate "what's left"
    under the curve from 0.00 - -2.00 and 0.00 -
    2.00. To do this, you need to determine the
    probability from 0.00 to 2.00 double that value
    to include the left side of the curve and,
    subtract that figure from the probability for the
    entire curve.
  • Click to get table.

21
  • Calculate the z-score equivalents of these
    systolic blood pressure values 100, 120, 130,
    140, and 190, where the mean equals 126.2, and
    the standard deviation equals 18.8.  Click here
    to enter an excel file to do the work.

22
  • Calculate the probability of selecting a blood
    pressure value at random greater than 160, where
    the mean is 126.20 and the standard deviation is
    18.80.
  • To determine this value, you first need to
    calculate the equivalent z-score, as you did in
    the prior assignment. Then, determine the area
    under the curve represented by that z-score.
    Finally, calculate the area beyond, or greater
    than that value. The resulting value represents
    probability. Click here for an excel file to work
    on.

23
  • Calculate the probability of selecting a blood
    pressure value at random between 110 and 135
    where the mean is 126.20 and the s.d is 18.80.
    Click here for an excel file to work on.

24
  • We're about to make the transition from
    descriptive to inferential statistics. The heart
    of inferential statistics is "statistical
    significance." This is the probability that the
    value you ended up with when you completed your
    calculation happened is "real" or happened by
    random chance. Here's an example

25
  • Let's say that you're working with a group of
    people to get their blood pressure down. High
    blood pressure is unhealthy, right? One of the
    things you do is put them on a gentle exercise
    program, both to get their weight down a bit and
    to strengthen their cardiovascular system. All
    things being equal, if your program is
    successful, their blood pressure should come
    down.
  • If you take the individuals' blood pressure at
    the beginning of the study, calculate the mean,
    and take it and average it again at the end, the
    mean blood pressure at the end should be lower
    than at the beginning if your program worked.
    But, how much lower does ending blood pressure
    have to be in order for us to assert with some
    confidence that our program was successful? If
    the beginning group average was 128 and the
    ending was 124, is this difference large enough
    to claim programmatic success? What if the ending
    average was 120? 110? This is the issue central
    to statistical significance is the difference
    between the two values so close to the mean of
    the normal distributionzerothat the probability
    of selecting the value at random is too high to
    accept as "real?" Or, is it so far from the mean
    that it's out in one of the tails of the
    distribution, where the chances of pulling out of
    the hat of values randomly is very small, and,
    therefore, more likely "real?"
  • Here are some ways that a difference in mean
    values before and after your high blood pressure
    occur by chance alone for any particular group
    of people, their blood pressure might go down for
    reasons other than our exercise program. Maybe
    they had a high salt diet when they started and
    cut down on their sodium intake. Maybe they were
    experiencing a lot of stress and they got it
    under control. Or, maybe their blood pressure
    just went down unexpectedly.
  • Conversely, maybe your program actually worked.
    If so, you should be able to select another,
    similar group, conduct the same program, and find
    similar a similar difference between starting and
    ending blood pressure values. Not exactly the
    same, but the difference should be fall in the
    same general area on the normal curve. If you
    conducted this program with 100 similar groups
    and found about the same difference, see how
    you'd be pretty confident that your program
    actually worked? Well, you probably can only run
    it once, so you need that difference value to
    fall a good long way from the mean in order to
    have confidence that your program worked. That's
    what statistical significance does for you.

26
  • Ok assume that the five blood pressure values
    that you worked with for the last assignment, and
    are now plotted nicely on the normal curve to the
    right actually represent the difference in mean
    blood pressure between the beginning and ending
    values for five groups of people. In other words,
    you conducted your study five times.
  • Many researchers use a statistical significance
    level of .05 as their critical level. This means
    that if the value is statistically-significant,
    it will occur by chance alone, less than 5 times
    in 100. This means that if you conducted your
    study 100 times and found a mean difference about
    this large each time, you would be right in
    claiming that your program worked at least 96
    times that's not too bad for this kind of
    program.
  • Another way of looking at the .05 critical value
    is that it has to fall into one of the two tails
    of the normal distribution, and not from that big
    bunch of scores in the middle. Why? Because there
    are lots of scores in the middle, and you have a
    very good chance of selecting one of them by
    chance alone. So, if the mean difference score is
    in the "bulge" of the distribution, the
    probability that it happened by chance alone is
    too great for you to assume your program worked.

27
  • So, on the normal curve, in order for a value to
    be statistically-significant at the 95 level or
    greater, you have to have a z-score of greater
    than 1.96. That is, the value has to come from
    the area to the left of -1.96 or from the area to
    the right of 1.96. This is the only way that you
    can reduce your odds that the value you
    calculated occurred by random chance alone less
    than 5 times in 100.
  • If the value you calculate is statistically-signif
    icant at, let's say the .05 level, you write it
    like this plt.05.
  • "lt" stands for "less than." If it's not
    statistically-significant, you write pgt.05. "gt"
    stands for "greater than." Because I may have
    left you wondering, the reason that we almost
    always consider both sides of the normal curve is
    because our calculated value might be greater or
    less than the mean. For the blood pressure
    example, our program may have failed so badly
    that we actually caused the average ending blood
    pressure to increase! We'll talk a bit more about
    one-tail vs two-tail tests a little later.
  • I accept this may be confusing for you. I assure
    you that we'll do lots of work with probability,
    statistical significance, critical values, and
    the tables in the back of the textbook before
    we're done. I'm pretty sure it will all become
    clear before we're done.
Write a Comment
User Comments (0)
About PowerShow.com