Statistics or Whats normal about the normal curve, and whats standard about the standard deviation, - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Statistics or Whats normal about the normal curve, and whats standard about the standard deviation,

Description:

Formal answer 1: The binomial distribution I ... Formal answer 1: The binomial distribution II. Why is it called the binomial distribution? ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 37
Provided by: chriswe
Category:

less

Transcript and Presenter's Notes

Title: Statistics or Whats normal about the normal curve, and whats standard about the standard deviation,


1
StatisticsorWhats normal about the normal
curve, and whats standard about the standard
deviation,and what co-relates in a correlation
2
Overview
  • What are statistics?
  • Whats normal about the normal curve?
  • The nature of the confusion
  • One formal answer
  • An intuitive answer (real-time demo)
  • Whats standard about a standard deviation?
  • Z-scores
  • Whats co-relates in a correlation?

3
What are statistics?
  • Recall that a probability is a proportion a
    ratio of the probability of an event to the space
    of all possibilities
  • The sum of the probabilities of all possibilities
    is 1

All possibilities
Possibility of some particular thing
4
What are statistics?
  • In the last few classes we have seen that some
    events can occur in more ways than other events
    they are more more probable
  • For example, it is more probable that we will
    roll a 7 with two dice than that we will roll a 2
    and more probable that we will get 5 heads out
    ten coin flips than that we will get one, because
    there are more ways for it to happen.
  • A better representation of probability space (the
    space of all possible events) than our circle
    might represent the probabilities of certain
    events in a way that makes it more obvious that
    they can happen in more ways than other events
    it might represent the distribution of the
    probabilities of events directly

5
What are statistics?
  • We ended last time considering this probability
    distribution

6
What are statistics?
  • Statistics are methods for making calculations
    about distributions of probabilities
  • In particular, we might ask How likely is some
    set of events (e.g. lt 2 heads)? Is one
    distribution different from another?

P 0.8
P 0.5
7
Whats normal about the normal curve(s)?
  • The normal curve is not a single curve, but a
    class of curves of probability distributions,
    that share properties in common
  • There are a number of ways of mathematically
    defining and estimating the normal distribution
  • The actual definition (which you dont need to
    know) is

8
Whats normal about the normal curve(s)?
  • The main questions I want to address today is
  • What does that math actually mean?
  • Why are so many things normally distributed?
  • What makes sure that those things stay
    distributed normally?
  • What stops other things from being normally
    distributed?

9
What is the normal curve?
  • The normal curve has the following properties
  • It is bell-shaped
  • It is symmetric
  • The total area under the curve is 1 (why?)
  • The normal curve extends indefinitely in both
    directions, getting infinitely close to zero in
    either direction.

10
Normal curve shapes
  • The height of (non-infinite) normal
    approximations simply reflects the most probable
    event, which is a function of the number of
    possible events and of their probability of
    occurrence

10 flips
1000 flips
11
From Wilensky, U., (1997). What is Normal
Anyway? Therapy for Epistemological Anxiety.
Educational Studies in Mathematics. Special Issue
on Computational Environments in Mathematics
Education. Noss R. (Ed.) Volume 33, No. 2. pp.
171-202.
  • U Why do you think height is distributed
    normally?
  • L Come again? (sarcastic)
  • U Why is it that women's height can be graphed
    using a normal curve?
  • L That's a strange question.
  • U Strange?
  • L No one's ever asked me that before.....
    (thinking to herself for a while) I guess there
    are 2 possible theories Either it's just a fact
    about the world, some guy collected a lot of
    height data and noticed that it fell into a
    normal shape.....
  • U Or?
  • L Or maybe it's just a mathematical trick.
  • U A trick? How could it be a trick?

12
  • L Well... Maybe some mathematician somewhere
    just concocted this crazy function, you know, and
    decided to say that height fit it.
  • U You mean...
  • L You know the height data could probably be
    graphed with lots of different functions and the
    normal curve was just applied to it by this one
    guy and now everybody has to use his function.
  • U So youre saying that in the one case, it's a
    fact about the world that height is distributed
    in a certain way, and in the other case, it's a
    fact about our descriptions but not about height?
  • L Yeah.
  • U Well, if you had to commit to one of these
    theories, which would it be?
  • L If I had to choose just one?
  • U Yeah.
  • L I don't know. That's really interesting. Which
    theory do I really believe? I guess I've always
    been uncertain which to believe and it's been
    there in the background you know, but I don't
    know. I guess if I had to choose, if I have to
    choose one, I believe it's a mathematical trick,
    a mathematician's game. ....What possible reason
    could there be for height, ....for nature, to
    follow some weird bizarro function?

13
Formal answer 1 The binomial distribution I
  • The chance of an event of probability p happening
    r times out of n tries
  • P(r) n!/(r! (n - r)!) pr (1 - p) n-r
  • (Recall We wondered about this generalization
    last class.)

14
Formal answer 1 The binomial distribution II
  • Why is it called the binomial distribution?
  • Bi 2 Nom thing
  • the two-thing distribution
  • It can be used wherever
  • 1. Each trial has two possible outcomes (say,
    success and failure or heads and tails)
  • 2. The trials are independent the outcome of
    one trial has no influence over the outcome of
    another trial.
  • 3. The outcomes are mutually exclusive
  • 4. The events are randomly selected

15
Lets try it out (Example 6.3 from our first
probability class)
  • What are the odds of there being exactly one
    seven out of two rolls?
  • one way is to roll 7 first, but not second
  • - the odds of this are 1/6 5/6 (independent
    events) 0.138
  • - the odds of rolling 7 second are 5/6 1/6
    (independent events) 0.138
  • - since these two outcomes are mutually
    exclusive, we can add them to get 0.138 0.138
    0.277

16
The generalization (Example 6.3 from last class)
  • What are the odds of there being exactly one
    seven out of two rolls?

An event of probability p happens r times out of
n tries P(r) n!/(r! (n - r)!) pr (1 - p)
n-r p 1/6 N 2 r 1 2!/(1!1!)1/615/
61 0.277
17
What does this have to do with the normal
distribution?
18
What does this have to do with the normal
distribution?
19
Why does this normal distribution happen?
  • See http//ccl.northwestern.edu/netlogo/
  • for the NetLogo demo shown in class.
  • Can you understand
  • What effect changing the probabilities of each
    event has?
  • What has to change to skew a normal curve?

20
Why are so many things normally distributed?
  • The normal curve is a picture of how randomness
    distributes itself if it is left alone
  • That is Normality arises as a direct consequence
    of randomness
  • This is itself a remarkable and almost holy fact
    deep structure arises from that which is
    unstructured!
  • But why are so many things (especially
    psychometric things) unstructured and therefore
    structured normally?

21
The standard deviation
From http//www.psychstat.smsu.edu/introbook/sbk0
0.htm
  • Given the non-linear shape of the normal
    distribution, one has two choices
  • A.) Keep the amount of variation in each division
    constant, but vary the size of the divisions
  • B.) Keep the size of each division constant, but
    vary the the amount of variation in each division

22
The standard deviation (SD)
  • The definition of SD takes the second approach
    it keeps the size of each division constant, but
    it varies the the amount of variation in each
    division
  • The SD is a measure of average deviation
    (difference) from the mean
  • It is the square root of the variance, which is
    the average squared difference from the mean.
    Why do we square the difference?

23
Z-scores
  • If we express differences by dividing them by
    population SDs, we have z-scores standard units
    of difference from the mean
  • THESE Z-SCORES COME IN EXTREMELY USEFUL IN
    PSYCHOMETRICS!
  • For example, we might want to know
  • If a 12-foot elephant is taller (compared to the
    height of average elephants) than a 230 pound
    man is heavy (compared to weight of average men)
  • If Wayne Gretzky was better hockey player than
    Tiger Woods is a golfer (a prize for the person
    who proves one or the other!)
  • If a person with a WAIS IQ of 140 is rarer (
    less probable) than a person with a GPA of 3.9
  • Etc.

24
Z-scores
  • Remember this Z-scores are just a way of talking
    about distribution of probabilities- that is,
    they are a shorthand way of talking about how
    large a portion of probability space we are
    lopping out of the distribution of probabilities
    represented by our normal curve

25
What co-relates in a correlation?
  • In a correlation, we want to find the equation
    for the (one and only) line (the line of
    regression) which describes the relation between
    variables with the least error.
  • This is done mathematically, but the idea is
    simply that we draw a line such that the squared
    distances on two (or more) dimensions of points
    from the line would not be less for any other line

26
We need first to know What is covariance?
  • Covariance is closely related to variance (which
    is, recall, the average of the squared deviations
    from a mean)
  • The covariance of two features X and Y measures
    their tendency to vary together, i.e., to
    co-vary.
  • It is defined as the average of (differences from
    the mean for X multiplied by the differences from
    the mean for Y)
  • That is the average of the products of the
    deviations from the mean of X and Y
  • In variance (one variables), we square the
    differences of each data point from their mean
  • In covariance (two variables), we multiply the
    difference from one mean by the difference from
    the other mean

27
We need first to know What is covariance?
  • Covariance is the average of the products of the
    deviations from the mean of X and Y
  • Properties
  • If X and Y tend to increase together, then c(X,Y)
    gt 0
  • If one tends to decrease when the other
    increases, then c(X,Y) lt 0
  • If X and Y are independent, then c(X,Y) 0
  • c(X,Y) lt the product of the standard
    deviations of X and Y

28
We need first to know What is covariance?
  • Covariance is the average of the products of the
    deviations from the mean of X and Y
  • Properties
  • If X and Y tend to increase together, then c(X,Y)
    gt 0
  • Why? Because negative distances from the X mean
    will be likely to paired with negative distances
    from the Y mean (so their product will be
    positive) and positive distances from the X mean
    will be likely to paired with positive distances
    from the Y mean (so their product will also be
    positive)

29
We need first to know What is covariance?
  • Covariance is the average of the products of the
    deviations from the mean of X and Y
  • Properties
  • If one tends to decrease when the other
    increases, then c(X,Y) lt 0
  • Why? Because negative distances from the X mean
    will be likely to paired with positive distances
    from the Y mean (so their product will be
    negative) and positive distances from the X mean
    will be likely to paired with negative distances
    from the Y mean (so their product will also be
    negative)

30
We need first to know What is covariance?
  • Covariance is the average of the products of the
    deviations from the mean of X and Y
  • Properties
  • If X and Y are independent, then c(X,Y) 0
  • Why? Because positive distances from the X mean
    will as just as likely to be paired with positive
    and negative distances from the Y mean, and short
    distances from the X mean are as likely to be
    paired with short as long distances from theY
    mean, so their product will be as likely to be
    positive as negative, and as likely large as
    small, and will tend to average out.

31
We need first to know What is covariance?
  • Covariance is the average of the products of the
    deviations from the mean of X and Y
  • Properties
  • c(X,Y) lt the product of the standard
    deviations of X and Y
  • Why? The SD is a measure of average deviation
    (difference) from the mean, so it is a measure of
    how far away from the mean X and Y are
  • If every X is exactly the same distance from its
    means as every Y, c(X,Y) the product of the
    standard deviations of X and Y

32
What is a correlation?
  • r The covariance of x and y / the product of
    the SDs of X and Y
  • It is standardized measure of covariance,
    standardized as a fraction of the total possible
    deviation from the mean
  • When X and Y are related, covariance will close
    to the product of the SDs of X and Y, so R will
    be close to 1.
  • When X and Y are unrelated, the differences from
    the means by item will depart from the average
    differences from the mean c(x,y) lt SD(x) SD(y)

33
Significance tests for correlation
  • It is possible to transform a correlation into a
    t-score, so we calculate how reliable it is (that
    is, test the null hypothesis that r 0)
  • t r / sqrt( 1 - r2)/(N -2) df N - 2
  • Note that, with r constant, t increases as N
    increases
  • What are the implications of this?
  • For on-line calculation http//faculty.vassar.e
    du/lowry/rsig.html

34
Significance tests for correlation
  • It is also possible to calculate how reliable a
    difference between two correlations is, using a
    z-transformation
  • Z ln(r1)/r-1)/2
  • ln Natural logarithm log to the base e,
    where e 2.718281828459, for reasons that need
    not concern us here.

35
Correlation squared
  • The square of a correlation tells us the
    percentage of total variance that is accounted
    for in one variable by the other
  • We will not attempt to understand why in this
    class
  • The distribution is symmetric (for linear
    correlations only)
  • If r 0.1, then 0.01 (1) of the variance in one
    variable is accounted for by the other
  • If r 0.5, then 0.25 (25) of the variance in
    one variable is accounted for by the other

36
Visual help
  • Check out the normal curve and correlation
    real-time demos (as well as many 2-dice
    problems!) at
  • http//noppa5.pc.helsinki.fi/koe/
Write a Comment
User Comments (0)
About PowerShow.com