Data Handling Lecture 2 The normal distribution - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Data Handling Lecture 2 The normal distribution

Description:

How do I know I can trust the user uploaded R packages? ... i.e. some additive some subtractive. The normal appears... Sum all these numbers ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 22
Provided by: Andr864
Category:

less

Transcript and Presenter's Notes

Title: Data Handling Lecture 2 The normal distribution


1
Data HandlingLecture 2The normal distribution
  • Andrew Jackson

2
Computer Lab Recap
  • How do I know I can trust the user uploaded R
    packages?
  • Good ones will have an accompanying peer reviewed
    published paper
  • Always have to be aware of potential bugs
  • Why did we pull random numbers from the binomial
    distribution?

3
The statistical method
  • Think about your data and what question you are
    asking.
  • Form a hypothesis and null hypothesis
  • Decide on the appropriate model
  • Observe some data
  • Can you reject the null hypothesis?

4
1 - Think about our system
  • Repeated coin tosses
  • There is some probability of getting a headP(H)
  • With a corresponding probability of getting a
    tailP(T) 1 P(H)

5
2 - Form your Hypotheses
  • Hypothesis is that the coin is unfair
  • What is the null hypothesis H0?
  • That the coin has 0.5 chance of heads and 0.5
    chance of tails
  • i.e. the coin is fair

6
Decide on the model
  • You just have to know what this is going to be
  • Comes with familiarity and exposure to lots of
    different kinds
  • In this case its a Binomial distribution
  • Observation Binom(p,Ntrials)

7
Observe some data
  • Toss coins like we did 6 times
  • For each observation write down the number of
    heads
  • Or
  • Simulate some data using a computer
  • u lt- rbinom(23,6,0.5)
  • 4 3 1 2 3 4 4 3 4 5 3 2 3 3 6 5 5 1 5 4 4 2 3

8
Accept or Reject H0?
  • Say we get 5 heads
  • What is the probability of getting 5 or more
    heads given the model?
  • P(5) 0.094
  • P(6) 0.016
  • p 0.11 (1-tailed test)
  • p 0.22 (2-tailed test)

9
You can check this in R
  • Use the binomial test (http//en.wikipedia.org/wik
    i/Binomial_test)
  • In R, type ?binom.test to bring up the help files
  • binom.test(x,n,p0.5, alternativec("two.sided",
    "less", "greater"), )
  • For one-sidedbinom.test(5,6,p0.5,alternativegr
    eater)
  • For two-sidedbinom.test(5,6,p0.5,alternativetw
    o.sided)

10
The normal distribution
  • Also known as the Gaussian distribution
  • First introduced by Abraham de Moivre in 1733 to
    approximate the Binomial for very large Ntrials

11
Characteristics
  • Continuous
  • unlike the discrete binomial
  • Can take any value
  • unlike binomial which must be positive integer
    less than Ntrials

12
Where does it come from?
  • Binomial from repeated trials
  • yes/no, head/tail, present/absent, true/false
  • All things being equal we would all be the same
    height
  • But lots of reasons why we are not
  • Some effects will make us taller, some shorter
  • Sum these additive effects and you get a normal
    distribution!

13
Demonstration
  • Draw lots of uniform random numbers between -0.5
    and 0.5
  • i.e. some additive some subtractive

14
The normal appears
  • Sum all these numbers
  • Do this again and again

15
The normal distribution
  • Represents an addition of an infinite number of
    random additive errors that are uniformly
    distributed

16
Getting a p-value
17
Cumulative Distribution Function CDF
  • Obtained by integration
  • Say we 0bserve a value of 2
  • CDF at 2 0.98
  • One-sidedp-value 0.02
  • Two-sidedp-value 0.04

18
Populations and sampling
  • Populations are often very large
  • We cant measure everyone
  • So
  • We have to sample
  • and hope that our samples bear some relation to
    the population!

19
Sampling Error
20
Comparison of Populations Based on Samples
21
Summary
  • Statistical method
  • Form a hypothesis and appropriate null H0
  • Pick an appropriate model
  • Collect data test H0
  • Normal distribution represents random additive
    error
  • Sampling Error
  • Estimating the population values
Write a Comment
User Comments (0)
About PowerShow.com