Normal and Poisson Distributions - PowerPoint PPT Presentation

About This Presentation

Normal and Poisson Distributions


Normal and Poisson Distributions GTECH 201 Lecture 14 Sampling Population The entire group of objects about which information is sought Unit Any individual member of ... – PowerPoint PPT presentation

Number of Views:303
Avg rating:3.0/5.0
Slides: 48
Provided by: lax3


Transcript and Presenter's Notes

Title: Normal and Poisson Distributions

Normal and Poisson Distributions
  • GTECH 201
  • Lecture 14

  • Population
  • The entire group of objects about which
    information is sought
  • Unit
  • Any individual member of the population
  • Sample
  • A part or a subset of the population used to gain
    information about the whole
  • Sampling Frame
  • The list of units from which the sample is chosen

Simple Random Sampling
  • A simple random sample of size n is a sample of n
    units chosen in such a way that every collection
    of n units from a sampling frame has the same
    chance of being chosen

Random Sampling in R
  • In R you can simulate random draws
  • For example, to pick five numbers at random from
    the set 140, you cangt sample(140,5)1 4
    30 28 40 13

Sampling with Replacement
  • Default in R is without replacementsample(c
    ("H", "T"), 10, replaceT)1 "T" "T" "T" "T"
    "T" "H" "H" "T" "H" "Tprobc(.9,.1)
  • sample(c("S", "F"), 10, replaceT, prob)

Random Number Tables
  • A table of random digits is
  • A list of 10 digits 0 through 9 having the
    following properties
  • The digit in any position in the list has the
    same chance of being any of of 0 through 9
  • The digits in different positions are
    independent, in that the value of one has no
    influence on the value of any other
  • Any pair of digits has the same chance of being
    any of the 100 possible pairs, i.e., 00,01,02,
    ..98, 99
  • Any triple of digits has the same chance of being
    any of the 1000 possible triples, i.e., 000, 001,
    002, 998, 999

Using Random Number Tables
  • A health inspector must select a SRS of size 5
    from 100 containers of ice cream to check for E.
    coli contamination
  • The task is to draw a set of units from the
    sampling frame
  • Assign a number to each individual
  • Label the containers 00, 01,02,99
  • Enter table and read across any line
  • 81486 69487 60513 09297
  • 81, 48, 66, 94, 87, 60, 51, 30, 92, 97

Random Number Generation in R
  • gt rnorm(10)
  • gt rnorm(10, mean7, sd5)
  • gt rbinom(10, size20, prob.5)
  • We will revisit the meaning of the parameters at
    the end of todays session

Combinatorics 1
  • Back to draw five out of 40sample(140,5)
  • The probability for any given number is 1/40 in
    the first sample,, 1/39 in the second, and so on
  • ? P(x ) 1/(403938373635)
  • gt 1/prod(4036)1 1.266449e-08
  • But

Combinatorics 2
  • We dont care about the order of the five numbers
    out of 40
  • There are 54321 combinations for the five
    drawn numbers
  • ? gt prod(15) / prod(4036)1 1.519738e-06
  • Shorthand for the above in gt 1/choose(40,5)

Binomial Distribution
  • Discrete probability distribution
  • Events have only 2 possible outcomes
  • binary, yes-no, presence-absence
  • Computing probability of multiple events or
  • Examples
  • Probability that x number of people are alive at
    the age of 65
  • Probability of a river reaching flood stage for
    three consecutive years

When to Apply Binomial
  • If sample is less than 10 of a large population
    in which a proportion p have acharacteristic of
    interest, then the distribution X, the number in
    the sample with that characteristic, is
    approximately binomial (n, p), where n is the
    sample size

Geometric Distribution
  • Tossing a biased coin until the first head
    appears pr(H) p
  • pr(X x) pr(TTT H) pr(T1 n T2 n ..n Hx)
    (1 p)x-1 p

The geometric distribution is the distributionof
the number of tosses of a biased coin up toand
including the first head
Poisson Distribution
  • Discrete probability distribution
  • Named in honor of Simeon Poisson (1781-1840)
  • What is it used for?
  • To model the frequency with which a specified
    event occurs over a period of time
  • The specified event occurs randomly
  • Independent of past or future occurrences
  • Geographers also use this distribution to model
    how frequently an event occurs across a
    particular area
  • We can also examine a data set (of frequency
    counts in order to determine whether a random
    distribution exists

Poisson Distribution is used
  • To analyze the number of patients arriving at a
    hospital emergency room between 6 AM and 7 AM on
    a particular day
  • Obvious implications for resource allocation
  • To analyze the number of phone calls per day
    arriving at a telephone switchboard
  • To analyze the number of cars using the drive
    through window at a fast-food restaurant
  • To analyze hailstorm occurrence in one Canadian

The Poisson Probability Formula
Lambda (?) is a positive real number (mean
frequency) e 2.718 (mathematical constant) X
0, 1, 2, 3, .(frequency of an occurrence) X! X
Example - 1
  • General Hospital, located in Phoenix, keeps
    records of emergency room traffic. From these
    records, we find that the number of patients
    arriving between 10 AM and 12 Noon has a Poisson
    distribution of with parameter
  • ? 6.9
  • Determine the probability that, on any given
    day, the number of patients arriving at that
    emergency room between 10 AM and 12 Noon will be
  • Exactly four
  • At the most two

Exactly four arrivals, x4
At the most, two arrivals
Revisiting Mean and Standard Deviation
Standard Dev.
What if
  • We wanted to obtain a table of probabilities for
    the random variable X, the number of patients
    arriving between 10AM and 12 Noon?

Discrete versus Continuous Distributions
  • Moving from individual probabilities to total
    number of successes or failures
  • Probability distribution f (x ) P (Xx) for
    discrete events
  • Probability distribution for continuous events

Expected Values
  • Population standard deviationsquare root of the
    average squared distance of X from the mean m

Expected Values
  • Mean and Poisson distribution
  • It can be shown that this adds to l. Thus, for
    Poisson-distributed populations E(X) l
  • The standard deviation sd(X) for Poisson(l) is vl

Probability Density Functions
  • Moving from the discrete to the continuous
  • Increasing the frequency of observations results
    in an ever finer histogram
  • Total area under the curve 1

Probability Density Functions
  • Population means and standard devs
  • mx balances the distribution
  • The standard deviation is calculated as for
    discrete density functions

The Normal Distribution
Properties of a Normal Distribution
  • Continuous Probability Distribution
  • Symmetrical about a central point
  • No skewness
  • Central point in this dataset corresponds to all
    three measures of central tendency
  • Also called a Bell Curve
  • If we accept or assume that our data is normally
    distributed, then,
  • We can compute the probability of different


Properties of a Normal Distribution
  • Using the symmetrical property of the
    distribution, we can conclude
  • 50 of values must lie to the right, i.e. they
    are greater than the mean
  • 50 of values must lie to the left, i.e.
  • If the data is normally distributed, the
    probability values are also normally distributed
  • The total area under the normal curve represents
    all (100) of probable outcomes
  • What can you say about data values in a normally
    distributed data set?

Normal Distribution and Standard Deviations
Approximating a Normal Distribution
  • In reality,
  • If a variables distribution is shaped roughly
    like a normal curve,
  • Then the variable approximates a normal
  • Normal Distribution is determined by
  • Mean
  • Standard Deviation
  • These measures are considered parameters of a
    Normal Distribution / Normal Curve

Equation of a Normal Curve
Areas Within the Normal Curve
  • For a normally distributed variable,
  • the percentage of all possible observations that
    lie within any specified range equals the
    corresponding area under its associated normal
    curve expressed as a percentage
  • A college has an enrollment of 3264 female
    students. Mean height is 64.4 inches, standard
    deviation is 2.4 inches
  • Frequency and relative frequency are presented

Frequency and Relative Frequency Table
0.0735, i.e. 7.35 of the students are between
67 and 68 inches tall
Relative Frequency Histogram with Normal Curve
0.0735 the area that has been cross-hatched
Shaded area under the normal curve approximates
the percentage of students who are between 67-68
inches tall
Standardizing a Normal Variable
  • Once we have mean and standard deviation of a
    curve, we know its distribution and the
    associated normal curve
  • Percentages for a normally distributed variable
    are equal to the areas under the associated
    normal curve
  • There could be hundreds of different normal
    curves (one for each choice of mean or std. dev.
  • How can we find the areas under a standard normal
  • A normally distributed variable with a mean of 0
    and a standard deviation of 1 is said to have a
    standard normal distribution

Z Score
The variable z is called the standardized version
of x, or the standardized variable corresponding
to x, with the mean 0 and standard deviation 1
Almost all observations in a dataset will lie
within three standard deviations to either side
of the mean, i.e., almost all possible
observations will have z scores between 3 and
Normal Curve Properties
  • The total area under the standard normal curve is
    equal to 1
  • The standard normal curve extends infinitely in
    both directions, approaching but never touching
    the horizontal axis
  • Standard normal curve is symmetric about 0
  • Most of the area under a standard normal curve
    lies between 3 and 3

Using the Standard Normal Table
  • The times taken for runners to complete a local
    10 km race is normally distributed with a mean of
    61 minutes and a standard deviation of 9 minutes.
    Let x be the finish time of a randomly selected
    runner. Find the probability that x gt 75 minutes
  • Step 1
  • Calculate the standard score
  • z 75-61/9 z 1.56
  • Step 2
  • Determine the probability from the normal
  • For z of 1.56, p 0.4406
  • Step 3
  • Interpret the result
  • p (xgt75) 0.5 0.446
  • 0.054 or 5.4 chance

Using the Standard Normal Table
  • In the previous example, what is the probability
    that someone finishes in less than 45 minutes?
  • Step 1
  • Calculate the standard score
  • z 45-61/9 z -1.78
  • Step 2
  • Determine the probability from the normal
  • For z of -1.78, area 0.4625
  • Step 3
  • Interpret the result
  • p (xlt45) 1- (0.50.4625)
  • 0.038 or 3.8 of the runners finish in
    less than 45 minutes

Three Distributions
Normal Approximations for Discrete Distributions
  • Approximation of the Binomial
  • Binomial is used for large n and small p
  • If p is moderate (not close to 0 or 1), then the
    Binomial can be approximated by the normal
  • Rule of thumb np (1-p) 10
  • Other normal approximations
  • If X Poisson(l), normal works well for l 10

Built-in Distributions in
  • Four fundamental items can be calculated for a
    statistical distribution
  • Density or point probability
  • Cumulated probability, distribution function
  • Quantiles
  • Pseudo-random numbers
  • In there are functions for each of these

Density of a Normal Distribution
  • gt x seq(-4, 4, 0.1)
  • gt plot (x, dnorm(x), type"l")

For Discrete Distributions..
  • gt x 050
  • gt plot (x, dbinom(x, size50, prob.33, type"h")

Cumulative Distribution Functions
  • Could be graphed but is not very informative
  • Example
  • Blood sugar concentration in the US population
    has a mean of 132 and a standard deviation of
    13.How special is a patient with a value 160?
  • 1 pnorm(160, mean132, sd13)1 0.01562612
    or 1.5

Random Number Generation in R
  • gt rnorm(10)
  • gt rnorm(10, mean7, sd5)
  • gt rbinom(10, size20, prob.5)
  • Now you understand the parameters
Write a Comment
User Comments (0)