Topic 4: Discrete Random Variables and Probability Distributions - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Topic 4: Discrete Random Variables and Probability Distributions

Description:

Title: Week 1: Descriptive Statistics Author: aregan Last modified by: Amelia Regan Created Date: 8/28/1999 4:53:00 AM Document presentation format – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 40
Provided by: areg7
Category:

less

Transcript and Presenter's Notes

Title: Topic 4: Discrete Random Variables and Probability Distributions


1
Topic 4 Discrete Random Variables and
Probability Distributions
  • CEE 11 Spring 2002
  • Dr. Amelia Regan

These notes draw liberally from the class text,
Probability and Statistics for Engineering and
the Sciences by Jay L. Devore, Duxbury 1995 (4th
edition)
2
Definition
  • For a given sample space S of some experiment, A
    random variable is any rule that associates a
    number with each outcome in S
  • Random variables may take on finite or infinite
    values
  • Examples 0,1, the number of sequential tosses
    of a coin in which the outcome head is observed
  • A set is discrete either if it consists of a
    finite number of elements or if its elements may
    be listed in sequence so that there is a first
    element, a second element, a third element, and
    so on, in the list
  • A random variable is said to be discrete if its
    set of possible values is a discrete set.

3
Definition
  • The probability distribution or probability mass
    function (pmf) of a discrete random variable is
    defined for every number x by
  • The cumulative distribution function (cdf) F(x)
    of a discrete rv X with pmf p(X) is defined for
    every number x by

4
Examples
5
Definition
  • Let X be a discrete rv with set of possible
    values D and pmf p(x). The expected value or
    mean value of X, denoted E(x) or mx is given by
  • If the rv X has a set of possible values D and
    pmf p(x), then the expected value of any function
    h(X), denoted by Eh(x) or mh(x) is computed by
  • Note according to Ross(1988) p. 255 this is
    known as the law of the unconscious statistician

6
Example
  • Let X be a discrete rv with the following pmf and
    corresponding cdf
  • Now let h(x) x22

7
Class exercise
  • Let X be a discrete rv with the following pmf
  • Calculate the cdf of X
  • Calculate the E(X)
  • Now let h(x) 3x3-100 -- Calculate h(x)

8
Definition
  • For linear functions of x we have simple rules

9
Variance of a random variable
  • If X is a random variable with mean m, then the
    variance of X denoted by Var(X) is defined by
  • Recall that we previously defined the variance of
    a population as the average of the squared
    deviations from the mean. The expected value is
    nothing other than the average or mean so this
    form corresponds exactly to the one we used
    earlier.

10
Variance of a random variable
  • Its often convenient to use the a different form
    of the variance which, applying the rules of
    expected value which we just learned and
    remembering the EX m, we derive in the
    following way.

11
The Bernoulli distribution
  • Any random variable whose only two possible
    values are 0 and 1 is called a Bernoulli random
    variable.
  • Example Suppose a set of buildings are examined
    in a western city for compliance with new
    stricter earthquake engineering specifications.
    After 25 of the cities buildings are examined at
    random and 12 are found to be out of code while
    88 are found to conform to the new
    specifications it is supposed that buildings in
    the region have a 12 likely hood of being out of
    code.
  • Let X 1 if the next randomly selected building
    is within code and X 0 otherwise.
  • The distribution of buildings in and out of code
    is a Bernoulli random variable with parameters p
    0.88 and 0.12.

12
The Bernoulli distribution
  • We write this as follows

13
The Bernoulli distribution
  • In general form, a is the parameter for the
    Bernoulli distribution but we usually refer to
    this parameter as p, the probability of success
  • The mean of the Bernoulli distribution with
    parameter p is
  • The variance of the Bernoulli distribution with
    parameter p is

14
The Binomial distribution
  • Now consider a random variable that is made up of
    successive (independent) Bernoulli trials and
    define the random variable X as the number of
    successes among n trials.
  • The Binomial distribution has the following
    probability mass function
  • Remembering what we learned about combinations
    this makes intuitive sense. The binomial
    coefficient represents the number of ways to
    distribution the x successes among n trials. px
    represents the probability that we have x
    successes in the n trials, while (1-p)(n-x)
    represents the probability that we have n-x
    failures in the n trials.

15
The Binomial distribution
  • Computing the mean and the variance of the
    binomial distribution is straightforward. First
    remember that the binomial random variable is the
    sum of the number of successes in n consecutive
    Bernoulli trials. Therefore

16
The Binomial distribution
17
The Binomial distribution
18
The Binomial distribution
19
Class exercise
  • A software engineer has historically delivered
    completed code to clients on schedule, 40 of the
    time. If her performance record continues, the
    probability of the number of on schedule
    completions in the next 6 jobs can be described
    by the binomial distribution.
  • Calculate the probability that exactly four jobs
    will be completed on schedule
  • Calculate the probability that at most 5 jobs
    will be completed on schedule
  • Calculate the probability that at least two jobs
    will be completed on schedule
  • Calculate the probability that at most 5 jobs
    will be completed on schedule

20
The Hypergeometric distribution
  • The binomial distribution is made up of
    independent trials in which the probability of
    success does not change.
  • Another distribution in which the random variable
    X represents the number of successes among n
    trials is the hypergeometric distribution.
  • The hypergeometric distribution assumes a fixed
    population in which the proportion or number of
    successes is known.
  • We think of the hypergeometric distribution as
    involving trials without replacement and the
    binomial distribution involving trials with
    replacement.
  • The classic illustration of the differences
    between the hypergeometric and the binomial
    distribution is that of black and white balls in
    a urn. Assume the proportion of black balls is
    p. The distribution of the number of black balls
    selected in n trials is binomial(xnp) if put
    the balls back in the urn after selection and
    hypergeometric(xnM,N) if we set them aside
    after selection. (engineering examples to come)

21
The Hypergeometric distribution
  • The probability mass function of the
    hypergeometric distribution is given by the
    following, where M is the number of possible
    successes in the population, N is the total size
    of the population, x is number of successes in
    the n trials.

22
The Hypergeometric distribution
  • Example Suppose out of 100 bridges in a region,
    30 have been recently retrofitted to be more
    secure during earthquakes. Ten bridges are
    selected randomly from the 100 for inspection.
    What is the probability that at least three of
    these bridges will be retrofitted?

23
The Hypergeometric distribution
  • The mean and variance of the hypergeometric
    distribution is given below

24
The Hypergeometric and the Binomial distributions
  • Note the following
  • Sometimes we refer to M/N, the proportion of the
    population with a particular characteristic as p.
  • This is very close to EX np and Var(x)
    pn(1-p) which are the mean and variance of the
    Binomial distribution. In fact we can think of
  • as a correction term which accounts for the fact
    that in the hypergeometric distribution we sample
    without replacement.
  • Question Should the variance of the
    Hypergeometric distribution with proportion p be
    greater than or less than the variance of the
    Binomial with parameter p? Can you give some
    intuitive reason why this is so?

25
The Hypergeometric and the Binomial distributions
  • Example Binomial, p 0.40, three trials vs.
  • Hypergeometric with M/N 2/5
    0.40, three trials

26
The Hypergeometric and the Binomial distributions
  • If the ratio of the n, the number of trials is
    small and N, the number in the population is
    large then we can approximate the hypergeometric
    distribution with the binomial distribution in
    which p M/N.
  • This used to be very important. As recently as
    five years ago computers and hand calculators
    could not calculate large factorials. The
    calculator I use can not calculate 70!. A few
    years ago 50! was out of the question.
  • Despite improvements in calculators its still
    important to know that if the ratio of n to N is
    less than 5 (we are sampling 5 of the
    population) then we can approximate the
    hypergeometric distribution with the binomial.
  • Check your calculators now -- what is the maximum
    factorial they can handle?

27
Class exercise
  • The system administrator in charge of a campus
    computer lab has identified 9 machines out of 80
    with defective motherboards.
  • While he is on vacation the lab is moved to a
    different room, in order to make room for
    graduate student offices.
  • The administrator kept notes about which machines
    were bad, based on their location in the old lab.
    During the move the computers were places on
    their new desks in a random fashion so all of the
    machines must be checked again.
  • If the administrator checks three of the machines
    for defects, what is the probability that one of
    the three will be defective?
  • Calculate using both the hypergeometric
    distribution and the binomial approximation to
    the hypergeometric.

28
The Geometric distribution
  • The geometric distribution refers to the random
    variable represented by the number of consecutive
    Bernoulli trials until a success is achieved.
  • Suppose that independent trials, each having
    probability p,
  • 0 lt p lt 1, of being a success are performed until
    a success occurs. If we let X equal the number
    of trials prior to the success, then
  • The above definition is the one used in our
    textbook. A more common definition is the
    following Let X equal the number of trials
    including the last trial, which by definition is
    a success. Then we get the following

29
The geometric distribution
  • The expected value and variance of the geometric
    distribution is given for the first form by
  • For the second form, the expected value has more
    intuitive appeal. Can you convince yourself that
    the value is correct?
  • Please note that the variance is the same in both
    cases.
  • Explain why this is so.

30
The geometric distribution
  • We can derive E(X) in the following way

31
The geometric distribution
These steps are so that we can work with the
geometric series 1xx2x31/(1-x)
so p(1qq2)p(1/1-q) Here we just substitute
p for 1-q
32
The geometric distribution
Try deriving the variance of the geometric
distribution by finding E(X2)
33
Poisson Distribution
  • One of the most useful distributions for many
    branches of engineering is the Poisson
    Distribution
  • The Poisson distribution is often used to model
    the number of occurrences during a given time
    interval or within a specified region.
  • The time interval involved can have a variety of
    lengths, e.g., a second, minute, hour, day, year,
    and multiples thereof.
  • Poisson processes may be temporal or spatial.
    The region in question can also be a line
    segment, an area, a volume, or some n-dimensional
    space.
  • Poisson processes or experiments have the
    following characteristics

34
Poisson Distribution
  • 1. The number of outcomes occurring in any given
    time interval or region is independent of the
    number of outcomes occurring in any other
    disjoint time interval or region.
  • 2. The probability of a single outcome occurring
    in a very short time interval or very small
    region is proportional to the length of the time
    interval or the size of the region. This value is
    not affected by the number of outcomes occurring
    outside this particular time interval or region.
  • 3. The probability of more than one outcome
    occurring in a very short time interval or very
    small region is negligible.
  • Taken together, the first two characteristics are
    known as the memoryless property of Poisson
    processes.
  • Transportation engineers often assume that the
    number of vehicles passing by a particular point
    on a road is approximately Poisson distributed.
  • Do you think that this model is more appropriate
    for a rural hiqhway or a city street?

35
Poisson Distribution
  • The pdf of the Poisson distribution is the
    following
  • The parameter l is equal to a t, where a is the
    intensity of the process the average number of
    events per time unit and t is the number of time
    units in question. In a spatial Poisson process
    a represents the average number of events per
    unit of space and t represents the number of
    spatial units in question.
  • For example, the number of vehicles crossing a
    bridge in a rural area might be modeled as a
    Poisson process. If the average number of
    vehicles per hour, during the hours of 1000 AM
    to 300 PM is 20 we might be interested in the
    probability that fewer than three vehicles cross
    on from 1230 to 1245 PM. In this case l (20
    per hour)(0.25hours) 5.
  • The expected value and the variance of the
    Poisson distribution are identical and are equal
    to l.

36
Class exercise
  • An urban planner believes that the number of gas
    stations in an urban area is approximately
    Poisson distributed with parameter a 3 per
    square mile. Lets assume she is correct in her
    assumption.
  • Calculate the expected number of gas stations in
    a four square mile region of the urban area as
    well as the variance of this number.
  • Calculate the probability that this region of
    four square miles has less than six gas stations.
  • Calculate the probability that in four adjacent
    regions of one square mile each, that at least
    two of the four regions contains more than three
    gas stations.
  • Do you think the situation is accurately modeled
    by a Poisson process?
  • Why or why not?

37
Some random variables that typically obey the
Poisson probability law (Ross, p.130)
  • The number of misprints on a page (or group of
    pages) of a book
  • The number of people in a community living to 100
    years of age
  • The number of wrong telephone numbers that are
    dialed in a day
  • The number of packages of dog biscuits sold in a
    particular store each day
  • The number of customers entering a post office
    (bank, store) in a give time period
  • The number of vacancies occurring during a year
    in the supreme court
  • The number of particles discharged in a fixed
    period of time from some radioactive material
  • WHAT ABOUT ENGINEERING?
  • Poisson processes are the heart of queueing
    theory --which is one of the most important
    topics in transportation and logistics. Lots of
    other applications too -- water, structures,
    geotech etc.

38
The Poisson Distribution as an approximation to
the Binomial
  • When n is large and p is small, the Poisson
    distribution with parameter np is a very good
    approximation to the binomial (the number of
    successes in n independent trials when the
    probability of success is equal to p for each
    trial).
  • Example --Suppose that the probability that an
    idem produced by a certain machine will be
    defective is 0.10. Find the probability that a
    sample of 10 items will contain at most one flaw.

np 0.10101.0
39
References
  • Ross, S. (1988), A first course in probability,
    Macmillian Press.
Write a Comment
User Comments (0)
About PowerShow.com