Probability and probability distributions - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Probability and probability distributions

Description:

The chairs have arms, but no tables. Bring hard folder or paper pad to ... Number of yellow cars to pass in 30 minutes. Chance that a baby will be a boy or girl ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 69
Provided by: moll2
Category:

less

Transcript and Presenter's Notes

Title: Probability and probability distributions


1
Probability and probability distributions
  • PU5005 Lecture 2

2
Administration
  • SPSS practicals in Computer room 3 (behind 2nd
    floor refectory) this will be the room for every
    SPSS practical
  • Occasionally there will be a classroom based
    tutorial
  • Sign attendance register at lecture and practical
  • 5 to be paid to office for handbook if not
    already done so
  • All people attending must be registered for this
    course including those staff and PhD students
    taking this course only

3
Room change to Med Chi Hall
  • Med Chi Hall (Medico-Chirurgical Hall), Ground
    Floor, Polwarth Building
  • If entering the Polwarth Building from the west,
    follow the main corridor east past the medical
    library, through one set of double doors and take
    the stairs to your left down one flight. The Med
    Chi Hall is through a single wooden door at the
    bottom of the stairs. The lecture room itself is
    to the left.
  • If entering the Polwarth Building from the east
    through the main entrance, go through the foyer
    past the main stairs on the left and through an
    open set of double doors. Turn left signposted to
    Medical Microbiology. Follow this corridor to its
    end where there will be a single wooden door to
    the Med Chi Hall. The lecture room itself is to
    the left.

4
Room change to Med Chi Hall
  • Larger room. The chairs have arms, but no tables.
    Bring hard folder or paper pad to lean on.
  • 12 Oct (Next Thursday)
  • Not 19 Oct when we are in this room (1.147) again
  • 26 Oct
  • 2 Nov
  • 9 Nov
  • 16 Nov
  • 23 Nov
  • 30 Nov
  • 7 Dec

5
Why do we need probability?
  • Basis for statistical inference
  • Using data from a sample to address questions
    regarding the population

6
What is probability?
  • Nothing new you use it all the time in everyday
    life
  • Time wait in supermarket queue to be served
  • Number of yellow cars to pass in 30 minutes
  • Chance that a baby will be a boy or girl
  • Chance that a person will get cancer in their
    lifetime

7
Definitions
  • An experiment is an activity whose outcome is
    uncertain.
  • Examples Throwing a die (1,2,3,4,5 or
    6) Having a disease (yes/no)
  • An event consists of one or more possible
    outcomes.
  • Experiment Throwing a die
  • Event A even number

8
Approaches to computing probabilities
  • Classical
  • Uses the equally likely outcomes approach
  • Probability of an event A
  • Number of equally likely events in A
  • Total number of events
  • Relative frequency
  • the proportion of times the event occurs if the
    experiment were repeated a large number of times

9
Approaches to computing probabilities
  • What is the probability that on one throw of a
    die, the 6 will face upwards?

Classical
Number of events in A 1 Total number of equally
likely events 6 Probability 1/6
10
Approaches to computing probabilities
  • What is the probability that on one throw of a
    loaded dice, the 6 will face upwards?

Relative frequency
Generate a large number of dice throws and
generate a frequency distribution for each event
1,2,3,4,5,6 Probability 6 Number of sixes /
total number of throws
11
Quick experiment
12
Toss one die 600 times
13
Toss a die 600 times
14
Probability
  • Probability measures uncertainty
  • A probability measures the chance of an event
    occurring
  • Probabilities must lie between 0 and 1
  • The sum of the probabilities of all possible
    mutually exclusive events is 1
  • Probability is central to statistical inference

15
Rules of probability
  • There are rules that allow us to compute the
    probability of an event occurring
  • Addition rule
  • Multiplication rule
  • Terminology
  • Independent events
  • Mutually exclusive events
  • Conditional probability

16
Independent events
  • Definition Two events are independent if one
    happening has no bearing on whether the other
    happens or not.
  • Examples
  • An experiment involves throwing two dice. The
    value of the first die tells us nothing about
    what the value of the second die will be.
  • Pick two people in the class. Knowing the eye
    colour of one will tell me nothing about the eye
    colour of the other.

17
Mutually exclusive events
Definition Two events are mutually exclusive if
one happening means that the other cannot happen

The sum of the probabilities of all possible
mutually exclusive events is equal to 1.
18
Addition Rule
  • Consider two events A and B what is the
    probability that either event occurs?
  • The addition rule states that
  • PA or B PA PB - PA and B

19
Example
  • A die is thrown and the upper face is observed.
  • Event A is that an even number is observed
    2,4,6
  • Event B is a number lt 4 is observed 1,2,3
  • PA3/6 PB3/6 PA and B1/6

Using the addition rule
PA or B PA PB PA and B 3/6
3/6 1/6 5/6
20
Pictorial explanation
4, 6
1, 3
2
21
Example P14
  • A loaded die is thrown and the upper face is
    observed.
  • Event A is that an even number is observed
  • Event B is a number lt 4 is observed
  • PA0.4 PB0.2 PA and B0.1

Using the addition rule
PA or B PA PB PA and B 0.4
0.2 0.1 0.5
22
Pictorial explanation
P 0.4
P0.2
P0.1
23
Mutually Exclusive Events
  • Two events are mutually exclusive if one
    happening means that the other cannot happen.
  • e.g. Throw a die
  • Event A Even number
  • Event B Odd number
  • PA or B PA PB
  • 3/6 3/6
  • 1

24
Multiplication rule of probability
  • If two events are independent then
  • P(event A and event B)
  • P(A) ? P(B)
  • In a die toss experiment (fair/unloaded die),
    events A and B are
  • Aobserve an even number and
  • Bobserve a number lt4
  • What is the probability of observing A and B?

25
Solution
  • P(A) 2, 4, 6 3/6 1/2
  • P(B) 1,2,3,4 4/6 2/3
  • P(A and B) 2,4 2/6
  • Note P(A and B) (1/2 x 2/3) P(A) ? P(B)
  • This implies that events A and B are independent
    as P(A and B) P(A) ? P(B)

26
Conditional Probability
  • Independence is a strong assumption.
  • Consider instead the probability of one event
    happening given that the other has occurred.
  • Probability of event A occurring GIVEN that B has
    occurred is denoted with a vertical line
  • P(AB)
  • P(AB) P(A and B) P(B)

27
Definition of conditional probability
  • Conditional probability
  • P(BA) P(A and B)
  • P(A)
  • P(B given A has occurred) P(A and B)

  • P(A)
  • P(A and B) P(BA) x P(A)

28
Example
  • European cards have 52 in a pack. There 13 each
    of 4 types hearts ?, diamonds ?, clubs ? and
    spades ?
  • Choose two cards from pack, what is the
    probability that both are hearts, under two
    scenarios
  • The first card is replaced
  • The first card is not replaced

29
Scenario A
  • First card is replaced in pack
  • The two events are independent
  • When drawing each card all 52 cards are
    available.
  • P(1st card is a heart) 13/52
  • P(2nd card is a heart) 13/52
  • P(1st and 2nd card are hearts) 13/52 x 13/52
  • 0.0625

30
Scenario B
  • The first card is not replaced in the pack
  • Therefore, the probability of the second card
    being a heart is conditional on the first card.
  • P(1st card is heart) 13 / 52
  • P(2nd card is a heart given that the first was a
    heart) 12 / 51
  • P(1st heart and 2nd heart) 12/51 x 13/52
  • P(2nd heart1st heart) P(1st heart)
  • 0.0588

31
Example
  • A computing magazine reported that one third of
    graduates working in computing have degrees in IT
    and two-thirds have degrees in other subjects.
  • They conclude that arts and science graduates
    are twice as likely as IT graduates to pursue
    careers as IT professionals
  • Is this statement definitely correct?

32
Discussion of example
  • one third of graduates working in computing have
    degrees in IT and two-thirds have degrees in
    other subjects

P (arts or science graduate work in IT) 2/3
P (graduate works in IT arts or science
graduate) ? 2/3
P (AB) ? P(BA)
33
Diagnostic tests Conditional probability
  • In certain situations it is necessary to know the
    probability of a particular event or outcome
    happening given that we already know that another
    event or outcome has already occurred.
  • P(BA) P(A and B) P(A)
  • The main application of these types of
    probabilities is in diagnostic test and screening
    programs.

34
Screening example
  • As part of the Breast Cancer Screening Project of
    the Health Insurance Plan of Greater New York
    (HIP Program) 64,810 women aged between 40 and 64
    were screened for breast cancer by mammography
    and physical examination.
  • A total of 1,115 women were positive on
    screening, of whom 132 had breast cancer
    diagnosed. During a five- year follow-up period
    45 further cases of cancer were detected among
    women who were negative on screening. Source
    American Journal of Epidemiology (1974), 100
    357-366.

35
Screening
What is the probability that a women will develop
breast cancer given that she has had a positive
test?
P (breast cancer positive test result)
36
Solution
P (cancerve screening) P (cancer and ve
screen) / P (ve screen)
P (cancer and ve screening) 132 / 64,810
P (ve screening) 1,115/64,810
0.118
P (cancerve screening)
37
Predictive values
  • We have just calculated the positive predictive
    value of the screening test in this sample.
  • Screening or diagnostic tests are used to
    identify diseases and to help make a diagnosis.
  • It is important to know the probability that the
    test is giving the correct diagnosis (positive or
    negative).
  • Positive predictive value (PPV) is the
    probability of a person having the disease given
    a positive test result.
  • PPV P(test ve and disease ve)
  • P(test ve)

38
Sensitivity
  • Sensitivity is the probability that the test is
    positive given that the disease is present (i.e.
    true positives).
  • In conditional probability notation this is
  • P(Test is positive given that we know that the
    Disease is present)
  • P(Test veDisease ve)
  • P(Test ve and Disease ve)
  • P(Disease ve)

39
Calculations - Sensitivity
  • P(Test veDisease ve) P(Test ve and Disease
    ve) P(Disease ve)

(132/64,810) / (177/64,810)
132/177 0.75
40
Specificity
  • Specificity is the probability that the test is
    negative given that the disease is absent (i.e.
    true negatives).
  • In conditional probability notation this is
  • P(Test is negative given we know that the Disease
    is absent) P(Test -veDisease-ve)
  • P(Test ve and Disease -ve)
  • P(Disease -ve)

41
Calculations - Specificity
  • P(Test -veDisease -ve) P(Test -ve and Disease
    -ve) P(Disease -ve)

(63,650/64,810) / (64,633/64,810)

63,650/64,633 0.985
42
Sensitivity, Specificity, PPV
  • Sensitivity is the proportion of disease
    positives that are correctly identified by the
    test
  • 132/177 0.746 74.6
  • Specificity is the proportion of disease
    negatives that are correctly identified by the
    test
  • 63650/64,633 0.985 98.5
  • Positive predictive value is the proportion of
    patients with positive test results who are
    correctly diagnosed
  • 132/1115 0.118 11.8

43
A note on these diagnostic values
The predictive values are only of limited
validity. In clinical practice the predictive
values depend critically on the prevalence of the
abnormality in the patients being tested. This
may well differ from the prevalence in published
study assessing the usefulness of the test.
44
So what!
  • The rules of probability allow us to compute
    probabilities that a mutually exclusive event
    will occur.
  • From these we can produce theoretical probability
    distributions.
  • Each probability distribution is defined by
    certain parameters (e.g. the mean and variance)
    which characterise the distribution.
  • Probability distributions can be discrete or
    continuous.

45
Some discrete and continuous probability
distributions
  • Discrete probability distributions
  • Binomial (number of trials (n) and probability of
    success (p)
  • Poisson (count of the number of events occurring
    independently (average rate))
  • Continuous probability distributions
  • Normal (Mean and variance)
  • t (Mean and variance)
  • F
  • Chi square (?2)

46
Binomial distribution
  • This type of distribution is used to calculate
    probabilities when we have a dichotomous variable
    (e.g. success/failure following treatment)
  • 3 patients have a headache
  • They are each given a tablet to relieve their
    headache
  • The outcome (relief/no relief) for the three
    patients is independent to one another
  • Of the three patients there are 3 combinations in
    which 2 of the three will have relief following
    treatment (Patients 1 and 2, or 1 and 3, or 2 and
    3)

47
Binomial distribution example
  • Cystic fibrosis is a serious congenital disease
    which results in abnormal amounts of thick sticky
    secretions in the lungs. It is autosomal
    recessive, so that if both parents are carriers
    there is a 1 in 4 chance that each child they
    have will be infected.
  • If the parents are both carriers and have 3
    children
  • What is the probability that none of the three
    children are affected?
  • What is the probability that two of the three are
    affected?

48
Answer
  • P (none has CF)
  • P (child 1 has no CF and child 2 has no CF and
    child 3 has no CF)
  • (3/4 x 3/4 x 3/4) 0.4219
  • The probability that none of the children will
    have CF is 0.42 (i.e. there is a 42 chance that
    none will have CF)

49
Probability that two of the three children will
have CF
P (child 1 has CF and child 2 has CF and child 3
has no CF
or child 1 has CF and child 2 has no CF and child
3 has CF
or child 1 has no CF and child 2 has CF and child
3 has CF )
  • (1/4 x1/4 x 3/4) (1/4 x 3/4 x 1/4) (3/4 x
    1/4 x 1/4)
  • 3 x (1/4 x 1/4 x 3/4) 0.1406
  • The probability that two out of three will have
    CF is 0.14.

50
Binomial distribution
51
Binomial distribution
  • Probabilities can be computed using a
    mathematical formula.
  • These are listed in Statistical Tables and it is
    easy to find the probability of r events
    occurring in n trials with the probability of
    success P .
  • However, with computers these can be easily
    accessed without the need to consult tables of
    probabilities.
  • When the number of trials (n) increases, the
    binomial distribution can be approximated by the
    Normal distribution.

52
Continuous probability distributions
  • For continuous probability distributions we can
    only calculate the probability of a random
    variable taking values in a certain range.
  • A curve can be drawn from the equation that
    represents the appropriate probability
    distribution.
  • The area under the curve must equal 1 since the
    curve represents the probability of all possible
    events.

53
Normal distribution
  • Turning our attention to continuous variables
    such as height, weight or blood pressure it is
    also possible to calculate probabilities.
  • The distributions of many medical measurements
    approximate to the Normal distribution e.g. serum
    uric acid levels, cholesterol levels).

54
The Normal distribution
  • This distribution is a smooth bell-shaped
    distribution which is symmetrical about its mean
    value.
  • It will be flatter is the variance is larger and
    more peaked if the variance is small.
  • Areas under this Normal curve correspond to
    probabilities.

55
Properties of Normal curve
P0.68 P0.95 P0.999
56
Standard Normal Distribution
  • Since there are an infinite number of Normal
    distributions, it is easier to work with the
    Standard Normal distribution.
  • The Standard Normal distribution has a mean of 0
    and a standard deviation of 1.
  • The standardised Normal deviate (Z score) is a
    random variable that has a Standard Normal
    distribution.
  • A Z-score establishes how many standard
    deviations a particular value is from the mean
    value.

57
How to calculate Z-scores
  • It is easier to work with Z-scores
  • The Z-score is found as
  • X is the random variable of interest
  • m (represents the population mean of the Normal
    distribution that the random variable follows)
  • s (represents the population standard deviation).

58
Properties of Normal curve
P0.68 P0.95 P0.999
59
  • The probability of being between any points may
    be calculated as the area under the curve. These
    are available in tables, but roughly correspond
    to
  • the probability that a random Normal variable
    will take a value between
  • the mean and 1 standard deviation either side is
    0.68
  • the mean and 1.96 standard deviations either side
    is 0.95
  • the mean and 2.58 standard deviations either side
    is 0.99

60
Example
  • Suppose we know that in a population of patients,
    diastolic blood pressure follows a Normal
    distribution with mean 100mmHg and standard
    deviation 8 mmHg.
  • Find the probability that the diastolic blood
    pressure of a particular patient is less than 92
    mmHg.

61
Solution
  • If we let X represent diastolic blood pressure,
    m represents mean blood pressure and s
    represents standard deviation
  • we can perform the following
  • Calculate the probability that blood pressure is
    less than 92 mmHg, denoted P(X lt 92)
  • If X comes from the Normal distribution with mean
    100 and SD8,
  • then standardise the variable into a Z score,
    thus
  • P(Z lt 92-100) P (Z lt -1), 8
  • where Z has a standardised normal distribution
    (Z? N(0,1)).

62
P0.16
63
The Normal distribution
  • The Normal distribution is for continuous data
    when the population mean and variance are known.
  • When these are not known and we only have sample
    information about the mean and variance then we
    use the Students t-distribution. As the sample
    size increases this tends towards the Normal.
  • In fact many of the distributions tend towards
    Normality especially if many samples are
    collected. Hence, the Normal distribution has
    become an important part in the theory of
    statistics.

64
Chi-Squared
  • Another important distribution related to the
    normal is the Chi-squared distribution.
  • Its use is used when investigating categorical
    data.

65
Crossstab Example
  • If eye and hair colour are not associated then
    for example, the Expected number with blue eyes
    and blond hair would be

66
  • So the chi-squared is found by looking a function
    of the discrepancy between observed and expected
    counts in each cell
  • summed over all combinations of hair and eye
    colour.
  • If this is large and in the tail of the
    distribution, then it may be said that the
    observed is not as expected!
  • More of this later.

67
Summary
  • Probabilities are integral to all things around
    us.
  • We can derive and understand probabilities.
  • We have seen that probabilities build together to
    form probability distributions.
  • Some are theoretical distributions that are well
    understood, the most important being the Normal.
  • Using these theoretical distributions we can
    begin to make inferences about the population on
    the basis of samples.

68
On a brighter note
  • Dont worry if you find the theory underlying
    these distributions confusing.
  • In practice, it is more important that you know
    when and how to use each of the probability
    distributions rather than understand how the
    probabilities are calculated.
  • They allow us to quantify how likely an event,
    measurement, or statistic is to occur.
Write a Comment
User Comments (0)
About PowerShow.com