Random Variables

About This Presentation

Title:

Random Variables

Description:

Random Variables & Expectation Example: What are the mean, variance, & standard deviation for our binomial distribution example in which n=5 & p=1/3? – PowerPoint PPT presentation

Number of Views:238

Avg rating:3.0/5.0

Slides: 222

Provided by: WidenerUn8

Category:

more less

Transcript and Presenter's Notes

Title: Random Variables

1
Random Variables Expectation
2
Random Variable

A random variable (r.v.) is a well defined rule
for assigning a numerical value to all possible
outcomes of an experiment.
example
experiment taking a course
outcomes grades A, B, C, D, F
sample space S discrete finite
random variable Y 4 if grade is A
Y 3 if grade is B
Y 2 if grade is C
Y 1 if grade is D
Y 0 if grade is F

3
Experiment throw 2 diceWhat are the possible
outcomes?

1,1 2,1 3,1 4,1 5,1 6,1
1,2 2,2 3,2 4,2 5,2 6,2
1,3 2,3 3,3 4,3 5,3 6,3
1,4 2,4 3,4 4,4 5,4 6,4
1,5 2,5 3,5 4,5 5,5 6,5
1,6 2,6 3,6 4,6 5,6 6,6

4
Define the random variable X to be the sum of the
dots on the 2 dice.
5
For which outcomes does X 9

1,1 2,1 3,1 4,1 5,1 6,1
1,2 2,2 3,2 4,2 5,2 6,2
1,3 2,3 3,3 4,3 5,3 6,3
1,4 2,4 3,4 4,4 5,4 6,4
1,5 2,5 3,5 4,5 5,5 6,5
1,6 2,6 3,6 4,6 5,6 6,6

6
For which outcomes does X 9

1,1 2,1 3,1 4,1 5,1 6,1
1,2 2,2 3,2 4,2 5,2 6,2
1,3 2,3 3,3 4,3 5,3 6,3
1,4 2,4 3,4 4,4 5,4 6,4
1,5 2,5 3,5 4,5 5,5 6,5
1,6 2,6 3,6 4,6 5,6 6,6

7
What is Pr(X9)?

1,1 2,1 3,1 4,1 5,1 6,1
1,2 2,2 3,2 4,2 5,2 6,2
1,3 2,3 3,3 4,3 5,3 6,3
1,4 2,4 3,4 4,4 5,4 6,4
1,5 2,5 3,5 4,5 5,5 6,5
1,6 2,6 3,6 4,6 5,6 6,6

Since there are 36 equally likely outcomes, each
has a probability of 1/36. So since there are 4
outcomes that yield X9, Pr(X9) 4/36 1/9
8
Lets calculate the probabilities of all the
possible values x of the random variable X

x Pr(Xx)

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
9
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
10
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
11
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
12
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
13
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
14
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
15
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
16
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
17
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
18
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
19
Lets calculate the probabilities of the possible
values x of the random variable X

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36

1,1 2,1 3,1 4,1 5,1 6,1 1,2 2,2 3,2 4,2 5,2 6,2 1,
3 2,3 3,3 4,3 5,3 6,3 1,4 2,4 3,4 4,4 5,4 6,4 1,5
2,5 3,5 4,5 5,5 6,5 1,6 2,6 3,6 4,6 5,6 6,6
20
Lets graph the probability distribution of X.

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36

21
Pr(Xx) f(x) p(x)as described in this table
or graph is called the probability distribution
or probability mass function (p.m.f.)

x Pr(Xx)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36

22
Properties of Probability Distributions

0 Pr(Xx) 1 for all x

23
Cumulative Mass Function
24
Cumulative Mass Function (2 dice problem)
1 30/36 24/36 18/36 12/36 6/36
F(x)

x Pr(Xx) Pr(Xx)
2 1/36 1/36
3 2/36 3/36
4 3/36 6/36
5 4/36 10/36
6 5/36 15/36
7 6/36 21/36
8 5/36 26/36
9 4/36 30/36
10 3/36 33/36
11 2/36 35/36
12 1/36 1

0 1 2 3 4 5 6 7 8 9 10 11 12
13 x
25
Expectation, Expected Value, or Mean of a Random
Variable
26
Notice the similarity of the definitions of the
mean of a random variable the mean of a
frequency distribution for a population
Recall that probability p(x) is the relative
frequency f/N with which something occurs over
the long run. So these definitions are saying
the same thing.
27
Example Suppose that a stock broker wants to
estimate the price of a certain stock one year
from now. If the probability mass function of
the price in a year is as given, determine the
expected price.

x price in one year p(x)
94 0.25
98 0.25
102 0.25
106 0.25

28
Example Suppose that a stock broker wants to
estimate the price of a certain stock one year
from now. If the probability mass function of
the price in a year is as given, determine the
expected price.

x price in one year p(x)
94 0.25
98 0.25
102 0.25
106 0.25
1.00

29
Example Suppose that a stock broker wants to
estimate the price of a certain stock one year
from now. If the probability mass function of
the price in a year is as given, determine the
expected price.

x price in one year p(x) xp(x)
94 0.25 23.5
98 0.25 24.5
102 0.25 25.5
106 0.25 26.5
1.00

30
Example Suppose that a stock broker wants to
estimate the price of a certain stock one year
from now. If the probability mass function of
the price in a year is as given, determine the
expected price.

x price in one year p(x) xp(x)
94 0.25 23.5
98 0.25 24.5
102 0.25 25.5
106 0.25 26.5
1.00 100.0

Notice that you do NOT divide by the number of
observations when youre done adding. Also, the
probabilities do not have to be equal they just
have to add up to one.
31
Theorem Suppose that g(X) is a function of a
random variable X, the probability mass
function of X is px(x). Then the expected value
of g(X) is
32
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x)
-2 0.1
-1 0.2
1 0.3
2 0.4

33
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y p(y)
-2 0.1
-1 0.2
1 0.3
2 0.4

34
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y p(y)
-2 0.1 1 0.5
-1 0.2
1 0.3
2 0.4

35
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y p(y)
-2 0.1 1 0.5
-1 0.2 4 0.5
1 0.3
2 0.4

36
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y p(y)
yp(y)
-2 0.1 1 0.5 0.5
-1 0.2 4 0.5 2.0
1 0.3
2 0.4

37
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y p(y)
yp(y)
-2 0.1 1 0.5 0.5
-1 0.2 4 0.5 2.0
1 0.3 E(Y) 2.5
2 0.4

38
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y
-2 0.1 4
-1 0.2 1
1 0.3 1
2 0.4 4

39
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y ypx(x)
-2 0.1 4 0.4
-1 0.2 1 0.2
1 0.3 1 0.3
2 0.4 4 1.6

40
Example Suppose Y X2 the distribution of X
is as given below. Determine the mean of g(X) by
using1. the definition of expected value, 2.
the previous theorem.

x p(x) y ypx(x)
-2 0.1 4 0.4
-1 0.2 1 0.2
1 0.3 1 0.3
2 0.4 4 1.6
E(Y) 2.5

41
DefinitionVariance of a random variable X
42
TheoremThe variance of X can also be calculated
as follows
43
Standard Deviation of a random variable X
44
Example Suppose sales at a donut shop are
distributed as below. Calculate (a) the mean
number of donuts sold, (b) the variance (using
both the definition of the variance the
theorem), (c) the standard deviation.
x p(x)
1 0.08
2 0.27
4 0.10
6 0.33
12 0.22

45
First, the mean.
x p(x) xp(x)
1 0.08 0.08
2 0.27 0.54
4 0.10 0.40
6 0.33 1.98
12 0.22 2.64

46
First, the mean.
x p(x) xp(x)
1 0.08 0.08
2 0.27 0.54
4 0.10 0.40
6 0.33 1.98
12 0.22 2.64
m5.64

47
Next, the variance using the definition
x p(x) xp(x) x-m
1 0.08 0.08 -4.64
2 0.27 0.54 -3.64
4 0.10 0.40 -1.64
6 0.33 1.98 0.36
12 0.22 2.64 6.36
m5.64

48
Next, the variance using the definition
x p(x) xp(x) x-m (x-m)2
1 0.08 0.08 -4.64 21.53
2 0.27 0.54 -3.64 13.25
4 0.10 0.40 -1.64 2.69
6 0.33 1.98 0.36 0.13
12 0.22 2.64 6.36 40.45
m5.64

49
Next, the variance using the definition
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x)
1 0.08 0.08 -4.64 21.53 1.72
2 0.27 0.54 -3.64 13.25 3.58
4 0.10 0.40 -1.64 2.69 0.27
6 0.33 1.98 0.36 0.13 0.04
12 0.22 2.64 6.36 40.45 8.90
m5.64

50
Next, the variance using the definition
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x)
1 0.08 0.08 -4.64 21.53 1.72
2 0.27 0.54 -3.64 13.25 3.58
4 0.10 0.40 -1.64 2.69 0.27
6 0.33 1.98 0.36 0.13 0.04
12 0.22 2.64 6.36 40.45 8.90
m5.64 s2 14.51

51
Now, the variance using the theoremV(X)
E(X2)-E(X)2.
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x) x2
1 0.08 0.08 -4.64 21.53 1.72 1
2 0.27 0.54 -3.64 13.25 3.58 4
4 0.10 0.40 -1.64 2.69 0.27 16
6 0.33 1.98 0.36 0.13 0.04 36
12 0.22 2.64 6.36 40.45 8.90 144
m5.64 s2 14.51

52
Now, the variance using the theoremV(X)
E(X2)-E(X)2.
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x) x2 x2p(x)
1 0.08 0.08 -4.64 21.53 1.72 1 0.08
2 0.27 0.54 -3.64 13.25 3.58 4 1.08
4 0.10 0.40 -1.64 2.69 0.27 16 1.60
6 0.33 1.98 0.36 0.13 0.04 36 11.88
12 0.22 2.64 6.36 40.45 8.90 144 31.68
m5.64 s2 14.51

53
Now, the variance using the theoremV(X)
E(X2)-E(X)2.
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x) x2 x2p(x)
1 0.08 0.08 -4.64 21.53 1.72 1 0.08
2 0.27 0.54 -3.64 13.25 3.58 4 1.08
4 0.10 0.40 -1.64 2.69 0.27 16 1.60
6 0.33 1.98 0.36 0.13 0.04 36 11.88
12 0.22 2.64 6.36 40.45 8.90 144 31.68
m5.64 s2 14.51 E(X2)46.32

54
Now, the variance using the theoremV(X)
E(X2)-E(X)2.
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x) x2 x2p(x)
1 0.08 0.08 -4.64 21.53 1.72 1 0.08
2 0.27 0.54 -3.64 13.25 3.58 4 1.08
4 0.10 0.40 -1.64 2.69 0.27 16 1.60
6 0.33 1.98 0.36 0.13 0.04 36 11.88
12 0.22 2.64 6.36 40.45 8.90 144 31.68
m5.64 s2 14.51 E(X2)46.32
s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51
55
And lastly, the standard deviation,by taking the
square root of the variance.
x p(x) xp(x) x-m (x-m)2 (x-m)2p(x) x2 x2p(x)
1 0.08 0.08 -4.64 21.53 1.72 1 0.08
2 0.27 0.54 -3.64 13.25 3.58 4 1.08
4 0.10 0.40 -1.64 2.69 0.27 16 1.60
6 0.33 1.98 0.36 0.13 0.04 36 11.88
12 0.22 2.64 6.36 40.45 8.90 144 31.68
m5.64 s2 14.51 E(X2)46.32
s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81 s2 V(X) E(X2) E(X)2 46.32 (5.64)2 14.51 s 3.81
56
Important Theorem

If X has mean m and variance s2, then (X-m)/s
has mean 0 and variance 1.

57
Example (G-m)/s

Suppose your course grades have a mean of 2.7 and
a standard deviation of 1.2.
Suppose you took your grades, subtracted 2.7 from
each one, then divided those results by 1.2.
The new set of numbers would have a mean of 0 and
a standard deviation of 1.

58
Expectation RulesLet k, a, b be constants.

E(k) k The mean of a
constant is the constant.
2. V(k) 0 The variance of a constant is zero.
E(a bX) a b E(X)
V(a bX) b2 V(X)

59
Example If X has a mean of 3 and a variance of
2/3, what are the mean and variance of Y52X ?

First find the mean E(Y) E(52X).
E(a bX) a b E(X).
Let a5 b2. Then just plug into the formula.
So,
E(Y) E(52X) 5 2 E(X) 5 2(3) 11.
Next find the variance V(Y) V(52X).
V(a bX) b2 V(X).
Again let a5 and b2 and just plug into the
formula.
V(Y) V(52X) 22 V(X) 4 V(X) 4(2/3) 8/3.
Notice that the constant term shifts the mean but
has no effect on the spread of the distribution.

60
Joint Probability Distribution for 2 Discrete
Random Variables X Y

p(x,y) Pr(Xx and Yy)

61
Properties of Joint Probability Distributions
62
Example Consider the following joint
distribution of the number of jobs the number
of promotions of college graduates in their 1st 5
years out of college.
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y)
1 2 3 4
1 0.10 0.15 0.12 0.06
2 0.05 0.07 0.10 0.05
3 0.04 0.02 0.14 0.10

Number of jobs (x)
63
For example, the probability of 3 jobs 2
promotions is 0.02.
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y)
1 2 3 4
1 0.10 0.15 0.12 0.06
2 0.05 0.07 0.10 0.05
3 0.04 0.02 0.14 0.10

Number of jobs (x)
64
We can determine the marginal distribution of
the 2 random variables X Y just as we did
before for 2 events.Just add across the row or
down the column.
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y)
1 2 3 4
1 0.10 0.15 0.12 0.06
2 0.05 0.07 0.10 0.05
3 0.04 0.02 0.14 0.10

Number of jobs (x)
65
For the probability of 1 job
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05
3 0.04 0.02 0.14 0.10

Number of jobs (x)
66
Similarly for the probabilities of 2 or 3 jobs
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30

Number of jobs (x)
67
For the probability of 1 promotion
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19
Number of jobs (x)
68
and for the probabilities of 2, 3, or 4
promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21
Number of jobs (x)
69
Notice again, that you must get at total one when
you total the marginal probabilities for x and
for y.
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
Number of jobs (x)
70
Conditional Probabilities for Random
VariablesExample

The probability that X is 2 given that Y is 3
pXY(23) Pr(X2Y3)
Pr(X2 Y3)/Pr(Y3).
The probability that Y is 2 given that X is 3
pYX(23) Pr(Y2X3)
Pr(Y2 X3)/Pr(X3).

71
Lets do the calculations using our previous
example.
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
pXY(23) Pr(X2Y3) Pr(X2
Y3)/Pr(Y3) 0.10/0.36 0.278. pYX(23)
Pr(Y2X3) Pr(Y2 X3)/Pr(X3) 0.02/0.30
0.067.
Number of jobs (x)
72
Cumulative Joint Mass Function for 2 Discrete
Random Variables X Y

F(X,Y) Pr(X x and Y y)

73
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3)
Number of jobs (x)
74
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1)
Number of jobs (x)
75
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2)
Number of jobs (x)
76
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3)
Number of jobs (x)
77
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3)
Number of jobs (x)
78
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3) f(2,1)
Number of jobs (x)
79
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3) f(2,1)
f(2,2)
Number of jobs (x)
80
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3) f(2,1)
f(2,2) f(2,3)
Number of jobs (x)
81
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3) f(2,1)
f(2,2) f(2,3)
Number of jobs (x)
82
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3) f(2,1)
f(2,2) f(2,3) 0.10 0.15 0.12 0.05
0.07 0.10
Number of jobs (x)
83
Job/Promotion Example Find probability that a
person had 2 or fewer jobs 3 or fewer promotions
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
F(2,3) f(1,1) f(1,2) f(1,3) f(2,1)
f(2,2) f(2,3) 0.10 0.15 0.12 0.05
0.07 0.10 0.59
Number of jobs (x)
84
Independence

Recall that 2 events A B were independent if
Pr(AnB)Pr(A) Pr(B)
Similarly 2 random variables are independent if
p(x,y) pX(x) pY(y) for all values of x y

85
In our previous example, are the number of jobs
number of promotions independent?
Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) Number of Promotions (y) pX(x) marginal prob. of x
1 2 3 4 pX(x) marginal prob. of x
1 0.10 0.15 0.12 0.06 0.43
2 0.05 0.07 0.10 0.05 0.27
3 0.04 0.02 0.14 0.10 0.30
pY(y) marginal prob. of y pY(y) marginal prob. of y 0.19 0.24 0.36 0.21 1.00
We must have p(x,y) pX(x) pY(y) for all
values of x y. To start, does p(1,1) equal
pX(1) pY(1) ? p(1,1) 0.10 pX(1) pY(1) 0.43
0.19 0.0817
? 0.10 So X Y are not independent. If that case
had been equal, we wouldnt be done yet. Wed
have to verify that equality held for all the
cells.
Number of jobs (x)
86
Theorem mean of a function of 2 random
variables X Y
87
Suppose that based on the joint distribution of
the length X width Y of lumber sold by a
lumberyard, we would like to determine the mean
length, mean width, mean area of the lumber.

So we want to calculate
E(X),
E(Y), and
E(XY).

88
Given the joint distribution below, calculate
E(X), E(Y), E(XY).
Y Y Y
2 4 6
X 4 0.05 0.05 0.10
X 8 0.10 0.50 0.20

89
First, determine the marginal distributions.
Y Y Y
2 4 6
X 4 0.05 0.05 0.10
X 8 0.10 0.50 0.20

90
The marginal distribution of X ...
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80

91
The marginal distribution of Y ...
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30
92
Check that the marginal distribution
probabilities sum to 1.
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
93
Next we calculate the mean length mean width.
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
94
For E(X), remember we need to multiply the
values by their probabilities and add up.
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
x p(x) xp(x)

95
We get the values of X and their probabilities
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
x p(x) xp(x)
4 0.20
8 0.80

96
multiply
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
x p(x) xp(x)
4 0.20 0.80
8 0.80 6.40

97
and add up.
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
x p(x) xp(x)
4 0.20 0.80
8 0.80 6.40
7.20
98
We now have our E(X).
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
x p(x) xp(x)
4 0.20 0.80
8 0.80 6.40
E(X) 7.20 E(X) 7.20
99
For E(Y), we do the same thing.
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
y p(y) yp(y)

100
Get the values of Y and their probabilities
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
y p(y) yp(y)
2 0.15
4 0.55
6 0.30

101
multiply
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
y p(y) yp(y)
2 0.15 0.30
4 0.55 2.20
6 0.30 1.80

102
and add up.
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
y p(y) yp(y)
2 0.15 0.30
4 0.55 2.20
6 0.30 1.80
4.30
103
Theres our E(Y).
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
y p(y) yp(y)
2 0.15 0.30
4 0.55 2.20
6 0.30 1.80
E(Y) 4.30 E(Y) 4.30
104
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
For the mean area, E(XY), the theorem translates
to
105
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
To keep track of the xy terms, we are going to
put them in our table.
106
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
107
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
108
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
109
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
110
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
111
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
112
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
Next, we need to multiple the xy terms by the
corresponding probabilities,
113
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
and then add it all up.
114
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) ...
115
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) 0.05 (16) ...
116
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) 0.05 (16) 0.10 (24) ...
117
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) 0.05 (16) 0.10 (24)
0.10 (16) ...
118
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) 0.05 (16) 0.10 (24)
0.10 (16) 0.50 (32) ...
119
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) 0.05 (16) 0.10 (24)
0.10 (16) 0.50 (32) 0.20 (48) ...
120
To calculate the mean area E(XY), we use the
theorem
Y Y Y pX(x)
2 4 6 pX(x)
X 4 0.05 (8) 0.05 (16) 0.10 (24) 0.20
X 8 0.10 (16) 0.50 (32) 0.20 (48) 0.80
pY(y) pY(y) 0.15 0.55 0.30 1.00
So we have 0.05 (8) 0.05 (16) 0.10 (24)
0.10 (16) 0.50 (32) 0.20 (48) 30.8 for
the mean area.
121
You might wonder if we could get E(XY) by just
multiplying E(X) by E(Y).

The answer is generally not.
In our example, we had E(X) 7.2, E(Y) 4.3,
E(XY) 30.8
E(X) E(Y) 30.96, not 30.80.
Close in this case, but not the same.

122
If X and Y are independent, then it is true that
E(XY) E(X) E(Y).

It may also hold occasionally in other cases.
But generally, it doesnt work.

123
Definition Covariance of X Y
What does this mean?
124
Suppose that two variables tend to move in the
same direction, like study time and grades. Next,
when x is large, so that it is larger than its
mean, then x-mX gt 0. When x is large, y tends to
be large as well, so that y-mY gt 0
also. Remember, that the p(x,y) values are
probabilities and therefore must be positive. So
those terms in the formula would look like

These products are positive.
125
Similarly, since x and y tend to be small
together,we have x-mX lt 0 with y-mYlt0
too. Those terms would look like
- -
These products are positive too. So were adding
up a lot of positive numbers. What all that means
is that when 2 variables tend to move in the same
direction, the covariance will positive.
126
When 2 variables tend to move in opposite
directions,

their covariance C(X,Y) lt 0,
perhaps like party time and grades.

127
If variables dont tend to move either in the
same or opposite directions,

their covariance C(X,Y) 0.
This case includes independent variables.

128
It is usually easier to calculate covariances
using this theorem.

Theorem C(X,Y) E(XY) E(X) E(Y)

129
Returning to the lumber example

Remember we had E(X) 7.2, E(Y) 4.3, E(XY)
30.8
Then the covariance would be
C(X,Y) E(XY) E(X) E(Y)
(30.8) (7.2)(4.3)
- 0.16

130
Difficulty

The value of the covariance changes when you
change units.
That is, you get different answers if you use
feet, inches, or meters.
So its difficult to tell if a particular answer
means a strong relationship or not.
Fortunately, we have a solution to this problem

131
Correlation Coefficient

The correlation coefficient is similar to the
covariance, but it doesnt vary with the units
used.

132
Correlation Coefficient
The correlation coefficient is denoted by the
Greek letter rho, r. Its computed by dividing
the covariance of X Y by the standard
deviations of X of Y.
133
The correlation coefficient is always between -1
and 1.
-1 r 1.
134
Correlation Coefficient
-1 r 1
So, if your correlation coefficient r is close to
1, you have a strong positive relationship. If it
is close to -1, you have a strong negative
relationship. If it is close to zero, there is no
strong linear relationship at all.
135
Back to the lumber example again

We had C(X,Y) -0.16.
We need the standard deviations of X and Y, which
we have not calculated yet.

136
This is what we had for X so far.
x p(x) xp(x)
4 0.20 0.80
8 0.80 6.40
E(X) 7.20
137
Recall we said previously that we can calculate
V(X)as V(X) E(X2) E(X)2.
x p(x) xp(x)
4 0.20 0.80
8 0.80 6.40
E(X) 7.20
We have E(X) but we need E(X2). The theorem
Eg(X) Sg(x)p(x) gives us E(X2) Sx2p(x)
138
E(X2) Sx2p(x)
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16
8 0.80 6.40 64
E(X) 7.20
139
E(X2) Sx2p(x)
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16 3.2
8 0.80 6.40 64 51.2
E(X) 7.20
140
E(X2) Sx2p(x)
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16 3.2
8 0.80 6.40 64 51.2
E(X) 7.20 E(X2) 54.4
141
Now we need to subtract to get V(X).
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16 3.2
8 0.80 6.40 64 51.2
E(X) 7.20 E(X2) 54.4
V(X) E(X2) E(X)2 V(X) E(X2) E(X)2 V(X) E(X2) E(X)2 V(X) E(X2) E(X)2 V(X) E(X2) E(X)2
142
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16 3.2
8 0.80 6.40 64 51.2
E(X) 7.20 E(X2) 54.4
V(X) E(X2) E(X)2 54.4 (7.2)2 V(X) E(X2) E(X)2 54.4 (7.2)2 V(X) E(X2) E(X)2 54.4 (7.2)2 V(X) E(X2) E(X)2 54.4 (7.2)2 V(X) E(X2) E(X)2 54.4 (7.2)2
143
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16 3.2
8 0.80 6.40 64 51.2
E(X) 7.20 E(X2) 54.4
V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56
144
Take the square root to get the standard
deviation sX
x p(x) xp(x) x2 x2p(x)
4 0.20 0.80 16 3.2
8 0.80 6.40 64 51.2
E(X) 7.20 E(X2) 54.4
V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 sX 1.60 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 sX 1.60 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 sX 1.60 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 sX 1.60 V(X) E(X2) E(X)2 54.4 (7.2)2 2.56 sX 1.60
145
We do the same thing with Y.
y p(y) yp(y)
2 0.15 0.30
4 0.55 2.20
6 0.30 1.80
E(Y) 4.30

146
Get y2
y p(y) yp(y) y2 y2p(y)
2 0.15 0.30 4
4 0.55 2.20 16
6 0.30 1.80 36
E(Y) 4.30

147
Multiply by p(y).
y p(y) yp(y) y2 y2p(y)
2 0.15 0.30 4 0.60
4 0.55 2.20 16 8.80
6 0.30 1.80 36 10.80
E(Y) 4.30

148
Add to get E(Y2).
y p(y) yp(y) y2 y2p(y)
2 0.15 0.30 4 0.60
4 0.55 2.20 16 8.80
6 0.30 1.80 36 10.80
E(Y) 4.30 E(Y2) 20.20

149
Subtract to get V(Y).
y p(y) yp(y) y2 y2p(y)
2 0.15 0.30 4 0.60
4 0.55 2.20 16 8.80
6 0.30 1.80 36 10.80
E(Y) 4.30 E(Y2) 20.20
V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71
150
Take the square root to get the standard
deviation sY
y p(y) yp(y) y2 y2p(y)
2 0.15 0.30 4 0.60
4 0.55 2.20 16 8.80
6 0.30 1.80 36 10.80
E(Y) 4.30 E(Y2) 20.20
V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 sY 1.31 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 sY 1.31 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 sY 1.31 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 sY 1.31 V(Y) E(Y2) E(Y)2 20.20 (4.3)2 1.71 sY 1.31
151
Now we have everything we need to compute the
correlation coefficient for the lumber problem.
This number is much closer to 0 than it is to
-1. So the negative relation between the length
width of the lumber is very weak.
152
Theorem

E(aX bY) aE(X) bE(Y)
V(aX bY) a2V(X) b2V(Y) 2abC(X,Y)

153
Example The mean variance of X are 1 5
respectively. The mean variance of Y are 2 6
respectively. The covariance of X Y is 7.
Determine the mean variance of 4X 3Y.

Recall E(aX bY) aE(X) bE(Y)
V(aX bY) a2V(X) b2V(Y)
2abC(X,Y)
To solve this problem what should a b be?
a is 4 b is 3.
E(aX bY) aE(X) bE(Y) 4 (1) 3(2)
4
6 10
V(aX bY) a2V(X) b2V(Y) 2abC(X,Y)
42V(X) 32V(Y)
2(4)(3)C(X,Y)
16(5) 9(6) 24(7)
80 54 168
302

154
Consider the following joint distribution of X
Y.
y y
2 4
x 1 0.20 0.25
x 3 0.15 0.20
x 5 0.15 0.05

Determine the following
The mean variance of X
The mean variance of Y
The covariance correlation coefficient of X Y
The mean variance of XY

155
First, determine the marginal distribution of X
y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20

156
and the marginal distribution of Y.
y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20
pY(y) pY(y) 0.50 0.50
157
Verify that they sum to 1.
y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20
pY(y) pY(y) 0.50 0.50 1
158
Set up table to compute the mean variance of X.
y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20
pY(y) pY(y) 0.50 0.50 1
x p(x) xp(x) x2p(x)

159
Fill in the values of X and their probabilities.
x p(x) xp(x) x2p(x)
1 0.45
3 0.35
5 0.20

y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20
pY(y) pY(y) 0.50 0.50 1
160
Multiply x by p(x).
x p(x) xp(x) x2p(x)
1 0.45 0.45
3 0.35 1.05
5 0.20 1.00

161
Add to get the mean of X.
x p(x) xp(x) x2p(x)
1 0.45 0.45
3 0.35 1.05
5 0.20 1.00
E(X) 2.50

162
To calculate the variance, first compute E(X2)
S x2p(x).
x p(x) xp(x) x2p(x)
1 0.45 0.45 0.45
3 0.35 1.05 3.15
5 0.20 1.00 5.00
E(X) 2.50

163
To calculate the variance, first compute E(X2)
S x2p(x).
x p(x) xp(x) x2p(x)
1 0.45 0.45 0.45
3 0.35 1.05 3.15
5 0.20 1.00 5.00
E(X) 2.50 E(X2)8.60

164
Calculate the variance as V(X) E(X2) E(X)2.
x p(x) xp(x) x2p(x)
1 0.45 0.45 0.45
3 0.35 1.05 3.15
5 0.20 1.00 5.00
E(X) 2.50 E(X2)8.60
V(X) E(X2) E(X)2 8.6 (2.5)2 2.35 V(X) E(X2) E(X)2 8.6 (2.5)2 2.35 V(X) E(X2) E(X)2 8.6 (2.5)2 2.35 V(X) E(X2) E(X)2 8.6 (2.5)2 2.35
165
Set up table to compute the mean variance of Y.
y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20
pY(y) pY(y) 0.50 0.50 1
y p(y) yp(y) y2p(y)

166
Fill in the values of Y and their probabilities.
y y pX(x)
2 4 pX(x)
x 1 0.20 0.25 0.45
x 3 0.15 0.20 0.35
x 5 0.15 0.05 0.20
pY(y) pY(y) 0.50 0.50 1
y p(y) yp(y) y2p(y)
2 0.5
4 0.5

167
Multiply y by p(y)
y p(y) yp(y) y2p(y)
2 0.5 1
4 0.5 2

168
and add to get E(Y).
y p(y) yp(y) y2p(y)
2 0.5 1
4 0.5 2
E(Y) 3

169
To calculate the variance, first compute E(Y2)
S y2p(y).
y p(y) yp(y) y2p(y)
2 0.5 1 2
4 0.5 2 8
E(Y) 3

170
To calculate the variance, first compute E(Y2)
S y2p(y).
y p(y) yp(y) y2p(y)
2 0.5 1 2
4 0.5 2 8
E(Y) 3 E(Y2) 10

171
Calculate the variance as V(Y) E(Y2) E(Y)2.
y p(y) yp(y) y2p(y)
2 0.5 1 2
4 0.5 2 8
E(Y) 3 E(Y2) 10
V(Y) E(Y2) E(Y)2 10 (3)2 1 V(Y) E(Y2) E(Y)2 10 (3)2 1 V(Y) E(Y2) E(Y)2 10 (3)2 1 V(Y) E(Y2) E(Y)2 10 (3)2 1
172
To determine the C(X,Y) E(XY) - E(X) E(Y), we
need
173
As before, well put the xy values in the table
next to the probability values
y y pX(x)
2 4 pX(x)
x 1 0.20 (2) 0.25 (4) 0.45
x 3 0.15 (6) 0.20 (12) 0.35
x 5 0.15 (10) 0.05 (20) 0.20
pY(y) pY(y) 0.50 0.50 1.00
174
Then we multiply and add.
y y pX(x)
2 4 pX(x)
x 1 0.20 (2) 0.25 (4) 0.45
x 3 0.15 (6) 0.20 (12) 0.35
x 5 0.15 (10) 0.05 (20) 0.20
pY(y) pY(y) 0.50 0.50 1.00
E(XY) (0.20)(2) (0.25)(4) (0.15)(6)
(0.20)(12) (0.15)(10) (0.05)(20)
0.40 1.00 0.90 2.40
1.50 1.00 7.20
175
C(X,Y) E(XY) E(X) E(Y)

Since E(XY) 7.2, E(X) 2.5, E(Y) 3.0,
C(X,Y) 7.2 (2.5)(3)
7.2 7.5
-0.3

176
Next, the correlation coefficient.
Since C(X,Y) -0.3, V(X)2.35, V(Y) 1,
177
The next part of the problem asked for E(XY)

We know that E(X) 2.5 and E(Y) 3.0.
E(aXbY) a E(X) b E(Y)
What should a b be?
1 1
So E(XY) 1 E(X) 1E(Y)
E(X) E(Y)
2.5 3.0
5.5

178
Lastly V(XY)

We know V(X) 2.35, V(Y) 1, C(X,Y) -0.3.
V(aXbY) a2 V(X) b2 V(Y) 2ab C(X,Y)
What are a b ?
1 1
V(aXbY) a2 V(X) b2 V(Y) 2ab C(X,Y)
12 V(X) 12 V(Y)
2(1)(1)C(X,Y)
V(X) V(Y) 2C(X,Y)
2.35 1 2 (-0.3)
2.75

179
Specific Discrete Distributions

Uniform
Binomial
Hypergeometric
Multinomial
Poisson

180
Uniform Distribution

The uniform distribution assigns all the possible
values equal probabilities.
example a fair die has possible values
1, 2, 3, 4, 5, and 6 each with
probability 1/6.

181
Graph of Uniform DistributionExample Fair Die
182
Binomial Distribution

Example What is the probability of getting 3
heads on 5 tosses of an unfair (lopsided) coin
whose probability on any toss of getting a head
is 1/3.

183
What is the probability of getting specifically
HTHHT ?

(1/3) (2/3) (1/3) (1/3) (2/3)
(1/3)3 (2/3)2
What is the probability of any other specific
outcome with 3 heads on 5 tosses?
The same.
So we just have to figure out how many different
ways you can get 3 heads on 5 tosses, and
multiply that by the probability of each
individual outcome.
That will give us the probability of getting 3
heads on 5 tosses.

184
How many ways can you get 3 heads on 5 tosses?

Its the number of combinations of 5 objects
taken 3 at a time.

185
So the probability of getting 3 heads on 5 tosses
is
186
In general, the probability of getting x
successes on n trials in which the probability of
success on any given trial is p is
This is the binomial distribution.
187
Notes

0! 1
Each trial that can result in either success or
failure is called a Bernoulli trial.

188
Example If the probability that any person
passes this course is 0.95, what is the
probability that in a a class of 30 people,
exactly 28 people pass?
189
Lets go back to the example in which we flipped
a coin 5 times the probability of heads on each
toss was 1/3.

For 3 heads, the probability was 0.1646.
Using the binomial formula, we can determine the
probabilities of the other possibilities.

x p(x) 0 0.1317 1 0.3292 2 0.3292 3 0.1646 4
0.0412 5 0.0041 1
190
If we graph this distribution, it looks like
x p(x) 0 0.1317 1 0.3292 2 0.3292 3 0.1646 4
0.0412 5 0.0041 1
Notice that there is a bump on the left and a
tail on the right. Such a distribution is said to
be skewed to the right. The skew is where the
tail is.
191
Binomial Distribution

The binomial distribution graph we just did was
for p 1/3 and the skew was to the right.
A binomial distribution with p lt ½ will always
have a skew to the right.
What do you think the distribution will look like
if p gt ½ ?
It will be skewed to the left. (The tail will be
on the left the bump will be on the right.)

192
Binomial Distribution

What do you think the distribution will look like
if p ½ ?
It will be symmetric. The left and right sides
will be mirror images of each other.
If the number of trials n (tosses in our example)
is large, the graph will be roughly symmetric
even if p ? ½ .
How large does n have to be for the graph to be
roughly symmetric? That depends on how far p is
from ½.
There are two sets of rules that are sometimes
used to determine if the graph is roughly
symmetric.
One rule requires that np 5 and n(1-p) 5.
The other rule requires that np(1-p) 3.
These rules are not exactly equivalent, but they
both work reasonably well.

193
Mean Variance of the Binomial Distribution

Mean m np
Variance s2 np(1-p)

194
Example What are the mean, variance, standard
deviation for our binomial distribution example
in which n5 p1/3?

Mean m np (5)(1/3) 5/3
Variance s2 np(1-p) (5)(1/3)(2/3) 10/9

195
Using Excel to calculate Binomial Probabilities

On an Excel spreadsheet, you can get the binomial
distribution as follows
click insert, and then click function
select statistical as the category of function,
scroll down to the binomdist function, and click
on it
fill in the information in the dialog box .

196
Suppose that you wanted to calculate a messy
binomial, such as the probability of between 60
and 70 successes inclusive, on 100 trials with
success probability on each trial of 0.64.

This would be a lot of work with just a
calculator. You would have to calculate 11
separate binomial probabilities (the
probabilities for 60, 61, 62, 70) and then add
them up.
Its much easier with Excel.

197
Remember you want the probability of between 60
and 70 successes inclusive, on 100 trials with
success probability on each trial of 0.64.

You can calculate the (cumulative) probability of
70 or fewer successes.
Then calculate the cumulative probability of 59
or fewer successes.
Then take the difference.

198
To get the probability of 70 or fewer successes,
specify the following

of successes 70
of trials 100
prob.of success on any trial 0.64
cumulative True (because you want 70 or fewer,
not just 70)

199
To get the probability of 59 or fewer successes,
specify the following

of successes 59
of trials 100
prob.of success on any trial 0.64
cumulative True

200
Then just subtract the two cumulative function
values you calculated.

If you do this, you get
0.91368 0.17394 0.7397

201
We can also study binomial problems using
proportions.

For example, we might want to know the
probability of getting 60 heads on 5 tosses of a
coin with probability of heads on each toss of
1/3. (This is the same as getting 3 heads.)
In general, if X is the number of successes on n
trials, the proportion of successes is X/n.
We can easily determine the mean variance of
this binomial proportion variable X/n.
If p again is the probability of success on any
given trial,
E(X/n) p
V(X/n) p(1-p)/n

202
When can we use the binomial distribution?

We have exactly two possibilities on each trial
(success or failure, heads or tails, male or
female, yes or no, etc.)
The probability of success is the same on each
trial.
The trials are independent. (What happens on one
trial has no effect on what happens on the next
trial.)

203
Sampling with without Replacement

Suppose we have a bowl with 6 red and 4 green
marbles. We select 3 marbles at random without
replacement. We want to know the probability of
selecting exactly 2 red marbles.
Whats the probability of getting a red marble on
the 1st draw?
6/10
Whats the probability of getting a red marble on
the 2nd draw?
It depends on what we got on the first draw.
If we got a red one, then the probability is 5/9.
If we got a green one, then the probability is
6/9.
Since the probability varies from trial to trial,
we can not use the binomial distribution.
We will discuss very shortly what we use instead.

204
What if we selected the marbleswith replacement?

Then the probability of a red marble would be the
same on each draw, regardless of what you pulled
out previously.
Then we could use the binomial distribution.

205
Suppose we instead of having 6 red marbles and 4
green marbles, we had 6000 red ones and 4000
green ones.

The probability of red on the 1st draw would be
6,000/10,000 0.6 .
If we got red on the 1st draw, the probability of
red on the 2nd draw would be 5999/9999 0.59996
If we got green on the 1st draw, the probability
of red on the 2nd would be 6000/9999 0.60006
These three numbers are very close.
So you could use the binomial distribution to get
a very good approximation of the probability.

206
So if we have two options on each trial, when we
can use the binomial distribution?

If we sample with replacement, or
We sample without replacement, but the sample is
small relative to the population.
A rule that is often used is that the sample
is less than 5 of the population (n lt 0.05 N).

207
If our sample is more than 5 of our population,
then we will use the hypergeometric
distribution.
208
Lets return to our marble problem.

Suppose we have a bowl with 6 red and 4 green
marbles. We select 3 marbles at random without
replacement. We want to know the probability of
selecting exactly 2 red marbles.
Remember that the number of ways of selecting x
objects from n is .
So there are ways of selecting 2 red
marbles from 6.
There are ways of selecting 1 green
marble from 4.
There are ways of selecting 3 marbles
from 10.

209
So the probability of getting exactly 2 red
marbles on 3 draws will be
210
and our probability is
211
The hypergeometric distribution can also be used
if you have more than 2 categories.