Introduction to statistics

About This Presentation

Title:

Introduction to statistics

Description:

Kurtosis. Peaked. Skewness. Skew. Range (max-min), Interquartile range (1st-3rd quartile) ... In descriptive statistics, a quartile is any of the three values ... – PowerPoint PPT presentation

Number of Views:1442

Avg rating:3.0/5.0

Slides: 212

Provided by: charle375

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to statistics

1
Introduction to statistics

Programme Bioinformatics
Master Grid Computing
April 2007

Prof dr AHC van Kampen Bioinformatics
Laboratory KEBB, AMC
2
Descriptive Statistics
3
Describing data
4
Quartile

In descriptive statistics, a quartile is any of
the three values which divide the sorted data set
into four equal parts, so that each part
represents 1/4th of the sample or population.
first quartile (designated Q1) lower quartile
cuts off lowest 25 of data 25th percentile
second quartile (designated Q2) median cuts
data set in half 50th percentile
third quartile (designated Q3) upper quartile
cuts off highest 25 of data, or lowest 75
75th percentile
The difference between the upper and lower
quartiles is called the interquartile range.

5
Variance, S.D. of a Sample
variance
Degrees of freedom
Standard deviation
In statistics, the term degrees of freedom (df)
is a measure of the number of independent pieces
of information on which the precision of a
parameter estimate is based
6
Skewness
7
Box-whisker plots
8
Distributions

Normal, binomial, Poisson, hypergeometric,
t-distribution, chi-square
What parameters describe their shapes
How these distributions can be useful

9
Normal distribution
10
The Normal Distribution

Also called a Gaussian distribution
Centered around the mean ? with a width
determined by the standard deviation ?
Total area under the curve 1.0

11
A Normal Distribution . . .

For a mean of 5 and a standard deviation of 1

12
What Does a Normal Distribution Describe?

Imagine that you go to the lab and very carefully
measure out 5 ml of liquid and weigh it.
Imagine repeating this process many times.
You wont get the same answer every time, but if
you make a lot of measurements, a histogram of
your measurements will approach the appearance of
a normal distribution.

13
What Does a Normal Distribution Describe?

Any situation in which the exact value of a
continuous variable is altered randomly from
trial to trial.
The random uncertainty or random error

14
How Do You Use The Normal Distribution?

Use the area UNDER the normal distribution
For example, the area under the curve between xa
and xb is the probability that your next
measurement of x will fall between a and b

15
A normal distribution with a mean of 75 and a
standard deviation of 10. The shaded area
contains 95 of the area and extends from 55.4 to
94.6. For all normal distributions, 95 of the
area is within 1.96 standard deviations of the
mean.
16
How Do You Get ? and ??

To draw a normal distribution you must know ? and
?
If you made an infinite number of measurements,
their mean would be ? and their standard
deviation would be ?
In practice, you have a finite number of
measurements with mean x and standard deviation s
For now, ? and ? will be given
Later well use x and s to estimate ? and ?

17
The Standard Normal Distribution

It is tedious to integrate a new normal
distribution for every population, so use a
standard normal distribution with standard
tabulated areas.
Convert your measurement x to a standard score
(z-score)
z (x - ?) / ?
Use the standard normal distribution
? 0 and ? 1
(areas tabulated in any statistics text book)

The z-score indicates the number of standard
deviations that value x is away from the mean ?
18
(No Transcript)
19
Probability density function
z-transform
green curve is standard normal distribution
20
Cummulative distribution functions
21
Exercises 1
If scores are normally distributed with a mean
of 30 and a standard deviation of 5, what
percent of the scores is (a) greater than 30?
(b) greater than 37? (c) between 28 and
34? What proportion of a normal distribution is
within one standard deviation of the mean?
What proportion is more than 1.8 standard
deviations from the mean? A test is normally
distributed with a mean of 40 and a standard
deviation of 7. What value would be needed to be
in the 85th percentile?
Stat tables http//www.statsoft.com/textbook/stta
ble.html
22
Binomial distribution
23
What Does the Binomial Distribution Describe?

yes/no experiments (two possible outcomes)
The probability of getting all tails if you
throw a coin three times
The probability of getting all male puppies in a
litter of 8
The probability of getting two defective
batteries in a package of six

24
Exercise 2

What is the probability of getting one 2 when
you roll six dice?

25
The Binomial Distribution

The probability of getting the result of interest
k times out of n, if the overall probability of
the result is p
Note that here, k is a discrete variable
Integer values only

bionomial coefficient
26
Binomial Distribution

n 6 number of dice rolled
p 1/6 probability of rolling a 2
k 0 1 2 3 4 5 6 of 2s out of 6

0.402
27
Binomial Distribution

n 8 number of puppies in litter
p 1/2 probability of any pup being male
k 0 1 2 3 4 5 6 7 8 of males out of 8

28
The Shape of the Binomial Distribution

Shape is determined by values of n and p
Only truly symmetric if p 0.5
Approaches normal distribution if n is large,
unless ? is very small
Mean number of successes is np
Variance of distribution is
variance (X) n p(1- p)

29
Exercise 3

While you are in the bathroom, your little
brother claims to have rolled a Yahtzee in 6s
(five dice all 6s) in one roll of the five dice.
How justified would you be in beating him up for
cheating?

30
Poisson distribution
P(µ) probability of getting n counts (0, 1,
2,...) µ average of distribution
variance mean
31
Poisson distribution
Randomly placed dots over 50 scale divisions. On
average µ1 dot per interval
µ1
P(µ) probability of getting n counts µ average
of distribution
n
32
Exercise 4
Pn(µ) probability of getting n counts µ average
of distribution
Average number of phone calls in 1 hour
2.1 What is probability of getting 4 calls?
33
Exercise 5
Pn(µ) probability of getting a discrete value
n µ average of distribution
Average number of phone calls in 1 hour
2.1 What is probability of getting 0
calls? Does this simply the formula?
34
Hypergeometric distribution
35
Hypergeometric Distribution

Suppose that we have an urn with N balls in it
of these m are white and others are black.
Then k balls are drawn from the urn without
replacement and of these X are observed to be
white.
X is a random variable following hypergeometric
distribution

N20 mn10
draw k10 balls
X6
36
Hypergeometric Distribution
P(X x)

37
Fishers Exact Test

We often want to ask whether there are more white
balls in the sample than expected by chance.

P(X ? x)

If the probability is small, it is less likely
that we get the result by chance.

38
Hypergeometric example

Extract a cluster of 36 samples from leukemia
microarray dataset
Whole dataset 47 ALL 25 AML
Extracted 29 ALL 7 AML
Is this sample enriched for ALL samples?

Pr(extracted ALL 29)

0.006

Conclusion This cluster is significantly
enriched with ALL samples.

39
Sampling Distribution

Every time we take a random sample and calculate
a statistic, the value of the statistic changes
(remember, a statistic is a random variable).
If we continue to take random samples and
calculate a given statistic over time, we will
build up a distribution of values for the
statistic. This distribution is referred to as a
sampling distribution.
A sampling distribution is a distribution that
describes the chance fluctuations of a statistic
calculated from a random sample.

40
Sampling Distribution of the Mean

The probability distribution of is called
the sampling distribution of the mean.
The distribution of , for a given sample
size, n, describes the variability of sample
averages around the population mean µ.

41
Sampling Distribution of the Mean

If a random sample of size n is taken from a
normal population having mean µx and variance
, then is a random variable which is also
normally distributed with mean µx and variance
.
Further,
is a standard normal random variable.

42
Sampling Distribution of the Mean
Original population
1
3
n(100,5)
n(100,1.58)
2
4
n(100,3.54)
n(100,1)
5/sqrt(2)3.54
43
Sampling Distribution of the Mean

Example A manufacturer of steel rods claims that
the length of his bars follows a normal
distribution with a mean of 30 cm and a standard
deviation of 0.5 cm.
Assuming that the claim is true, what is the
probability that a given bar will exceed 30.1 cm?
(b) Assuming the claim is true, what is the
probability that the mean of 10 randomly chosen
bars will exceed 30.1 cm?
(c) Assuming the claim is true, what is the
probability that the mean of 100 randomly chosen
bars will exceed 30.1 cm?

44
Sampling Distribution of the Mean

Example A manufacturer of steel rods claims that
the length of his bars follows a normal
distribution with a mean of 30 cm and a standard
deviation of 0.5 cm.
Assuming that the claim is true, what is the
probability that a given bar will exceed 30.1 cm?
(z30.1-30)/0.50.2 ?p0.42)
(b) Assuming the claim is true, what is the
probability that the mean of 10 randomly chosen
bars will exceed 30.1 cm?
(z30.1-30)/(0.5/sqrt(10)0.63 ?p0.26)
(c) Assuming the claim is true, what is the
probability that the mean of 100 randomly chosen
bars will exceed 30.1 cm?
(z30.1-30)/(0.5/sqrt(100)2 ?p0.02)

45
Sampling Distribution of the Mean
.42
.42
. 26
.02
46
Inference on Population Mean

Example Suppose that it is very important to our
manufacturing process that we detect a deviation
in the bar mean of 0.1 cm or more.
Will sampling one bar allow us to detect a shift
of 0.1 cm in the population mean?
Will sampling ten bars allow us to detect a
shift of 0.1 cm in the population mean?
Will sampling one hundred bars allow us to
detect a shift of 0.1 cm in the population mean?

47
Inference on Population Mean
48
Inference on Population Mean
49
Inference on Population Mean
50
Properties of Sample Mean as Estimator of
Population Mean

Expected value of sample mean is population mean
Among UNBIASED estimators, the mean has the
SMALLEST variance
Variance

UNBIASED
??
??
_

??
_
As n increase, decrease.
standard error
x
x
51
(No Transcript)
52
When the Population is Normal Sampling
Distribution is Also Normal
Population Distribution
Central Tendency
??
??

_
x
Variation
??
Sampling Distributions
??
_

x
n 16??X 2.5
n 4??X 5
53
Central Limit Theorem
As Sample Size Gets Large Enough
Sampling Distribution
Becomes almost normal regardless of shape of
population
54
When The Population is Not Normal
Population Distribution
Central Tendency
? 10
??
??

_
x
Variation
? 50
X
??
Sampling Distributions
??
_

x
n 30??X 1.8
n 4??X 5
55
Central Limit Theorem

As the sample size increases the sampling
distribution of the sample mean approaches the
normal distribution with mean ? and variance
?2/n

56
Example Sampling Distribution
Standardized
Sampling Distribution
Normal Distribution

? 1
.3830
.1915
.1915
Z
???? 0
7.8 8 8.2
57
Central Limit Theorem

Rule of thumb normal approximation for will
be good if n gt 30. If n lt 30, the approximation
is only good if the population from which you are
sampling is not too different from normal.
Otherwise t-distribution

58
t-Distribution

So far, we have been assuming that we knew the
value of s. This may be true if one has a large
amount of experience with a certain process.
However, it is often true that one is estimating
s along with µ from the same set of data.

59
t-Distribution

To allow for such a situation, we will consider
the t statistic
which follows a t-distribution.

? standard error of the mean
60
t-Distribution
t(n?) Z
t(n6)
t(n3)
61
t-Distribution

If is the mean of a random sample of size n
taken from a normal population having the mean µ
and variance s2, and
then
is a random variable following the t-
distribution with parameter ? n 1, where ? is
degrees of freedom.

62
t-Distribution

The t-distribution has been tabularized.
ta represents the t-value that has an area of a
to the right of it.
Note, due to symmetry, t1-a -ta

t.05
t.80
t.20
t.95
63
Example t-Distribution

The resistivity of batches of electrolyte follow
a normal distribution. We sample 5 batches and
get the following readings 1400, 1450, 1375,
1500, 1550.
Does this data support or refute a population
average of 1400?

64
Example t-Distribution
Support
p0.025
Refute
Refute
t2.78
1.71
65
Sampling Distribution of the Variance

The probability distribution of S2 is called the
sampling distribution of the Variance.
The distribution of S2, for a given sample size,
n, describes the variability of sample variances
around the population variance s2.

66
Sampling Distribution S2

If S2 is the variance of a random sample of size
n taken from a normal population having the
variance s2, then the statistic
has a chi-squared distribution with ? n 1
degrees of freedom.

67
Chi-Squared Distribution
?2(n3)
?2(n6)
?2(n11)
68

Introduction to Hypothesis Testing

69
Nonstatistical Hypothesis Testing

A criminal trial is an example of hypothesis
testing without the statistics.
In a trial a jury must decide between two
hypotheses. The null hypothesis is
H0 The defendant is innocent
The alternative hypothesis or research hypothesis
is
H1 The defendant is guilty
The jury does not know which hypothesis is true.
They must make a decision on the basis of
evidence presented.

70
Nonstatistical Hypothesis Testing

In the language of statistics convicting the
defendant is called rejecting the null hypothesis
in favor of the alternative hypothesis. That is,
the jury is saying that there is enough evidence
to conclude that the defendant is guilty (i.e.,
there is enough evidence to support the
alternative hypothesis).
If the jury acquits it is stating that there is
not enough evidence to support the alternative
hypothesis. Notice that the jury is not saying
that the defendant is innocent, only that there
is not enough evidence to support the alternative
hypothesis. That is why we never say that we
accept the null hypothesis.

71
Nonstatistical Hypothesis Testing

There are two possible errors.
A Type I error occurs when we reject a true null
hypothesis. That is, a Type I error occurs when
the jury convicts an innocent person.
A Type II error occurs when we dont reject a
false null hypothesis. That occurs when a guilty
defendant is acquitted.

72
Nonstatistical Hypothesis Testing

The probability of a Type I error is denoted as
a.
The probability of a type II error is ß.
The two probabilities are inversely related.
Decreasing one increases the other.

73
Nonstatistical Hypothesis Testing

In the (US) system Type I errors are regarded as
more serious. We try to avoid convicting innocent
people. We are more willing to acquit guilty
people.
We arrange to make a small by requiring the
prosecution to prove its case and instructing the
jury to find the defendant guilty only if there
is evidence beyond a reasonable doubt.

74
Nonstatistical Hypothesis Testing

The critical concepts are these
There are two hypotheses, the null and the
alternative hypotheses.
2. The procedure begins with the assumption that
the null hypothesis is true.
3. The goal is to determine whether there is
enough evidence to infer that the alternative
hypothesis is true.
4. There are two possible decisions
Conclude that there is enough evidence to support
the alternative hypothesis.
Conclude that there is not enough evidence to
support the alternative hypothesis.

75
Nonstatistical Hypothesis Testing

5. Two possible errors can be made.
Type I error Reject a true null hypothesis
Type II error Do not reject a false null
hypothesis.
P(Type I error) a
P(Type II error) ß

76
Introduction

Hypothesis testing is a procedure for making
inferences about a population.
Hypothesis testing allows us to determine whether
enough statistical evidence exists to conclude
that a belief (i.e. hypothesis) about a parameter
is supported by the data.

77
Concepts of Hypothesis Testing (1)

There are two hypotheses. One is called the null
hypothesis and the other the alternative or
research hypothesis. The usual notation is
H0 the null hypothesis
H1 the alternative or research hypothesis
The null hypothesis (H0) will always state that
the parameter equals the value specified in the
alternative hypothesis (H1)

78
Concepts of Hypothesis Testing

Consider example mean demand for computers
during assembly lead time. Rather than estimate
the mean demand, our operations manager wants to
know whether the mean is different from 350
units. We can rephrase this request into a test
of the hypothesis
H0 350
Thus, our research hypothesis becomes
H1 ? 350

This is what we are interested in determining
79
Concepts of Hypothesis Testing

The testing procedure begins with the assumption
that the null hypothesis is true.
Thus, until we have further statistical evidence,
we will assume
H0 350 (assumed to be TRUE)

80
Concepts of Hypothesis Testing

The goal of the process is to determine whether
there is enough evidence to infer that the
alternative hypothesis is true.
That is, is there sufficient statistical
information to determine if this statement
H1 ? 350, is true?

This is what we are interested in determining
81
Concepts of Hypothesis Testing

There are two possible decisions that can be
made
Conclude that there is enough evidence to support
the alternative hypothesis
(also stated as rejecting the null hypothesis in
favor of the alternative)
Conclude that there is not enough evidence to
support the alternative hypothesis
(also stated as not rejecting the null
hypothesis in favor of the alternative)
NOTE we do not say that we accept the null
hypothesis

82
Concepts of Hypothesis Testing

Once the null and alternative hypotheses are
stated, the next step is to randomly sample the
population and calculate a test statistic (in
this example, the sample mean).
If the test statistics value is inconsistent
with the null hypothesis we reject the null
hypothesis and infer that the alternative
hypothesis is true.
For example, if were trying to decide whether
the mean is not equal to 350, a large value of
(say, 600) would provide enough evidence. If
is close to 350 (say, 355) we could not say that
this provides a great deal of evidence to infer
that the population mean is different than 350.

83
Concepts of Hypothesis Testing

Two possible errors can be made in any test
A Type I error occurs when we reject a true null
hypothesis and
A Type II error occurs when we dont reject a
false null hypothesis.
There are probabilities associated with each type
of error
P(Type I error)
P(Type II error )
is called the significance level.

84
Types of Errors

A Type I error occurs when we reject a true null
hypothesis (i.e. Reject H0 when it is TRUE)
A Type II error occurs when we dont reject a
false null hypothesis (i.e. Do NOT reject H0 when
it is FALSE)

85
Types of Errors

Back to our example, we would commit a Type I
error if
Reject H0 when it is TRUE
We reject H0 ( 350) in favor of H1 (
? 350) when in fact the real value of is
350.
We would commit a Type II error in the case
where
Do NOT reject H0 when it is FALSE
We believe H0 is correct ( 350), when in
fact the real value of is something other
than 350.

86
Recap

The null hypothesis must specify a single value
of the parameter (e.g. ___)
Assume the null hypothesis is TRUE.
Sample from the population, and build a statistic
related to the parameter hypothesized (e.g. the
sample mean, )
Compare the statistic with the value specified in
the first step

87
Example

A department store manager determines that a new
billing system will be cost-effective only if the
mean monthly account is more than 170.
A random sample of 400 monthly accounts is drawn,
for which the sample mean is 178. The accounts
are approximately normally distributed with a
standard deviation of 65.
Can we conclude that the new system will be
cost-effective?

88
Example

The system will be cost effective if the mean
account balance for all customers is greater than
170.
We express this belief as a our research
hypothesis, that is
H1 gt 170 (this is what we want to
determine)
Thus, our null hypothesis becomes
H0 170 (this specifies a single value
for the parameter of interest)

89
Example

What we want to show
H1 gt 170
H0 170 (well assume this is true)
We know
n 400
178
65
Hmm. What to do next?!

90
Example

To test our hypotheses, we can use two different
approaches
The rejection region approach (typically used
when computing statistics manually), and
The p-value approach (which is generally used
with a computer and statistical software).
We will explore both in turn

91
Example. Rejection Region

The rejection region is a range of values such
that if the test statistic falls into that range,
we decide to reject the null hypothesis in favor
of the alternative hypothesis.

is the critical value of to reject H0.
92
Example

It seems reasonable to reject the null hypothesis
in favor of the alternative if the value of the
sample mean is large relative to 170, that is if
gt .

P( gt ) is also P(rejecting H0
given that H0 is true) P(Type I error)
93
Example

All thats left to do is calculate and
compare it to 178.

we can calculate this based on any level of
significance ( ) we want
94
Example

At a 5 significance level (i.e. 0.05), we
get
Solving we compute 175.34
Since our sample mean (178) is greater than the
critical value we calculated (175.34), we reject
the null hypothesis in favor of H1, i.e. that
gt 170 and that it is cost effective
to install the new billing system

95
Example The Big Picture
175.34
178
Reject H0 in favor of
96
Standardized Test Statistic

An easier method is to use the standardized test
statistic
and compare its result to (rejection
region z gt )
Since z 2.46 gt 1.645 (z.05), we reject H0 in
favor of H1

97
p-Value

The p-value of a test is the probability of
observing a test statistic at least as extreme as
the one computed given that the null hypothesis
is true.
In the case of our department store example, what
is the probability of observing a sample mean at
least as extreme as the one already observed
(i.e. 178), given that the null hypothesis
(H0 170) is true?

p-value
98
Interpreting the p-value

The smaller the p-value, the more statistical
evidence exists to support the alternative
hypothesis.
We observe a p-value of .0069, hence there is
evidence to support H1 gt 170.

99
Interpreting the p-value
Overwhelming Evidence (Highly Significant)
Strong Evidence (Significant)
Weak Evidence (Not Significant)
No Evidence (Not Significant)
0 .01
.05 .10
p.0069
100
Interpreting the p-value

Compare the p-value with the selected value of
the significance level
If the p-value is less than , we judge the
p-value to be small enough to reject the null
hypothesis.
If the p-value is greater than , we do not
reject the null hypothesis.
Since p-value .0069 lt .05, we reject H0
in favor of H1

101
Another example

The objective of the study is to draw a
conclusion about the mean payment period. Thus,
the parameter to be tested is the population
mean. We want to know whether there is enough
statistical evidence to show that the population
mean is less than 22 days. Thus, the alternative
hypothesis is
H1µ lt 22
The null hypothesis is
H0µ 22

102
Another example

The test statistic is
We wish to reject the null hypothesis in favor of
the alternative only if the sample mean and hence
the value of the test statistic is small enough.
As a result we locate the rejection region in the
left tail of the sampling distribution.
We set the significance level at 10.

103
Another example

Rejection region
Assume
and
p-value P(Z lt -.91) .5 - .3186 .1814

Conclusion There is not enough evidence to infer
that the mean is less than 22.
104
One and TwoTail Testing

The department store example was a one tail test,
because the rejection region is located in only
one tail of the sampling distribution
More correctly, this was an example of a right
tail test.

105
One and TwoTail Testing

The payment period example is a left tail test
because the rejection region was located in the
left tail of the sampling distribution.

106
Right-Tail Testing

Calculate the critical value of the mean ( )
and compare against the observed value of the
sample mean ( )

107
Left-Tail Testing

Calculate the critical value of the mean ( )
and compare against the observed value of the
sample mean ( )

108
TwoTail Testing

Two tail testing is used when we want to test a
research hypothesis that a parameter is not equal
(?) to some value

109
Example

KPN argues that its rates are such that customers
wont see a difference in their phone bills
between them and their competitors. They
calculate the mean and standard deviation for all
their customers at 17.09 and 3.87
(respectively).
They then sample 100 customers at random and
recalculate a monthly phone bill based on
competitors rates.
What we want to show is whether or not
H1 ? 17.09. We do this by assuming that
H0 17.09

110
Example

The rejection region is set up so we can reject
the null hypothesis when the test statistic is
large or when it is small.
That is, we set up a two-tail rejection region.
The total area in the rejection region must sum
to , so we divide this probability by 2.

stat is small
stat is large
111
Example

At a 5 significance level (i.e. .05), we
have
/2 .025. Thus, z.025 1.96 and our
rejection region is
z lt 1.96 -or- z gt 1.96

z
-z.025
z.025
0
112
Example

From the data, we calculate 17.55
Using our standardized test statistic
We find that
Since z 1.19 is not greater than 1.96, nor less
than 1.96 we cannot reject the null hypothesis
in favor of H1. That is there is insufficient
evidence to infer that there is a difference
between the bills of KPN and the competitor.

113
Summary of One- and Two-Tail Tests
114
Probability of a Type II Error

It is important that that we understand the
relationship between Type I and Type II errors
that is, how the probability of a Type II error
is calculated and its interpretation.
Recall previous example
H0 170
H1 gt 170
At a significance level of 5 we rejected H0 in
favor of H1 since our sample mean (178) was
greater than the critical value of (175.34)

115
Probability of a Type II Error

A Type II error occurs when a false null
hypothesis is not rejected.
In our example this means that if is less
than 175.34 (our critical value) we will not
reject our null hypothesis, which means that we
will not install the new billing system.
Thus, we can see that
P( lt 175.34 given that the null
hypothesis is false)

116
Example

P( lt 175.34 given that the null
hypothesis is false)
The condition only tells us that the mean ? 170.
We need to compute for some new value of
. For example, suppose the mean account
balance needs to be 180 in order to cost justify
the new billing system
P( lt 175.34, given that 180),
thus

117
Example
Our original hypothesis
our new assumption
118
Effects on of Changing

Decreasing the significance level ,
increases the value of and vice versa.
Consider this diagram again. Shifting the
critical value line to the right (to decrease
) will mean a larger area under the lower curve
for (and vice versa)

119
Judging the Test

A statistical test of hypothesis is effectively
defined by the significance level ( ) and
the sample size (n), both of which are selected
by the statistics practitioner.
Therefore, if the probability of a Type II error
( ) is judged to be too large, we can reduce
it by
increasing , and/or
increasing the sample size, n.

120
Judging the Test

For example, suppose we increased n from a sample
size of 400 account balances to 1,000
The probability of a Type II error ( ) goes
to a negligible level while remains at 5

121
Judging the Test

The power of a test is defined as 1 .
It represents the probability of rejecting the
null hypothesis when it is false.

122
Error Rates and Power(H0 and H1 null and
alternative hypothes)
123
Factors Affecting Power

Increasing overall sample size increases power
Having unequal group sizes usually reduces power
Larger size of effect being tested increases
power
Setting lower significance level decreases power
Violations of assumptions underlying test often
decrease power substantially

124
Exercises

Exercises see word document

125
The t-test
126
Recall t distribution.

Take random sample of size n from a N(m,s2)
population.
has a standard normal
distribution.
Consider .
This is approximately normal if n is large.
If n is small, S is not expected to be close to
s. S introduces additional variability. Thus
this statistic will be more variable that a
standard normal random variable.
This statistic follows a t distribution with
n-1degrees of freedom.

127
The t distribution.
red t with 1 d.f., green t with 5
d.f., yellow t with 10 d.f., blue standard
normal
The t distribution is similar in shape to the
normal distribution, but is more spread out. As
the degrees of freedom go to infinity the t
distribution approaches the standard normal
distribution.
128
Confidence Intervals.

Suppose that the population is normally
distributed with mean m and variance s2. Then
If s is known, a 100(1-a) confidence interval
for m is.
If s is not known, a 100(1-a) confidence
interval for m is.

129
Overview of the t-test

The t-test is used to help make decisions about
population values.
There are two main forms of the t-test, one for a
single sample and one for two samples.
The one sample t-test is used to test whether a
population has a specific mean value
The two sample t-test is used to test whether
population means are equal, e.g., do training and
control groups have the same mean.

130
One-sample t-test

We can use a confidence interval to test or
decide whether a population mean has a given
value.
For example, suppose we want to test whether the
mean height of women at USF is less than 68
inches.
We randomly sample 50 women students at USF.
We find that their mean height is 63.05 inches.
The SD of height in the sample is 5.75 inches.
Then we find the standard error of the mean by
dividing SD by sqrt(N) 5.75/sqrt(50) .81.
The critical value of t with (50-1) df is
2.01(find this in a t-table).
Our confidence interval is, therefore, 63.05
plus/minus 1.63.

131
One-sample t-test example
Take a sample, set a confidence interval around
the sample mean. Does the interval contain the
hypothesized value?
132
One-sample t-test Example
The sample mean is roughly six standard
deviations (St. Errors) from the hypothesized
population mean. If the population mean is
really 68 inches, it is very, very unlikely that
we would find a sample with a mean as small as
63.05 inches.
133
Two-sample t-test

Used when we have two groups, e.g.,
Experimental vs. control group
Males vs. females
New training vs. old training method
Tests whether group population means are the
same.
Can be means are just same or different
(nondirectional)
or can predict one group higher (directional).

134
Sampling Distribution of Mean Differences

Suppose we sample 2 groups of size 50 at random
from USF.
We measure the height of each person and find the
mean for each group.
Then we subtract the mean for group 1 from the
mean for group 2. Suppose we do this over and
over.
We will then have a sampling distribution of mean
differences.
If the two groups are sampled at random from 1
population, the mean of the differences in the
long run will be zero because the mean for both
groups will be the same.
The standard deviation of the sampling
distribution will be

The standard error of the difference is the root
of the sum of squared standard errors of the
mean.
135
Example of the Standard Error of the Difference
in Means
Suppose that at USF the mean height is 68 inches
and the standard deviation of height is 6 inches.
Suppose we sampled people 100 at a time into two
groups. We would expect that the average mean
difference would be zero. What would the
standard deviation of the distribution of
differences be?
The standard error for each group mean is .6, for
the difference in means, it is .85.
136
Estimating the Standard Error of Mean Differences
The USF scenario we just worked was based on
population information. That is
We generally dont have population values. We
usually estimate population values with sample
data, thus
All this says is that we replace the population
variance of error with the appropriate sample
estimators.
137
Pooled Standard Error
We can use this formula when the sample sizes for
the two groups are equal.
When the sample sizes are not equal across
groups, we find the pooled standard error. The
pooled standard error is a weighted average,
where the weights are the groups degrees of
freedom.
138
Back to the Two-Sample t
The formula for the two-sample t-test for
independent samples looks like this
This says we find the value of t by taking the
difference in the two sample means and dividing
by the standard error of the difference in means.

139
Example of the two-sample t, Empathy by College
Major
Suppose we have a professionally developed test
of empathy. The test has people view film
clips and guess what people in the clips are
feeling. Scores come from comparing what
people guess to what the people in the films said
they felt at the time. We want to know
whether Psychology majors have higher scores on
average to this test than do Physics majors.
No direction, we just want to know if there is
a difference. So we find some (N15) of each
major and give each the test.
140
Empathy Scores
141
Empathy
142
Exercise

Exercises t-test, see word document

143
Chi-square
144

Background
1. Suppose there are n observations.
2. Each observation falls into a cell (or class).
3. Observed frequencies in each cell O1, O2, O3,
, Ok.
Sum of the observed frequencies is n.
4. Expected, or theoretical, frequencies E1, E2,
E3, . . . , Ek.

145

Goal
1. Compare the observed frequencies with the
expected frequencies.
2. Decide whether the observed frequencies seem
to agree or seem to disagree with the expected
frequencies.
Methodology
Use a chi-square statistic
Small values of c2 Observed frequencies close to
expected frequencies.
Large values of c2 Observed frequencies do not
agree with expected frequencies.

146

Sampling Distribution of c2
When n is large and all expected frequencies are
greater than or equal to 5, then c2 has
approximately a chi-square distribution.
Recall
Properties of the Chi-Square Distribution
1. c2 is nonnegative in value it is zero or
positively valued.
2. c2 is not symmetrical it is skewed to the
right.
3. c2 is distributed so as to form a family of
distributions, a separate distribution for each
different number of degrees of freedom.

147

Critical values for chi-square
1. See Table.
2. Identified by degrees of freedom (df) and the
area under the curve to the right of the critical
value.
3. c2(df, a) critical value of a chi-square
distribution with df degrees of freedom and a
area to the right.
4. Chi-square distribution is not symmetrical
critical values associated with right and left
tails are given separately.

148

Example Find c2(16, 0.05).

Portion of Table
c2(16, 0.05) 26.3
149

Testing Procedure
1. H0 The probabilities p1, p2, . . . , pk are
correct.
Ha At least two probabilities are incorrect.
2. Test statistic
3. Use a one-tailed critical region the
right-hand tail.
4. Degrees of freedom df k - 1.
5. Expected frequencies
6. To ensure a good approximation to the
chi-square distribution Each expected frequency
should be at least 5

150

Example A market research firm conducted a
consumer-preference experiment to determine which
of 5 new breakfast cereals was the most appealing
to adults. A sample of 100 consumers tried each
cereal and indicated the cereal he or she
preferred. The results are given in the
following table
Is there any evidence to suggest the consumers
had a preference for one cereal, or did they
indicate each cereal was equally likely to be
selected? Use a 0.05.

151

Solution
If no preference was shown, we expect the 100
consumers to be equally distributed among the 5
cereals. Thus, if no preference is given, we
expect (100)(0.2) 20 consumers in each class.
1. The Set-up
a. Population parameter of concern Preference
for each cereal, the probability that a
particular cereal is selected.
b. The null and alternative hypotheses
H0 There was no preference shown (equally
distributed).
Ha There was a preference shown (not equally
distributed).
2. The Hypothesis Test Criteria
a. Assumptions The 100 consumers represent a
random sample.
b. Test statistic c2 with df k - 1 5 - 1
4
c. Level of significance a 0.05.

152

3. The Sample Evidence
a. Sample information Table given in the
statement of the problem.
b. Calculate the value of the test statistic
c2 3.2

153

4. The Probability Distribution (Classical
Approach)
a. Critical value c2(k - 1, 0.05) c2(4, 0.05)
9.49
b. c2 is not in the critical region.
4. The Probability Distribution (p-Value
Approach)
a. The p-value
Using computer P 0.5429.
b. The p-value is larger than the level of
significance, a.
5. The Results
a. Decision Fail to reject H0.
b. Conclusion At the 0.05 level of
significance, there is no evidence to suggest
the consumers showed a preference for any one
cereal.

154

r c Contingency Table
r number of rows c number of columns.
Used to test the independence of the row factor
and the column factor.
Degrees of freedom
n grand total.
5. Expected frequency in the ith row and the jth
column
Each Ei,j should be at least 5.
6. R1, R2, . . . , Rr and C1, C2, . . . Cc
marginal totals.

155

Contingency table showing sample results and
expected values

156

4. The Probability Distribution (Classical
Approach)
a. Critical value c2(4, 0.01) 13.3
b. c2 is in the critical region.
4. The Probability Distribution (p-Value
Approach)
a. The p-value
By computer P 0.0068.
b. The p-value is smaller than the level of
significance, a.
5. The Results
a. Decision Reject H0.
b. Conclusion There is evidence to suggest that
opinion on tax reform and political party are
not independent.

157
ANOVA

Analysis of Variance

158
From t to F

In the independent samples t test, you learned
how to use the t distribution to test the
hypothesis of no difference between two
population means.
Suppose, however, that we wish to know about the
relative effect of three or more different
treatments?

159
From t to F

We could use the t test to make comparisons among
each possible combination of two means.
However, this method is inadequate in several
ways.
It is tedious to compare all possible
combinations of groups.
Any statistic that is based on only part of the
evidence (as is the case when any two groups are
compared) is less stable than one based on all of
the evidence.
There are so many comparisons that some will be
significant by chance.

160
From t to F

What we need is some kind of survey test that
will tell us whether there is any significant
difference anywhere in an array of categories.
If it tells us no, there will be no point in
searching further.
Such an overall test of significance is the F
test, or the analysis of variance, or ANOVA.

161
The logic of ANOVA

Hypothesis testing in ANOVA is about whether the
means of the samples differ more than you would
expect if the null hypothesis were true.
This question about means is answered by
analyzing variances.
Among other reasons, you focus on variances
because when you want to know how several means
differ, you are asking about the variances among
those means.

162
Two Sources of Variability

In ANOVA, an estimate of variability between
groups is compared with variability within
groups.
Between-group variation is the variation among
the means of the different treatment conditions
due to chance (random sampling error) and
treatment effects, if any exist.
Within-group variation is the variation due to
chance (random sampling error) among individuals
given the same treatment.

163
Variability Between Groups

There is a lot of variability from one mean to
the next.
Large differences between means probably are not
due to chance.
It is difficult to imagine that all six groups
are random samples taken from the same
population.
The null hypothesis is rejected, indicating a
treatment effect in at least one of the groups.

164
Variability Within Groups

Same amount of variability between group means.
However, there is more variability within each
group.
The larger the variability within each group, the
less confident we can be that we are dealing with
samples drawn from different populations.

165
The F Ratio
166
Two Sources of Variability
167
Two Sources of Variability
168
The F Ratio
mean squares between
mean squares within
169
The F Ratio
sum of squares between
sum of squares within
degrees of freedom within
degrees of freedom between
Sum of Squares
Degrees of Freedom
170
The F Ratio
sum of squares total
degrees of freedom total
171
The F Ratio SS Between
Find each group total, square it, and divide by
the number of subjects in the group.
Grand Total (add all of the scores together, then
square the total)
Total number of subjects.
172
The F Ratio SS Within
Square each individual score and then add up all
of the squared scores.
Squared group total.
Number of subjects in each group.
173
The F Ratio SS Total
Grand Total (add all of the scores together, then
square the total)
Square each score, then add all of the squared
scores together.
Total number of subjects.
174
An Example ANOVA

A study compared the intensity of pain among
three groups of treatment.
Determine the significance of the difference
among groups, using the .05 level of
significance.
Treatment 1 Treatment 2 Treatment 3
7 12 8
6 8 10
5 9 12
6 11 10

175
An Example ANOVA

State the research hypothesis.
Do ratings of the intensity of pain differ for
the three treatments?
State the statistical hypothesis.

176
Nondirectional Test

In testing the hypothesis of no difference
between two means, a distinction was made between
directional and nondirectional alternative
hypotheses.
Such a distinction no longer makes sense when the
number of means exceeds two.
A directional test is possible only in situations
where there are only two ways (directions) that
the null hypothesis could be false.
H0 may be false in any number of ways.
Two or more group means may be alike and the
remainder differ, all may be different, and so on.

177
Degrees of Freedom

Between
Within

178
An Example ANOVA

Set decision rule.

179
An Example ANOVA

Set the decision rule.

180
An Example ANOVA

Calculate the test statistic.

Grand Total 104
181
An Example ANOVA

Calculate the test statistic.

Grand Total 104
182
An Example ANOVA
183
An Example ANOVA

Determine if your result is significant.
Reject H0, 9.61gt4.26
Interpret your results.
There is a significant difference between the
treatments.
ANOVA Summary Table
In the literature, the ANOVA results are often
summarized in a table.

Source df SS MS F Between Groups 2 42.67 21.34 9
.61 Within Groups 9 20 2.22 Total 11 62.67
184
After the F Test

When an F turns out to be significant, we know,
with some degree of confidence, that there is a
real difference somewhere among our means.
But if there are more than two groups, we dont
know where that difference is.
Post hoc tests have been designed for doing
pair-wise comparisons after a significant F is
obtained.

185
Exercise 6 ANOVA

A psychologist interested in artistic preference
randomly assigns a group of 15 subjects to one of
three conditions in which they view a series of
unfamiliar abstract paintings.
The 5 participants in the famous condition are
led to believe that these are each famous
paintings.
The 5 participants in the critically acclaimed
condition are led to believe that these are
paintings that are not famous but are highly
thought of by a group of professional art
critics.
The 5 in the control condition are given no
special information about the paintings.
Does what people are told about paintings make a
difference in how well they are liked? Use the
.01 level of significance.

186
Linear and non-linear models
187
Review linear regression

Simplest form
Fit a straight line through
data points xi ,yi, i1....n, ngt2
y ax b
x predictor
y predicted value (outcome)
a slope
b y-axes intercept
Goal determine parameters a,b

188
Review linear regression
Find values for a and b such that sum of squared
error is minimized
189
Review linear regression
Predicted values yaxb Measurments y minimize
A minimum of a function (R) is characterized by a
zero first derivative with respect to the
parameters
190
Intermezzo minimum of function
191
Review linear regression
A minimum of a function (R) is characterized by a
zero first derivative with respect to the
parameters ? this provides the parameter values
for the model function
192
Review linear regression
a
Explicit expressions for parameters a and b!!
193
Linear and nonlinear models 1

(non) linear in the parameters (a, ß, ?)
Examples of linear models
yaßx (linear)
yaßx? x2 (polynomial)
yaßlog(x) (log)

194
Example
y varies linear with a for fixed x
195
Example
y varies linear with a for fixed x
196
Linear and nonlinear models 2

y ß0 ß1x1 ß2x2 e
-linear model (in parameters)
-y is linear combination of xs

-y is not a linear combination of xs -linear in
the parameters -We can use MLR if variables are
transformed x11/x1 x2x2 y ß0 ß1x1
ß2x2 e
197
Linear and nonlinear models 3

Models like
cannot be linearized and must be solved with
nonlinear regression techniques

198
Linear and nonlinear models 4

Nonlinear model
At least one of the derivatives of the function
wrt the parameters depends on at least one of the
parameters (thus, slope of line at fixed x is not
constant)

y ßlog(x) dy/dß log(x) Linear model
y ß0 ß1x1 ß2x2 dy/dß1 x1 Linear model
Nonlinear model
199
Significance testing and multiple testing
correction
200
Multiple testing

Say that you perform a statistical test with a
0.05 threshold, but you repeat the test on twenty
different observations.
Assume that all of the observations are
explainable by the null hypothesis.
What is the chance that at least one of the
observations will receive a p-value less than
0.05?

201
Multiple testing

Say that you perform a statistical test with a
0.05 threshold, but you repeat the test on twenty
different observations. Assuming that all of the
observations are explainable by the null
hypothesis, what is the chance that at least one
of the observations will receive a p-value less
than 0.05?
Pr(making a mistake) 0.05
Pr(not making a mistake) 0.95
Pr(not making any mistake) 0.9520 0.358
Pr(making at least one mistake) 1 - 0.358
0.642
There is a 64.2 chance of making at least one
mistake.

202
Percentage sugar in candy (process 1)
Percentage sugar in candy (process 2)
no difference
statistical test (alpha0.05)
100 candy bars
100 candy bars
5 change of finding a difference (e.g. p0.003)
Suppose the company is required to do an
expensive tuning of process 2 if a difference is
found. They are willing to accept an Type 1 error
of 5. Thus only 5 of making wrong decision.
203
Percentage sugar in candy (process 1)
Percentage sugar in candy (process 2)
no difference
Day 1
statistical test (alpha0.05)
Day 2
statistical test (alpha0.05)
Change of 64.2 of finding at least
one significant difference Overall Type 1 error
64.2
Day 20
statistical test (alpha0.05)
204
Bonferroni correction

Assume that individual tests are independent.
Divide the desired p-value threshold by the
number of tests performed.
For the previous example, 0.05 / 20 0.0025.
Pr(making a mistake) 0.0025
Pr(not making a mistake) 0.9975
Pr(not making any mistake) 0.997520 0.9512
Pr(making at least one mistake) 1 - 0.9512
0.0488
meaning that the probability of one of the total
number of tests being wrongfully said to be
significantly different is of magnitude alpha
(0.0488)
This is also known as correcting for the Family
Wise Error (FWE). It is clear though that this
highly increases the beta error (false negative),
which is that many tests that should show an
effect get below the corrected threshold.