Loading...

PPT – Chapter 8 Fundamental Sampling Distributions and Data Distributions PowerPoint presentation | free to view - id: 119a4c-NmY2Y

The Adobe Flash plugin is needed to view this content

Chapter 8 Fundamental Sampling Distributions and

Data Distributions

- Wen-Hsiang Lu (???)
- Department of Computer Science and Information

Engineering, - National Cheng Kung University
- 2007/05/24

8.1 Random Sampling

- Outcome of a statistical experiment
- Numerical value total value of a pair of dice

tossed - Descriptive representation blood types in blood

test - Sampling from distributions or populations
- Sample mean and sample variance
- The use of high speed computer enhance the use of

formal statistical inference with graphical

techniques.

Random Sampling

- Definition 8.1 A population consists of the

totality of the observations with which we are

concerned. - Finite size 600 students are classified

according to blood type gt a population of size

600 - Infinite size measuring the atmospheric

pressure some infinite populations are so large - Each observation in a population is a value of a

random variable X having some probability

distribution f(x). - If one is inspecting items coming off an assembly

line for defects, then each observation in

population might be a value 0 or 1 of the

binomial random variable X with probability

distribution where 0 indicates a nondefective

item and 1 indicates a defective item.

Random Sampling

- Sometimes, it is impossible or impractical to

observe the entire set of observations that make

up the population. - Definition 8.2 A sample is a subset of a

population. - Inference from the sample to the population are

to be valid - Obtain representative samples
- Bias Erroneous inferences result from selecting

convenient sampling members - Random sample independent and at random

Random Sampling

- Definition 8.3 Let X1, X2 ,
, Xn be n

independent random variables, each having the

same probability distribution f(x). We then

define X1, X2, , Xn to be a random sample of

size n from the population f(x) and write its

joint probability distribution as - If we assume the population of battery lives to

be normal, the possible values of any random

sample Xi, i 1, 2, , 8, will be precisely the

same as those in the original population, and

hence Xi has the same identical normal

distribution as X.

8.2 Some Important Statistics

- Definition 8.4 Any function of the random

variables constituting a random sample is called

a statistic. - Definition 8.5 If X1, X2 ,
, Xn represent a

random sample of size n, then the sample mean is

defined by the statistic - Definition 8.6 If X1, X2 ,
, Xn represent a

random sample of size n, then the sample variance

is defined by the statistic

Some Important Statistics

- Example 8.1 A comparison of coffee prices at 4

randomly selected grocery stores in San Diego

showed increases from the previous month of 12,

15, 17, and 20 cents for a 1-pound bag. Find the

variance of this random sample of price

increases. - Solution

Some Important Statistics

- Theorem 8.1 If S2 is the variance of a random

sample of size n, we may write - Proof

Some Important Statistics

- Definition 8.7 The sample standard deviation,

denoted by S, is the positive square root of the

sample variance. - Example 8.2 Find the variance of the data 3, 4,

5, 6, 6, and 7, representing the number of trout

caught by a random sample of 6 fishermen. - Solution

8.3 Data Displays and Graphical Methods

- Motivation Use creative displays to extract

information about properties of a set. - The stem and leaf plots provide the viewer a look

at symmetry of the data. - Normal probability plots and quantile plots are

used to check normal distribution. - Characterize statistical analysis as the process

of drawing conclusion about system variability. - Statistics provide single measures, whereas a

graphical display adds additional information in

terms of a picture.

Box and Whisker Plot or Boxplot

- Box and whisker plot encloses the interquartile

range of the data in a box that has median

displayed within. - Interquartile range between the 75th percentile

(upper quartile) and the 25th percentile (lower

quartile). - Boxplot provides the viewer information about

outliers which represent rare event. - Example 8.3 Nicotine content was measured in a

random sample of 40 cigarettes. The data is

displayed right. - Mild outliers 0.72, 0.85, and 2.55

Box and Whisker Plot or Boxplot

Box and Whisker Plot or Boxplot

- Example 8.4 Consider the following data,

consisting of 30 samples measuring the thickness

of paint can ears. Figure 8.2 depicts a box and

whisker plot for this asymmetric set of data.

Quantile Plot

- Quantile plot
- Compare samples of data
- Draw distinctions
- Depict cumulative distribution function
- Definition 8.8 A quantile of a sample, q(f), is

a value for which a specified fraction f of the

data values is less than or equal to q(f). - Sample median q(0.5) 75th percentile q(0.75)

25th percentile q(0.25)

Quantile Plot

- In Figure 8.3, quantile plot shows all

observations. - Large clusters slopes near zero
- Sparse data steeper slopes
- E.g.
- Sparse data 28-30
- High density 36-38

Normal Quantile-Quantile Plot

- Approximation of quantile of normal distribution
- Definition 8.8 The normal quantile-quantile

plot is a plot of

Normal Quantile-Quantile Plot

- Construct a normal quantile-quantile plot and

draw conclusions regarding whether or not it is

reasonable to assume that the two samples are

from the same N(?, ?) distribution. - Solution
- Far from a straight line
- Station 1 reflect a few values in the lower tail

of the distribution and several in the upper tail - Unlikely

8.4 Sampling Distribution

- Statistical inference is concerned with

generalizations and predictions. - Based on the opinions of several people

interviewed on the street, that in a forthcoming

election 60 of the eligible voters in the city

of Detroit favor a certain candidate. - Definition 8.10 The probability distribution of

a statistic is called a sampling distribution. - E.g., the probability distribution of is

called the sampling distribution of the mean. - The sampling distribution of a statistic depends

on the size of the population, the size of the

samples, and the method of choosing the samples.

8.5 Sampling Distribution of Means

- Suppose that a random sample of n observations is

taken from a normal population with mean ? and

variance ?2. - By the reproductive property of the normal

distribution established in Theorem 7.11

- Theorem 7.11 If X1, X2 ,
, Xn are independent

random variables having normal distributions with

means ?1, ?2 , , ?n and variances ?12, ?22 , ,

?n2, respectively, then the random variable

Y a1X1 a2X2 anXn has a

normal distribution with mean

?Y a1?1 a2?2 an?n and variance

?Y2 a12?12 a22?22 an2?n2.

Sampling Distribution of Means

- Theorem 8.2 Central Limit Theorem If is the

mean of a random sample of size n taken from a

population with mean ? and finite variance ?2,

then the limiting form of the distribution of as

n??, is the standard normal distribution n(z 0,

1). - The normal approximation for will generally

be good if n ? 30. - If n lt 30, the approximation is good only if the

population is not too different from a normal

distribution. - If the population is known to be normal, the

sampling distribution of will follow a

normal distribution exactly, no matter how small

the size of the samples.

Sampling Distribution of Means

- Example 8.6 An electric firm manufactures light

bulbs that have life mean equal to 800 hours and

a standard deviation of 40 hours. Find the

probability that a random sample of 16 bulbs will

have an average life of less than 775 hours. - Solution

Sampling Distribution of Means

- Example 8.7 A engineer conjectures that the

population mean of a certain component parts is

5.0 millimeters. An experiment is conducted in

which 100 parts produced by the process are

selected randomly and the diameter measured on

each. It is known that the population standard

deviation ? 0.1. The experiment indicates a

sample average diameter 5.027 millimeters.

Does this sample information appear to support or

refute the engineers conjecture? - Solution

Sampling Distribution of the Difference Between

Two Averages

- Theorem 8.3 If independent sample of size n1 and

n2 are drawn at random from two populations,

discrete or continuous, with means ?1 and ?2 and

variances ?12 and ?22, respectively, then the

sampling distribution of the differences of

means, is approximately normally

distributed with mean and variance given by

- Theorem 7.11 If X1, X2 ,
, Xn are independent

random variables having normal distributions with

means ?1, ?2 , , ?n and variances ?12, ?22 , ,

?n2, respectively, then the random variable

Y a1X1 a2X2 anXn has a normal

distribution with mean ?Y a1?1

a2?2 an?n and variance ?Y2

a12?12 a22?22 an2?n2.

Sampling Distribution of the Difference Between

Two Averages

- Example 8.8 Two independent experiments are

being run in which two different types of paints

are compared. Eighteen specimens are painted

using type A and the drying time in hours is

recorded on each. The same is done with type B.

The population standard deviations are both known

to be 1.0. Assuming that the mean drying time is

equal for the two types of paint, find - Solution

Sampling Distribution of the Difference Between

Two Averages

- Example 8.9 The television picture tubes of

manufacturer A have a mean lifetime of 6.5 yeas

and a standard deviation of 0.9 year, while those

of manufacturer B have a mean lifetime of 6.0

years and a standard deviation of 0.8 year. What

is the probability that a random sample of 36

tubes from manufacturer A will have a mean

lifetime that is at least 1 year more than the

mean lifetime of a sample of 49 tubes from

manufacturer B? - Solution

Sampling Distribution of S2

- If a random sample of size n is taken from a

normal population with mean ? and variance ?2,

and the sample variance S2 is computed.

Corollary If X1, X2 , , Xn are independent

random variables having identical normal

distributions with mean ? and variances ?2

has a chi-squared

distribution with v n degrees of freedom.

Sampling Distribution of S2

- Theorem 8.4 If S2 is the variance of a random

sample of size n taken from a normal population

having the variance ?2, then the statistic has

a chi-squared distribution with v n -1 degrees

of freedom. - It is customary to let ??2 represent the ?2

value above which we find an area of ?. This is

illustrated by the shaded region in Figure 8.10. - Table A.5

(No Transcript)

Sampling Distribution of S2

- Example 8.10 A manufacturer of car batteries

guarantees that his batteries will last, on the

average, 3 years with a standard deviation of 1

year. If five of these batteries have lifetimes

of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the

manufacturer still convinced that his batteries

have a standard deviation of 1 year? Assume that

the battery lifetime follows a normal

distribution. - Solution

Degrees of Freedom As a Measure of Sample

Information

- Comparison
- Theorem 7.12 has a ?2 distribution with n

degrees of freedom. - Theorem 8.4 has a ?2 distribution with n -1

degrees of freedom. (when ? is not known, a

degree of freedom is lost in the estimation of ?,

i.e. )

t-Distribution

- Central Limit Theorem (Theorem 8.2)
- ? might not be known.
- Consider
- In developing the sampling distribution of T, we

shall assume that our random sample was selected

from a normal population.

t-Distribution

- Theorem 8.5 Let Z be a standard normal random

variable and V a chi-squared random variable with

v degrees of freedom. If Z and V are independent,

then the distribution of the random variable T,

where is given by the density function This

is known as the t-distribution with v degrees of

freedom.

t-Distribution

- Corollary Let X1, X2 ,
, Xn be independent

random variables that are all normal with mean ?

and standard deviation ?. Let Then the random

variable has a t-distribution with v

n-1 degrees of freedom. - Student t-distribution
- The probability distribution of T was first

published in 1908 in a paper by W. S. Gosset. - Employed by an Irish brewery, but disallowed

publication. - Published his work secretly under the name

Student.

t-Distribution

- T is similar to Z symmetric about ?

0, bell-shaped. - Difference between T and Z variance of T ? 1

and depends on n - T and Z are the same n ? ?

t-Distribution

- t-value with 10 degrees of freedom leaving an

area of 0.025 to the right is t 2.228. - t-distribution is symmetric about 0 t1-? -t?.
- Example 8.11 The t-value with v 14 degrees of

freedom that leaves an area of 0.025 to the left,

and therefore an area of 0.975 to the right, is - Example 8.12 P(-t0.025 lt T lt t0.05) 1 - 0.05

- 0.025 0.925

t-Distribution

t-Distribution

t-Distribution

- Example 8.13 Find k such that P(k lt T lt -1.761)

0.045, for a random sample of size 15 selected

from a normal distribution and - Solution

t-Distribution

- Exactly 95 of the values of a t-distribution

with v n -1 degrees of freedom lie between

t0.025 and t0.025. - A t-value that falls below t0.025 or above

t0.025 would tend to make us believe that either

a very rare event has taken place or perhaps our

assumption about ? is error. - Example 8.14 A engineer claims that the

population mean of a process is 500 grams. To

check this claim he samples 25 batches each

month. If the computed t-value falls between

t0.05 and t0.05, he is satisfied with his claim.

What conclusion should he draw from a sample that

has a mean grams and a sample

standard deviation s 40 grams? Assume the

distribution of yields to be approximately

normal. - Solution

t-Distribution

- The t-distribution is used extensively in

problems that deal with - Inference about the population mean
- Comparative samples (two sample means)
- requires that X1, X2 ,
, Xn be

normal.

F-Distribution

- The F-distribution finds enormous application in

comparing sample variances. - Theorem 8.6 Let U and V be two independent

random variables having chi-squared distribution

with v1 and v2 degrees of freedom, respectively.

Then the distribution of the random variable

is given by the density This is

known as the F-distribution with v1 and v2

degrees of freedom.

F-Distribution

- Theorem 8.7 Writing f?(v1, v2) for f? with v1

and v2 degrees of freedom, we obtain - E.g., f-value with 6 and 10 degrees of freedom,

leaving an area of 0.95 to the right,

F-Distribution

10

4.06

F-Distribution with Two Sample Variances

- Suppose that random samples of size n1 and n2 are

selected from two normal populations with

variances ?12 and ?22 Let

having chi-squared distribution

with v1 n1 - 1 and v2 n2 1 degrees of

freedom. Using Theorem 8.6, we obtain the

following result - Theorem 8.8 If S12 and S22 are the variances of

independent random samples of size n1 and n2

taken from normal populations with variances ?12

and ?22, respectively, then has an

F-distribution with v1 n1 - 1 and v2 n2 1

degrees of freedom.

F-Distribution

- If we wish to determine if the population means

are equivalent - The normal distribution applies nicely for

two-sample situation. - However, three-sample?
- F-distribution is called the variance ratio

distribution. - Whether sample averages could have occurred by

chance depends on the variability within samples,

as quantified by SA2 and SB2, and SC2. - The notion of the important components of

variability is best seen through some simple

graphics

Analysis of Variance with F-Distribution

- Two key sources of variability
- Variability within samples
- Variability between samples
- If the variability within samples is considerably

larger than the variability between samples,

there will be considerable overlap in the sample

data and a signal that the data could all have

come from a common distribution.

Exercise

- 1, 14, 17, 29, 41, 51, 59