Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread. - PowerPoint PPT Presentation

About This Presentation

Title:

Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread.

Description:

Summary Statistics. 8. Median is 'robust' Robust resistant to ... Summary Statistics. 9. Mode. Mode value with ... Summary Statistics. 15. New data set ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 32

Provided by: budger

Learn more at: https://www.sjsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Last week we used stemplots and histograms to describe the shape, location, and spread of a distribution. This week we use numerical summaries of location and spread.

1
Summary Statistics

Last week we used stemplots and histograms to
describe the shape, location, and spread of a
distribution. This week we use numerical
summaries of location and spread.

2
Main Summary Statistics by Type

Central location
Mean
Median
Mode
Spread
Variance and standard deviation
Quartiles and Inter Quartile Range (IQR)
Shape
Statistical measures of spread (e.g., skewness
and kurtosis) are available but are seldom used
in practice (not covered)

3
Notation

n ? sample size
X ? variable
xi ? value of individual i
? ? sum all values (capital sigma)
Illustrative example (sample.sav), data
21 42 5 11 30 50 28 27 24 52
n 10
X age
x1 21, x2 42, , x10 52
?x 21 42 52 290

4
Sample Mean
Illustrative example n 10 (data intermediate
calculations on prior slide)
5
Population Mean

Same operation as sample mean, but based on
entire population (N population size)
Not available in practice, but important
conceptually

6
Interpretation of xbar

Sample mean used to predict
an observation drawn at random from a sample
an observation drawn at random from the
population
the population mean
Gravitational center (balance point)

7
Median a different kind of average

Middle value
Covered last week
Order data
Depth of median is (n1) / 2
When n is odd ? middle value
When n is even ? average two middle values
Illustrative example, n 10 ? median has depth
(101) / 2 5.5

05 11 21 24 27 28 30 42 50
52?median average of 27 and 28 27.5
8
Median is robust

Robust ? resistant to skews and outliers

This data set has a mean (xbar) of 1600 1362
1439 1460 1614 1666 1792 1867 This data
set has an outlier and a mean of 2743 1362
1439 1460 1614 1666 1792 9867
The median is 1614 in both instances. The median
was not influenced by the outlier.
9
Mode

Mode ? value with greatest frequency
e.g., 4, 7, 7, 7, 8, 8, 9 has mode 7
Used only in very large data sets

10
Mean, Median, Mode

Symmetrical data mean median
positive skew mean gt median mean gets pulled
by tail
negative skew mean lt median

11
Spread Variability

Variability ? amount values spread above and
below the average
Measures of spread
Range and inter-quartile range
Standard deviation and variance (this week)

12
Range max min
The range is rarely used in practice b/c it tends
to underestimate population range and is not
robust
13
Standard deviation
Most common descriptive measure of spread
Sample variance
14
Standard deviation (formula)
Sample standard deviation s is the unbiased
estimator of population standard deviation ?.
Population standard deviation ? is rarely known
in practice.
15
New data set (Metabolic Rates)This example is
not in your lecture notes

Metabolic rates (cal/day), n 7
1792 1666 1362 1614 1460 1867 1439

16
Metabolic rates showing mean () and deviations
of first two observations
17
Standard Deviation Calculationmetabolic.sav
introduced slide 15
Observations Deviations Squared deviations

1792 1792 ?1600 192 (192)2 36,864
1666 1666 ?1600 66 (66)2 4,356
1362 1362 ?1600 -238 (-238)2 56,644
1614 1614 ?1600 14 (14)2 196
1460 1460 ?1600 -140 (-140)2 19,600
1867 1867 ?1600 267 (267)2 71,289
1439 1439 ?1600 -161 (-161)2 25,921
SUMS ? 0 SS 214,870
Sum of deviations will always equal zero
18
Standard Deviation Metabolic data (cont.)
Variance (s2)
Standard deviation (s)
19
General rule for rounding means and standard
deviations

Report mean to one additional decimals above that
of the data
To achieve accuracy, intermediate calculations
should carry still an additional decimals
Illustrative example
Suppose data is recorded with one decimal
accuracy (i.e., xx.x)
Report mean with two decimal accuracy (i.e.,
xx.xx)
Carry all intermediate calculations with at least
three decimal accuracy (i.e., xx.xxx)

Even more important Always use common sense and
judgment.
20
TI-30XIIS about 12
In practice, we often use software or a
calculator to check our standard deviation
21
Interpretation of Standard Deviation

Larger standard deviation ? greater variability
s1 15 and s2 10 ? group 1 has more
variability
68-95-99.7 rule Normal data only
68 of data with 1 SD of mean, 95 within 2 SD
from mean, and 99.7 within 3 SD of mean
e.g., if mean 30 and SD 10, then 95 of
individuals are in the range 30 (2)(10) 30
20 (10 to 50)
Chebychevs rule All data
at least 75 data within 2 SD of mean
e.g., mean 30 and SD 10, then at least 75 of
individuals in range 30 (2)(10) (10 to 50)

22
Quartiles and IQR

Quartiles divide the ordered data into four
equally-sized groups
Q0 minimum
Q1 25th ile
Q2 50th ile (Median)
Q3 75th ile
Q4 maximum

23
Rule for quartiles

Find the median ? Q2
Middle of lower half of data set ? Q1
Middle of upper half of the data ? Q3

Bottom half Top half 05 11
21 24 27 28 30 42 50 52
? ? ? Q1
Q2 Q3
IQR Q3 Q1 42 21 21 gives spread of
middle 50 of the data
24
5-Point Summary (sample.sav)

Q0 5 (minimum)
Q1 21 (lower hinge)
Q2 27.5 (median)
Q3 42 (upper hinge)
Q4 52 (maximum)

Best descriptive statistics for skewed data
25
Illustrative example (metabolic.sav)
1362 1439 1460 1614 1666 1792 1867
?
median Bottom half 1362
1439 1460 1614 ?
Q1 (1439 1460) / 2 1449.5 Top half 1614
1666 1792 1867 ? Q3
(1666 1792) / 2 1729 5-point summary 1362,
1449.5, 1614, 1729, 1867
26
Box-and-whiskers plot (boxplot)