Title: Review BPS chapter 1
1Review BPS chapter 1
- Picturing Distributions with Graphs
- What is Statistics ?
- Individuals and variables
- Two types of data categorical and quantitative
- Ways to chart categorical data bar graphs and
pie charts - Ways to chart quantitative data histograms and
stem plots - Interpreting histograms
- Time plots
2Example BPS chapter 1
- Indicate whether each of the following variables
is categorical or - quantitative.
- a. We have data on 20 individuals measuring
amount of time it takes to - climb five flights of stairs.
- b. During a clinical trial, an experimental pain
relief drug is administered to - individuals. Each individual is then asked
whether s/he experienced - any pain relief.
Quantitative
Categorical
3Objectives (BPS chapter 2)
- Describing distributions with numbers
- Measure of center mean and median
- Measure of spread quartiles and standard
deviation - The five-number summary and boxplots
- IQR and outliers
- Choosing among summary statistics
4Measure of center the mean
The mean or arithmetic average To calculate the
average, or mean, add all values, then divide by
the number of individuals. It is the center of
mass. Sum of heights is 1598.3 Divided by 25
women 63.9 inches
5Mathematical notation
Learn right away how to get the mean using your
calculators.
6Measure of center the median
- The median(M) is the midpoint of a
distributionthe number such that half of the
observations are smaller and half are larger.
1. Sort observations from smallest to largest.
2. Find the location of the median (L)
7Comparing the mean and the median
- The mean and the median are the same only if the
distribution is symmetrical. In a skewed
distribution, the mean is usually farther out in
the long tail than is the median. The median is a
measure of center that is resistant to skew and
outliers. The mean is not.
Mean and median for a symmetric distribution
Mean Median
Mean and median for skewed distributions
Left skew
Right skew
Mean Median
Mean Median
8Mean and median of a distribution with outliers
Percent of people dying
9Impact of skewed data
10Example STAT 200 Midterm Score
Midterm 30 35 40 40 40 40 45 45 45 45 50 50 55 55
60 65 65 70 100 100
Descriptive Statistics Midterm Variable N
Mean StDev Minimum Q1 Median Q3
Maximum Midterm 20 53.75 18.98 30.00
40.00 47.50 63.75 100.00
11Measure of spread quartiles
The first quartile, Q1, is the value in the
sample that has 25 of the data at or below it.
The third quartile, Q3, is the value in the
sample that has 75 of the data at or below it.
Q1 first quartile 2.2
M median 3.4
Q3 third quartile 4.35
12Center and spread in boxplots
Largest max 6.1
Q3 third quartile 4.35
M median 3.4
Q1 first quartile 2.2
Five-number summary
Smallest min 0.6
13Boxplots for skewed data
Comparing box plots for a normal and a
right-skewed distribution
Boxplots remain true to the data and clearly
depict symmetry or skewness.
14IQR and outliers
- The interquartile range (IQR) is the distance
between the first and third quartiles (the length
of the box in the boxplot) - IQR
Q3 - Q1 - An outlier is an individual value that falls
outside the overall pattern. - How far outside the overall pattern does a value
have to fall to be considered an outlier? - The 1.5 X IQR Rules for Outliers
Low outlier any value lt Q1 1.5 IQR High
outlier any value gt Q3 1.5 IQR
15Example STAT 200 Midterm Score
Midterm 30 35 40 40 40 40 45 45 45 45 50 50 55 55
60 65 65 70 100 100
- IQR Q3 - Q1 63.75-40.0023.75
-
Low outlier any value lt Q1 1.5 IQR 40.00 -
1.5(23.75) 4.375 High outlier any value gt Q3
1.5 IQR 63.75 1.5(23.75) 99.375
Outliers !!
16Measure of spread standard deviation
The standard deviation is used to describe the
variation around the mean.
Mean 1 s.d.
17Calculations
Womens height (inches)
Mean 63.4 Sum of squared deviations from mean
85.2 Degrees freedom (df) (n - 1) 13 s2
variance 85.2/13 6.55 inches squared s
standard deviation v6.55 2.56 inches
- Well never calculate these by hand, so make sure
you know how to get the standard deviation using
your calculator.
18Choosing among summary statistics
- Because the mean is not resistant to outliers or
skew, use it to describe distributions that are
fairly symmetrical and dont have outliers. ?
Plot the mean and use the standard deviation for
error bars. - Otherwise, use the median in the five-number
summary, which can be plotted as a boxplot.
Box plot Mean s.d.
19Example 1
- Suppose a sample of twelve lab rats is found to
have the following glucose levels - 3 4 4 6 6 6 8 8 9 10 12 15
- 1. Find the five-number summary of the data and
construct box-plot . - 2. Based on the box plot, the data set is
- a. Skewed to left
- b. roughly symmetric
- c. skewed to right
-
Min3, Q15, M7, Q39.5, Max15
20Example 2
Suppose a researcher is recording fifty values in
a database. Suppose she records every value
correctly except the lowest value, which is
supposed to be 2 but which she incorrectly
types as 200. In the above scenario, the
effect of the researchers error on mean and
Median is a. Her calculated mean will be
lower than it would have been without the error,
but her calculated Median will remain unchanged.
b. Her calculated mean will be higher than it
would have been without the error, but her
calculated Median will remain unchanged. c. Her
calculated mean will remain unchanged, but her
calculated Median will be lower than it would
have been without the error. d. Her calculated
mean will remain unchanged, but her calculated
Median will be lower than it would have been
without the error.
21Example 2
In the above scenario, the effect of the
researchers error on standard deviation is
a. The error will not affect standard
deviation. b. Her calculated standard deviation
will be smaller than it would have been without
the error. c. Her calculated standard deviation
will be larger than it would have been without
the error. d. The error is likely to make the
calculated standard deviation negative.
22Example 3
- There are three children in a room -- ages 3, 4,
and 5. If a four-year-old child enters the room,
the -
- mean age and variance will stay the same.
- mean age and variance will increase.
- mean age will stay the same but the variance will
increase. - mean age will stay the same but the variance will
decrease. -