Chapter 1: Exploring Data - PowerPoint PPT Presentation

1 / 138
About This Presentation
Title:

Chapter 1: Exploring Data

Description:

Chapter 1: Exploring Data 1.1 Displaying Distributions with Graphs – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 139
Provided by: C937
Category:

less

Transcript and Presenter's Notes

Title: Chapter 1: Exploring Data


1
Chapter 1 Exploring Data 1.1 Displaying
Distributions with Graphs
2
Types of Graphs
Categorical
Quantitative
Dotplot
Bar Chart
Stemplot
Pie Chart
Histogram
Ogive
Time Plot
3
Bar graph
Displays categorical variables
Title
How to construct a bar graph
Step 1 Label your axes and title graph
Step 2 Scale your axes
Step 3 Leave spaces between bars
4
Side-by-Side bar graph
Compares two variables of one individual
Title
5
Example 1 The table shows results of a poll
asking adults whether they were looking forward
to the Super Bowl game, the commercials, or
didnt plan to watch.
Male Female Total
Game 279 200 479
Commercials 81 156 237
Wont Watch 132 160 292
Total 492 516 1008
Construct a side-by-side bar chart for their
preference based on gender. Note any trends that
appear.
6
Reason Looking Forward to Super Bowl
300
Game
250
Commercials
200
150
Wont Watch
100
50
Female
Male
Males overwhelmingly watch the Super Bowl for the
game, where women seem mixed as to why they want
to watch it.
7
Describing Quantitative Distributions
When describing a Graph -- CUSS
C
- Center
Average value, add up then divide by
Mean
Most frequent number. There can be many modes
Mode
Number in the center when data is lined up
Median
8
Calculator Tip
To calculate mean and median
Stat edit type in data exit Stat CALC
1-Var Stats - L1
9
Describing Quantitative Distributions
When describing a Graph -- CUSS
U
- Unusual points
Any data points that stand out as different
Dont call them outliers yet!
10
Describing Quantitative Distributions
When describing a Graph -- CUSS
S
- Shape
Fold in half, it matches up
Symmetric
Special Case, dont say yet!
Bell/Normal
All the same frequencies
Uniform
11
S
- Shape
One peak in the data
Unimodal
Two peaks in the data
Bimodal
12
S
- Shape
Gaps
Space between the data
Cluster
Several data points grouped together
13
S
- Shape
Skewed Right
Unusual point to the right
Skewed Left
Unusual point to the left
14
Describing Quantitative Distributions
When describing a Graph -- CUSS
- Spread
S
Distance between largest and smallest values.
Range Maximum - Minimum
Range
Homogeneous
Data is all in a similar space (small spread)
15
(No Transcript)
16
Dotplot
Dots are used to keep count of the frequency of
each number
How to construct a dotplot
Step 1 Label your axis and title your graph.
Step 2 Mark a dot above the corresponding value
TITLE
17
Example 2 The data below give the number of
hurricanes classified as major hurricanes in the
Atlantic Ocean each year from 1944 through 2006,
as reported by NOAA.
3 2 1 2 4 3 7 2 3 3 2 5 2 2 4 2 2
6 0 2 5 1 3 1 0 3 2 1 0 1 2 3 2 1
2 2 2 3 1 1 1 3 0 1 3 2 1 2 1 1 0
5 6 1 3 5 3 3 2 3 6 7 2 6 8
  1. Make a dotplot of the data.

18
Number of Hurricanes Classified as a Major
Hurricane (1944-2006)
0 1 2 3 4 5 6 7 8
b. Describe what you see in a few sentences.
19
  • A dotplot is a simple display. It just places a
    dot along an axis for each case in the data.
  • The dotplot to the right shows Kentucky Derby
    winning times, plotting each race as its own dot.
  • You might see a dotplot displayed horizontally or
    vertically.

20
Guidelines for constructing Stemplots (stem and
leaf)
1. Put data in order from smallest to largest
2. Separate each value in a STEM and LEAF The
leaf is a single digit and it is the rightmost
digit of the number. The stem will consist of
everything else to the left of the leaf
3. Stems go in a vertical column from small to
large and a vertical line is drawn to the
right of the stems
4. Leaves are written to the right of their
stems from small to large.
21
Back-to-Back Stemplots
To compare two different sets of data
Split Stemplots
To spread out the data to see more trends if they
are grouped together. Leaves will split from 0-4
and 5-9.
22
Example 3 The data below give the amount of
caffeine content (in milligrams) for an 8-ounce
serving of popular soft drinks.
20 15 23 29 23 15 23 31 28 35 37 27 24 26 47 28 24 28 28
16 38 36 35 37 27 33 37 25 47 27 29 26 43 43 28 35 31 25
  1. Construct stemplot.
  2. Construct a split stemplot.

23
Caffeine per 8oz of soda
a.
1 2 3 4
5 5 6 0 3 3 3 4 4 5 5 6 6 7 7 7 8 8 8 8 8 9 9 1
1 3 5 5 5 6 7 7 7 8 3 3 7 7
Key 1 5 15mill
b.
1 2 2 3 3 4 4
5 5 6 0 3 3 3 4 4 5 5 6 6 7 7 7 8 8 8 8 8 9 9 1
1 3 5 5 5 6 7 7 7 8 3 3 7 7
c. Differences?
24
www.whfreeman.com/tps3e
1-Var Stats
25
(No Transcript)
26
Most people believe that you need to drink coffee
or an energy drink to get good buzz off of the
caffeine. Below is a table with common caffeine
levels of tea, coffee, and energy drinks. Coffee
133 160 150 103 150 93 150 115 75 75 40
Energy Drink
160 144 100 100 95 83 80 80 80 79
74 50 48
d. Make a back-to-back stemplot. Comment on the
difference in caffeine levels between coffee and
energy drinks.
27
Coffee
Energy Drink
0 5 5 3 3 5 3 0 0 0 0
4 5 6 7 8 9 10 11 12 13 14 15 16
8 0 9 4
3 0 0 0 5 0 0 4
0
Key 1 5 15mg
28
http//www.cspinet.org/new/cafchart.htm
29
http//www.cspinet.org/new/cafchart.htm
30
562
56.2
5.62
562
56 2
56 2
56 2
5 6
50
2
5 0
0 2
31
Back-to-Back Stemplots
To compare two different sets of data
565
562
572
580
577
565
5 2 56 5 57 2 7 0 58
32
Split Stemplots
To spread out the data to see more trends if they
are grouped together. Leaves will split from 0-4
and 5-9.
565
562
572
580
577
565
2 56 5 56 5 57 2 57 7
0 58
33
Count towards median
median
Count towards median
34
Calculator Tip
Sort values from smallest to largest
Stat Edit type in data exit Stat SortA
L1
35
Calculator Tip
Clearing Lists
All Lists Mem ClrAllLists Enter
One List Stat Edit Highlight List name
Clear
36
Calculator Tip
Deleted a list?
STAT SetUpEditor Enter
37
Calculator Tip
Save a list?
L1 STO? Any name or Letter
To Retrieve later 2nd List
38
Calculator Tip
Remove a number from list?
Line up number you want to delete, hit DEL
39
Histogram
1. Divide the range of data into classes of
equal width.
2. Count the number of observations in each
class. Ensure no one number falls into two
classes
3. Label and scale the axes and title your graph.

4. Draw a bar that represents the count in each
class. The base of a bar should cover its
class, and the bar height is the class
count. Leave no horizontal space between
the bars unless the class is empty.
40
Make a histogram. Pg. 59
Calculator Tip
Stat Edit type in data exit StatPlot 1
On histogram L1 Freq 1 Zoom ZoomStat (9)
41
To adjust the classes Window Xmin Lowest
value Xmax Highest value Xscl Scale on
x-axis (width of bars) Ymin -0.2
typically Ymax Highest frequency rate (height
of bars) Yscl Scale on y-axis
Ymax
Yscl
Ymin
Xscl
Xmax
Xmin
42
Ex. 4 Describe the distribution of the graph.
C
4-5 words
U
12 words
S
Unimodal, slight skew right
S
1 to 12
Range 11
43
Example5 An executive finds the subscriptions
(in millions of people) of the 20 leading
American magazines is as follows
Readers Digest 17.9 Ladies Home Journal 5.3
TV Guide 17.1 National Enquirer 4.7
National Geographic 10.6 Time 4.6
Modern Maturity 9.3 Playboy 4.2
AARP News Bulletin 8.8 Redbook 4
Better Homes and Gardens 8 The Star 3.7
Family Circle 7.2 Penthouse 3.5
Womans Day 7 Newsweek 3
McCalls 6.4 Cosmopolitan 3
Good Housekeeping 5.4 People Weekly 2.8
Make a histogram for the number of subscriptions
in intervals of 2 (million) compared to the
frequency of that number. Then describe the
graph.
44
Circulation in millions of people of American
Magazines
8 7 6 5 4 3 2 1
Frequency
2 4 6 8 10 12 14 16 18 20
Circulation (in millions)
Describe the features of the graph in detail.
C
mean 6 .825, median 5.35
U
17.1 17.9
S
Skewed to the right, unimodal
S
2.8 to 17.9, range of 15.1
45
Height of NBA Players
46
http//bcs.whfreeman.com/tps3e
Page 50 applets One-variable Statistical
calculator
  • How do you determine how many classes to make?
  • When is it good to split the stems on a stemplot?

47
HW
P2 1.1 Types of Graphs Bar graph Dotplot Stemplot Histogram Describing a Graph 19 7, 9
P2 1.1 Types of Graphs Bar graph Dotplot Stemplot Histogram Describing a Graph 47 57-58 109 3(ab only) 11 51
48
Day 3
1.1 1.2
49
Relative Cumulative Frequency Graph (Ogive)
Shows relative standing of an observation
50
Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
  1. What percent of presidents were younger than 60?

80
51
Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
30
b. What percent of presidents were between 50 and
55?
52
Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
c. There is a horizontal line between 35 and 40
years of age. What does that mean?
No presidents were less than 40 years old
53
Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
d. What is the median age of the current
presidents?
55
54
Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
e. President Obama was 47 when he was
inaugurated. What percent of presidents were
older than him?
85
55
Plots each observation against the time at which
it was measured. Always mark the time scale on
the horizontal axis and the variable being
measured on the y axis.
Time Plots
A common overall pattern.
Trend
Seasonal Variations
A pattern that repeats itself at regular time
intervals
56
Ex. 7 Identify any trends and describe the
time plot.
Seems to fluctuate, peaking in 1983
57
(No Transcript)
58
Chapter 1 Exploring Data 1.2 Describing
Distributions with Numbers
59
Mean The average number of a set of data. Add
the values in the data set and divide by the
number of observations
For n observations,
or
60
Ex8 Find the mean for the two sets of
data. Data set A 1 1 2 2 3 Data set
B 1 1 2 2 500,000
Data set A
Data set B
What happened?
Strongly influenced by unusual values
61
Variance
Average of the squares of the deviations of the
observations from their mean
or
62
Standard Deviation
The square root of the variance
Measures the average distance the values are away
from the mean.
Degrees of Freedom
Dividing by n 1
63
Calculator Tip
Standard Deviation
1-var stats L1
64
Ex9 Calculate the Standard Deviation by
Hand Data Set 6, 4, 4, 3, 2, 6, 10
Mean 5
(4-5)2 (4-5)2 (3-5)2 (2-5)2 (6-5)2
(10-5)2
(6-5)2

42
(1)2
(-1)2 (-1)2 (-2)2 (-3)2 (1)2 (5)2

2.64575
65
Example 10 Using the numbers 1-10, choose 4
numbers so the standard deviation will be the
smallest. Then choose 4 numbers so the standard
deviation will be the largest. (Repeats are ok)
Smallest
1, 1, 1, 1
Sx 0
Sx 5.196
Largest
1, 1, 10, 10
66
http//www.stat.tamu.edu/west/ph/stddev.html
67
Example 11 Which graph will have the larger
standard deviation? Why?
a. b. c.
d. e.
68
  • Properties of the standard deviation and
    variance
  • Sensitive to _______________.
  • Some deviations are positive and some are
    negative (thats why we square them!) Otherwise,
    they would add up to zero and tell us nothing
    about the deviance around the mean. Then, to get
    the original units, we take the square root.

outliers
69
Properties of the standard deviation and variance
  • Standard deviation is at least ZERO, or
    greater, but never ________________.
  • Values that are very close together have a
    _____________ standard deviation and those far
    apart have a _____________ standard deviation.

negative
small
large
70
1.1 1.2 Ogives Time Plot Mean Variance Standard Deviation 64-69 89 101 13(ab only), 22, 23, 26 39, 43 54 Curriculum Night
71
Day 4 1.2
72
Median The midpoint or value where half of the
data is above the median and half is below the
median. (50 mark)
  • To find the median
  • Put all the data in order from smallest to
    largest
  • Cancel off the end data points until you find
    the middle

73
Resistant measure
Good estimate even when there is very unusual
values.
74
Ex12 Find the median for the two sets of
data. Data set A 1 1 2 2 3 Data set
B 1 1 2 2 500,000
Data set A
M 2
Data set B
M 2
Which one is a resistant measure? Mean or Median?
75
p percent of the observations fall at or below it
pth percentile
76
Quartiles
25th percentile first quartile Q1 50th
percentile median Q2 75th percentile third
quartile Q3
Five-Number Summary
Min, Q1, M, Q3, Max
77
Boxplot
Uses the five-number summary. A box is drawn
connecting Q1 and Q3 with a line through the
median. Whiskers are drawn to the max and min.
25
25
25
25
min
Q1
med
Q3
max
line
78
Interquartile Range
IQR Q3 Q1
Outliers
Data that is away from the majority of points
To Determine
Lower Outlier
Q1 1.5(IQR)
Upper Outlier
Q3 1.5(IQR)
All values should be between these two numbers
79
Outlier
Outliers


min
Q1
med
Q3
max
line
Keep in mind, you dont know how much data is in
a boxplot!
80
(No Transcript)
81
Calculator Tip
Boxplots.
Pg. 81
Stat Edit type data exit StatPlot 1 on
boxplot (with or without outliers) L1
82
Calculator Tip
5-Number Summary
Pg. 81
Stat Calc 1-var Stats L1
83
Ex 13 The Fuel Economy of 2004 vehicles is
given.
13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28
28 29 32 66
a. Determine the 5-number summary.
Min
13
Q1
18
Med
23
28
Q3
66
Max
84
b. Calculate the range and IQR for each data set.
Range
66 13
53
IQR
28 18
10
Min
13
Q1
18
Med
23
28
Q3
66
Max
85
c. Make a box plot using the 5-number summary.
10 15 20 25 30 35 40
45 50 55 60 65 70
d. Describe the shape, center, and spread.
C
Median 23
S
Skewed Right
U
66
S
Range 53, IQR 10
86
e. Are there any potential outliers using the
criterion?
Q1 1.5(IQR)
Q3 1.5(IQR)
18 1.5(10)
28 1.5(10)
18 15
28 15
3
43
Yes, 66 is above 43.
87
f. Construct a modified boxplot to account for
the outlier.

10 15 20 25 30 35 40
45 50 55 60 65 70
88
Ozone and Outliers The 'ozone hole' above
Antarctica provides the setting for one of the
most infamous outliers in recent history. It is a
great story to tell students who wantonly delete
outliers from a dataset merely because they are
outliers. In 1985 three researchers (Farman,
Gardinar and Shanklin) were puzzled by some data
gathered by the British Antarctic Survey showing
that ozone levels for Antarctica had dropped 10
below normal January levels. The puzzle was why
the Nimbus 7 satellite, which had instruments
aboard for recording ozone levels, hadn't
recorded similarly low ozone concentrations. When
they examined the data from the satellite it
didn't take long to realize that the satellite
was in fact recording these low concentrations
levels and had been doing so for years. But
because the ozone concentrations recorded by the
satellite were so low they were being treated as
outliers by a computer program and discarded! The
Nimbus 7 satellite had in fact been gathering
evidence of low ozone levels since 1976. The
damage to our atmosphere caused by
chloroflourocarbons went undetected and untreated
for up to nine years because outliers were
discarded without being examined. Moral Don't
just toss out outliers, as they may be the most
valuable members of a dataset.
89
(No Transcript)
90
Weight of NBA Players
91
(No Transcript)
92
  • Compare the histogram and boxplot for daily wind
    speeds
  • How does each display represent the distribution?

93
Matching Histograms and Boxplots Match each
histogram with its boxplot, by writing the letter
of the boxplot in the space provided.
94
1.
D
95
A
2.
96
3.
C
97
4.
E
98
5.
B
99
1970 Draft
Was the draft fair?
100
1971 Draft
101
(No Transcript)
102
(No Transcript)
103
(No Transcript)
104
(No Transcript)
105
1.2 Percentile Median Quartiles Boxplot IQR Determine outlier 82-84 106-107 33, 36, 37 61(a only), 62
106
Day 5 1.2
107
Comparing Distributions
Make sure you actually compare!!!!!!
Dont just state CUSS, but compare the values
108
(No Transcript)
109
change in population from 1990 to 2000
110
http//www.ruf.rice.edu/lane/stat_sim/descriptive
/
Mean and median applet.
www.whfreeman.com/tps3e
Pg. 73
Mean and median applet.
111
If the data is uniform or symmetric use
Mean
Center
Spread
standard deviation
If the data is skewed, use
Median
Center
Spread
Five-number summary, Range, IQR
112
(No Transcript)
113
Who's Counting It's Mean to Ignore the Median
Reading Economic Numbers from Democratic,
Republican Points of View Aug. 6, 2006 -
Believe it or not, the difference in the way the
Democrats and Republicans react to the
performance of the U.S. economy is clarified by a
mathematical distinction studied in elementary
school. The distinction is between the mean,
which the Republicans emphasize, while the
Democrats prefer the median. The relevance of
this distinction is apparent in the just-released
figures on the U.S. economy for 2004, the latest
year for which there is complete data. The
Republicans chortle that the economy grew at a
healthy rate of 4.2 percent. (It's slowed since
then.) The Democrats point to data from the
Census Bureau for the same year (and earlier as
well), indicating that the real median family
income fell and that poverty increased.
114
Example 14 Should you use the mean or median to
discuss the center?
  1. Average price of home
  2. Average age
  3. Average height
  4. Average gas mileage for all cars

Median
Mean
Mean
Median
115
Linear Transformation
Change in the measurement unit where you add or
multiply the data
116
Matching Histograms and Summary
Statistics Match each histogram with a set of
summary statistics, by writing the letter in the
space provided.
117
D. mean 10.2 standard deviation
4.1 median 11.9 IQR 6.8
1.
D
118
A. mean 10.5 standard deviation
1.4 median 10.7 IQR 2.0
2.
A
119
B. mean 10.1 standard deviation
2.7 median 10.1 IQR 4.2
3.
B
120
E. mean 8.8 standard deviation
2.8 median 8.0 IQR 1.9
4.
E
121
C. mean 10.2 standard deviation
2.1 median 10.5 IQR 2.5
5.
C
122
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Original Data
Mean
Median
S.D.
Q1
Q3
IQR
Range
3
3.5
1.77
1
4.5
3.5
4
123
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Dotplot
1 2 3 4 5
124
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Boxplot
1 2 3 4 5
125
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Original Data
Mean
Median
S.D.
Q1
Q3
IQR
Range
Multiply by 3







3
9
3.5
10.5
1.77
5.31
1
3
4.5
13.5
3.5
10.5
4
12
126
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Dotplot
1 2 3 4 5
3 4 5 6 7 8 9 10
11 12 13 14 15
127
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Boxplot
1 2 3 4 5
3 4 5 6 7 8 9 10
11 12 13 14 15
128
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Original Data
Mean
Median
S.D.
Q1
Q3
IQR
Range
Multiply by 3







Add 4







3
9
7
3.5
10.5
7.5
1.77
5.31
1.77
1
3
3
4.5
13.5
8.5
3.5
10.5
3.5
4
12
4
129
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Dotplot
1 2 3 4 5
1 2 3 4 5 6 7 8 9
130
Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Boxplot
1 2 3 4 5
1 2 3 4 5 6 7 8 9
131
Conclusion
Multiply
Changes both center and spread
Add
Changes mean 5-number summary Spread doesnt
change.
Middle always Moves
Spread Sometimes Shifts
132
Mean
Standard Deviation
133
  • Example 6
  • True or False.
  • If you add 7 to each entry on a list, that adds
    7 to the mean.
  • If you add 7 to each entry on a list, that adds
    7 to the standard deviation.
  • If you double each entry on a list, that
    doubles the mean.

TRUE
FALSE
TRUE
134
  • Example 6
  • True or False.
  • If you double each entry on a list, that
    doubles the standard deviation.
  • Multiplying each entry on a list changes the
    mean.
  • Multiplying each entry on a list changes the
    standard deviation.

TRUE
TRUE
TRUE
135
  • Example 6
  • True or False.
  • g. Adding to each entry on a list changes the
    mean.
  • h. Adding to each entry on a list changes the
    standard deviation.

TRUE
FALSE
136
Example 17 A college professor gave a test to
his students. The test had five questions, each
worth 20 points. The summary statistics for the
students scores on the test are below. After
grading the test, the professor realized that,
because he had made a typographical error in
question number 2, no student was able to answer
the question. So he decided to adjust the
students scores by adding 20 points to each one.
What will be the summary statistics for the new,
adjusted scores?
Summary Statistics for Scores Summary Statistics for Scores NEW
Mean 62
Median 60
Range 45
Standard Deviation 8
Q1 71
Q3 48
IQR 23
82
80
45
8
91
68
23
137
Example 18 The summary statistics for the
property tax per property collected by one county
are below. This year, county residents voted to
increase property taxes by 2 percent to support
the local school system. What will be the
summary statistics for the new, increased
property taxes?
Summary Statistics for Property Tax Summary Statistics for Property Tax NEW
Mean 12,000
Median 8,000
Range 30,000
Standard Deviation 5,000
Q1 14,000
Q3 5,000
IQR 9,000
12,240
8,160
30,600
5,100
14,280
5,100
9,180
138
1.2 Mean vs. Median Describing a Graph Choosing a Summary Linear Transformations 55-57 74-75 82 89 97 102 110-111 7, 10 27, 31, 32 35 40, 42 45, 46 58 68, 70
Research Project Due Soon!
Write a Comment
User Comments (0)
About PowerShow.com