Title: Chapter 2 Exploring Data with Graphs and Numerical Summaries
1Chapter 2Exploring Data with Graphs and
Numerical Summaries
- Learn .
- The Different Types of Data
- The Use of Graphs to Describe
- Data
- The Numerical Methods of Summarizing Data
2Section 2.1
- What are the Types of Data?
3In Every Statistical Study
- Questions are posed
- Characteristics are observed
4Characteristics are Variables
- A Variable is any characteristic that is
recorded for subjects in the study
5Variation in Data
- The terminology variable highlights the fact that
data values vary.
6Example Students in a Statistics Class
- Variables
- Age
- GPA
- Major
- Smoking Status
-
7Data values are called observations
- Each observation can be
- Quantitative
- Categorical
8Categorical Variable
- Each observation belongs to one of a set of
categories - Examples
- Gender (Male or Female)
- Religious Affiliation (Catholic, Jewish, )
- Place of residence (Apt, Condo, )
- Belief in Life After Death (Yes or No)
9Quantitative Variable
- Observations take numerical values
- Examples
- Age
- Number of siblings
- Annual Income
- Number of years of education completed
10Graphs and Numerical Summaries
- Describe the main features of a variable
- For Quantitative variables key features are
center and spread - For Categorical variables key feature is the
percentage in each of the categories
11Quantitative Variables
- Discrete Quantitative Variables
- and
- Continuous Quantitative Variables
12Discrete
- A quantitative variable is discrete if its
possible values form a set of separate numbers
such as 0, 1, 2, 3,
13Examples of discrete variables
- Number of pets in a household
- Number of children in a family
- Number of foreign languages spoken
14Continuous
- A quantitative variable is continuous if its
possible values form an interval
15Examples of Continuous Variables
- Height
- Weight
- Age
- Amount of time it takes to complete an assignment
16Frequency Table
- A method of organizing data
- Lists all possible values for a variable along
with the number of observations for each value
17Example Shark Attacks
18Example Shark Attacks
Example Shark Attacks
- What is the variable?
- Is it categorical or quantitative?
- How is the proportion for Florida calculated?
- How is the for Florida calculated?
19Example Shark Attacks
- Insights what the data tells us about shark
attacks
20Identify the following variable as categorical or
quantitative
- Choice of diet
- (vegetarian or non-vegetarian)
- Categorical
- Quantitative
21Identify the following variable as categorical or
quantitative
- Number of people you have known who have been
elected to political office - Categorical
- Quantitative
22Identify the following variable as discrete or
continuous
- The number of people in line at a box office to
purchase theater tickets - Continuous
- Discrete
23Identify the following variable as discrete or
continuous
- The weight of a dog
- Continuous
- Discrete
24Section 2.2
- How Can We Describe Data Using Graphical
Summaries?
25Graphs for Categorical Data
- Pie Chart A circle having a slice of pie for
each category - Bar Graph A graph that displays a vertical bar
for each category
26Example Sources of Electricity Use in the U.S.
and Canada
27Pie Chart
28Bar Chart
29Pie Chart vs. Bar Chart
- Which graph do you prefer?
- Why?
30Graphs for Quantitative Data
- Dot Plot shows a dot for each observation
- Stem-and-Leaf Plot portrays the individual
observations - Histogram uses bars to portray the data
31Example Sodium and Sugar Amounts in Cereals
32Dotplot for Sodium in Cereals
- Sodium Data
- 0 210 260 125 220 290 210 140
220 200 125 170 250 150 170 70
230 200 290 180 -
33Stem-and-Leaf Plot for Sodium in Cereal
- Sodium Data 0 210
- 260 125
- 220 290
- 210 140
- 220 200
- 125 170
- 250 150
- 170 70
- 230 200
- 290 180
34Frequency Table
- Sodium Data
- 0 210
- 260 125
- 220 290
- 210 140
- 220 200
- 125 170
- 250 150
- 170 70
- 230 200
- 290 180
35Histogram for Sodium in Cereals
36Which Graph?
- Dot-plot and stem-and-leaf plot
- More useful for small data sets
- Data values are retained
- Histogram
- More useful for large data sets
- Most compact display
- More flexibility in defining intervals
37Shape of a Distribution
- Overall pattern
- Clusters?
- Outliers?
- Symmetric?
- Skewed?
- Unimodal?
- Bimodal?
38Symmetric or Skewed ?
39Example Hours of TV Watching
40- Identify the minimum and maximum sugar values
2 and 14 1 and 3
1 and 15 0 and 16
41Consider a data set containing IQ scores for the
general public
- What shape would you expect a histogram of this
data set to have? - Symmetric
- Skewed to the left
- Skewed to the right
- Bimodal
42Consider a data set of the scores of students on
a very easy exam in which most score very well
but a few score very poorly
- What shape would you expect a histogram of this
data set to have? - Symmetric
- Skewed to the left
- Skewed to the right
- Bimodal
43Section 2.3
- How Can We describe the Center of Quantitative
Data?
44Mean
- The sum of the observations divided by the number
of observations
45Median
- The midpoint of the observations when they are
ordered from the smallest to the largest (or from
the largest to the smallest)
46Find the mean and median
- CO2 Pollution levels in 8 largest nations
measured in metric tons per person - 2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2
-
- Mean 4.6 Median 1.5
- Mean 4.6 Median 5.8
- Mean 1.5 Median 4.6
47Outlier
- An observation that falls well above or below the
overall set of data - The mean can be highly influenced by an outlier
- The median is resistant not affected by an
outlier
48Mode
- The value that occurs most frequently.
- The mode is most often used with categorical data
49Section 2.4
- How Can We Describe the Spread of Quantitative
Data?
50Measuring Spread Range
- Range difference between the largest and
smallest observations
51Measuring Spread Standard Deviation
- Creates a measure of variation by summarizing the
deviations of each observation from the mean and
calculating an adjusted average of these
deviations
52Empirical Rule
- For bell-shaped data sets
- Approximately 68 of the observations fall within
1 standard deviation of the mean - Approximately 95 of the observations fall within
2 standard deviations of the mean - Approximately 100 of the observations fall
within 3 standard deviations of the mean
53Parameter and Statistic
- A parameter is a numerical summary of the
population - A statistic is a numerical summary of a sample
taken from a population