Descriptive%20Statistics - PowerPoint PPT Presentation

About This Presentation

Title:

Descriptive%20Statistics

Description:

Descriptive Statistics Lecture 02: Tabular and Graphical Presentation of Data and Measures of Locations Presentation of Qualitative Variables The simplest way of ... – PowerPoint PPT presentation

Number of Views:261

Avg rating:3.0/5.0

Slides: 35

Provided by: Pena150

Learn more at: https://people.stat.sc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Descriptive%20Statistics

1
Descriptive Statistics

Lecture 02
Tabular and Graphical Presentation of Data and
Measures of Locations

2
Presentation of Qualitative Variables

The simplest way of presenting/summarizing a
qualitative variable is by using a frequency
table, which shows the frequency of occurrence of
each of the different categories.
Such a table could also include the relative
frequency, which indicates the proportion or
percentage of occurrence of each of the
categories.
The frequency table could then be pictorially
represented by a bar graph or a pie diagram.

3
An Example

A manufacturer of jeans has plants in California
(CA), Arizona (AZ), and Texas (TX). A sample of
25 pairs of jeans was randomly selected from a
computerized database, and the state in which
each was produced was recorded. The data are as
follows
CA AZ AZ TX CA CA CA TX TX TX AZ AZ CA AZ TX CA
AZ TX TX TX CA AZ AZ CA CA
Quite uninformative at this stage!
Need to summarize to reveal information.

4
The Frequency Table
5
The Bar Chart
Frequency
10
5
0
TX
CA
AZ
6
Example continued

By looking at this frequency table and bar graph,
one is able to obtain the information that there
seems to be equal proportions of pairs of jeans
being manufactured in the three states.
Frequency table and bar graph certainly more
informative than the raw presentation of the
sample data.
Another method of pictorial presentation of
qualitative data is by using the pie diagram. In
this case a pie is divided into the categories
with a given categorys angle being equal to 360
degrees times the relative frequency of
occurrence of that category.

7
Pie Diagram
Angles (in degrees) CA(360)(.36)129.6 AZ(360)(
.32)115.2 TX(360)(.32)115.2
129.6o
115.2o
115.2o
8
Pie Chart from Minitab
9
Presentation of Quantitative Variables

When the quantitative variable is discrete (such
as counts), a frequency table and a bar graph
could also be used for summarizing it.
Only difference is that the values of the
variables could not be reshuffled in the graph,
in contrast to when the variable is categorical
or qualitative.
For example suppose that we asked a sample 20
students about the number of siblings in their
family. The sample data might be
4, 1, 6, 2, 2, 3, 4, 1, 2, 2, 3, 7, 2, 1, 1, 5,
3, 4, 6, 3

10
Its Bar Graph is
11
An Example of a Real Data Set Poverty versus
PACT in SC
74 48 54 77 43 55 94 41 62 88 49 62 78 50 59 79 46
58 61 41 47 45 26 34 87 49 62 68 36 52 76 45 56 3
2 22 31 63 39 53 33 20 26 64 44 53 39 20 22 37 21
27 47 23 30 40 29 41 43 25 27 37 24 31 64 37 43 59
36 45 70 32 41 55 37 46 90 38 47 45 32 35 31 25 2
4 35 29 32 15 14 18
73 30 41 31 24 30 75 45 57 57 29 40 80 51 63 54 30
44 67 28 33 76 45 50 87 61 61 54 27 33 60 32 41 3
5 26 35 51 29 36 50 35 42 43 23 26 66 32 44 86 63
75 54 25 33 87 60 69 49 29 37 46 38 43 50 38 44 57
40 50 90 60 75 26 17 20 47 23 27 53 37 39 58 34 4
3 16 13 15
Lunch ActualLang ActualMath 59 32 38 46 26 30 90 6
3 67 29 17 24 41 24 26 51 30 41 41 25 30 43 32 36
70 33 36 93 50 66 84 50 66 64 27 32 52 36 43 50 31
43 53 28 35 78 36 41 57 31 42 51 39 42 55 41 53 6
0 37 45 96 46 66 75 34 45 60 29 36 71 43 53 68 42
51 76 47 52 82 49 55
12
Frequency Tables and Histograms
Consider the variable Lunch, which represents
the percentage of students in the school district
whose lunches are not free. The higher the value
of this variable, the richer the district. n
Number of Observations 86 LV Lowest Value
15 HV Highest Value 96 Let us construct a
frequency table with classes 10,20), 20,30),
30,40), , 90,100)
13
Frequency Table for Variable Lunch
14
Frequency Histogram
15
Stem-and-Leaf Plots

An important tool for presenting quantitative
data when the sample size is not too large is via
a stem-and-leaf plot.
By using this method, there is usually no loss
of information in that the exact values of the
observations could be recovered (in contrast to a
frequency table for continuous data).
Basic idea To divide each observation into a
stem and a leaf.
The stems will serve as the body of the plant
while the leaves will serve as the branches or
leaves of the plant.
An illustration makes the idea transparent.

16
An Example

A random sample of 30 subjects from the 1910
subjects in the blood pressure data set was
selected. We present here the systolic blood
pressures of these 30 subjects.
30 Systolic Blood Pressures 122 135 110 126 100
110 110 126 94 124 108 110 92 98 118 110 102 108
126 104 110 120 110 118 100 110 120 100 120 92
Lowest Value 92, Highest Value 135
Stems 9,10, 11, 12, 13
Leaves Ones Digit

17
Stem-and-Leaf Plot

9 224
9 8
10 00024
10 88
11 00000000
11 88
12 00024
12 666
13
13 5

18
Stem-and-Leaf continued

In this stem-and-leaf plot, because there will
only be 5 stems if we use 9, 10, 11, 12, 13, we
decided to subdivide each stem into two parts
corresponding to leaf values lt 4, and those gt
5.
Such a procedure usually produces better looking
distributions.
Looking at this stem-and-leaf plot, notice that
many of the observations are in the range of
100-126.
The exact values could be recovered from this
plot.
By arranging the leaves in ascending order, the
plot also becomes more informative.

19
Comparative Stem-and-Leaf Plots

When comparing the distributions of two groups
(e.g., when classified according to GENDER),
side-by-side stem-and-leaf plots (also
side-by-side histograms) could be used.
To illustrate, consider 30 observations from the
blood pressure data set with Gender and Systolic
Blood Pressure being the observed variables.
For the males (Sex 0) 122, 120, 130, 110, 134,
136, 142, 100, 120, 162, 126, 132, 124, 130
For the females (Sex 1) 132, 94, 104, 100,
130, 110, 102, 110, 130, 92, 125, 108, 100, 130,
100, 100

20
Comparing Male/Female Systolic Blood Pressures
21
Scatterplots Studying Relationship Between
Poverty and Math
Question What kind of relationship is there
between Lunch and PACT Math Scores?
22
Numerical Summary Measures

Overview
Why do we need numerical summary measures?
Measures of Location
Measures of Variation
Measures of Position
Box Plots

23
Why we Need Summary Measures?

A picture is worth a thousand words, but beauty
is always in the eyes of the beholder!
Graphs or pictures sometimes unwieldy
Usually wants a small set of numbers that could
provide the important features of the data set
When making decisions, objectivity is enhanced
when they are based on numbers!
Numerical summaries and tabular/graphical
presentations complement each other

24
The Setting

In defining and illustrating our summary
measures, assume that we have sample data
Sample Data X1, X2, X3, , Xn
Sample Size n
These summary measures are thus (sample)
statistics.
If instead they are based on the population
values, they will be (population) parameters.

25
Measures of Location or Center

These are summary measures that provide
information on the center of the data set
Usually, these measures of location are where the
observations cluster, but not always
In laymans terms, these measures are what we
associate with averages
Will discuss two measures sample mean and sample
median

26
Sample Mean or Arithmetic Average

The sample mean equals the sum of the
observations divided by the number of
observations.
It is defined symbolically via

27
Properties of the Sample Mean

Center of Gravity
Sum of the deviations of the observations from
the mean is always zero (barring rounding errors)
Sample mean could however be affected drastically
by extreme or outliers
The sample mean is very conducive to mathematical
analysis compared to other measures of location

28
Illustration

Consider the systolic blood pressure data set
considered in Lecture 01
Sample Size n 30
Data 122, 135, 110, 126, 100, 110, 110, 126, 94,
124, 108, 110, 92, 98, 118, 110, 102, 108, 126,
104, 110, 120, 110, 118, 100, 110, 120, 100, 120,
92

29
Sample Mean Computation

This value of 111.1 could be interpreted as the
balancing point of the 30 systolic blood pressure
observations.
Locating this in the histogram we have

30
Sample Mean in Histogram
31
Sample Median

Sample median (M) value that divides the
arranged/ordered data set into two equal parts.
At least 50 are lt M and at least 50 are gt M
Not sensitive to outliers but harder to deal with
mathematically
Appropriate when histogram is left or
right-skewed
Better to present both mean and median in practice

32
Illustration of Computation of Median

Consider again the blood pressure data earlier.
n30 an even number.
Median will be the average of the 15th and 16th
observations in arranged data.
Arranged data 92, 92, 94, 98, 100, 100, 100,
102, 104, 108, 108, 110, 110, 110, 110, 110, 110,
110, 110, 118, 118, 120, 120, 120, 122, 124, 126,
126, 126, 135

33
Continued ...

The sample median is the average of 110 and 110,
which are the 15th and 16th observations in the
arranged data.
The median equals 110.
Note that it is very close to the sample mean
value of 111.1
This closeness is because of the near symmetry of
the distribution

34
Relative Positions of Mean and Median

For symmetric distributions, the mean and the
median coincide.
For right-skewed distributions, the mean tends to
be larger than the median (mean pulled up by the
large extreme values)
For left-skewed distributions, the mean tends to
be smaller than the median (mean pulled down by
the small extreme values)

Write a Comment

User Comments (0)