1.2 Displaying and Describing Categorical - PowerPoint PPT Presentation

About This Presentation
Title:

1.2 Displaying and Describing Categorical

Description:

Title: Qualitative (Categorical) Data Author: Center for Academic Computing Last modified by: LCPS Created Date: 1/13/1998 12:40:31 PM Document presentation format – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 55
Provided by: CenterforA153
Learn more at: https://www.lcps.org
Category:

less

Transcript and Presenter's Notes

Title: 1.2 Displaying and Describing Categorical


1
1.2 Displaying and Describing Categorical
Quantitative Data
2
You should be able to
  • Recognize when a variable is categorical or
    quantitative
  • Choose an appropriate display for a categorical
    variable and a quantitative variable
  • Summarize the distribution with a bar, pie chart,
    stem-leaf plot, histogram, dot plot, box plots
  • Know how to make a contingency table
  • Describe the distribution of categorical
    variables in terms of relative frequencies
  • Be able to describe the distribution of
    quantitative variables in terms of its shape,
    center and spread
  • Describe abnormalities or extraordinary features
    of distribution
  • Discuss outliers and how they deviate from the
    overall pattern

3
3 RULES of EXPLORATORY DATA ANALYSIS
  1. MAKE A PICTURE find patterns in difficult to
    see from a chart
  2. MAKE A PICTURE show important features in graph
  3. MAKE A PICTURE- communicates your data to others

4
Concepts to know!
  • Bar graph
  • Histogram
  • Dot plot
  • Stem leaf plot
  • Scatterplots
  • Boxplots

5
Which graph to use?
  • Depends on type of data
  • Depends on what you want to illustrate

6
Categorical Data
  • The objects being studied are grouped into
    categories based on some qualitative trait.
  • The resulting data are merely
  • labels or categories.

7
Categorical Data(Single Variable)
Eye Color BLUE BROWN GREEN
Frequency (COUNTS) 20 50 5
Relative Frequency 20/75 .27 50/75 .66 5/75 .07
8
Pie Chart(Data is Counts or Percentages)
9
Bar Graph(Shows distribution of data)
10
Bar Graph
  • Summarizes categorical data.
  • Horizontal axis represents categories, while
    vertical axis represents either counts
    (frequencies) or percentages (relative
    frequencies).
  • Used to illustrate the differences in percentages
    (or counts) between categories.

11
Contingency Table(How data is distributed
across multiple variables)
Class Class Class Class Class Class
Survival First Second Third Crew Total
Survival ALIVE 203 118 178 212 711
Survival DEAD 122 167 528 673 1490
Survival Total 325 285 706 885 2201
12
What can go wrong when working with categorical
data?
  • Pay attention to the variables and what the
    percentages represent
  • (9.4 of passengers who were in first class
    survived is different from 67 of survivors were
    first class passengers!!!)
  • Make sure you have a reasonably large data set
    (67 of the rats tested died and 1 lived)

13
Analogy
Bar chart is to categorical data as histogram is
to ...
quantitative data.
14
Histogram
15
Histogram
  • Divide measurement up into equal-sized categories
    (BIN WIDTH)
  • Determine number (or percentage) of measurements
    falling into each category.
  • Draw a bar for each category so bars heights
    represent number (or percent) falling into the
    categories.
  • Label and title appropriately.
  • http//www.stat.sc.edu/west/javahtml/Histogram.ht
    ml

16
Histogram
Use common sense in determining number of
categories to use. Between 6 15 intervals is
preferable
(Trial-and-error works fine, too.)
17
Too few categories
18
Too many categories
19
Dot Plot
20
Dot Plot
  • Summarizes quantitative data.
  • Horizontal axis represents measurement scale.
  • Plot one dot for each data point.

21
Stem-and-Leaf Plot
Stem-and-leaf of Shoes N 139 Leaf Unit
1.0 12 0 223334444444 63 0
55555555555556666666667777777888888888888899999999
9 (33) 1 000000000000011112222233333333444
43 1 555555556667777888 25 2
0000000000023 12 2 5557 8 3 0023
4 3 4 4 00 2 4 2 5 0
1 5 1 6 1 6 1 7
1 7 5
22
Stem-and-Leaf Plot
  • Summarizes quantitative data.
  • Each data point is broken down into a stem and
    a leaf.
  • First, stems are aligned in a column.
  • Then, leaves are attached to the stems.

23
Box Plot
24
Box Plot
  • Summarizes quantitative data.
  • Vertical (or horizontal) axis represents
    measurement scale.
  • Lines in box represent the 25th percentile
    (first quartile), the 50th percentile
    (median), and the 75th percentile (third
    quartile), respectively.

25
5 Number Summary
  • Minimum
  • Q1 (25th percentile)
  • Median (50th percentile)
  • Q3 (75th percentile)
  • Maximum

26
An aside...
  • Roughly speaking
  • The 25th percentile is the number such that 25
    of the data points fall below the number.
  • The median or 50th percentile is the number
    such that half of the data points fall below the
    number.
  • The 75th percentile is the number such that 75
    of the data points fall below the number.

27
Box Plot (contd)
  • Outliers are drawn to the most extreme data
    points that are not more than 1.5 times the
    length of the box beyond either quartile(IQR).
  • IQR Q3 - Q1
  • Outliers(upper) gt Q31.5 IQR
  • Outliers(lower)ltQ1-1.5IQR

28
Using Box Plots to Compare
Outliers
29
Strengths and Weaknesses of Graphs for
Quantitative Data
  • Histograms
  • Uses intervals
  • Good to judge the shape of a data
  • Not good for small data sets
  • Stem-Leaf Plots
  • Good for sorting data (find the median)
  • Not good for large data sets

30
Strengths and Weaknesses of Graphs for
Quantitative Data
  • Dotplots
  • Uses individual data points
  • Good to show general descriptions of center and
    variation
  • Not good for judging shape for large data sets
  • Boxplots
  • Good for showing exact look at center, spread and
    outliers
  • Not good for judging shape

31
Analogy
Contingency table is to categorical data with two
variables as scatterplot is to ..
quantitative data with two variables.
32
Scatter Plots
33
Scatter Plots
  • Summarizes the relationship between two
    quantitative variables.
  • Horizontal axis represents one variable and
    vertical axis represents second variable.
  • Plot one point for each pair of measurements.

34
No relationship
35
Summary
  • Many possible types of graphs.
  • Use common sense in reading graphs.
  • When creating graphs, dont summarize your data
    too much or too little.
  • When creating graphs, label everything for
    others.
  • Remember you are trying to communicate something
    to others!

36
  • GRAPHICAL ANALYSIS

37
  • CENTER
  • SPREAD

38
Graphical Analysis
Center (Location)
Spread (Variation)
Shape
39
Interesting Features Identified by Graphs
  • Center (Location)
  • Spread (Variability)
  • Shape
  • Individual Values
  • Compare Groups
  • Identify Outliers
  • LOOK FOR PATTERNS, CLUSTERS, GAPS!!!!!
  • LOOK FOR DEVIATIONS FROM THE GENERAL PATTERN!!!!!
  • http//bcs.whfreeman.com/bps3e/

40
Shape of Graph
41
Shape of Graph
  • Symmetry-Skewness
  • Modes(peaks)

42
SYMMETRY SKEWNESS
  • Skewness - a measure of symmetry, or more
    precisely, the lack of symmetry.
  • A distribution, or data set, is symmetric if it
    looks the same to the left and right of the
    center point.

43
Kurtosis
  • Kurtosis - a measure of whether the data are
    peaked or flat relative to a normal distribution.
  • High kurtosis tend to have a distinct peak near
    the mean, decline rather rapidly, and have heavy
    tails. L
  • Low kurtosis tend to have a flat top near the
    mean rather than a sharp peak.

44
Symmetric -- Mesokurtic
45
Symmetric -- Platykurtic
46
Symmetric -- Leptokurtic
47
Nonsymmetric Skewed Positive -Skewed Right
48
Nonsymmetric Skewed Negative -- Skewed Left
49
Symmetric -- Bimodal

50
Describing Distributions SOCS Rock!
  • Shape
  • Outliers
  • Center
  • Spread

51
Graphical Analysis - Scale
SOL Scores Class 1
SOL Scores Class 2
52
Graphical Analysis - Scale
SOL Scores Class 1
SOL Scores Class 2
53
SOL Scores Class 1
SOL Scores Class 2
54
  • Comparing Histograms
Write a Comment
User Comments (0)
About PowerShow.com