Measures of Center and Variation Sections 3.1 and 3.3 - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Measures of Center and Variation Sections 3.1 and 3.3

Description:

1. Measures of Center and Variation. Sections 3.1 ... Lead (Pb) in air at BMCC (mmg/m3), 1.5 high: 5.4, 1.1, 0.42, 0.73, 0.48, 1.1 ... NBA Jordan 78, =69, =2.8 ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 24
Provided by: LP868
Category:

less

Transcript and Presenter's Notes

Title: Measures of Center and Variation Sections 3.1 and 3.3


1
Measures of Center and Variation Sections 3.1
and 3.3
  • Prof. Felix Apfaltrer
  • fapfaltrer_at_bmcc.cuny.edu
  • OfficeN518
  • Phone 212-220 8000X 7421
  • Office hours
  • Mon-Thu 130-215 pm

2
Measures of center - mean
  • A measure of center is a value that represents
    the center of the data set
  • The mean is the most important measure of center
    (also called arithmetic mean)
  • sample mean
  • population mean

addition of values variable (indiv. data
vals) sample size population size
Example. Lead (Pb) in air at BMCC (mmg/m3), 1.5
high 5.4, 1.1, 0.42, 0.73, 0.48, 1.1
Outlier has strong effect on mean!
3
Measures of center - median
  • Mean is good but sensitive to outliers!
  • Large values can have dramatic effect!
  • The median is the middle value of the original
    data arranged in increasing order
  • If n odd exact middle value
  • If n even average 2 middle values
  • Previous example
  • reorder data
  • 0.42, 0.48, 0.73, 1.1, 1.1, 5.4

If we had an extra data point 5.4, 1.1, 0.42,
0.73, 0.48, 1.1, 0.66 After reordering we
have 0.42, 0.48, 0.66, 0.73, 1.1, 1.1, 5.4
Outlier has strong effect on mean, not so on
median!
Used for example in median household income
36,078
4
Measures of Center - mode and midrange
  • Mode M value that occurs most frequently
  • if 2 values most frequent bimodal
  • if more than 2 multimodal
  • Iif no value repeated no mode
  • Needs no numerical values
  • Midrange
  • (highest-lowest value)/2
  • Outliers have very strong weight
  • Examples
  • 5.4, 1.1,0.42, 0.73, 0.48, 1.1
  • 27, 27, 27, 55, 55, 55, 88, 88, 99
  • 1, 2, 3, 6 , 7, 8, 9, 10
  • Solutions
  • unimodal 1.1
  • Bimodal 27 and 55
  • No mode

a. (0.425.4)/22.91 b. (2799)/263 c.
(110)/2 5.5
5
Mode and more
  • Mode not much used with numerical data
  • Example
  • Survey shows students own
  • 84 TV
  • 76 VCR
  • 69 CD player
  • 39 video game player
  • 35 DVD
  • Mean from frequency distribution
  • Weighted mean
  • Dis-Advantages of different measures of center

TV is the mode! No mean, median or midrange!
Round-off carry one more decimal than in data!
6
Measures of variation
  • Variation measures consistency
  • Range (highest value - lowest value)/2
  • Standard deviation

Precision arrows
jungle arrows
Same mean length, but different variation!
7
Standard deviation
  • Recipe
  • Compute the mean
  • Substract mean from
  • Individual values
  • Square the differences
  • Add the squared differences
  • Divide by n-1.
  • Take the square root.
  • Example waiting times
  • Bank Consistency 6 5 4 4 6 5
  • Bank Unpredictable 0 15 5 0 0 10
  • Mean (654465)/65
  • (6-5)1,(5-5)0, (4-5)-1, (4-5)-1, (6-5)1, 0
  • 121 , 020, (-1)21, (-1)21, 121,020
  • ? 101110 4
  • n-16-15 4/50.8
  • Measure of variation of all values from mean
  • Positive or zero (data )
  • Larger deviations, larger s
  • Can increase dramatically with outliers
  • Same units as original data values

8
Standard deviation of sample and population
  • Standard deviation of a population
  • divide by N
  • - mu (population mean)
  • Sigma (st. dev. of population)
  • Different notations in calculators
  • Excell STDEVP instead of
  • STDEV
  • Example using fast formula
  • Find values of n, ,
  • n6 6 values in sample
  • 30 adding the values
  • 625242 42 52 62 154

Estimating s and ? (highest value - lowest
value)/4
9
Example class grades
  • A statistics class of 20 students obtains the
    following grades
  • To rapidly approximate the mean, we take a random
    sample of 5 students. At random, we pick
  • x (7892648378)/5395/5 79
  • s v((78-79) 2 (92-79) 2 (64-79)2(83-79) 2
    (78-79)2)/4
  • v(( -1) 2 ( 13 ) 2 ( -15 )2 ( 4 ) 2
    ( -1 )2)/4
  • v( 1 169 225 16 1)/4
  • v( 412 )/4 v( 103 ) 10.15
  • The population mean is obtained
  • by adding all grades
  • and dividing by 20, which is 79.95.
  • The population variance is 10.71.
  • Which we can obtain using Excell

10
Variance and coefficient of variation
  • Variance
  • Variance square of standard deviation
  • sample
  • population
  • General terms refering to variation dispersion,
    spread, variation
  • Variance specific definition
  • Ex finding a variance 0.8, 40
  • Examples
  • In class grade case, sample standard deviation
    was 10.15.
  • Therefore, s2103.
  • The population standard deviation was 10.71,
    therefore,
  • ? 210.71 2 114.7.

11
Coefficient of variation
  • Coefficient of variation CV p.155 ex. 49
  • Describes the standard deviation relative to the
    mean
  • Coefficient of variation allows to compare
    dispersion of completely different data sets
  • ex
  • consistent bank data set
  • 6,5,4,4,6,5 x5, s0.9
  • CV.9/50.18
  • Class sample x79, s10.1
  • CV10.1/790.13
  • Variation of consistent bank is larger than that
    of the class in relative terms!

In previous example, CVsample10.1/79 12.8
CVpopulation10.71/ 79.95 13.4
12
More on variance and standard deviation
  • Empirical rule for data with normal distribution
  • Why use variance, standard deviation is more
    intuitive?
  • (Independent) variances have additive properties
  • Probabilistic properties
  • Standard deviation is more intuitive
  • Why divide sample st. dev by n-1?
  • Only n-1 free parameters

Example Adult IQ scores have a bell-shaped
distribution with mean of 100 and a standard
deviation of 15. What percentage of adults have
IQ in 55145 range? s15, 3s45, x-3s55,
x3s145 Hence, 99.7 of adults have IQs in that
range.
Chebyshevs theorem At least 1-1/k2 percent of
the data lie between k standard deviations from
the mean. Ex At least 1-1/328/989 of the
data lie within 3 st. dev. of the mean.
13
  • The mean and the median are often different
  • This difference gives us clues about the shape of
    the distribution
  • Is it symmetric?
  • Is it skewed left?
  • Is it skewed right?
  • Are there any extreme values?

14
  • Symmetric the mean will usually be close to the
    median
  • Skewed left the mean will usually be smaller
    than the median
  • Skewed right the mean will usually be larger
    than the median
  • Skewness Pearsons index
  • I3( mean-median )/s
  • If I lt -1 or I gt 1 significantly skewed

15
  • For a mostly symmetric distribution, the mean and
    the median will be roughly equal
  • Many variables, such as birth weights below, are
    approximately symmetric

16
Summary Chapter 3 Sections 1and 2
  • Mean
  • The center of gravity
  • Useful for roughly symmetric quantitative data
  • Median
  • Splits the data into halves
  • Useful for highly skewed quantitative data
  • Mode
  • The most frequent value
  • Useful for qualitative data
  • Range
  • The maximum minus the minimum
  • Not a resistant measurement
  • Variance and standard deviation
  • Measures deviations from the mean
  • Not a resistant measurement
  • Empirical rule
  • About 68 of the data is within 1 standard
    deviation
  • About 95 of the data is within 2 standard
    deviations

17
Summary Chapter 3 Section 3 (Grouped Data)
  • As an example, for the following frequency table,
  • we calculate the mean as if
  • The value 1 occurred 3 times
  • The value 3 occurred 7 times
  • The value 5 occurred 6 times
  • The value 7 occurred 1 time

Class 0 1.9 2 3.9 4 5.9 6 7.9
Midpoint 1 3 5 7
Frequency 3 7 6 1
18
  • Evaluating this formula
  • The mean is about 3.6
  • In mathematical notation
  • This would be µ for the population mean and
    for the sample mean

19
Variance and Standard deviation (grouped data)
  • Interpreting a known value of the standard
    deviation s If the standard deviation s is
    known, use it to find rough estimates of the
    minimum and maximum usual sample values by
    using
  • max usual value mean 2?(st. dev)
  • min usual value mean - 2?(st. dev)
  • Finding s from a frequency distribution

Example cotinine levels of smokers
N-1 DATA 3,6,9 ?6, ? 26 Samples (replacement)
33 36 39 63 66 69 93 96 99 x 3
4.5 6 4.5 6 7.5 6 7.5 9 ?(x-x )2 0
4.5 18 4.5 0 4.5 18 4.5 0 S2(divide by
n-12-1) 0 4.5 18 4.5 0 4.5 18 4.5 0 Mean
value of s2 54/9 6 S? 2(divide
by n2) 0 2.25 9 2.25 0 2.25 9 2.25
0 Mean value of s? 2 27/9 3
using Excel we obtain
with which we calculate
20
Measures of relative standing
  • Useful for comparing different data sets
  • z scores
  • Number of standard deviations that a value x is
    above of below the mean
  • Percentiles
  • Percentile of value x Px

Example data point 48 in Smoker data
8/4010020th percentile P20 Exercise Locate
the percentiles of data points 1, 130 and 250.
  • Example
  • NBA Jordan 78, ?69, ? 2.8
  • WNBA Lobo 76, ?63.6, ? 2.5 Number of standard
    deviations that a value x is above of below the
    mean
  • J z(x-?)/?(78-69)/2.83.21
  • L z(x-?)/?(76-63.6)/2.54.96

21
Quartiles and percentiles
22
Percentiles and Quartiles
  • Quartiles
  • Q1, P25, Q2 P50 median, Q3 P75

Pk k (L 1)/n 100 Example data point 48 in
Smoker data is 9th on table, n 40. (9 1)/40
10020 ? 48 is in P20 or 20th percentile or
the first quartile Q1. Data point 234 is 28th.
k(28 1)/40 100 68th percentile, or the 3rd
quartile Q3.
  • Example In class table ( n 20 )
  • find value of 21 percentile
  • L21/100 20 4.2
  • round up to 5th data point
  • --gt P21 71
  • find the 80th percentile
  • L80/100 20 16,
  • WHOLE NUMBER
  • P80 (8992)/290.5
  • Conversely, if you are looking for data in the
    kth percentile
  • L(k/100)n
  • n total number of values
  • k percentiles being used
  • L locator that gives position of a value
  • (the 12th value in the sorted list L12)
  • Pk kth percentile (ex P25 is 25th percentile)

23
Exploratory Data Analysis
  • Exploratory data analysis is the process of using
    statistical tools (graphs, measures of center and
    variation) to investigate data sets in order to
    understand their characteristics.
  • Box plots have less information than histograms
    and stem-and-leaf plots
  • Not that often used with only one set of data
  • Good when comparing many different sets of data
  • Outlier Extreme value. (often they are typos
    when collecting data, but not always).
  • can have a dramatic effect on mean
  • can have dr. effect on standard deviation
  • on histogram
Write a Comment
User Comments (0)
About PowerShow.com