Descriptive Statistics Part II PowerPoint PPT Presentation

presentation player overlay
1 / 35
About This Presentation
Transcript and Presenter's Notes

Title: Descriptive Statistics Part II


1
Chapter 3
  • Descriptive Statistics Part II
  • Describing Central Tendency
  • Measures of Variation

2
Important Characteristics of data
  • Center A value that indicates where the middle
    of the data set is located.
  • Variation A measure of the amount that the data
    values vary among themselves
  • Distribution The nature or shape of the
    distribution of data (such as bell-shaped,
    uniform, or skewed)
  • Outliers Sample values that lie very far away
    from the vast majority of the other samples

3
Describing Central Tendency
  • Mean, ?, is the average or expected value
  • Median, Md , is the middle point of the ordered
    measurements
  • Mode, Mo, is the most frequent value
  • Percentiles and Quartiles

4
Basic Symbols
  • The sample size, i.e. the number of items in the
    sample, is denoted n.
  • The population size, i.e. the total number of
    items in the entire population, is denoted N.

5
Mean
  • If the data are from a population, the mean is
    denoted by ? (mu).
  • If the data are from a sample, the mean is
    denoted by .
  • The sample mean is a point estimate of the
    population mean .

6
Example 3.1
  • Suppose we compiled a sample of the weights of 5
    professional football players
  • 255, 216, 346, 300, 270

7
Example 3.2Given below is a sample of monthly
rent values () for one-bedroom apartments. The
data is a sample of 70 apartments in a particular
city. The data are presented in ascending order.

Anderson, Sweeney, and Williams
8
The Median
  • The median is a value such that at least 50
    of all measurements are less than or equal to it
    and at least 50 of all measurements are greater
    than or equal to it .
  • The median is the measure of location most often
    reported for annual income and property value
    data.
  • This measure is used instead of the mean since a
    few extremely large incomes or property values
    can inflate the mean.

9
  • The median Md is found as follows
  • Arrange values in ascending order (smallest to
    largest).
  • If the number of measurements is odd, the median
    is the middle value.
  • If the number of measurements is even, the median
    is the average of the two middle values.

10
Example 3.3
  • Suppose the following represent a sample of
    salaries of 13 Internist(x1000)
  • 127 132 138 141 144 146 152 154
    165 171 177 192 241
  • Since n 13 (odd,) then the median is the
    middlemost or 7th measurement, Md152

11
Example 3.2 Revisited
Median (475 475)/2 475
12
Mode
The mode, Mo , is the measurement that occurs
most frequently.
  • The greatest frequency can occur at two or more
    different values.
  • If the data have exactly two modes, the data are
    bimodal.
  • If the data have more than two modes, the data
    are multimodal.
  • Mode is an important measure of location for
    qualitative data (can not compute median and mean
    for qualitative data)

13
Mode
450 occurred most frequently (7 times) Mode
450
14
Percentiles and Quartiles
  • A percentile provides information about how the
    data are spread over the interval from the
    smallest value to the largest value.
  • Admission test scores for colleges and
    universities are frequently reported in terms of
    percentiles.

15
Percentiles
  • The pth percentile of a data set is a value such
    that at least p percent of the items take on this
    value or less and at least (100 - p) percent of
    the items take on this value or more.
  • Steps for computing percentiles
  • Arrange the data in ascending order.
  • 2. Compute index i, the position of the pth
    percentile. i (p/100)n
  • 3a. If i is not an integer, round up. The p th
    percentile is the value in this position.
  • 3b. If i is an integer, the p th percentile is
    the average of the values in positions i and i 1.

16
Example 3.2 Revisited
90th Percentile
  • i (p/100)n (90/100)70 63
  • Averaging the 63rd and 64th data values
  • 90th Percentile (580 590)/2 585

17
65th Percentile
i (p/100)n (65/100)70 45.5 This is a non
integer so round i up to 46. Data value in
position 46500 65th Percentile 500
18
Quartiles
  • Quartiles are specific percentiles
  • First Quartile 25th Percentile
  • Second Quartile 50th Percentile Median
  • Third Quartile 75th Percentile

19
Example 2.4 Revisited
  • Third quartile 75th percentile
  • i (p/100)n (75/100)70 52.5 53
  • Third quartile 525

20
Measures of Variation
  • The range is the largest minus the smallest
    measurement.
  • The variance is the average of the sum of the
    square of the deviations from the mean.
  • The standard deviation is the square root of the
    variance.
  • In a comparison of multiple variables, the one
    with the largest variance shows the most
    variability in the data.

21
Example 3.3 Revisited
Internists Salaries (in thousands of dollars)
127 132 138 141 144 146 152 154 165 171 177 192
241 Range 241 - 127 114 (114,000)
22
Example 3.2 Revisited
Range largest value - smallest value Range
615 - 425 190
23
Variance
If the data set is a sample, the variance is
denoted by s2.
If the data set is a population, the variance is
denoted by ? 2.
24
Standard Deviation
If the data set is a sample, the standard
deviation is denoted s.
If the data set is a population, the standard
deviation is denoted ? (sigma).
25
Example 3.1 Revisited (recall 277.4)
xi x I - (xi - )2 x1 255
277.4 x1- 255 - 277.4 -22.4 (x1
- )2 (-22.4)2 501.76 x2 216 277.4
x2 - 216 - 277.4 -61.4 (x2 -
)2 (-61.4)2 3769.96 x3 346 277.4 x3 -
346 - 277.4 68.6 (x3 - )2
(68.6)2 4705.96 x4 300 277.4 x4 -
300 - 277.4 22.6 (x4 - )2
(22.6)2 510.76 x5270 277.4 x5 -
270 - 277.4 -7.4 (x5 - )2
(-7.4)2 54 .76 sum 9543.2
Since this is sample data
26
Example 3.1 Revisited (continued)
If this was data from the entire population
xi ? xi-? (xi - ?)2 255 277.4
-22.4 501.76 216 277.4 -61.4 3769.96 346 277.4
68.6 4705.96 300 277.4
22.6 510.76 270 277.4 -7.4 54.76
9543.2
27
Example 3.4
Compute the standard deviation of the following
sample data 4, 5, 1, -2, 7
xi xi- (xi- )2 4 3 1 1 5 3 2 4 1 3 -2
4 -2 3 -5 25 7 3 4 16
28
Z score
  • The z-score is often called the standardized
    value.
  • It is a measure of location that tells how far a
    particular observation is from the mean.
  • It denotes the number of standard deviations a
    data value xi is from the mean.
  • A data value less than the sample mean will have
    a z-score less than zero.
  • A data value greater than the sample mean will
    have a z-score greater than zero.
  • A data value equal to the sample mean will have a
    z-score of zero.

xi is the data value for which you want the z
score
29
Example 3.1 Revisited
255, 216, 346, 300, 270
The z score for the data value 216 is
30
Chebyshevs Rule
  • Chebyshevs rule applies to any data set,
    regardless of the shape of the distribution of
    the data
  • Can be used to make statements about the
    proportion of data values that must be within a
    specified number of standard deviations from the
    mean
  • Chebyshevs Rule
  • At least (1 - 1/k2) of the items in any data
    set will be within k standard
  • deviations of the mean, where k is any
    value greater than 1.
  • Implications
  • At least 75 of the items must be within k 2
    standard deviations
  • of the mean. (i.e. within the interval -
    2s, 2s)
  • At least 89 of the items must be within k 3
    standard deviations of the mean. (i.e. within the
    interval - 3s, 3s)
  • At least 94 of the items must be within k 4
    standard deviations of the mean. (i.e. within the
    interval - 4s, 4s)

31
Example 3.1 Revisited
Let k 1.5 with 277.4 and s
48.84 According to Chebyshevs Rule, At least (1
- 1/(1.5)2) 1 - 0.44 0.56 or 56 football
players weights are between - k(s) 277.4
- 1.5(48.84) 204.14 and
k(s) 277.4 1.5(48.84) 350.66
32
Empirical Rule for Normal Populations
  • For a Normal distribution, the Empirical rule can
    be used to make statements about the proportion
    of data values that must be within a specified
    number of standard deviations from the mean
  • If a population has mean ? and standard
    deviation ? and is described by a normal curve,
    then
  • 68.26 of the population measurements lie within
    one standard deviation of the mean ? -?, ? ?
  • 95.44 of the population measurements lie within
    two standard deviations of the mean ? -2?, ?
    2?
  • 99.73 of the population measurements lie within
    three standard deviations of the mean ? -3?, ?
    3?

33
Outliers
  • Outliers are defined as sample values that lie
    very far away from the vast majority of the other
    samples
  • Your book uses the term unusual to refer to
    outliers
  • We will use the course notes to numerically
    determine an outlier or an unusual value, NOT the
    criteria set in your text book.
  • If we assume the data is normally distributed,
    there are two ways to numerically determine if a
    sample value is an outlier
  • 1.) Determine the interval ( - 3s,
    3s). If a value is OUTSIDE the interval, then
    this value in an outlier.
  • 2.) Compute the z-value for a sample value. If
    z gt 3 or z lt -3, then the value is an outlier.

34
Example (Outliers)
  • For the first quiz in MATH 123, the average quiz
    score was 16 (out of 20 pts.), with a variance of
    4 pts. Would a score of 11 be considered an
    usually low score?
  • We were given
  • 16, s2 and xi11
  • Note s2 not 4 because we must take the square
    root of the variance to get the standard
    deviation
  • Using the 1st method we determine the range
  • ( - 3s, 3s) (16-3(2)
    , 163(2) (10, 22).
  • Since 11 is within this range, this score is not
    unusually low.
  • Using the 2nd method we determine the z-value
  • Since -2.5 is not less than -3, this score is not
    unusually low.
  • Note Either method can be used to determine
    whether a value is an outlier, because both will
    ALWAYS yield the same result.


35
The End
Write a Comment
User Comments (0)
About PowerShow.com