QBM117 Business Statistics - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

QBM117 Business Statistics

Description:

QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures Objectives To introduce numerical measures for describing the central location of ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 58
Provided by: local199
Category:

less

Transcript and Presenter's Notes

Title: QBM117 Business Statistics


1
QBM117Business Statistics
  • Descriptive Statistics
  • Numerical Descriptive Measures

2
Objectives
  • To introduce numerical measures for describing
    the central location of data
  • To introduce numerical measures for describing
    the variability of data

3
Numerical Descriptive Methods
  • We have looked at tabular and graphical methods
    for presenting data.
  • Although these methods help us to highlight
    important features of the data, they do not tell
    the whole story.
  • Numerical descriptive measures allow us to be
    more precise in describing the characteristics of
    the data.

4
Numerical Descriptive Methods for Quantitative
Data
  • Most numerical descriptive measures are obtained
    through arithmetic operations on the data.
  • Arithmetic calculations can only be applied to
    quantitative data.
  • Consequently most of the numerical descriptive
    measures we will discuss are for quantitative
    data.

5
Parameters and Statistics
  • Recall the terms introduced in lecture 2 week 1
    population, sample, parameter, statistic
  • Numerical measures calculated from sample data
    are called sample statistics.
  • Numerical measures calculated from population
    data are called population parameters.

6
  • We will look at a number of descriptive
    statistics and for each we will learn how to
    calculate both the population parameter and the
    sample statistic.
  • In practice we usually collect data from a sample
    and calculate sample statistics to use as
    estimates of population parameters.

7
Notation
  • Statistics are usually represented by Roman
    letters
  • sample mean
  • sample standard deviation s
  • Parameters are usually represented by Greek
    letters
  • population mean ?
  • population standard deviation ?

8
Properties of numerical data
  • Three major properties that describe quantitative
    data are
  • - measures of central tendency
  • - measures of dispersion
  • - measures of shape

9
Measures of Central Tendency
  • In most sets of data there is a tendency for the
    data to group about a central point.
  • This phenomenon is referred to as central
    tendency.
  • We will look at three measures of central
    tendency mean, median and mode

10
The Mean
  • The most popular and useful measure of central
    tendency is the arithmetic mean, widely known as
    the average.
  • The mean is calculated by summing all the
    observations and dividing by the number of
    observations.
  • It can easily be calculated using the statistics
    function on your calculator.

11
  • The mean of a sample of n measurements
  • is defined as
  • The mean of a population of N measurements
  • is defined as

12
  • I have shown you the formulas so that you
    understand how the mean is calculated.
  • However it is expected that you will calculate
    the mean using the statistics function on your
    calculator.
  • If you are unsure of how to use the statistics
    functions on your calculator refer to your
    calculator manual.
  • The population mean or the sample mean are
    calculated using the same button on your
    calculator.

13
Example 1
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Calculate the mean of the data.

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
14
The Median
  • The median is the middle value when the data are
    arranged in order.
  • To calculate the median
  • - Order the data from smallest to largest
  • - If the number of observations is odd, the
    median is the middle value.
  • - If the number of observations in even, the
    median is the mean of the two middle
    observations.

15
Example 1 revisited
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Calculate the median.

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
16
  • Order the data.
  • 4 10 15 16 18 20 21 23 28 29 29 31 33
    35 37
  • median
  • There are 15 observations and so the median will
    be the middle value.
  • It will be the 8th value.

17
Stem and Leaf Display
  • A useful tool for ordering data is the stem and
    leaf display.
  • To construct and stem and leaf display separate
    each observation into
  • a stem, consisting of all but the last digit
  • and a leaf, the final digit.
  • Write the stems in a vertical column (smallest at
    top) .
  • Write each leaf in the row to the right of the
    stem.
  • Redraw, ordering the leaves.

18
Example 1 revisited
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Construct and stem and leaf display and
    calculate the median.

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
19
0 4
1 6 0 5 8
2 0 8 3 9 1 9
3 1 7 3 5
  • Ordered

0 4
1 0 5 6 8
2 0 1 3 8 9 9
3 1 3 5 7
20
The Mode
  • The mode is the value that occurs most
    frequently.
  • The mode doesnt necessarily lie in the middle.
  • Its claim to be a measure of central tendency is
    based on the fact that it indicates the location
    of greatest concentration of values.
  • The mode is a measure of central tendency that
    can be used for qualitative data.

21
Example 1 revisited
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Calculate the mode.
  • mode 29

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
22
  • If no data value occurs more than once then there
    is no mode.
  • A data set may have more than one mode.
  • If there are two modes then the data are bimodal.
  • If there are more than two modes the data are
    multimodal.

23
Example 2
  • A survey of television-viewing habits among
    university students provided the following data
    on viewing time in hours per week
  • Calculate the mean, median and mode.

14 9 12 4 20 26 17 15
18 15 10 6 16 15 8 5
24
  • mean 13.125
  • 4 5 6 8 9 10 12 14 15 15 15 16 17
    18 20 26
  • median
  • median 14.5
  • mode 15

25
Mean, Median or Mode
  • There are several factors to consider when making
    our choice of measure of central tendency.
  • The mean is generally our first selection.
  • However, there are circumstances when the median
    is better.
  • The mode is seldom the best measure of central
    tendency.

26
  • The mean is a popular measure because it is
    simple to calculate and interpret, and lends
    itself to mathematical manipulation.
  • However the mean is sensitive to skewness and
    outliers.
  • The mean can be thought of as the balance point
    of the data.
  • If there are a few data points that are far from
    the bulk of the data, the mean moves towards them
    in order to maintain balance.

27
  • The mean is the preferred measure of central
    tendency.
  • However, if the data are skewed or contain
    outliers then the median is the preferred measure
    of central tendency.
  • If the data are qualitative, the mode must be
    used.

28
Relationship between Mean, Median and Mode
  • If the data is unimodal and symmetric, the mean,
    median and mode coincide.
  • If the data are unimodal and positively skewed,
    the mean is greater than the median, which is
    greater than the mode.
  • If the data are unimodal and negatively skewed,
    the mean is less than the median, which is less
    than the mode.

29
Measures of Dispersion
  • In addition to knowing the central location of
    the data values, it is important to know how the
    values vary about this point.
  • We are now going to look at measures of
    dispersion, also referred to as
  • - measures of spread
  • - measures of variability
  • We will look at three measures of dispersion
  • range, standard deviation and coefficient of
    variation

30
The Range
  • The range is the difference between the largest
    and smallest observations in a data set.
  • The range measures the total spread of the data
    set.
  • Although the range is a simple measure of
    variability, it does not take into account how
    the data are distributed between the smallest and
    largest values.
  • Hence the range is seldom used as the only
    measure.

31
Example 1 revisited
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Calculate the range.
  • range 37 4 33

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
32
Variance and Standard Deviation
  • The variance and the standard deviation are the
    two most widely accepted measures of dispersion.
  • The variance is the square root of the standard
    deviation.
  • Both measures take into account how far each data
    value is away from the mean.

33
Population Variance
  • The variance of a population of N measurements
  • having mean ? is defined as

34
Sample Variance
  • The variance of a sample of n measurements
  • having mean is defined as

35
Standard Deviation
  • Calculating the variance involves squaring the
    original measurements and hence the unit attached
    to the variance is the square of the unit
    attached to the original measurements.
  • Taking the square root of the variance gives as a
    measure of variability that is in the same units
    as the data.
  • This measure is the standard deviation.

36
Population Standard Deviation
  • The standard deviation of a population of N
    measurements having mean µ is defined as

37
Sample Standard Deviation
  • The standard deviation of a sample of n
    measurements having mean is defined as

38
Calculating the Standard Deviation and Variance
  • As with the mean, you are expected to calculate
    the standard deviation and variance using the
    statistics functions on your calculator.
  • You are not to use the formulae, these have been
    provided to help you understand what the standard
    deviation and variance are.
  • Note that the population standard deviation and
    sample standard deviation are calculated using
    different buttons on your calculator.

39
Important Points about the Standard Deviation
  • The standard deviation cannot be negative.
  • The standard deviation is zero if, and only if,
    all of the observations have the same value.
  • Like the mean, the standard deviation is not
    resistant. Strong skewness or a few outliers can
    greatly increase the standard deviation.

40
Example 1 revisited
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Calculate the standard deviation and the
    variance.

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
41
Coefficient of Variation
  • In some situations we may be interested in a
    measure of variability that indicates how large
    the standard deviation is in relation to the
    mean.
  • This measure is called the coefficient of
    variation (CV) and is calculated by dividing the
    standard deviation of a data set by the mean.
  • The CV allows us to compare the variability of
    two data sets having different units of
    measurement.

42
  • A standard deviation of 1mm would be considered
    very large for the measured thickness of CDs on a
    production line.
  • However a standard deviation of 1mm would be
    considered small for the height of a telephone
    pole.
  • When the means for data sets differ greatly we do
    not get an accurate picture of the relative
    variability in the two data sets by comparing the
    standard deviations.

43
Calculating the Coefficient of Variation
  • The sample coefficient of variation is calculated
    by
  • The population coefficient of variation is
    calculated by

44
Example 1 revisited
  • The following data are the price-earnings ratios
    for a set of stocks whose prices are quoted by
    NASDAQ
  • Calculate the coefficient of variation.

4 20 16 28 31
10 23 37 29 15
33 21 18 35 29
45
Example 2 revisited
  • A survey of television-viewing habits among
    university students provided the following data
    on viewing time in hours per week
  • Calculate the range, standard deviation,
    variance and coefficient of variation.

14 9 12 4 20 26 17 15
18 15 10 6 16 15 8 5
46
  • range 26 6 20
  • standard deviation s 5.92 (2d.p.)
  • variance s2 35.05
  • coefficient of variation cv 0.45 (2d.p.)

47
Interpreting the Standard Deviation
  • The standard deviation, as a measure of average
    deviation around the mean, helps you understand
    how the observations are distributed above and
    below the mean.
  • A data set with a large standard deviation has
    much dispersion with values widely scattered
    around its mean.
  • A data set with a small standard deviation has
    little dispersion with the values tightly
    clustered about the mean.

48
Chebyshes Theorem
  • More than a century ago, Russian mathematician
    Pavroty Chebyshev, found that regardless of how a
    data set is distributed, the proportion of
    observations that are contained within distances
    of k standard deviations of the mean is at least
    1-(1/k2).
  • This is known as Chebyshevs theorem.

49
  • Regardless of the shape of the distribution,
    Chebyshevs theorem states
  • At least 75 of the observations must lie within
    2 standard deviations of the mean
  • At least 89 of the observations must lie within
    3 standard deviations of the mean
  • At least 94 of the observations must lie within
    4 standard deviations of the mean

50
Example 3.11 from text (pg 86)
  • The duration (in minutes) of a sample of 30
    long-distance telephone calls placed by a firm in
    Melbourne in a given week are given in Table 3.2
    on page 86 of the text.
  • The 30 telephone-call durations have a mean of
    10.26 and a standard deviation of 4.29.
  • Chebyshevs theorem states that at least 75 of
    the call durations lie within 2 standard
    deviations of the mean.

51
  • When we look at the data we find that all but
    the largest of the 30 durations fall within this
    interval.
  • That is, the interval actually contains 96.7 of
    the call durations.

52
Empirical Rule
  • A more exact rule applies if the distribution of
    the data is bell-shaped.
  • The empirical rule has evolved from empirical
    studies that have produced samples possessing
    bell-shaped distributions.

53
  • The empirical rule states that for data with a
    bell-shaped distribution
  • About 68 of all observations lie within 1
    standard deviation of the mean
  • About 95 of all observations lie within 2
    standard deviations of the mean
  • Almost all 94 of the observations lie within 3
    standard deviations of the mean

54
Example 3.12 from text (pg 87)
  • The data in the sample of telephone-call
    durations in Table 3.2 have a mean of 10.26, a
    standard deviation of 4.29, and the durations
    have an approximately bell-shaped distribution
    (see Figure 3.5).
  • According to the empirical rule, approximately
    68 of the observations should lie in the
    interval

55
  • According to the empirical rule, approximately
    68 of the observations should lie in the
    interval
  • If we look at the data we see that 21 out of the
    30 durations are contained in this interval, i.e.
    70.
  • This is very close the the empirical rules
    approximation.

56
  • According to the empirical rule, approximately
    95 of the observations should lie in the
    interval
  • If we look at the data we see that 29 out of the
    30 durations are contained in this interval, i.e.
    96.7.
  • This is very close the the empirical rules
    approximation.

57
  • Reading for next lecture
  • Chapter 3 Sections 3.5 - 3.6
  • Exercises
  • 3.7
  • 3.20
  • 3.25a
  • 3.31
Write a Comment
User Comments (0)
About PowerShow.com