PUAF 610 TA - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

PUAF 610 TA

Description:

PUAF 610 TA Session 2 * * Kurtosis a measure of the – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 36
Provided by: Jiah154
Category:

less

Transcript and Presenter's Notes

Title: PUAF 610 TA


1
PUAF 610 TA
  • Session 2

2
Today
  • Class Review- summary statistics
  • STATA Introduction
  • Reminder HW this week

3
Review Two types of Statistics
  • Descriptive statistics summarize numerical
    information.
  • Inferential statistics uses a sample to infer the
    population.

4
Summary statistic
  • In descriptive statistics, summary statistics are
    used to summarize a set of observations.
  • Typically,
  • What is the central value?
  • How widely are values spread from the center?
  • Are there data that are very atypical?
  • .

5
Summary statistic
  • a measure of location, or central tendency
  • a measure of statistical dispersion
  • a measure of the shape of the distribution

6
Central tendency
  • Central tendency relates to the way in which
    quantitative data tend to cluster around some
    value.
  • A measure of central tendency is any of a number
    of ways of specifying the central value.

7
Basic measures of central tendency
  • Mean
  • Median
  • Mode

8
Mean
  • the sum of all measurements divided by the number
    of observations in the data set
  • population mean (?) v. sample mean (x-bar)

9
Example
  • Assume 4 people take PUAF 610, and their final
    exam scores are 95, 87, 93, 83. Whats the mean
    for exam score?

10
Example
  • Mean (95879383)/489.5

11
Median
  • the middle observation, when data are ordered
    from smallest to largest
  • the point of a distribution that divides the
    bottom 50 from the top 50 of the data. The
    median is the 50th percentile.

12
Median
  • If there is an odd number of observations, the
    median is the middle observation
  • If there is an even number of observations, the
    median is the average of the two middle
    observations
  • If the dataset is arranged in increasing order
    the median is located at position (n1)/2

13
Example
  • Calculate the sample median for the following
    observations 1, 5, 2, 8, 7.
  • Start by sorting the values 1, 2, 5, 7, 8.
  • The median is located at position (n1)/23,
    thus it is 5.
  • An odd number of values.

14
Example
  • Calculate the sample median for the following
    observations 1, 5, 2, 8, 7, 2.
  • Start by sorting the values 1, 2, 2, 5, 7, 8.
  • The median is located at position (n1)/23.5,
    Thus, it is the average of the two middlemost
    terms (2 5)/2 3.5.
  • An even number of values

15
Mode
  • the most frequent value in the data set
  • It is possible for a distribution to have more
    than one mode or not to have a mode at all.

16
Example
  • The mode for the following data set
  • (1) 1, 2, 2, 3, 4, 7, 9
  • (2) 12, 26, 26, 53, 84, 71, 71, 79
  • (3) 32, 46, 53, 94, 37, 29

17
Comparing of Mode, Median and Mean
  • Pros and Cons
  • For descriptive purposes we might use the measure
    that suits the data.
  • If we would like to infer from samples to
    populations, the mean is a measure of choice
    because it can be manipulated mathematically.

18
Summary statistic
  • a measure of location, or central tendency
  • a measure of statistical dispersion, or variation
  • a measure of the shape of the distribution

19
Measures of Variation
  • Variation is variability or spread in a variable
  • Measures of variation are lengths of intervals on
    the measurement scale that indicate the spread of
    values in a distribution.

20
Measures of Variation
  • Range
  • Quartiles
  • Interquartile range
  • Variance
  • Standard Deviation

21
Range
  • the length of the smallest interval which
    contains all the data
  • (highest value lowest value) 1

22
Quartiles
  • any of the three values which divide the sorted
    data set into four equal parts, so that each part
    represents one fourth of the sampled population.

23
Quartiles
  • first quartile (Q1) lower quartile cuts off
    lowest 25 of data 25th percentile
  • second quartile (Q2) median cuts data set in
    half 50th percentile
  • third quartile (Q3) upper quartile cuts off
    highest 25 of data, or lowest 75 75th
    percentile
  • The difference between the upper and lower
    quartiles is called the interquartile range.

24
Variance
  • Describes how far values lie from the mean.
  • Use the absolute values or to square the
    deviation scores to get rid of the minus signs.
  • Averaging absolute values cannot be used in more
    advanced analyses.
  • By averaging the sum of squared deviations (sum
    of squares) we can get a measure that is
    susceptible to further algebraic manipulations
    that are difficult or impossible with absolute
    values.

25
Variance
  • Less intuitive and more difficult to interpret,
    because it is measured in squared units rather
    than original units
  • Do not use variance much
  • (in population) and (in sample)

 
where µ is the mean and N is the number of
population.
25
26
Standard deviation
  • A widely used measure of the variability or
    dispersion.
  • It shows how much variation there is from the
    "average.
  • Standard deviation is obtained by taking a square
    root of the variance, i.e.
  • (population) (sample)

26
27
Standard deviation
  • A low standard deviation indicates that the data
    points tend to be very close to the mean.
  • A high standard deviation indicates that the data
    is spread out over a large range of values.

28
Summary statistic
  • a measure of location, or central tendency
  • a measure of statistical dispersion, or variation
  • a measure of the shape of the distribution

29
Shape of the distribution
  • Skewness
  • Kurtosis

30
Skewness
  • a measure of the asymmetry of the distribution
  • The skewness value can be positive or negative,
    or even undefined.

31
Skewness
  • negative skew The left tail is longer the mass
    of the distribution is concentrated on the right
    of the figure. It has relatively few low values.

32
Skewness
  • positive skew The right tail is longer the mass
    of the distribution is concentrated on the left
    of the figure. It has relatively few high values.

33
Skewness
  • A zero value indicates that the values are
    relatively evenly distributed on both sides of
    the mean.

34
Kurtosis
  • a measure of the "peakedness" of the distribution
  • Higher kurtosis means more of the variance is the
    result of infrequent extreme deviations, as
    opposed to frequent modestly sized deviations

35
Thats all for class review. So far so
good? Lets go to STATA!
Write a Comment
User Comments (0)
About PowerShow.com