Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) 02-250-01 Lecture 2 – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 79
Provided by: HPAutho76
Category:

less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)


1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
  • 02-250-01
  • Lecture 2

2
Sign Up for Participant Pool!!
  • see Psychology research first hand!
  • earn up to 2 bonus points
  • HOW????
  • sign up on the web (takes less than 5 minutes)
  • www.uwindsor.ca/psychology/signup
  • or access through psych homepage
  • You MUST sign up by May 19 to be included

3
Major Points Today
  • Types of Measurement
  • Summation Notation
  • Organizing Data
  • Stem and Leaf Displays
  • Graphs
  • Measures of Central Tendency

4
Types of Measurement
  • There are 4 types of measurement most often used
    in statistics
  • Nominal
  • Ordinal
  • Interval
  • Ratio

5
Nominal Measurement
  • Nominal Measurement the classification of
    measurements into a set of categories
  • The numbers produced by nominal measurement are
    frequencies of occurrence in the categories
    (e.g., 22 ducks, 12 chickens, 2 geese, etc)

6
Nominal Measurement cont.
  • A second example is gender 2 categories, male
    and female
  • Nominal measurement applies to qualitative
    variables - elements are assigned to a category
    because they possess one characteristic or
    another
  • Nominal data is also termed qualitative data

7
Ordinal Measurement
  • Ordinal Measurement the rank ordering of
    elements on a continuum
  • Ordinal measurement does not measure the amount
    of the variable - it represents the individuals
    placement in a continuum (or ranking e.g., the
    winner of a race is in first place)

8
Ordinal Measurement cont.
  • It is important to note that the amount of
    variable difference between rank position is not
    constant - the difference in amount of talent
    between the 1st and 2nd place finishers in a race
    cannot be assumed to be the same as the
    difference in amount of talent between the 5th
    and 6th place finishers
  • Ordinal data can tell you that the person in 1st
    place finished before the person in 3rd place,
    but not by how much

9
Interval Measurement
  • Interval Measurement the assignment of numerical
    quantity to the variable in a way that
  • the number assigned reflects the amount of the
    variable
  • the size of the measurement unit remains constant
  • and the zero point is defined arbitrarily and
    does not represent an absence of the property
    being measured

10
Interval Measurement cont.
  • The best example is temperature
  • 40C represents how hot something is (the amount
    of heat it has)
  • The unit of measurement (1C) represents the same
    amount of heat regardless of where it occurs in
    the range of measurement (the amount of change in
    temperature is the same between 25C - 26C and
    32C - 33C)
  • The zero point (0C) is arbitrary - it represents
    the point at which water freezes, not the absence
    of temperature

11
Interval Measurement cont.
  • Interval measurement can contain negative
    numbers, whereas Nominal and Ordinal Measurement
    do not

12
Ratio Measurement
  • Ratio Measurement The assignment of numerical
    quantity to the variable in such a way that
  • the number assigned reflects the amount of the
    variable
  • the size of the measurement unit remains constant
  • and the zero point represents an absence of the
    property being measured

13
Ratio Measurement
  • Good examples are time and length
  • A ratio scale cannot produce negative numbers
  • Interval and ratio measurement are equivalent
    for statistical purposes and are often referred
    to as one thing (interval/ratio data)

14
Summation Notation
  • We commonly use the letters X and Y to
    represent the variables we have measured
  • Upper case Greek letter sigma (?) is known as
    the summation operator it means the sum of

15
? Example
  • Suppose we keep a record for 6 days of every time
    someone slips in the CAW Student Centre Cafeteria
    (represented by X), the data may look like this

16
Data Example
  • Day X
  • Mon 10
  • Tues 5
  • Weds 12
  • Thurs 11
  • Fri 21
  • Sat 28

17
?X
  • ?X means the sum of all the X scores, so that
  • ?X X1 X2 X3 ... XN
  • 10 5 12 11 21 28
  • 87
  • Note X1 means the first X score XN means
    the last X score

18
(?X)2
  • (?X)2 means the square of the sum (total all
    numbers within parentheses and then square), so
    that
  • (?X)2 (X1 X2 X3 ... XN)2
  • (10 5 12 11 21 28)2
  • (87)(87)
  • 7569

19
?X 2
  • ?X 2 means the sum of the squares (square each
    number and then sum), so that
  • ?X 2 X1 2 X2 2 X3 2 ... XN 2
  • 10 2 5 2 12 2 11 2 21 2 28 2
  • 100 25 144 121 441 784
  • 1615

20
More Summation Notation
  • Suppose you also keep track of the number of
    pieces of garbage dropped on the floor of the CAW
    Student Centre for the same days as above
    (variable Y) and the data were as follows

21
Example Data
  • Day X Y
  • Mon 10 210
  • Tues 5 160
  • Weds 12 245
  • Thurs 11 240
  • Fri 21 340
  • Sat 28 415

22
?XY
  • ?XY means the sum of the products
  • ?XY (X1)(Y1) (X2)(Y2) (X3)(Y3) ...
    (XN)(YN)
  • (10)(210) (5)(160) (12)(245)
    (11)(240) (21)(340) (28)(415)
  • 2100 800 2940 2640 7140 11620
  • 27240

23
Organizing Data
  • Frequency Distributions A frequency distribution
    is a table which shows the number of individuals
    or events that occurred at each measurement value
  • this is the most common form of organizing data

24
Frequency Distributions
  • The following hypothetical frequency distribution
    shows the number of women in different majors at
    the University of Windsor
  • Major of Women
  • Art 15
  • Biology 35
  • Chemistry 34
  • Music 85
  • Psychology 97

25
Frequency Distributions
  • This frequency distribution organizes the data
    into nominal categories (by major)
  • Frequency distributions can also organize data by
    points of measurement on a continuous variable,
    as follows

26
Frequency Distributions
  • Age of Students in 02-250
  • Age Frequency
  • 18 14
  • 19 85
  • 20 58
  • 21 40
  • 22 35
  • 23 16
  • 24 10
  • 25 6
  • 26 4

27
Frequency Distributions
  • Frequency distributions should not exceed 15 to
    20 lines, as the point is to summarize the data
    in a way that represents all the information
    concisely
  • When there are more data than can be classified
    in 20 lines, the data can be grouped into score
    ranges known as class intervals, as in this
    example

28
Class Interval Example
  • Canada Population Estimates for the Year 2016 (in
    millions)
  • Age Pop Age
    Pop
  • 0 - 4 2.05 50 - 54 2.79
  • 5 - 9 2.07 55 - 59 2.69
  • 10 - 14 2.12 60 - 64 2.31
  • 15 - 19 2.19 65 - 69 1.97
  • 20 - 24 2.38 70 - 74 1.42
  • 25 - 29 2.48 75 - 79 0.99
  • 30 - 34 2.54 80 - 84 0.71
  • 35 - 39 2.53 85 - 89 0.47
  • 40 - 44 2.51 90 0.33
  • 45 - 49 2.57

29
Frequency Distributions cont.
  • Looking at the frequency distribution tells us
  • The most frequently occurring age is expected to
    be in the 50-54 age range (b/c this is the
    largest population estimate, 2.79 million).
  • The age frequencies are expected to be fairly
    evenly distributed from 0 to 70 years old and
    then fall off
  • The expected distributions of ages is not
    symmetrical very low (young) and high (old) ages
    do not occur with equal likelihood

30
Frequency Distributions cont.
  • Dividing the data into class intervals makes the
    data more accessible
  • Data which has been divided into class intervals
    is sometimes referred to as grouped data

31
Cumulative Frequency Distributions
  • Frequency distributions can be made to contain
    more information, as when a column of cumulative
    frequencies is added
  • Cumulative Frequency Distribution A table in
    which the frequency of individuals or events at
    each measurement value is added to previous
    frequencies so that each line reads as the total
    frequency of that and lower measurement values

32
Cumulative Frequency Ex.
  • Age of Students in 02-250
  • Age Frequency Cumulative Frequency
  • 18 14 14
  • 19 85 99
  • 20 58 157
  • 21 40 197
  • 22 35 232
  • 23 16 248
  • 24 10 258
  • 25 6 264
  • 26 4 268

33
More Frequency Distributions
  • Frequency distributions can also contain
    information about the percentages and cumulative
    percentages of observations at the various scores

34
More Frequency Distributions
  • Age of Students in 02-250
  • Age Frequency Cumulative Cumulative
  • Frequency
  • 18 14 14 5.22 5.22
  • 19 85 99 31.72 36.94
  • 20 58 157 21.64 58.58
  • 21 40 197 14.93 73.51
  • 22 35 232 13.06 86.57
  • 23 16 248 5.97 92.54
  • 24 10 258 3.93 96.27
  • 25 6 264 2.24 98.51
  • 26 4 268 1.49 100.00

35
Exact Limits
  • All measurements are expressed in discrete units,
    such as seconds or centimeters
  • No matter how small the unit of measurement, it
    is always possible to imagine finer measurement
  • 1 cm 10 mm

36
Exact Limits
  • So, for continuous variables, any measure should
    be viewed as representing a range of values
  • This range has a width equal to the unit of
    measurement used, and the boundaries of this
    range are the exact limits of the measure

37
Exact Limits
  • E.g., If we say an event lasted 12 seconds, we
    mean it is closer to 12 seconds than to 11 or 13
    seconds. A score of 12 represents a range of
    values. This range is one second wide (one unit
    of the measurement) and extends between 11.5 and
    12.5 seconds

38
Exact Limits
  • Exact limits identify the upper and lower ends of
    the range represented by the raw score and are
    the real boundaries of the measure in question

39
Exact Limits
  • Exact Limits Values one-half unit of measurement
    above and below the score or class interval.
    Exact limits are the boundaries of the range of
    values represented by the measure
  • Some authors refer to exact limits as real limits

40
Exact Limits Examples
  • Measure Exact Limits
  • 52 51.5 - 52.5
  • 51 50.5 - 51.5
  • 52.2 52.15 - 52.25
  • 52.1 52.05 - 52.15

41
Exact Limits Examples
  • Measure Exact Limits
  • 50.02 50.015 - 50.025
  • 50.01 50.005 - 50.015
  • Class Interval Exact Limits
  • 50 - 54 49.5 - 54.5
  • 55 - 59 54.5 - 59.5

42
Stem-and-Leaf Displays
  • Stem-and-Leaf Display partitions each score into
    a stem and a leaf and groups the scores
    according to common stems
  • The Leaf is the rightmost digit
  • The Stem is the digit (or digits) to the left
    of the leaf (the stem is 0 for 1 digit numbers)

43
Stem-and-Leaf
  • E.g.,
  • Stem Leaf
  • 4 0 4
  • 54 5 4
  • 123 12 3
  • 123 4
  • The numbers 24 and 26 have different leaves(4
    and
  • 6) but the same stem (2)

44
Stem-and-Leaf
  • Consider this raw data and their stem-and-leaf
    plot

45
Stem-and-Leaf
  • stem leaf
  • 3 6
  • 4 477
  • 5 05899
  • 6 01225788
  • 7 24559
  • 8 578
  • 9 2

Data 36, 44, 47, 47, 50, 55, 58, 59, 59, 60, 61,
62, 62, 65, 67, 68, 68, 72, 74, 75, 75, 79, 85,
87, 88, 92
46
Stem-and-Leaf
  • Or this example
  • Data 102, 104, 115, 116, 116, 125, 127, 128,
    129, 129, 131, 136, 137, 145, 145
  • stem leaf
  • 10 24
  • 11 566
  • 12 57899
  • 13 167
  • 14 55

47
Stem-and-Leaf
  • Unlike frequency distributions, stem-and-leaf
    plots give an indication of the overall
    distribution of the scores (e.g., evenly spread
    or bunched, symmetrical or nonsymmetrical)
  • Note Make sure you include every instance of a
    given value, e.g., if 57 occurs 3 times in the
    data set, this should be represented in the stem
    and leaf display with a stem of 5 and three 7s in
    the leaf.

48
Graphs
  • Graph refers to all manner of pictorial, or
    graphic, representation of data
  • We will consider histograms and frequency
    polygons

49
Graphs
  • The horizontal axis (X axis) is labeled with
    units representing points of measurement and the
    vertical axis (Y axis) is labeled with values
    representing frequency of occurrence
  • Histograms and frequency polygons are like
    2-dimensional representations of frequency
    distributions

50
Histogram
  • Histogram A graphic in which the horizontal axis
    identifies points of measurement, and the
    vertical axis represents frequency of occurrence
  • Solid bars are used to represent the frequency at
    each point of measurement (a histogram is a bar
    graph)

51
Age Data Histogram Example
52
Frequency Polygon
  • Frequency Polygon A graphic in which the
    horizontal axis identifies points of measurement,
    and the vertical axis represents frequency of
    occurrence (a frequency polygon is a line graph)

53
Age Data Frequency Polygon
54
Graphs cont.
  • Both histograms and frequency polygons can be
    embellished by the simultaneous plotting of more
    than one variable, as shown next

55
Graphs cont.
56
Graphs cont.
57
Describing Data
  • Averages an average is a numerical value that
    indicates the middle point or central region of
    the raw data
  • Averages are sometimes referred to as measures of
    central tendency

58
Averages
  • 3 statistics are commonly termed averages
  • Mode
  • Median
  • Mean

59
Mode
  • Mode The most frequently occurring score
  • A distribution with a single most frequently
    occurring score (one hump) is termed a unimodal
    (single mode) distribution
  • A distribution with 2 values that share the
    quality of being most frequently occurring (2
    humps) is termed bimodal (2 modes)

60
Mode Example
  • Age of Students in 02-250
  • Age Frequency
  • 18 14
  • 19 85 ? In this example, the Mode is
  • 20 58 19 as it has the highest
  • 21 40 frequency
  • 22 35
  • 23 16
  • 24 10
  • 25 6
  • 26 4

61
A la Mode
  • The mode does not take into account all of the
    data - only the one most frequently occurring
    score
  • The mode is the score with the highest bar in a
    histogram, or the highest point in a frequency
    polygon
  • When the data are combined into class intervals,
    the mode is the mid-point of the class interval
    that contains the most scores

62
Median
  • Median The middle point of the distribution, or
    the score which bisects the distribution (divides
    it into upper and lower halves)

63
Median
  • If there are an ODD number of scores, the median
    is the middle score
  • 1, 3, 6, 7, 8, 13, 15, 17, 18, 21, 23
  • ?
  • Median 13
  • There are 5 scores above the median,
  • and 5 below

64
Median
  • If there are an EVEN number of scores, the median
    is the midpoint between the two middle scores
  • 1, 3, 6, 7, 8, 13, 15, 17, 18, 23
  • ?
  • Median (8 13)/2 10.5

65
Median Notes
  • NOTE!
  • When determining the median, you must arrange the
    scores in ascending or descending order first!

66
Steps to Finding the Median
  1. Arrange data in ascending or descending order
  2. Count the number of scores (N)
  3. If there are an odd number of scores, find the
    middle point (the score where there are the same
    number of scores above and below it) - this is
    the median

67
Steps to Finding the Median
  • 4. If there are an even number of scores,
  • find the 2 middle scores - add them,
  • and divide by 2 - this is the median

68
More Median
  • When a distribution is viewed as area, the median
    divides the total area in half

50
50
Median
69
Median cont.
  • The median is based on the value of one or two
    scores, and does not take into account all of the
    data
  • When the data are grouped into class intervals,
    the median can be viewed as the midpoint of the
    class interval which contains the middle score
    (50th frequency). This is only a rough estimate

70
Arithmetic Mean
  • Arithmetic Mean the sum of the scores divided by
    the number of scores (what is generally thought
    of as the average)

71
Mean
  • The mean of a sample of X scores is symbolized as
    ? , which is said as X bar
  • The mean of a population of X scores is
    symbolized by the Greek letter mu (µ)
  • Greek letters tend to be used for parameters,
    while conventional letters are used for statistics

72
Mean
  • The algebraic definition of the population mean
    is as follows
  • N is used to refer to the number of scores
  • in the data set (termed population size)

73
Sample Mean
  • The algebraic definition of the sample mean is as
    follows
  • n is used to refer to the number of scores
  • in the data set (termed sample size)

74
Mean cont.
  • The algebraic formula for the sample and
    population mean is the same, (although some terms
    have different formulae for samples and
    populations)

75
Mean cont.
  • The mean is used as the measure of average almost
    exclusively (rather than the mode or median)
    because it is defined algebraically and considers
    all the raw scores in the data set

76
Mean cont.
  • In any group of scores, the sum of the deviations
    from the mean equals zero
  • X X- ? n 6
  • 3 3 - 5.50 -2.50 ? ?X/n
  • 5 5 - 5.50 -0.50 ? 33/6
  • 9 9 - 5.50 3.50 ? 5.50
  • 2 2 - 5.50 -3.50
  • 8 8 - 5.50 2.50
  • 6 6 - 5.50 0.50
  • ?X 33 ?(X- ?) 0.00

77
Relative Characteristics of Averages
  • If the distribution is symmetrical, the mean,
    median, and mode have the same value
  • The longer tail of a non-symmetrical distribution
    pulls the mean more than the mode and median
  • Therefore the mean is more effected by outliers
    (very large or very small data points) than are
    the mode and median

78
Relative Characteristics of Averages
  • Relative positions of the mean and median
  • Note The mode is the highest point in the
    distribution
Write a Comment
User Comments (0)
About PowerShow.com