Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) - PowerPoint PPT Presentation

About This Presentation

Title:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

Description:

Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) 02-250-01 Lecture 2 – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 79

Provided by: HPAutho76

Category:

more less

Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)

1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)

02-250-01
Lecture 2

2
Sign Up for Participant Pool!!

see Psychology research first hand!
earn up to 2 bonus points
HOW????
sign up on the web (takes less than 5 minutes)
www.uwindsor.ca/psychology/signup
or access through psych homepage
You MUST sign up by May 19 to be included

3
Major Points Today

Types of Measurement
Summation Notation
Organizing Data
Stem and Leaf Displays
Graphs
Measures of Central Tendency

4
Types of Measurement

There are 4 types of measurement most often used
in statistics
Nominal
Ordinal
Interval
Ratio

5
Nominal Measurement

Nominal Measurement the classification of
measurements into a set of categories
The numbers produced by nominal measurement are
frequencies of occurrence in the categories
(e.g., 22 ducks, 12 chickens, 2 geese, etc)

6
Nominal Measurement cont.

A second example is gender 2 categories, male
and female
Nominal measurement applies to qualitative
variables - elements are assigned to a category
because they possess one characteristic or
another
Nominal data is also termed qualitative data

7
Ordinal Measurement

Ordinal Measurement the rank ordering of
elements on a continuum
Ordinal measurement does not measure the amount
of the variable - it represents the individuals
placement in a continuum (or ranking e.g., the
winner of a race is in first place)

8
Ordinal Measurement cont.

It is important to note that the amount of
variable difference between rank position is not
constant - the difference in amount of talent
between the 1st and 2nd place finishers in a race
cannot be assumed to be the same as the
difference in amount of talent between the 5th
and 6th place finishers
Ordinal data can tell you that the person in 1st
place finished before the person in 3rd place,
but not by how much

9
Interval Measurement

Interval Measurement the assignment of numerical
quantity to the variable in a way that
the number assigned reflects the amount of the
variable
the size of the measurement unit remains constant
and the zero point is defined arbitrarily and
does not represent an absence of the property
being measured

10
Interval Measurement cont.

The best example is temperature
40C represents how hot something is (the amount
of heat it has)
The unit of measurement (1C) represents the same
amount of heat regardless of where it occurs in
the range of measurement (the amount of change in
temperature is the same between 25C - 26C and
32C - 33C)
The zero point (0C) is arbitrary - it represents
the point at which water freezes, not the absence
of temperature

11
Interval Measurement cont.

Interval measurement can contain negative
numbers, whereas Nominal and Ordinal Measurement
do not

12
Ratio Measurement

Ratio Measurement The assignment of numerical
quantity to the variable in such a way that
the number assigned reflects the amount of the
variable
the size of the measurement unit remains constant
and the zero point represents an absence of the
property being measured

13
Ratio Measurement

Good examples are time and length
A ratio scale cannot produce negative numbers
Interval and ratio measurement are equivalent
for statistical purposes and are often referred
to as one thing (interval/ratio data)

14
Summation Notation

We commonly use the letters X and Y to
represent the variables we have measured
Upper case Greek letter sigma (?) is known as
the summation operator it means the sum of

15
? Example

Suppose we keep a record for 6 days of every time
someone slips in the CAW Student Centre Cafeteria
(represented by X), the data may look like this

16
Data Example

Day X
Mon 10
Tues 5
Weds 12
Thurs 11
Fri 21
Sat 28

17
?X

?X means the sum of all the X scores, so that
?X X1 X2 X3 ... XN
10 5 12 11 21 28
87
Note X1 means the first X score XN means
the last X score

18
(?X)2

(?X)2 means the square of the sum (total all
numbers within parentheses and then square), so
that
(?X)2 (X1 X2 X3 ... XN)2
(10 5 12 11 21 28)2
(87)(87)
7569

19
?X 2

?X 2 means the sum of the squares (square each
number and then sum), so that
?X 2 X1 2 X2 2 X3 2 ... XN 2
10 2 5 2 12 2 11 2 21 2 28 2
100 25 144 121 441 784
1615

20
More Summation Notation

Suppose you also keep track of the number of
pieces of garbage dropped on the floor of the CAW
Student Centre for the same days as above
(variable Y) and the data were as follows

21
Example Data

Day X Y
Mon 10 210
Tues 5 160
Weds 12 245
Thurs 11 240
Fri 21 340
Sat 28 415

22
?XY

?XY means the sum of the products
?XY (X1)(Y1) (X2)(Y2) (X3)(Y3) ...
(XN)(YN)
(10)(210) (5)(160) (12)(245)
(11)(240) (21)(340) (28)(415)
2100 800 2940 2640 7140 11620
27240

23
Organizing Data

Frequency Distributions A frequency distribution
is a table which shows the number of individuals
or events that occurred at each measurement value
this is the most common form of organizing data

24
Frequency Distributions

The following hypothetical frequency distribution
shows the number of women in different majors at
the University of Windsor
Major of Women
Art 15
Biology 35
Chemistry 34
Music 85
Psychology 97

25
Frequency Distributions

This frequency distribution organizes the data
into nominal categories (by major)
Frequency distributions can also organize data by
points of measurement on a continuous variable,
as follows

26
Frequency Distributions

Age of Students in 02-250
Age Frequency
18 14
19 85
20 58
21 40
22 35
23 16
24 10
25 6
26 4

27
Frequency Distributions

Frequency distributions should not exceed 15 to
20 lines, as the point is to summarize the data
in a way that represents all the information
concisely
When there are more data than can be classified
in 20 lines, the data can be grouped into score
ranges known as class intervals, as in this
example

28
Class Interval Example

Canada Population Estimates for the Year 2016 (in
millions)
Age Pop Age
Pop
0 - 4 2.05 50 - 54 2.79
5 - 9 2.07 55 - 59 2.69
10 - 14 2.12 60 - 64 2.31
15 - 19 2.19 65 - 69 1.97
20 - 24 2.38 70 - 74 1.42
25 - 29 2.48 75 - 79 0.99
30 - 34 2.54 80 - 84 0.71
35 - 39 2.53 85 - 89 0.47
40 - 44 2.51 90 0.33
45 - 49 2.57

29
Frequency Distributions cont.

Looking at the frequency distribution tells us
The most frequently occurring age is expected to
be in the 50-54 age range (b/c this is the
largest population estimate, 2.79 million).
The age frequencies are expected to be fairly
evenly distributed from 0 to 70 years old and
then fall off
The expected distributions of ages is not
symmetrical very low (young) and high (old) ages
do not occur with equal likelihood

30
Frequency Distributions cont.

Dividing the data into class intervals makes the
data more accessible
Data which has been divided into class intervals
is sometimes referred to as grouped data

31
Cumulative Frequency Distributions

Frequency distributions can be made to contain
more information, as when a column of cumulative
frequencies is added
Cumulative Frequency Distribution A table in
which the frequency of individuals or events at
each measurement value is added to previous
frequencies so that each line reads as the total
frequency of that and lower measurement values

32
Cumulative Frequency Ex.

Age of Students in 02-250
Age Frequency Cumulative Frequency
18 14 14
19 85 99
20 58 157
21 40 197
22 35 232
23 16 248
24 10 258
25 6 264
26 4 268

33
More Frequency Distributions

Frequency distributions can also contain
information about the percentages and cumulative
percentages of observations at the various scores

34
More Frequency Distributions

Age of Students in 02-250
Age Frequency Cumulative Cumulative
Frequency
18 14 14 5.22 5.22
19 85 99 31.72 36.94
20 58 157 21.64 58.58
21 40 197 14.93 73.51
22 35 232 13.06 86.57
23 16 248 5.97 92.54
24 10 258 3.93 96.27
25 6 264 2.24 98.51
26 4 268 1.49 100.00

35
Exact Limits

All measurements are expressed in discrete units,
such as seconds or centimeters
No matter how small the unit of measurement, it
is always possible to imagine finer measurement
1 cm 10 mm

36
Exact Limits

So, for continuous variables, any measure should
be viewed as representing a range of values
This range has a width equal to the unit of
measurement used, and the boundaries of this
range are the exact limits of the measure

37
Exact Limits

E.g., If we say an event lasted 12 seconds, we
mean it is closer to 12 seconds than to 11 or 13
seconds. A score of 12 represents a range of
values. This range is one second wide (one unit
of the measurement) and extends between 11.5 and
12.5 seconds

38
Exact Limits

Exact limits identify the upper and lower ends of
the range represented by the raw score and are
the real boundaries of the measure in question

39
Exact Limits

Exact Limits Values one-half unit of measurement
above and below the score or class interval.
Exact limits are the boundaries of the range of
values represented by the measure
Some authors refer to exact limits as real limits

40
Exact Limits Examples

Measure Exact Limits
52 51.5 - 52.5
51 50.5 - 51.5
52.2 52.15 - 52.25
52.1 52.05 - 52.15

41
Exact Limits Examples

Measure Exact Limits
50.02 50.015 - 50.025
50.01 50.005 - 50.015
Class Interval Exact Limits
50 - 54 49.5 - 54.5
55 - 59 54.5 - 59.5

42
Stem-and-Leaf Displays

Stem-and-Leaf Display partitions each score into
a stem and a leaf and groups the scores
according to common stems
The Leaf is the rightmost digit
The Stem is the digit (or digits) to the left
of the leaf (the stem is 0 for 1 digit numbers)

43
Stem-and-Leaf

E.g.,
Stem Leaf
4 0 4
54 5 4
123 12 3
123 4
The numbers 24 and 26 have different leaves(4
and
6) but the same stem (2)

44
Stem-and-Leaf

Consider this raw data and their stem-and-leaf
plot

45
Stem-and-Leaf

stem leaf
3 6
4 477
5 05899
6 01225788
7 24559
8 578
9 2

Data 36, 44, 47, 47, 50, 55, 58, 59, 59, 60, 61,
62, 62, 65, 67, 68, 68, 72, 74, 75, 75, 79, 85,
87, 88, 92
46
Stem-and-Leaf

Or this example
Data 102, 104, 115, 116, 116, 125, 127, 128,
129, 129, 131, 136, 137, 145, 145
stem leaf
10 24
11 566
12 57899
13 167
14 55

47
Stem-and-Leaf

Unlike frequency distributions, stem-and-leaf
plots give an indication of the overall
distribution of the scores (e.g., evenly spread
or bunched, symmetrical or nonsymmetrical)
Note Make sure you include every instance of a
given value, e.g., if 57 occurs 3 times in the
data set, this should be represented in the stem
and leaf display with a stem of 5 and three 7s in
the leaf.

48
Graphs

Graph refers to all manner of pictorial, or
graphic, representation of data
We will consider histograms and frequency
polygons

49
Graphs

The horizontal axis (X axis) is labeled with
units representing points of measurement and the
vertical axis (Y axis) is labeled with values
representing frequency of occurrence
Histograms and frequency polygons are like
2-dimensional representations of frequency
distributions

50
Histogram

Histogram A graphic in which the horizontal axis
identifies points of measurement, and the
vertical axis represents frequency of occurrence
Solid bars are used to represent the frequency at
each point of measurement (a histogram is a bar
graph)

51
Age Data Histogram Example
52
Frequency Polygon

Frequency Polygon A graphic in which the
horizontal axis identifies points of measurement,
and the vertical axis represents frequency of
occurrence (a frequency polygon is a line graph)

53
Age Data Frequency Polygon
54
Graphs cont.

Both histograms and frequency polygons can be
embellished by the simultaneous plotting of more
than one variable, as shown next

55
Graphs cont.
56
Graphs cont.
57
Describing Data

Averages an average is a numerical value that
indicates the middle point or central region of
the raw data
Averages are sometimes referred to as measures of
central tendency

58
Averages

3 statistics are commonly termed averages
Mode
Median
Mean

59
Mode

Mode The most frequently occurring score
A distribution with a single most frequently
occurring score (one hump) is termed a unimodal
(single mode) distribution
A distribution with 2 values that share the
quality of being most frequently occurring (2
humps) is termed bimodal (2 modes)

60
Mode Example

Age of Students in 02-250
Age Frequency
18 14
19 85 ? In this example, the Mode is
20 58 19 as it has the highest
21 40 frequency
22 35
23 16
24 10
25 6
26 4

61
A la Mode

The mode does not take into account all of the
data - only the one most frequently occurring
score
The mode is the score with the highest bar in a
histogram, or the highest point in a frequency
polygon
When the data are combined into class intervals,
the mode is the mid-point of the class interval
that contains the most scores

62
Median

Median The middle point of the distribution, or
the score which bisects the distribution (divides
it into upper and lower halves)

63
Median

If there are an ODD number of scores, the median
is the middle score
1, 3, 6, 7, 8, 13, 15, 17, 18, 21, 23
?
Median 13
There are 5 scores above the median,
and 5 below

64
Median

If there are an EVEN number of scores, the median
is the midpoint between the two middle scores
1, 3, 6, 7, 8, 13, 15, 17, 18, 23
?
Median (8 13)/2 10.5

65
Median Notes

NOTE!
When determining the median, you must arrange the
scores in ascending or descending order first!

66
Steps to Finding the Median

Arrange data in ascending or descending order
Count the number of scores (N)
If there are an odd number of scores, find the
middle point (the score where there are the same
number of scores above and below it) - this is
the median

67
Steps to Finding the Median

4. If there are an even number of scores,
find the 2 middle scores - add them,
and divide by 2 - this is the median

68
More Median

When a distribution is viewed as area, the median
divides the total area in half

50
50
Median
69
Median cont.

The median is based on the value of one or two
scores, and does not take into account all of the
data
When the data are grouped into class intervals,
the median can be viewed as the midpoint of the
class interval which contains the middle score
(50th frequency). This is only a rough estimate

70
Arithmetic Mean

Arithmetic Mean the sum of the scores divided by
the number of scores (what is generally thought
of as the average)

71
Mean

The mean of a sample of X scores is symbolized as
? , which is said as X bar
The mean of a population of X scores is
symbolized by the Greek letter mu (µ)
Greek letters tend to be used for parameters,
while conventional letters are used for statistics

72
Mean

The algebraic definition of the population mean
is as follows
N is used to refer to the number of scores
in the data set (termed population size)

73
Sample Mean

The algebraic definition of the sample mean is as
follows
n is used to refer to the number of scores
in the data set (termed sample size)

74
Mean cont.

The algebraic formula for the sample and
population mean is the same, (although some terms
have different formulae for samples and
populations)

75
Mean cont.

The mean is used as the measure of average almost
exclusively (rather than the mode or median)
because it is defined algebraically and considers
all the raw scores in the data set

76
Mean cont.

In any group of scores, the sum of the deviations
from the mean equals zero
X X- ? n 6
3 3 - 5.50 -2.50 ? ?X/n
5 5 - 5.50 -0.50 ? 33/6
9 9 - 5.50 3.50 ? 5.50
2 2 - 5.50 -3.50
8 8 - 5.50 2.50
6 6 - 5.50 0.50
?X 33 ?(X- ?) 0.00

77
Relative Characteristics of Averages

If the distribution is symmetrical, the mean,
median, and mode have the same value
The longer tail of a non-symmetrical distribution
pulls the mean more than the mode and median
Therefore the mean is more effected by outliers
(very large or very small data points) than are
the mode and median

78
Relative Characteristics of Averages