Title: Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
1Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
2Sign Up for Participant Pool!!
- see Psychology research first hand!
- earn up to 2 bonus points
- HOW????
- sign up on the web (takes less than 5 minutes)
- www.uwindsor.ca/psychology/signup
- or access through psych homepage
- You MUST sign up by May 19 to be included
3Major Points Today
- Types of Measurement
- Summation Notation
- Organizing Data
- Stem and Leaf Displays
- Graphs
- Measures of Central Tendency
4Types of Measurement
- There are 4 types of measurement most often used
in statistics - Nominal
- Ordinal
- Interval
- Ratio
5Nominal Measurement
- Nominal Measurement the classification of
measurements into a set of categories - The numbers produced by nominal measurement are
frequencies of occurrence in the categories
(e.g., 22 ducks, 12 chickens, 2 geese, etc)
6Nominal Measurement cont.
- A second example is gender 2 categories, male
and female - Nominal measurement applies to qualitative
variables - elements are assigned to a category
because they possess one characteristic or
another - Nominal data is also termed qualitative data
7Ordinal Measurement
- Ordinal Measurement the rank ordering of
elements on a continuum - Ordinal measurement does not measure the amount
of the variable - it represents the individuals
placement in a continuum (or ranking e.g., the
winner of a race is in first place)
8Ordinal Measurement cont.
- It is important to note that the amount of
variable difference between rank position is not
constant - the difference in amount of talent
between the 1st and 2nd place finishers in a race
cannot be assumed to be the same as the
difference in amount of talent between the 5th
and 6th place finishers - Ordinal data can tell you that the person in 1st
place finished before the person in 3rd place,
but not by how much
9Interval Measurement
- Interval Measurement the assignment of numerical
quantity to the variable in a way that - the number assigned reflects the amount of the
variable - the size of the measurement unit remains constant
- and the zero point is defined arbitrarily and
does not represent an absence of the property
being measured
10Interval Measurement cont.
- The best example is temperature
- 40C represents how hot something is (the amount
of heat it has) - The unit of measurement (1C) represents the same
amount of heat regardless of where it occurs in
the range of measurement (the amount of change in
temperature is the same between 25C - 26C and
32C - 33C) - The zero point (0C) is arbitrary - it represents
the point at which water freezes, not the absence
of temperature
11Interval Measurement cont.
- Interval measurement can contain negative
numbers, whereas Nominal and Ordinal Measurement
do not
12Ratio Measurement
- Ratio Measurement The assignment of numerical
quantity to the variable in such a way that - the number assigned reflects the amount of the
variable - the size of the measurement unit remains constant
- and the zero point represents an absence of the
property being measured
13Ratio Measurement
- Good examples are time and length
- A ratio scale cannot produce negative numbers
- Interval and ratio measurement are equivalent
for statistical purposes and are often referred
to as one thing (interval/ratio data)
14Summation Notation
- We commonly use the letters X and Y to
represent the variables we have measured - Upper case Greek letter sigma (?) is known as
the summation operator it means the sum of
15? Example
- Suppose we keep a record for 6 days of every time
someone slips in the CAW Student Centre Cafeteria
(represented by X), the data may look like this
16Data Example
- Day X
- Mon 10
- Tues 5
- Weds 12
- Thurs 11
- Fri 21
- Sat 28
17 ?X
- ?X means the sum of all the X scores, so that
- ?X X1 X2 X3 ... XN
- 10 5 12 11 21 28
- 87
- Note X1 means the first X score XN means
the last X score
18(?X)2
- (?X)2 means the square of the sum (total all
numbers within parentheses and then square), so
that - (?X)2 (X1 X2 X3 ... XN)2
- (10 5 12 11 21 28)2
- (87)(87)
- 7569
19?X 2
- ?X 2 means the sum of the squares (square each
number and then sum), so that - ?X 2 X1 2 X2 2 X3 2 ... XN 2
- 10 2 5 2 12 2 11 2 21 2 28 2
- 100 25 144 121 441 784
- 1615
20More Summation Notation
- Suppose you also keep track of the number of
pieces of garbage dropped on the floor of the CAW
Student Centre for the same days as above
(variable Y) and the data were as follows
21Example Data
- Day X Y
- Mon 10 210
- Tues 5 160
- Weds 12 245
- Thurs 11 240
- Fri 21 340
- Sat 28 415
22?XY
- ?XY means the sum of the products
- ?XY (X1)(Y1) (X2)(Y2) (X3)(Y3) ...
(XN)(YN) - (10)(210) (5)(160) (12)(245)
(11)(240) (21)(340) (28)(415) - 2100 800 2940 2640 7140 11620
- 27240
23Organizing Data
- Frequency Distributions A frequency distribution
is a table which shows the number of individuals
or events that occurred at each measurement value - this is the most common form of organizing data
24Frequency Distributions
- The following hypothetical frequency distribution
shows the number of women in different majors at
the University of Windsor - Major of Women
- Art 15
- Biology 35
- Chemistry 34
- Music 85
- Psychology 97
25Frequency Distributions
- This frequency distribution organizes the data
into nominal categories (by major) - Frequency distributions can also organize data by
points of measurement on a continuous variable,
as follows
26Frequency Distributions
- Age of Students in 02-250
- Age Frequency
- 18 14
- 19 85
- 20 58
- 21 40
- 22 35
- 23 16
- 24 10
- 25 6
- 26 4
27Frequency Distributions
- Frequency distributions should not exceed 15 to
20 lines, as the point is to summarize the data
in a way that represents all the information
concisely - When there are more data than can be classified
in 20 lines, the data can be grouped into score
ranges known as class intervals, as in this
example
28Class Interval Example
- Canada Population Estimates for the Year 2016 (in
millions) - Age Pop Age
Pop - 0 - 4 2.05 50 - 54 2.79
- 5 - 9 2.07 55 - 59 2.69
- 10 - 14 2.12 60 - 64 2.31
- 15 - 19 2.19 65 - 69 1.97
- 20 - 24 2.38 70 - 74 1.42
- 25 - 29 2.48 75 - 79 0.99
- 30 - 34 2.54 80 - 84 0.71
- 35 - 39 2.53 85 - 89 0.47
- 40 - 44 2.51 90 0.33
- 45 - 49 2.57
29Frequency Distributions cont.
- Looking at the frequency distribution tells us
- The most frequently occurring age is expected to
be in the 50-54 age range (b/c this is the
largest population estimate, 2.79 million). - The age frequencies are expected to be fairly
evenly distributed from 0 to 70 years old and
then fall off - The expected distributions of ages is not
symmetrical very low (young) and high (old) ages
do not occur with equal likelihood
30Frequency Distributions cont.
- Dividing the data into class intervals makes the
data more accessible - Data which has been divided into class intervals
is sometimes referred to as grouped data
31Cumulative Frequency Distributions
- Frequency distributions can be made to contain
more information, as when a column of cumulative
frequencies is added - Cumulative Frequency Distribution A table in
which the frequency of individuals or events at
each measurement value is added to previous
frequencies so that each line reads as the total
frequency of that and lower measurement values
32Cumulative Frequency Ex.
- Age of Students in 02-250
- Age Frequency Cumulative Frequency
- 18 14 14
- 19 85 99
- 20 58 157
- 21 40 197
- 22 35 232
- 23 16 248
- 24 10 258
- 25 6 264
- 26 4 268
-
33More Frequency Distributions
- Frequency distributions can also contain
information about the percentages and cumulative
percentages of observations at the various scores
34More Frequency Distributions
- Age of Students in 02-250
- Age Frequency Cumulative Cumulative
- Frequency
- 18 14 14 5.22 5.22
- 19 85 99 31.72 36.94
- 20 58 157 21.64 58.58
- 21 40 197 14.93 73.51
- 22 35 232 13.06 86.57
- 23 16 248 5.97 92.54
- 24 10 258 3.93 96.27
- 25 6 264 2.24 98.51
- 26 4 268 1.49 100.00
35Exact Limits
- All measurements are expressed in discrete units,
such as seconds or centimeters - No matter how small the unit of measurement, it
is always possible to imagine finer measurement - 1 cm 10 mm
36Exact Limits
- So, for continuous variables, any measure should
be viewed as representing a range of values - This range has a width equal to the unit of
measurement used, and the boundaries of this
range are the exact limits of the measure
37Exact Limits
- E.g., If we say an event lasted 12 seconds, we
mean it is closer to 12 seconds than to 11 or 13
seconds. A score of 12 represents a range of
values. This range is one second wide (one unit
of the measurement) and extends between 11.5 and
12.5 seconds
38Exact Limits
- Exact limits identify the upper and lower ends of
the range represented by the raw score and are
the real boundaries of the measure in question
39Exact Limits
- Exact Limits Values one-half unit of measurement
above and below the score or class interval.
Exact limits are the boundaries of the range of
values represented by the measure - Some authors refer to exact limits as real limits
40Exact Limits Examples
- Measure Exact Limits
- 52 51.5 - 52.5
- 51 50.5 - 51.5
- 52.2 52.15 - 52.25
- 52.1 52.05 - 52.15
41Exact Limits Examples
- Measure Exact Limits
- 50.02 50.015 - 50.025
- 50.01 50.005 - 50.015
- Class Interval Exact Limits
- 50 - 54 49.5 - 54.5
- 55 - 59 54.5 - 59.5
42Stem-and-Leaf Displays
- Stem-and-Leaf Display partitions each score into
a stem and a leaf and groups the scores
according to common stems - The Leaf is the rightmost digit
- The Stem is the digit (or digits) to the left
of the leaf (the stem is 0 for 1 digit numbers)
43Stem-and-Leaf
- E.g.,
- Stem Leaf
- 4 0 4
- 54 5 4
- 123 12 3
- 123 4
- The numbers 24 and 26 have different leaves(4
and - 6) but the same stem (2)
44Stem-and-Leaf
- Consider this raw data and their stem-and-leaf
plot
45Stem-and-Leaf
- stem leaf
- 3 6
- 4 477
- 5 05899
- 6 01225788
- 7 24559
- 8 578
- 9 2
Data 36, 44, 47, 47, 50, 55, 58, 59, 59, 60, 61,
62, 62, 65, 67, 68, 68, 72, 74, 75, 75, 79, 85,
87, 88, 92
46Stem-and-Leaf
- Or this example
- Data 102, 104, 115, 116, 116, 125, 127, 128,
129, 129, 131, 136, 137, 145, 145 - stem leaf
- 10 24
- 11 566
- 12 57899
- 13 167
- 14 55
47Stem-and-Leaf
- Unlike frequency distributions, stem-and-leaf
plots give an indication of the overall
distribution of the scores (e.g., evenly spread
or bunched, symmetrical or nonsymmetrical) - Note Make sure you include every instance of a
given value, e.g., if 57 occurs 3 times in the
data set, this should be represented in the stem
and leaf display with a stem of 5 and three 7s in
the leaf.
48Graphs
- Graph refers to all manner of pictorial, or
graphic, representation of data - We will consider histograms and frequency
polygons
49Graphs
- The horizontal axis (X axis) is labeled with
units representing points of measurement and the
vertical axis (Y axis) is labeled with values
representing frequency of occurrence - Histograms and frequency polygons are like
2-dimensional representations of frequency
distributions
50Histogram
- Histogram A graphic in which the horizontal axis
identifies points of measurement, and the
vertical axis represents frequency of occurrence - Solid bars are used to represent the frequency at
each point of measurement (a histogram is a bar
graph)
51Age Data Histogram Example
52Frequency Polygon
- Frequency Polygon A graphic in which the
horizontal axis identifies points of measurement,
and the vertical axis represents frequency of
occurrence (a frequency polygon is a line graph)
53Age Data Frequency Polygon
54Graphs cont.
- Both histograms and frequency polygons can be
embellished by the simultaneous plotting of more
than one variable, as shown next
55Graphs cont.
56Graphs cont.
57Describing Data
- Averages an average is a numerical value that
indicates the middle point or central region of
the raw data - Averages are sometimes referred to as measures of
central tendency
58Averages
- 3 statistics are commonly termed averages
- Mode
- Median
- Mean
59Mode
- Mode The most frequently occurring score
- A distribution with a single most frequently
occurring score (one hump) is termed a unimodal
(single mode) distribution - A distribution with 2 values that share the
quality of being most frequently occurring (2
humps) is termed bimodal (2 modes)
60Mode Example
- Age of Students in 02-250
- Age Frequency
- 18 14
- 19 85 ? In this example, the Mode is
- 20 58 19 as it has the highest
- 21 40 frequency
- 22 35
- 23 16
- 24 10
- 25 6
- 26 4
61A la Mode
- The mode does not take into account all of the
data - only the one most frequently occurring
score - The mode is the score with the highest bar in a
histogram, or the highest point in a frequency
polygon - When the data are combined into class intervals,
the mode is the mid-point of the class interval
that contains the most scores
62Median
- Median The middle point of the distribution, or
the score which bisects the distribution (divides
it into upper and lower halves)
63Median
- If there are an ODD number of scores, the median
is the middle score - 1, 3, 6, 7, 8, 13, 15, 17, 18, 21, 23
- ?
- Median 13
- There are 5 scores above the median,
- and 5 below
64Median
- If there are an EVEN number of scores, the median
is the midpoint between the two middle scores - 1, 3, 6, 7, 8, 13, 15, 17, 18, 23
- ?
- Median (8 13)/2 10.5
65Median Notes
- NOTE!
- When determining the median, you must arrange the
scores in ascending or descending order first!
66Steps to Finding the Median
- Arrange data in ascending or descending order
- Count the number of scores (N)
- If there are an odd number of scores, find the
middle point (the score where there are the same
number of scores above and below it) - this is
the median
67Steps to Finding the Median
- 4. If there are an even number of scores,
- find the 2 middle scores - add them,
- and divide by 2 - this is the median
68More Median
- When a distribution is viewed as area, the median
divides the total area in half
50
50
Median
69Median cont.
- The median is based on the value of one or two
scores, and does not take into account all of the
data - When the data are grouped into class intervals,
the median can be viewed as the midpoint of the
class interval which contains the middle score
(50th frequency). This is only a rough estimate
70Arithmetic Mean
- Arithmetic Mean the sum of the scores divided by
the number of scores (what is generally thought
of as the average)
71Mean
- The mean of a sample of X scores is symbolized as
? , which is said as X bar - The mean of a population of X scores is
symbolized by the Greek letter mu (µ) - Greek letters tend to be used for parameters,
while conventional letters are used for statistics
72Mean
- The algebraic definition of the population mean
is as follows -
- N is used to refer to the number of scores
- in the data set (termed population size)
73Sample Mean
- The algebraic definition of the sample mean is as
follows - n is used to refer to the number of scores
- in the data set (termed sample size)
74Mean cont.
- The algebraic formula for the sample and
population mean is the same, (although some terms
have different formulae for samples and
populations)
75Mean cont.
- The mean is used as the measure of average almost
exclusively (rather than the mode or median)
because it is defined algebraically and considers
all the raw scores in the data set
76Mean cont.
- In any group of scores, the sum of the deviations
from the mean equals zero -
- X X- ? n 6
- 3 3 - 5.50 -2.50 ? ?X/n
- 5 5 - 5.50 -0.50 ? 33/6
- 9 9 - 5.50 3.50 ? 5.50
- 2 2 - 5.50 -3.50
- 8 8 - 5.50 2.50
- 6 6 - 5.50 0.50
- ?X 33 ?(X- ?) 0.00
-
77Relative Characteristics of Averages
- If the distribution is symmetrical, the mean,
median, and mode have the same value - The longer tail of a non-symmetrical distribution
pulls the mean more than the mode and median - Therefore the mean is more effected by outliers
(very large or very small data points) than are
the mode and median
78Relative Characteristics of Averages
- Relative positions of the mean and median
- Note The mode is the highest point in the
distribution