1 / 20

Introduction toInferential Statistics

Inferential statistics

- So far weve assessed relationships between

variables two ways - Categorical variables tables and proportions

(percentages) - Continuous variables scattergrams and simple

correlation (r) - Inferential statistics are an extension of these

procedures - Provide far more precise assessments of

relationships

Higher rank ? more stress

r -.6 r2 .36

Higher income ? less crime

Using inferential statistics

- Examples of inferential statistics
- Categorical variables Chi-Square (X2)
- Combination of categorical dependent and

continuous independent variable - Difference between the means test (t statistic)
- Continuous variables
- Correlation and regression (r and r2) can be used

inferentially - b statistic, generated through regression

analysis - Combination of nominal and continuous variables
- Logistic regression, generates b and exp b (odds

ratio) statistics - Requirements
- Must use probability sampling techniques (e.g.,

random sampling) - Parametric inferential statistics, including r,

r2, b and t - Variables must be continuous and approximately

normally distributed in the population - Non-parametric statistics
- Variables need not be normally distributed. We

will cover one Chi-Square (X2).

General procedure

- Types of hypotheses
- Working hypothesis what a regular hypothesis

is called - Null hypothesis Fixed presumption that any

observed relationship between two variables is

caused by chance - Draw one or more samples and code the independent

and dependent variables - Use a test statistic (e.g., r) to assess the

hypothesized relationship - The computer calculates a coefficient for the

test statistic (e.g., r .21) - These coefficients are the sum of two components
- Systematic variance The actual, systematic

relationship between variables - Error variance An apparent relationship,

caused by sampling error. The size of this

component can be precisely calculated and shrinks

as sample size increases.

The big question Once we remove the error

component, is there enough of a real

relationship left to reject the null hypothesis?

Systematic variance

Errorvariance

Test statistics and the null hypothesis

- To reject the null hypothesis, the test statistic

coefficient (e.g., r .7) must be sufficiently

large, after subtracting sampling error, to

reject the null hypothesis of no relationship - How much room is required? Enough to yield a

probability of less than five in one-hundred (lt

.05) that the relationship between variables was

produced by chance. - If the computer decides that the coefficient is

sufficiently large it will award at least one

asterisk. The relationship between variables is

statistically significant and the null

hypothesis (no relationship) is FALSE. - If the coefficient is too small, no asterisk ()

is awarded. The association between variables is

deemed non-significant and the null hypothesis

is TRUE. Working hypotheses that depend on this

relationship must be rejected. - For significant relationships, one to three

asterisks usually appear next to the test

statistics coefficient (e.g., .25, .36,

.41). More asterisks greater confidence that

a relationship is systematic not the product of

chance. - Probability less than 5 in 100 that a

coefficient was produced by chance (plt .05) - Probability less than 1 in 100 that a

coefficient was produced by chance (plt .01) - Probability less than 1 in 1,000 that a

coefficient was produced by chance (plt .001) - Instead of asterisks, sometimes the actual

probability that a coefficient was produced by

chance are given, usually in a column labeled

p. - Again, significant relationships are denoted by

ps less than .05

Good Better Best

Some statistics used for testing relationships

Procedure Level of Measurement Statistic Interpretation

Correlation All variables continuous r Range -1 to 1, with 0 meaning no relationship. For example, .35 denotes a moderately strong positive relationship

Regression All variables continuous r2, R2 b Proportion of change in the dependent variable accounted for by change in the independent variable. R2 denotes cumulative effect of multiple independent variables. Unit change in the dependent variable caused by a one-unit change in the independent variable

Logistic regression DV nominal dichotomous, IVs nominal or continuous b exp(B) Dont try Odds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Range 0 to infinity 1 denotes even odds, or no relationship. Higher than 1 means positive relationship, lower negative relationship. Use percentage to describe likelihood of effect.

Chi-Square All variables categorical, not ordinal X2 Reflects difference between Observed and Expected frequencies. Use table to determine if coefficient is sufficiently large to reject null hypothesis

Difference between means IV dichotomous, DV continuous t Reflects magnitude of difference. Use table to determine if coefficient is sufficiently large to reject null hypothesis.

A caution on hypothesis testing

- Probability statistics are the most common way to

evaluate relationships, but they are being

criticized for suggesting misleading results.

(Click here for a summary of the arguments.) - We normally use p values to accept or reject null

hypotheses. But the actual meaning is more

subtle - Formally, a p lt.05 means that, if an association

between variables was tested an infinite number

of times, a test statistic coefficient as large

as the one actually obtained (say, an r of .3)

would come up less than five times in a hundred

if the null hypothesis of no relationship was

actually true. - For our purposes, as long as we keep in mind the

inherent sloppiness of social science, and the

difficulties of accurately quantifying social

science phenomena, its sufficient to use

p-values to accept or reject null hypotheses. - We should always be skeptical of findings of

significance, and particularly when very large

samples are involved, as even weak relationships

will tend to be statistically significant. (More

on this later.)

Examples of tables fromarticles, panels 1-12

1

Hypothesis Alcohol consumption ?

VictimizationMethod Logistic regression

Statistics b and Odds Ratio (Exp b)

Richard B. Felson and Keri B. Burchfield,

Alcohol and the Risk of Physical and Sexual

Assault Victimization, Criminology (424, 2004)

2

Hypothesis Black race related factors ?

Distrust of policeMethod Logistic regression

Statistic b (called the Estimate)

Elaine B. Sharp and Paul E. Johnson, Accounting

for Variation in Distrust of Local Police,

Justice Quarterly (261, 2009)

3

Hypothesis Race and class ? Satisfaction with

policeMethod Logistic regression Statistics b

and Exp b (odds ratio)

Yuning Wu, Ivan Y. Sun and Ruth A. Triplett,

Race, Class or Neighborhood Context Which

Matters More in Measuring Satisfaction With

Police?, Justice Quarterly (261, 2009)

4

Hypothesis Low self control ? More contact with

policeMethod Logistic regression Statistics b

and Exp b (odds ratio)

Kevin M. Beaver, Matt DeLisi, Daniel P. Mears and

Eric Stewart, Low Self-Control and Contact with

the Criminal Justice System in a Nationally

Representative Sample of Males, Justice

Quarterly (264, 2009)

5

Hypothesis Gender and race of victim ?

Imposition of death sentenceMethod Logistic

regression Statistics b (coefficient) and

odds-ratio (exp b)

Marian R. Williams, Stephen Demuth and Jefferson

E. Holcomb, Understanding the Influence of

Victim Gender in Death Penalty Cases The

Importance of Victim Race, Sex-Related

Victimization, and Jury Decision Making,

Criminology (454, 2007)

6

Hypothesis Academic performance ?

DelinquencyMethod Tobit regression

Statistic b

Richard B. Felson and Jeremy Staff, Explaining

the Academic Performance-Delinquency

Relationship, Criminology (442, 2006)

Best when the DV for a large proportion of

cases has a zero value

7

Hypothesis Strains of imprisonment ?

RecidivismMethod Logistic regression

Statistics B and exp B (odds-ratio)

Shelley Johnson Listwan, Christopher J. Sullivan,

Robert Agnew, Francis T. Cullen and Mark Colvin,

The Pains of Imprisonment Revisited The Impact

of Strain on Inmate Recidivism, Justice

Quarterly (301, 2013)

8

Hypothesis Fathers incarceration ? Sons

delinquencyMethod Logistic regression

Statistic Odds ratio (Standard Error in

parentheses)

Michael E. Roettger and Raymond R. Swisher,

Associations of Fathers History of

Incarceration With Sons Delinquency and Arrest

Among Black, White and Hispanic Males in the

United States, Criminology (494, 2011)

9

Hypothesis Officer and driver race ? Vehicle

searchMethod Logistic regression Statistics

Odds ratio (Standard Error in parentheses)

Jeff Rojek, Richard Rosenfeld and Scott Decker,

Policing Race The Racial Stratification of

Searches in Police Traffic Stops, Criminology

(504, 2012

Brian D. Johnson and Stephanie M. Dipietro, The

Power of Diversion Intermediate Sanctions and

Sentencing Disparity Under Presumptive

Guidelines, Criminology (503, 2012)

11

Hypothesis Child abuse neighborhood factors ?

Childs subsequent violent behaviorMethod

Logistic regression Statistic b (coefficient)

Emily M. Wright and Abigail A. Fagan, The Cycle

of Violence in Context Exploring the Moderating

Roles of Neighborhood Disadvantage and Cultural

Norms, Criminology (512, 2013)

12

Hypothesis Marriage ? Desistance from

crimeMethod HLM (like logistic regression)

Statistics b (Coeff.) Can compute log odds)

Bianca E. Bersani and Elaine Eggleston Doherty,

When the Ties That Bind Unwind Examining the

Enduring and Situational Processes of Change

Behind the Marriage Effect, Criminology (512,

2013)