Is the Association Statistically Significant Session 16 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Is the Association Statistically Significant Session 16

Description:

Null hypothesis is that there is no pattern of distribution. ... the goal of research is to explain why variables vary. The Basic Regression Model. Y = a bX e ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 26
Provided by: hollyb
Category:

less

Transcript and Presenter's Notes

Title: Is the Association Statistically Significant Session 16


1
Is the Association Statistically Significant
Session 16
2
Tests of Statistical Significance
  • Nominal
  • Lambda t test
  • Phi ?2
  • Contingency Coefficient ?2
  • Cramers V ?2
  • Ordinal
  • Gamma t test
  • Somers d t test
  • Tau-b, Tau-c t test
  • Interval
  • Pearsons r t test

3
Chi-Square TestStatistical Significance for
Nominal and Ordinal Level Variables
  • To be used when
  • Variables are not interval level
  • Not normally distributed

4
One-Way Chi-Square Distribution of Values Across
a Single Variable
  • Are variations across cell frequencies chance
    variations or are they a pattern?
  • Null hypothesis is that there is no pattern of
    distribution. In other words, cases are
    distributed evenly across cells.
  • Alternative hypothesis is that categories vary.
    Distribution is not the same across all categories

5
  • We have two sets of cell frequencies
  • The cell frequencies that correspond with the
    null hypothesis
  • The observed cell frequencies
  • How large is the discrepancy between these two
    sets of values?

6
Formula
  • ?2 S(fo fe)2
  • fe
  • The closer fo is to fe, the smaller the value of
    the chi-square test
  • The larger the discrepancy, the larger the value
    of the chi-square test
  • A larger value means we are more likely to reject
    the null and say that there is a pattern

7
  • degrees of freedom are
  • k 1
  • where k the number of categories

8
Two-Way Chi-Square Distribution of Values Across
a Two Variables
  • Used to compare two frequency distributions in
    other words, a crosstab
  • Null hypothesis is that cases are distributed
    evenly across cells.
  • Alternative hypothesis is that there is variation
    in the distribution of values of one variable
    across categories of the other

9
Formulas and Calculations
  • Expected frequency for null hypothesis is based
    on marginal values
  • Formula for the chi-square test is the same
  • Degrees of freedom
  • df (r 1)(c 1)

10
  • Median test (p. 302-305) skip this

11
Linear Regression
  • Provides a way to evaluate the influence of one
    independent variable on the dependent variable,
    controlling for the influence of other variables

12
Statistics Produced by Regression
  • a the constant
  • y-hat the predicted values of y given certain
    values of the independent variables
  • e the error, the discrepancy between the actual
    observed value of y and the predicted value of y,
    the slush factor
  • beta coefficients the influence on y of a one
    unit change in x
  • standardized beta coefficients puts the
    independent variables in the same metric

13
Statistics Produced by Regression
  • t tests the statistical significance of x on y
  • p values the probability of the observed value
    of the beta coefficient if the true influence
    were 0
  • R-squared how well the model fits the data, or,
    the percentage of variation in y explained by the
    variation in the independent variables

14
  • The Adjusted R-squared or coefficient of multiple
    determination Collectively, urbanization,
    population growth, and GDP explain 79 of
    variation in female literacy rate.

15
  • For each one percent increase in the percentage
    of people living in cities, female literacy
    increases by .61 or six tenths of one percent
  • For each one percent increase in the annual
    population the female literacy rate decreases by
    13.7 percent.
  • Gross domestic product per capita does not have a
    statistically significant influence on female
    literacy

16
Why Use Multivariate Analysis - Regression?
  • Descriptive Statistics one variable
  • Measures of Association two variables
  • Multivariate Analysis three variables or more

17
Why Multivariate Analysis Regression
  • To identify spurious relationships
  • To correctly specify relationships
  • To thoroughly describe a process
  • the goal of research is to explain why variables
    vary

18
The Basic Regression Model
  • Y a bX e
  • Y is the observed value of the dependent variable
  • a is the expected value of Y when X 0 (a
    baseline value)
  • b is the slope steep when X has a strong
    influence on Y (in other words, b is larger)
  • e is the amount of variation in Y that cant be
    explained by X

19
The Regression Line?
  • The regression line (the slope) is drawn to
    minimize the distance between the slope and the
    plotted points which are the observed values of
    the dependent variable
  • The regression line represents predicted values
    (predicted by the equation) and the points
    represent actual observed values

20
  • Using the slope coefficient (b), the actual
    values of X, and the value of a, we can plot the
    regression line.
  • The error term, e, is the distance between the
    regression line (the slope) and the location of
    the actual observed points, the values of the
    dependent variable.

21
Assumptions for Regression
  • Both the independent and the dependent variables
    are measured at the interval level
  • The relationship is linear
  • Variables must be normally distributed or sample
    must be large
  • Sample must be random for tests of statistical
    significance

22
The Significance of the Errors
  • Back to the proportionate reduction in error
  • The errors should be as small as possible
  • We use the average value of Y to guess
  • We compare this to our guess using the value of X
    for that observation

23
Pearsons, Regression, and the Coefficient of
Determination
  • Coefficient of Determination is also know as the
    R-squared. And if there is more than one
    independent variable its the adjusted R-squared.
    Enough names for ya? The adjusted R-squared
    takes the number of independent variables into
    consideration. Kind of like degree of difficulty
    in diving and gymnastics.

24
  • The R-squared value is the percent of variation
    in the dependent variable that is explained by
    the independent variables, collectively.

25
The Other Statistics
  • T-score and p-value the statistical
    significance of the individual coefficients. In
    other words, is the influence of this independent
    variable on the dependent variable statistically
    different from 0?
  • The beta coefficient the magnitude of the
    influence of X on Y. The amount of change in Y
    for a one unit change in X.
Write a Comment
User Comments (0)
About PowerShow.com