SW318 Social Work Statistics Slide 1 - PowerPoint PPT Presentation

Loading...

PPT – SW318 Social Work Statistics Slide 1 PowerPoint presentation | free to download - id: 438ed1-ZTk2N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

SW318 Social Work Statistics Slide 1

Description:

Regression Analysis We have previously studied the Pearson s r correlation coefficient and the r2 coefficient of determination as measures of association for ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 65
Provided by: KimJi5
Learn more at: http://www.utexas.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: SW318 Social Work Statistics Slide 1


1
Regression Analysis
  • We have previously studied the Pearsons r
    correlation coefficient and the r2 coefficient of
    determination as measures of association for
    evaluating the relationship between an interval
    level independent variable and an interval level
    dependent variable.
  • These statistics are components of a broader set
    of statistical techniques for evaluating the
    relationship between two interval level
    variables, called regression analysis (sometimes
    referred to in combination as correlation and
    regression analysis).

2
Regression Analysis vs. Chi-Square Test of
Independence
  • Our purpose now is to use a hypothesis test to
    conclude that there is a relationship between two
    interval level variables in the population
    represented by our sample data.
  • We could use a chi-square test of independence to
    determine whether or not a relationship exists
    between two variables in the population
    represented by our data, provided we grouped the
    values of both variables to create a bivariate
    table.
  • However, it is preferable to test for the
    presence of a relationship retaining the
    variables as interval level data because this
    strategy is more effective at detecting the
    existence of relationship. We might find a
    relationship using interval level statistics that
    we do not find using nominal level statistics
    because the nominal level statistics are less
    precise.

3
Elements of Regression Analysis
  • We will first review previous material on
    regression and correlation
  • The scatterplot or scattergram
  • The regression equation
  • Then, we will examine the statistical evidence to
    determine whether or not, the relationships found
    in our sample data are applicable to the
    population represented by the sample using a
    hypothesis test.

4
Purpose of Regression Analysis
  • The purpose of regression analysis is to answer
    the same three questions that have been
    identified as requirements for understanding the
    relationships between variables
  • Is there a relationship between the two
    variables?
  • How strong is the relationship?
  • What is the direction of the relationship?

5
Scatterplots - 1
  • The relationship between two interval variables
    can be graphed as a scatterplot or a scatter
    diagram which shows the position of all of the
    cases in an x-y coordinate system.
  • The independent variable is plotted on the
    x-axis, or the horizontal axis.
  • The dependent variable is plotted on the y-axis,
    or the vertical axis.
  • A dot in the body of the chart represented the
    intersection of the data on the x-axis and the
    y-axis

6
Scatterplots - 2
  • The trendline or regression line is plotted on
    the chart in a contrasting color
  • The overall pattern of the dots, or data points,
    succinctly summarizes the nature of the
    relationship between the two variables.
  • The clarity of the pattern formed by the dots can
    be enhanced by drawing a straight line through
    the cluster such that the line touches every dot
    or comes as close to doing so as possible.
  • This summarizing line is called the regression
    line.
  • We will see later how this line is obtained, but
    for now, we will look at how it helps us
    understand the scatterplot.

7
Scatterplots - 3
The pattern of the points on the scatterplot
gives us information about the relationship
between the variables. The regression line,
drawn in red, makes it easier for us to
understand the scatterplot.
8
The Uses of Scatterplots
  • Scatterplots give us information about our three
    questions about the relationship between two
    interval variables
  • Is there a relationship between the two
    variables?
  • How strong is the relationship?
  • What is the direction of the relationship?
  • In addition, the regression line on the
    scatterplot can be used to estimate the value of
    the dependent variable for any value of the
    independent variable.

9
Scatterplots Evidence of a Relationship
The angle between the regression line and the
horizontal x-axis provides evidence of a
relationship. If there is no relationship, the
regression line will be parallel to the axis.
When there is no relationship between two
variables, the regression line is parallel to the
horizontal axis.
When there is a relationship between two
variables, the regression line lies at an angle
to the horizontal axis, sloping either upward or
downward.
10
Scatterplots Strength of a Relationship
The strength of a relationship is indicated by
the narrowness of the band of points spread
around the regression line the tighter the
band, the stronger the relationship.
The spread of the points around the regression
line is narrow, indicating a stronger
relationship. We should check the scale of the
vertical axis to make sure the narrow band is not
the result of an excessively large scale.
In this scatterplot, the points are very spread
out around the regression line. The relationship
is weak.
11
Scatterplots Direction of Relationship
When the regression line slopes upward to the
right, there is a positive, or direct,
relationship between the variables. When the
regression line slopes downward, the relationship
is negative, or inverse.
In this scatterplot, the regression line slopes
donward to the right, indicating a negative or
inverse relationship. The values of the
variables move in opposite directions.
In this scatterplot, the regression line slopes
upward to the right, indicating a positive or
direct relationship. The values of both
variables increase and decrease at the same time.
12
Scatterplots Predicting Scores
For any value of the independent variable on the
horizontal x-axis, the predicted value for the
dependent variable will be the corresponding
value on the vertical y-axis.
For the value of the independent variable on the
horizontal axis, we draw a line upward to the
regression line, e.g. 52. We draw a
perpendicular line from the value on the x-axis
to the regression line.
The estimate for the dependent variable is
obtained by drawing a line parallel to the x-axis
from the regression line to the vertical y-axis
and reading the value where this line crosses the
y-axis, e.g. 50.
13
The Effect of Scaling on the Scatterplot
  • The scale used for the vertical
  • y-axis can change the appearance of the
    scatterplot and alter our interpretation of the
    strength of the relationship. The three
    scatterplots on this slide all use the same data.

In the original plot, the y-axis is scaled from 0
to 80.
14
The Assumption of Linearity
  • An underlying assumption of regression analysis
    is that the relationship between the variables is
    linear, meaning that the points in the
    scatterplot must form a pattern that can be
    approximated with a straight line.
  • While we could test the assumption of linearity
    with a test of statistical significance of the
    correlation coefficient, we will make a visual
    assess tor scatterplots.
  • If the scatterplot indicates that the points do
    not follow a linear pattern, the techniques of
    linear correlation and regression should not be
    applied.

15
Examples of Linear Relationships
  • These two scatterplots are for data on poverty of
    nations. The plots below show strong linear
    relationships. The points are evenly distributed
    on either side of the regression line.

16
Examples of Non-linear Relationships
  • These scatterplots show a non-linear
    relationship. The points are not evenly
    distributed on either side of the regression
    line. We will often see a concentration of
    points on one side of the regression line and an
    absence of points on the other side.

17
The Regression Equation
  • The regression equation is the algebraic formula
    for the regression line, which states the
    mathematical relationship between the independent
    and the dependent variable.
  • We can use the regression line to estimate the
    value of the dependent variable for any value of
    the independent variable.
  • The stronger the relationship between the
    independent and dependent variables, the closer
    these estimates will come to the actual score
    that each case had on the dependent variable.

18
Components of the Regression Equation
  • The regression equation has two components.
  • The first component is a number called the
    y-intercept that defines where the line crosses
    the vertical y axis.
  • The second component is called the slope of the
    line, and is a number that multiplies the value
    of the independent variable.
  • These two elements are combined in the general
    form for the regression equation
  • the estimated score on the dependent variable
  • the y-intercept the slope the score
    on the independent variable

19
The Standard Form of the Regression Equation
  • The standard form for the regression equation or
    formula is
  • Y a bX
  • where
  • Y is the estimated score for the dependent
    variable
  • X is the score for the independent variable
  • b is the slope of the regression line, or the
    multiplier of X
  • a is the intercept, or the point on the vertical
    axis where the regression line crosses the
    vertical y-axis

20
Depicting the Regression Equation
The regression equation includes both the
y-intercept and the slope of the line. The
y-intercept is 1.0 and the slope is 0.5.
The slope is the multiplier of x. It is the
amount of change in y for a change of one unit in
x. If x changes one unit from 2.0 to 3.0,
depicted by the blue arrow, y will change by 0.5
units, from 2.0 to 2.5 as depicted by the red
arrow.
  • The y-intercept is the point on the vertical
    y-axis where the regression line crosses the
    axis, i.e. 1.0.

21
Deriving the Regression Equation
  • In this plot, none of the points fall on the
    regression line.
  • The difference between the actual value for the
    dependent variable and the predicted value for
    each point is shown by the red lines. This
    difference is called the residual, and represents
    the error between the actual and predicted
    values.
  • The regression equation is computed to minimize
    the total amount of error in predicting values
    for the dependent variable. The method for
    deriving the equation is called the "method of
    least squares," meaning that the regression line
    minimizes the sum of the squared residuals, or
    errors between actual and predicted values.

22
Interpreting the Regression Equation the
Intercept
  • The intercept is the point on the vertical axis
    where the regression line crosses the axis. It
    is the predicted value for the dependent variable
    when the independent variable has a value of
    zero.
  • This may or may not be useful information
    depending on the context of the problem.

23
Interpreting the Regression Equation the Slope
  • The slope is interpreted as the amount of change
    in the predicted value of the dependent variable
    associated with a one unit change in the value of
    the independent variable.
  • If the slope has a negative sign, the direction
    of the relationship is negative or inverse,
    meaning that the scores on the two variables move
    in opposite directions.
  • If the slope has a positive sign, the direction
    of the relationship is positive or direct,
    meaning that the scores on the two variables move
    in the same direction.

24
Interpreting the Regression Equation when the
Slope equals 0
  • If there is no relationship between two
    variables, the slope of the regression line is
    zero and the regression line is parallel to the
    horizontal axis.
  • A slope of zero means that the predicted value of
    the dependent variable will not change, no matter
    what value of the independent variable is used.
  • If there is no relationship, using the regression
    equation to predict values of the dependent
    variable is no improvement over using the mean of
    the dependent variable.

25
Assumptions Required for Utilizinga Regression
Equation
  • The assumptions required for utilizing a
    regression equation are the same as the
    assumptions for the test of significance of a
    correlation coefficient.
  • Both variables are interval level.
  • Both variables are normally distributed.
  • The relationship between the two variables is
    linear.
  • The variance of the values of the dependent
    variable is uniform for all values of the
    independent variable (equality of variance).

26
Assumption of Normality
  • Strictly speaking, the test requires that the two
    variables be bivariate normal, meaning that the
    combined distribution of the two variables is
    normal. It is usually assumed that the variables
    are bivariate normal if each variable is normally
    distributed, so this assumption is tested by
    checking the normality of each variable.
  • Each variable will be considered normal if its
    skewness and kurtosis statistics fall between
    1.0 and 1.0 or if the sample size is
    sufficiently large to apply the Central Limit
    theorem.

27
Assumption of Linearity
  • Linearity means that the pattern of the points in
    a scatterplot form a band, like the pattern in
    the chart on the right
  • When the pattern of the points follows a curve,
    like the scatterplot on the right, the
    correlation coefficient will not accurately
    measure the relationship.

28
Test of Linearity
  • The test of linearity is a diagnostic statistical
    test of the null hypothesis that the linear model
    is an appropriate fit for the data points. The
    desired outcome for this test is to fail to
    reject the null hypothesis.
  • If the probability for the test of statistic is
    less than or equal to the level of significance
    for the problem, we reject the null hypothesis,
    concluding that the data is not linear and the
    Regression Analysis is not appropriate for the
    relationship between the two variables.
  • If the probability for the test of linearity
    statistic is greater than the level of
    significance for the problem, we fail to reject
    the null hypothesis and conclude that we satisfy
    the assumption of linearity.

29
Assumption of Homoscedasticity
  • Homoscedasticity (equality of variances) means
    that the points are evenly dispersed on either
    side of the regression line for the linear
    relationship.

In this scatterplot, the points extend about the
same distance above and below the regression line
for most of the length of the regression line.
This scatterplot meets the assumption of
homoscedasticity.
In this scatterplot, the spread of the points
around the regression line is narrower at the
left end of the regression line than at the right
end of the regression line. This funnel shape
is typical of a scatterplot showing violations of
the assumption of homoscedasticity.
30
Test of Homoscedasticity
  • When we compared groups, we used the Levene test
    of population variances to test for the
    assumption that the group variances were equal.
  • In order to use this test for the assumption of
    homoscedasity, we will convert the interval level
    independent variable into a dichotomous variable
    with low scores in one group and high scores in
    the other group. We can then compare the
    variances of the two groups derived from the
    independent variable.

31
Levene Test of Homogeneity of Variances
  • The Levene test of equality of population
    variances tests whether or not the variances for
    the two groups are equal. It is a test of the
    research hypothesis that the variance
    (dispersion) of the group with low scores is
    different from the variance of the group with
    high scores. The null hypothesis that the
    variance (dispersion) of both groups are equal.
  • If the probability of the test statistic is
    greater than 0.05, we do not reject the null
    hypothesis and conclude that the variances are
    equal. This is the desired outcome.
  • If the probability of the test statistic is less
    than or equal to 0.05, we conclude the variances
    are different and the Regression Analysis is not
    an appropriate test for the relationship between
    the two variables.

32
The hypothesis test of r2
  • The purpose of the hypothesis test of r2 is a
    test of the applicability of our findings to the
    population represented by the sample.
  • When we studied association between two interval
    variables, we stated that the Pearson r
    correlation coefficient and its square, the
    coefficient of determination measure the strength
    of the relationship between two interval
    variables. When the correlation coefficient and
    coefficient of determination are zero (0), there
    is no relationship.
  • The hypothesis test of r2 is a test of whether or
    not r2 is larger than zero in the population.

33
The hypothesis test of r2
  • The research hypothesis states that r2 is larger
    than zero. (a relationship exists)
  • The null hypothesis states that r2 is equal to
    zero. (no relationship)
  • Recall that we interpreted the coefficient of
    determination r2 as the reduction in error
    attributable to with the relationship between the
    variables.
  • The test statistic is an ANOVA F-test which tests
    whether or not the reduction in error associated
    with using the regression equation is really
    greater than zero.

34
How the regression ANOVA test works?
We will use the sample data we used for
correlation and regression to examine how the
hypothesis test for r2 works.
We are interested in the relationship between
family size and number of credit cards.
35
The scatter diagram or scatterplot
The dependent variable is plotted on the Y or
vertical axis.
The independent variable is plotted on the x or
horizontal axis.
36
The mean as the best guess
Without taking into account the independent
variable, our best guess for the number of credit
cards for any subject is the mean, 7.0.
37
Errors using the mean as estimate
Errors are measured by computing the difference
between the mean and each Y value, squaring the
differences, and then summing them. When we
compute the answer in SPSS, it will tell us that
the total amount of error is 22.0.
38
The regression line
The regression line minimizes the error (the best
fitting or least squares line)
39
The equation for the regression line
SPSS will give us the formula for the regression
line in the form Y a bX, or for these
variables Number of Credit Cards 2.871 .971
x Family Size
40
PRE reduction in error
SPSS also tells us the amount of error using only
the mean and using the regression line.
Error using mean only (total) 22.000
Error using regression line 5.486
Reduction in error associated with the regression 16.514
PRE measure (r2) 22.0-5.486 .751 22.0
41
The ANOVA test for the regression
The F statistic is calculated as the ratio of
error reduced by regressions divided the error
remaining. If the ratio were 1 and these two
numbers were the same, we would not have reduced
any error, there would be no relationship, and
the p-value would not let us reject the null
hypothesis. In this problem, the amount of error
reduced by the regression is large relative to
the amount remaining, so the F statistic is
large, the p-value(0.005) is smaller than the
alpha level of significance, so we reject the
null hypothesis.
42
Interpreting Pearsons r correlation coefficient
The square root of r2 is Pearsons r, the
correlation coefficient. If we want to
characterize the strength of the relationship, we
compare the size of r to the interpretive
guidelines for measures of association.
43
Interpreting the direction of the relationship
To interpret the direction of the relationship
between the variables, we look at the coefficient
for the independent variable. In this example,
the coefficient of 0.971 is positive, so we would
interpret this relationship as Families with
more members had more credit cards.
44
Testing Assumptions in Homework Problems
  • The process of testing assumptions can easily
    overwhelm the task of testing the significance of
    the relationship.
  • Since our emphasis here is testing the hypothesis
    that the relationship is generalizable to the
    population represented by the sample data, we
    will assume that our data satisfies the
    assumptions without explicitly testing
    assumptions.

45
Homework Problem Questions
  • The question in the homework problems requires us
    to look at three things
  • Does the hypothesis test support the existence of
    a relationship in the population?
  • Is the strength of the relationship characterized
    correctly?
  • Is the direction of the relationship between the
    variables correctly stated?

46
Practice Problem 1
This question asks you to use linear regression
to examine the relationship between marital and
age. Linear regression requires that the
dependent variable and the independent variables
be interval. Ordinal variables may be included as
interval variables if a caution is added to any
true findings. The dependent variable marital
is nominal level which does not satisfy the
requirement for a dependent variable. The
independent variable age is interval level,
satisfying the requirement for an independent
variable.
47
Practice Problem - 2
This question asks you to use linear regression
to examine the relationship between fund and
attend. The level of measurement requirements
for multiple regression are satisfied fund is
ordinal level, and attend is ordinal level. A
caution is added because ordinal level variables
are included in the analysis. Given the
assumption that the distributional requirements
for linear regression are satisfied, you can
conduct a linear regression using SPSS without
examining distributional assumptions for the
variables.
48
Linear Regression Hypothesis Test in SPSS (1)
You can conduct a linear regression
using Analyze gt Regression gt Linear
49
Linear Regression Hypothesis Test in SPSS (2)
Move the dependent variable to Dependent and
the independent variable to Independent(s)
boxes and then click OK button.
50
Linear Regression Hypothesis Test in SPSS (3)
Based on the ANOVA table for the linear
regression (F(1, 604) 70.579, plt0.001), there
was an relationship between the dependent
variable "degree of religious fundamentalism" and
the independent variable "frequency of attendance
at religious services". Since the probability of
the F statistic (plt0.001) was less than or equal
to the level of significance (0.05), the null
hypothesis that correlation coefficient (R) was
equal to 0 was rejected. The research
hypothesis that there was a relationship between
the variables was supported.
51
Linear Regression Hypothesis Test in SPSS (4)
Given the significant F-test result, the
correlation coefficient (R) can be interpreted.
The correlation coefficient for the
relationship between the independent variable and
the dependent variable was 0.323, which would be
characterized as a weak relationship using the
rule of thumb that a correlation between 0.0 and
0.20 is very weak 0.20 to 0.40 is weak 0.40 to
0.60 is moderate 0.60 to 0.80 is strong and
greater than 0.80 is very strong. The
relationship between the independent variables
and the dependent variable was incorrectly
characterized as a moderate relationship. The
relationship should have been characterized as a
weak relationship. The answer to the problem is
false.
52
Practice Problem 3
This question asks you to use linear regression
to examine the relationship between educ and
age. educ and age are interval level,
satisfying the level of measurement requirements
for regression. Given the assumption that the
distributional requirements for linear regression
are satisfied, you can conduct a linear
regression using SPSS without examining
distributional characteristics of variables.
53
Linear Regression Hypothesis Test in SPSS (5)
You can conduct a linear regression
using Analyze gt Regression gt Linear
54
Linear Regression Hypothesis Test in SPSS (6)
Move the dependent variable to Dependent and
the independent variable to Independent(s)
boxes and then click OK button.
55
Linear Regression Hypothesis Test in SPSS (7)
Based on the ANOVA table for the linear
regression (F(1, 659) 9.983, p0.002), there
was an relationship between the dependent
variable "highest year of school completed" and
the independent variable "age". Since the
probability of the F statistic (p0.002) was less
than or equal to the level of significance
(0.05), the null hypothesis that correlation
coefficient (R) was equal to 0 was rejected.
The research hypothesis that there was a
relationship between the variables was supported.
56
Linear Regression Hypothesis Test in SPSS (8)
Given the significant F-test result, the
correlation coefficient (R) can be interpreted.
The correlation coefficient for the
relationship between the independent variable and
the dependent variable was 0.122, which can be
characterized as a very weak relationship. .
57
Linear Regression Hypothesis Test in SPSS (9)
The b coefficient for the independent variable
"age" was -.021, indicating an inverse
relationship with the dependent variable. Higher
numeric values for the independent variable "age"
age are associated with lower numeric values
for the dependent variable "highest year of
school completed" educ. The statement in the
problem that "survey respondents who were older
had completed more years of school" is incorrect.
The direction of the relationship is stated
incorrectly.
58
Practice Problem 4
This question asks you to use linear regression
to examine the relationship between sei and
age. sei and age are interval level,
satisfying the level of measurement requirements
for regression. Given the assumption that the
distributional requirements for linear regression
are satisfied, you can conduct a linear
regression using SPSS without examining
distributional characteristics of variables.
59
Linear Regression Hypothesis Test in SPSS (10)
You can conduct a linear regression
using Analyze gt Regression gt Linear
60
Linear Regression Hypothesis Test in SPSS (11)
Move the dependent variable to Dependent and
the independent variable to Independent(s)
boxes and then click OK button.
61
Linear Regression Hypothesis Test in SPSS (12)
Based on the ANOVA table for the linear
regression (F(1, 629) .266, p0.606), there was
no relationship between the dependent variable
"socioeconomic index" and the independent
variable "age". Since the probability of the F
statistic (p0.606) was greater than the level of
significance (0.05), the null hypothesis that
correlation coefficient (R) was equal to 0 was
not rejected. The research hypothesis that
there was a relationship between the variables
was not supported.
62
Steps in solving Linear Regression Hypothesis
Test Problems - 1
The following is a guide to the decision process
for answering homework problems about Linear
Regression Hypothesis Test problems
Are the dependent and independent variables
ordinal or interval level?
Incorrect application of a statistic
No
Yes
Make sure that the assumption that the
distributional requirements for linear regression
are satisfied is made. Otherwise, you have to
check the assumption first.
Our regression problems will assume that the
assumptions are met.
63
Steps in solving Linear Regression Hypothesis
Test Problems - 2
Conduct the linear regression analysis
Is the p-value in the ANOVA table for the F ratio
test lt alpha?
No
False
Yes
Is the interpretation of the strength of the
correlation coefficient correct?
No
False
Yes
64
Steps in solving Linear Regression Hypothesis
Test Problems - 3
Is the direction of the relationship correctly
stated?
No
False
Yes
Are either of the variables ordinal level?
No
True
Yes
True with caution
About PowerShow.com