# SW318 Social Work Statistics Slide 1 - PowerPoint PPT Presentation

PPT – SW318 Social Work Statistics Slide 1 PowerPoint presentation | free to download - id: 438ed1-ZTk2N

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## SW318 Social Work Statistics Slide 1

Description:

### Regression Analysis We have previously studied the Pearson s r correlation coefficient and the r2 coefficient of determination as measures of association for ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 65
Provided by: KimJi5
Category:
Tags:
Transcript and Presenter's Notes

Title: SW318 Social Work Statistics Slide 1

1
Regression Analysis
• We have previously studied the Pearsons r
correlation coefficient and the r2 coefficient of
determination as measures of association for
evaluating the relationship between an interval
level independent variable and an interval level
dependent variable.
• These statistics are components of a broader set
of statistical techniques for evaluating the
relationship between two interval level
variables, called regression analysis (sometimes
referred to in combination as correlation and
regression analysis).

2
Regression Analysis vs. Chi-Square Test of
Independence
• Our purpose now is to use a hypothesis test to
conclude that there is a relationship between two
interval level variables in the population
represented by our sample data.
• We could use a chi-square test of independence to
determine whether or not a relationship exists
between two variables in the population
represented by our data, provided we grouped the
values of both variables to create a bivariate
table.
• However, it is preferable to test for the
presence of a relationship retaining the
variables as interval level data because this
strategy is more effective at detecting the
existence of relationship. We might find a
relationship using interval level statistics that
we do not find using nominal level statistics
because the nominal level statistics are less
precise.

3
Elements of Regression Analysis
• We will first review previous material on
regression and correlation
• The scatterplot or scattergram
• The regression equation
• Then, we will examine the statistical evidence to
determine whether or not, the relationships found
in our sample data are applicable to the
population represented by the sample using a
hypothesis test.

4
Purpose of Regression Analysis
• The purpose of regression analysis is to answer
the same three questions that have been
identified as requirements for understanding the
relationships between variables
• Is there a relationship between the two
variables?
• How strong is the relationship?
• What is the direction of the relationship?

5
Scatterplots - 1
• The relationship between two interval variables
can be graphed as a scatterplot or a scatter
diagram which shows the position of all of the
cases in an x-y coordinate system.
• The independent variable is plotted on the
x-axis, or the horizontal axis.
• The dependent variable is plotted on the y-axis,
or the vertical axis.
• A dot in the body of the chart represented the
intersection of the data on the x-axis and the
y-axis

6
Scatterplots - 2
• The trendline or regression line is plotted on
the chart in a contrasting color
• The overall pattern of the dots, or data points,
succinctly summarizes the nature of the
relationship between the two variables.
• The clarity of the pattern formed by the dots can
be enhanced by drawing a straight line through
the cluster such that the line touches every dot
or comes as close to doing so as possible.
• This summarizing line is called the regression
line.
• We will see later how this line is obtained, but
for now, we will look at how it helps us
understand the scatterplot.

7
Scatterplots - 3
The pattern of the points on the scatterplot
gives us information about the relationship
between the variables. The regression line,
drawn in red, makes it easier for us to
understand the scatterplot.
8
The Uses of Scatterplots
• Scatterplots give us information about our three
questions about the relationship between two
interval variables
• Is there a relationship between the two
variables?
• How strong is the relationship?
• What is the direction of the relationship?
• In addition, the regression line on the
scatterplot can be used to estimate the value of
the dependent variable for any value of the
independent variable.

9
Scatterplots Evidence of a Relationship
The angle between the regression line and the
horizontal x-axis provides evidence of a
relationship. If there is no relationship, the
regression line will be parallel to the axis.
When there is no relationship between two
variables, the regression line is parallel to the
horizontal axis.
When there is a relationship between two
variables, the regression line lies at an angle
to the horizontal axis, sloping either upward or
downward.
10
Scatterplots Strength of a Relationship
The strength of a relationship is indicated by
the narrowness of the band of points spread
around the regression line the tighter the
band, the stronger the relationship.
The spread of the points around the regression
line is narrow, indicating a stronger
relationship. We should check the scale of the
vertical axis to make sure the narrow band is not
the result of an excessively large scale.
In this scatterplot, the points are very spread
out around the regression line. The relationship
is weak.
11
Scatterplots Direction of Relationship
When the regression line slopes upward to the
right, there is a positive, or direct,
relationship between the variables. When the
regression line slopes downward, the relationship
is negative, or inverse.
In this scatterplot, the regression line slopes
donward to the right, indicating a negative or
inverse relationship. The values of the
variables move in opposite directions.
In this scatterplot, the regression line slopes
upward to the right, indicating a positive or
direct relationship. The values of both
variables increase and decrease at the same time.
12
Scatterplots Predicting Scores
For any value of the independent variable on the
horizontal x-axis, the predicted value for the
dependent variable will be the corresponding
value on the vertical y-axis.
For the value of the independent variable on the
horizontal axis, we draw a line upward to the
regression line, e.g. 52. We draw a
perpendicular line from the value on the x-axis
to the regression line.
The estimate for the dependent variable is
obtained by drawing a line parallel to the x-axis
from the regression line to the vertical y-axis
and reading the value where this line crosses the
y-axis, e.g. 50.
13
The Effect of Scaling on the Scatterplot
• The scale used for the vertical
• y-axis can change the appearance of the
scatterplot and alter our interpretation of the
strength of the relationship. The three
scatterplots on this slide all use the same data.

In the original plot, the y-axis is scaled from 0
to 80.
14
The Assumption of Linearity
• An underlying assumption of regression analysis
is that the relationship between the variables is
linear, meaning that the points in the
scatterplot must form a pattern that can be
approximated with a straight line.
• While we could test the assumption of linearity
with a test of statistical significance of the
correlation coefficient, we will make a visual
assess tor scatterplots.
• If the scatterplot indicates that the points do
not follow a linear pattern, the techniques of
linear correlation and regression should not be
applied.

15
Examples of Linear Relationships
• These two scatterplots are for data on poverty of
nations. The plots below show strong linear
relationships. The points are evenly distributed
on either side of the regression line.

16
Examples of Non-linear Relationships
• These scatterplots show a non-linear
relationship. The points are not evenly
distributed on either side of the regression
line. We will often see a concentration of
points on one side of the regression line and an
absence of points on the other side.

17
The Regression Equation
• The regression equation is the algebraic formula
for the regression line, which states the
mathematical relationship between the independent
and the dependent variable.
• We can use the regression line to estimate the
value of the dependent variable for any value of
the independent variable.
• The stronger the relationship between the
independent and dependent variables, the closer
these estimates will come to the actual score
that each case had on the dependent variable.

18
Components of the Regression Equation
• The regression equation has two components.
• The first component is a number called the
y-intercept that defines where the line crosses
the vertical y axis.
• The second component is called the slope of the
line, and is a number that multiplies the value
of the independent variable.
• These two elements are combined in the general
form for the regression equation
• the estimated score on the dependent variable
• the y-intercept the slope the score
on the independent variable

19
The Standard Form of the Regression Equation
• The standard form for the regression equation or
formula is
• Y a bX
• where
• Y is the estimated score for the dependent
variable
• X is the score for the independent variable
• b is the slope of the regression line, or the
multiplier of X
• a is the intercept, or the point on the vertical
axis where the regression line crosses the
vertical y-axis

20
Depicting the Regression Equation
The regression equation includes both the
y-intercept and the slope of the line. The
y-intercept is 1.0 and the slope is 0.5.
The slope is the multiplier of x. It is the
amount of change in y for a change of one unit in
x. If x changes one unit from 2.0 to 3.0,
depicted by the blue arrow, y will change by 0.5
units, from 2.0 to 2.5 as depicted by the red
arrow.
• The y-intercept is the point on the vertical
y-axis where the regression line crosses the
axis, i.e. 1.0.

21
Deriving the Regression Equation
• In this plot, none of the points fall on the
regression line.
• The difference between the actual value for the
dependent variable and the predicted value for
each point is shown by the red lines. This
difference is called the residual, and represents
the error between the actual and predicted
values.
• The regression equation is computed to minimize
the total amount of error in predicting values
for the dependent variable. The method for
deriving the equation is called the "method of
least squares," meaning that the regression line
minimizes the sum of the squared residuals, or
errors between actual and predicted values.

22
Interpreting the Regression Equation the
Intercept
• The intercept is the point on the vertical axis
where the regression line crosses the axis. It
is the predicted value for the dependent variable
when the independent variable has a value of
zero.
• This may or may not be useful information
depending on the context of the problem.

23
Interpreting the Regression Equation the Slope
• The slope is interpreted as the amount of change
in the predicted value of the dependent variable
associated with a one unit change in the value of
the independent variable.
• If the slope has a negative sign, the direction
of the relationship is negative or inverse,
meaning that the scores on the two variables move
in opposite directions.
• If the slope has a positive sign, the direction
of the relationship is positive or direct,
meaning that the scores on the two variables move
in the same direction.

24
Interpreting the Regression Equation when the
Slope equals 0
• If there is no relationship between two
variables, the slope of the regression line is
zero and the regression line is parallel to the
horizontal axis.
• A slope of zero means that the predicted value of
the dependent variable will not change, no matter
what value of the independent variable is used.
• If there is no relationship, using the regression
equation to predict values of the dependent
variable is no improvement over using the mean of
the dependent variable.

25
Assumptions Required for Utilizinga Regression
Equation
• The assumptions required for utilizing a
regression equation are the same as the
assumptions for the test of significance of a
correlation coefficient.
• Both variables are interval level.
• Both variables are normally distributed.
• The relationship between the two variables is
linear.
• The variance of the values of the dependent
variable is uniform for all values of the
independent variable (equality of variance).

26
Assumption of Normality
• Strictly speaking, the test requires that the two
variables be bivariate normal, meaning that the
combined distribution of the two variables is
normal. It is usually assumed that the variables
are bivariate normal if each variable is normally
distributed, so this assumption is tested by
checking the normality of each variable.
• Each variable will be considered normal if its
skewness and kurtosis statistics fall between
1.0 and 1.0 or if the sample size is
sufficiently large to apply the Central Limit
theorem.

27
Assumption of Linearity
• Linearity means that the pattern of the points in
a scatterplot form a band, like the pattern in
the chart on the right
• When the pattern of the points follows a curve,
like the scatterplot on the right, the
correlation coefficient will not accurately
measure the relationship.

28
Test of Linearity
• The test of linearity is a diagnostic statistical
test of the null hypothesis that the linear model
is an appropriate fit for the data points. The
desired outcome for this test is to fail to
reject the null hypothesis.
• If the probability for the test of statistic is
less than or equal to the level of significance
for the problem, we reject the null hypothesis,
concluding that the data is not linear and the
Regression Analysis is not appropriate for the
relationship between the two variables.
• If the probability for the test of linearity
statistic is greater than the level of
significance for the problem, we fail to reject
the null hypothesis and conclude that we satisfy
the assumption of linearity.

29
Assumption of Homoscedasticity
• Homoscedasticity (equality of variances) means
that the points are evenly dispersed on either
side of the regression line for the linear
relationship.

In this scatterplot, the points extend about the
same distance above and below the regression line
for most of the length of the regression line.
This scatterplot meets the assumption of
homoscedasticity.
In this scatterplot, the spread of the points
around the regression line is narrower at the
left end of the regression line than at the right
end of the regression line. This funnel shape
is typical of a scatterplot showing violations of
the assumption of homoscedasticity.
30
Test of Homoscedasticity
• When we compared groups, we used the Levene test
of population variances to test for the
assumption that the group variances were equal.
• In order to use this test for the assumption of
homoscedasity, we will convert the interval level
independent variable into a dichotomous variable
with low scores in one group and high scores in
the other group. We can then compare the
variances of the two groups derived from the
independent variable.

31
Levene Test of Homogeneity of Variances
• The Levene test of equality of population
variances tests whether or not the variances for
the two groups are equal. It is a test of the
research hypothesis that the variance
(dispersion) of the group with low scores is
different from the variance of the group with
high scores. The null hypothesis that the
variance (dispersion) of both groups are equal.
• If the probability of the test statistic is
greater than 0.05, we do not reject the null
hypothesis and conclude that the variances are
equal. This is the desired outcome.
• If the probability of the test statistic is less
than or equal to 0.05, we conclude the variances
are different and the Regression Analysis is not
an appropriate test for the relationship between
the two variables.

32
The hypothesis test of r2
• The purpose of the hypothesis test of r2 is a
test of the applicability of our findings to the
population represented by the sample.
• When we studied association between two interval
variables, we stated that the Pearson r
correlation coefficient and its square, the
coefficient of determination measure the strength
of the relationship between two interval
variables. When the correlation coefficient and
coefficient of determination are zero (0), there
is no relationship.
• The hypothesis test of r2 is a test of whether or
not r2 is larger than zero in the population.

33
The hypothesis test of r2
• The research hypothesis states that r2 is larger
than zero. (a relationship exists)
• The null hypothesis states that r2 is equal to
zero. (no relationship)
• Recall that we interpreted the coefficient of
determination r2 as the reduction in error
attributable to with the relationship between the
variables.
• The test statistic is an ANOVA F-test which tests
whether or not the reduction in error associated
with using the regression equation is really
greater than zero.

34
How the regression ANOVA test works?
We will use the sample data we used for
correlation and regression to examine how the
hypothesis test for r2 works.
We are interested in the relationship between
family size and number of credit cards.
35
The scatter diagram or scatterplot
The dependent variable is plotted on the Y or
vertical axis.
The independent variable is plotted on the x or
horizontal axis.
36
The mean as the best guess
Without taking into account the independent
variable, our best guess for the number of credit
cards for any subject is the mean, 7.0.
37
Errors using the mean as estimate
Errors are measured by computing the difference
between the mean and each Y value, squaring the
differences, and then summing them. When we
compute the answer in SPSS, it will tell us that
the total amount of error is 22.0.
38
The regression line
The regression line minimizes the error (the best
fitting or least squares line)
39
The equation for the regression line
SPSS will give us the formula for the regression
line in the form Y a bX, or for these
variables Number of Credit Cards 2.871 .971
x Family Size
40
PRE reduction in error
SPSS also tells us the amount of error using only
the mean and using the regression line.
Error using mean only (total) 22.000
Error using regression line 5.486
Reduction in error associated with the regression 16.514
PRE measure (r2) 22.0-5.486 .751 22.0
41
The ANOVA test for the regression
The F statistic is calculated as the ratio of
error reduced by regressions divided the error
remaining. If the ratio were 1 and these two
numbers were the same, we would not have reduced
any error, there would be no relationship, and
the p-value would not let us reject the null
hypothesis. In this problem, the amount of error
reduced by the regression is large relative to
the amount remaining, so the F statistic is
large, the p-value(0.005) is smaller than the
alpha level of significance, so we reject the
null hypothesis.
42
Interpreting Pearsons r correlation coefficient
The square root of r2 is Pearsons r, the
correlation coefficient. If we want to
characterize the strength of the relationship, we
compare the size of r to the interpretive
guidelines for measures of association.
43
Interpreting the direction of the relationship
To interpret the direction of the relationship
between the variables, we look at the coefficient
for the independent variable. In this example,
the coefficient of 0.971 is positive, so we would
interpret this relationship as Families with
more members had more credit cards.
44
Testing Assumptions in Homework Problems
• The process of testing assumptions can easily
overwhelm the task of testing the significance of
the relationship.
• Since our emphasis here is testing the hypothesis
that the relationship is generalizable to the
population represented by the sample data, we
will assume that our data satisfies the
assumptions without explicitly testing
assumptions.

45
Homework Problem Questions
• The question in the homework problems requires us
to look at three things
• Does the hypothesis test support the existence of
a relationship in the population?
• Is the strength of the relationship characterized
correctly?
• Is the direction of the relationship between the
variables correctly stated?

46
Practice Problem 1
This question asks you to use linear regression
to examine the relationship between marital and
age. Linear regression requires that the
dependent variable and the independent variables
be interval. Ordinal variables may be included as
interval variables if a caution is added to any
true findings. The dependent variable marital
is nominal level which does not satisfy the
requirement for a dependent variable. The
independent variable age is interval level,
satisfying the requirement for an independent
variable.
47
Practice Problem - 2
This question asks you to use linear regression
to examine the relationship between fund and
attend. The level of measurement requirements
for multiple regression are satisfied fund is
ordinal level, and attend is ordinal level. A
caution is added because ordinal level variables
are included in the analysis. Given the
assumption that the distributional requirements
for linear regression are satisfied, you can
conduct a linear regression using SPSS without
examining distributional assumptions for the
variables.
48
Linear Regression Hypothesis Test in SPSS (1)
You can conduct a linear regression
using Analyze gt Regression gt Linear
49
Linear Regression Hypothesis Test in SPSS (2)
Move the dependent variable to Dependent and
the independent variable to Independent(s)
boxes and then click OK button.
50
Linear Regression Hypothesis Test in SPSS (3)
Based on the ANOVA table for the linear
regression (F(1, 604) 70.579, plt0.001), there
was an relationship between the dependent
variable "degree of religious fundamentalism" and
the independent variable "frequency of attendance
at religious services". Since the probability of
the F statistic (plt0.001) was less than or equal
to the level of significance (0.05), the null
hypothesis that correlation coefficient (R) was
equal to 0 was rejected. The research
hypothesis that there was a relationship between
the variables was supported.
51
Linear Regression Hypothesis Test in SPSS (4)
Given the significant F-test result, the
correlation coefficient (R) can be interpreted.
The correlation coefficient for the
relationship between the independent variable and
the dependent variable was 0.323, which would be
characterized as a weak relationship using the
rule of thumb that a correlation between 0.0 and
0.20 is very weak 0.20 to 0.40 is weak 0.40 to
0.60 is moderate 0.60 to 0.80 is strong and
greater than 0.80 is very strong. The
relationship between the independent variables
and the dependent variable was incorrectly
characterized as a moderate relationship. The
relationship should have been characterized as a
weak relationship. The answer to the problem is
false.
52
Practice Problem 3
This question asks you to use linear regression
to examine the relationship between educ and
age. educ and age are interval level,
satisfying the level of measurement requirements
for regression. Given the assumption that the
distributional requirements for linear regression
are satisfied, you can conduct a linear
regression using SPSS without examining
distributional characteristics of variables.
53
Linear Regression Hypothesis Test in SPSS (5)
You can conduct a linear regression
using Analyze gt Regression gt Linear
54
Linear Regression Hypothesis Test in SPSS (6)
Move the dependent variable to Dependent and
the independent variable to Independent(s)
boxes and then click OK button.
55
Linear Regression Hypothesis Test in SPSS (7)
Based on the ANOVA table for the linear
regression (F(1, 659) 9.983, p0.002), there
was an relationship between the dependent
variable "highest year of school completed" and
the independent variable "age". Since the
probability of the F statistic (p0.002) was less
than or equal to the level of significance
(0.05), the null hypothesis that correlation
coefficient (R) was equal to 0 was rejected.
The research hypothesis that there was a
relationship between the variables was supported.
56
Linear Regression Hypothesis Test in SPSS (8)
Given the significant F-test result, the
correlation coefficient (R) can be interpreted.
The correlation coefficient for the
relationship between the independent variable and
the dependent variable was 0.122, which can be
characterized as a very weak relationship. .
57
Linear Regression Hypothesis Test in SPSS (9)
The b coefficient for the independent variable
"age" was -.021, indicating an inverse
relationship with the dependent variable. Higher
numeric values for the independent variable "age"
age are associated with lower numeric values
for the dependent variable "highest year of
school completed" educ. The statement in the
problem that "survey respondents who were older
had completed more years of school" is incorrect.
The direction of the relationship is stated
incorrectly.
58
Practice Problem 4
This question asks you to use linear regression
to examine the relationship between sei and
age. sei and age are interval level,
satisfying the level of measurement requirements
for regression. Given the assumption that the
distributional requirements for linear regression
are satisfied, you can conduct a linear
regression using SPSS without examining
distributional characteristics of variables.
59
Linear Regression Hypothesis Test in SPSS (10)
You can conduct a linear regression
using Analyze gt Regression gt Linear
60
Linear Regression Hypothesis Test in SPSS (11)
Move the dependent variable to Dependent and
the independent variable to Independent(s)
boxes and then click OK button.
61
Linear Regression Hypothesis Test in SPSS (12)
Based on the ANOVA table for the linear
regression (F(1, 629) .266, p0.606), there was
no relationship between the dependent variable
"socioeconomic index" and the independent
variable "age". Since the probability of the F
statistic (p0.606) was greater than the level of
significance (0.05), the null hypothesis that
correlation coefficient (R) was equal to 0 was
not rejected. The research hypothesis that
there was a relationship between the variables
was not supported.
62
Steps in solving Linear Regression Hypothesis
Test Problems - 1
The following is a guide to the decision process
Regression Hypothesis Test problems
Are the dependent and independent variables
ordinal or interval level?
Incorrect application of a statistic
No
Yes
Make sure that the assumption that the
distributional requirements for linear regression
are satisfied is made. Otherwise, you have to
check the assumption first.
Our regression problems will assume that the
assumptions are met.
63
Steps in solving Linear Regression Hypothesis
Test Problems - 2
Conduct the linear regression analysis
Is the p-value in the ANOVA table for the F ratio
test lt alpha?
No
False
Yes
Is the interpretation of the strength of the
correlation coefficient correct?
No
False
Yes
64
Steps in solving Linear Regression Hypothesis
Test Problems - 3
Is the direction of the relationship correctly
stated?
No
False
Yes
Are either of the variables ordinal level?
No
True
Yes
True with caution