Linear Models Lecture 1 - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Linear Models Lecture 1

Description:

Linear Regression & General Linear Models (GLM's) Confounding, Interactions and Regression ... Insert a table of betas. Interpreting Regression Output ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 48
Provided by: generalmed
Category:
Tags: betas | lecture | linear | models

less

Transcript and Presenter's Notes

Title: Linear Models Lecture 1


1
Linear Models Lecture 1 Regression Analysis
2
  • Lecture Agenda
  • Homework Review/Questions
  • Linear Regression General Linear Models (GLMs)
  • Confounding, Interactions and Regression
  • Assumptions and Model Fitting
  • In depth review of pre-model work
  • Working with categorical variables
  • Lab

3
Lecture Agenda Readings for Next Week Chapters
10-12, 14-16
4
General Linear Models (GLMs).
Statistical Techniques
Inferential
Descriptive
GLMs
5
  • General Linear Models (GLMs).
  • Class of inferential statistical procedures used
    to estimate the expected value of a dependent
    (outcome) variable as a linear function of 1 or
    more independent (predictor) variables.
  • All based on a variant of basic algebraic linear
    equation

Y ? ???????X1??
or
E(yx) ? a ??X1
6
The Regression Equation
predicted value of y
Intercept or estimated value of y when all X0
Best fitting coefficient for X1
Best fitting coefficient for X2
number of independent variables in regression
equation
the residual
Here, we are assuming that all variables (Y Xs
are continuous)
7
  • General Linear Models (GLMs).
  • Many common forms of GLM
  • ANOVA
  • Linear Regression
  • ANCOVA
  • t-test
  • Pearson correlation
  • Many more
  • All are designed for modeling a continuous
    dependent variable assuming certain conditions.
  • Specific form of the equation varies in
    appearance based on the nature and number
    predictors being considered as well as on the
    relationships between observations.

8
  • General Linear Models (GLMs).

All of these techniques are really special cases
or expressions of the same basic modeling
procedure.
9
  • Basics of Linear Regression / GLM Analysis
  • Measures the strength of the linear relationship
    between a dependent variable (y) and one or more
    independent variables (X).
  • Estimates the equation that minimizes the spread
    of points around the best fitting line (known as
    Least Squares Estimation).
  • Use components of the equation to test specific
    hypotheses about one or more predictors.

y????????X1??
r 0.74
10
  • Relationship between Regression and Correlation
  • Regression finds the coefficients that produce
    the best fitting line
  • Correlation measures how tightly those points fit
    around line in fact, the square of the multiple
    correlation (called r-squared) is the primary
    means of assessing the quality of a regression
    model..

y????????X1??
r 0.74 Rsq 0.55
11
Simple Regression vs. Multiple Regression
  • Simple regression - 1 predictor variable
  • Example what is the impact of stress on eating
    problems?
  • Good for controlled research settings and
    hypothesis generation
  • Multiple regression 2 predictor variables
  • Example what are the impacts of stress, gender
    and age on eating problems?
  • Much more commonly used in research than simple
    regression to deal with problems of uncontrolled
    confounding and possible interaction effects.
  • Also very useful for generating multivariable
    predictions

12
Types of hypotheses you can test with predictors
in multiple regression models
  • Q Is the entire set of predictor variables
    associated with the outcome?
  • H0 B1B2Bk0 or RSQ0
  • HA Not H0
  • Tested with the ANOVA TABLE
  • Q Is a partial set of predictor variables
    associated with the outcome?
  • H0 B1B2Bk (given Ba-Bl) 0 or RSQ change0
  • HA Not H0
  • Tested with the RSQ change test based on ANOVA
    TABLE
  • Q Is an individual predictor variable associated
    with the outcome?
  • HO B1 0 or RSQ(X1,YX2-Xk)0
  • H1 B1 0 or RSQ(X1,YX2-Xk) 0
  • Tested with Wald tests for individual coefficient
    estimates

13
Logic For Multivariable Regression
Models Confounding Effects
  • What is confounding?
  • Bias in an estimated effect due to 1 or more
    uncontrolled 3rd variables
  • Artifact of uncontrolled study designs
  • Why worry about confounding?
  • Can cause misleading results and incorrect
    inferences
  • Can make you look foolish

14
Confounding and Interaction Effects
Example Study of the effects of coffee drinking
on lung cancer
Simple Model
B 2.5
lung cancer
coffee
Multivariable Model
B 1.2
lung cancer
coffee
B 3.0
smoking
15
Confounding Effects
  • How can multivariable modeling help with
    confounding?
  • Produces estimates for effects of interest
    adjusted for potential confounding effects
  • Provides a means of looking at data acknowledging
    multifactor causes

16
Logic For Multivariable Regression
Models Interaction Effects
  • What is an interaction effect?
  • A relationship between 2 independent variables
    that represents a non-additive relationship with
    the dependent variable
  • Why worry about interactions?
  • Can represent important, scientifically
    interesting findings
  • Failure to specify can result in a sub-optimal
    model fit and incorrect inferences
  • Handle with care though can also just be
    nonsense effects, particularly with very large
    samples.

17
Interaction Effects
Example Study of the effects of asbestos exp and
smoking on lung cancer
asbestos
B 2.0
B 2.0
lung cancer
smoking
asbestos smoking
B 10.0
18
Confounding and Interaction Effects
  • How can multivariable modeling help with
    interactions?
  • Allow you to specify specific interactions of
    interest
  • Lets you judge whether the inclusion of a
    specific interaction is justified based on
    significance, improvement in model fit, increased
    complexity and logical interpretation

19
Assumptions of GLM / Linear Regression Modeling
model y ? ???????X1?b2X2 ?b3X1X2 ?
  • Statistical assumptions of Linear Regression /
    GLM.
  • Random Sample
  • DV (y) is continuous
  • IVs (Xs) are linearly related to the DV
  • Errors (e) normally distributed around best
    fitting line
  • Errors (e) have constant variance around best
    fitting line
  • Errors (e) uncorrelated with each other and all
    IVs.
  • Related Assumptions of Good Modeling
  • Variables used appropriately
  • Simpler is better (Occams razor)
  • IVs not perfectly correlated with each other
  • Variables measured without or nearly without
    error
  • Model is correctly specified and makes sense

20
Regression Example patient comes to clinic with
a standardized stress level score of 16 (X), what
do we predict for his eating disorder level (Y)?
We predict eating problems to be 13.5.
1 3 5 7 9 11 13 15 17
19 eating problems
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 Stress
Bob stress16
Q If you know pts stress level, can you predict
the of eating problems.
21
How do we find the best line for the data?
  • Method of Least-Squares
  • Finds the bs that minimize the squared
    deviations between each observed point and the
    regression line

D
B
1 3 5 7 9 11 13 15 17
19 eating problems
Deviations (also called residuals)
E
A
C
F
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 Stress
22
How good is the model?
  • Variance Explained
  • R-Squared and Adjusted R-Squared
  • Ratio of variance explained by the model to total
    variance in Y
  • Ranges from 0 to 1
  • Often interpreted as the proportion of variance
    explained by the joint set of predictors as
    specified
  • Statistical Significance
  • F-Test for Model R-square
  • Based on the ratio of the Model Variance and
    Error Variance
  • A p lt .05 indicates that the model performs
    better than chance at predicting Y.

23
How potent is the model?
Interpretation depends on correct specification
24
What can we learn from the ?s?
  • Looking at each betai
  • Each represents the expected change in y for each
    unit change in its relevant Xi
  • The sign (/-) of each tells us the direction of
    the relationship (direct or inverse) between Xi
    and Y.
  • if standardized, the magnitude tells us the
    relative importance of Xi relative to other Xis.
  • Each represents the marginal effect of Xi
    adjusted for all other Xis in the model.
  • Taken as a group, the equation can be used to
    estimate the expected value of Y for any
    combination of Xis.

25
Are the ?s significant?
  • To determine whether a b is significant, we
    calculate a t-test of whether b0 (if b0, it has
    no influence on the D.V.)
  • To calculate the t-value, we must rely on SPSS to
    produce standard errors of the b, which are based
    on the variance of the deviations around the line.

df n-2
  • An alternative way to test whether a b is
    significant is to estimate the R2 change test
    based on the model with and without Xi (well go
    into more later).

26
What can we learn from the ?s?
Insert a table of betas
27
Interpreting Regression Output
  • Examine model specification / missing values
    does all look kosher?
  • Examine ANOVA table determines whether the
    model is potent beyond chance and how much
    variance in DV it accounts for
  • F-test
  • Adjusted R-squared
  • Examine Model Coefficients table
  • Look at sign and magnitude of coefficients do
    they make sense?
  • Look at tests of significance (called Wald tests)
  • Examine model diagnostics
  • Write-up results

28
Basic Model Fitting Approach
  • Pre-model specification work
  • Pre-model descriptives and diagnostics
  • Final prep of analysis variables
  • Select best set of predictor variables
  • Fit preliminary model(s)
  • Model diagnostics
  • Remedial measures
  • Fit final best-fitting model
  • Derive tests of interest
  • Interpret and summarize findings

Repeat 4-6 as needed
29
Before Fitting a Linear Model (1)
  • 1. Develop logic model for variables of interest
    based on your study hypotheses
  • Outcome measure (DV)
  • Main effect or effects of interest (IVs)
  • Potential confounders (IVs)
  • Potential interactions among (IVs)
  • Write out and/or draw out hypotheses!
  • H0 b1 b2,b3 0

30
Before Fitting a Linear Model (1)
  • Develop logic model for variables of interest
    based on your study hypotheses
  • Good idea to draw it out

H0 b1(b2-b6) 0
IV 1
b1
b4
b2
b5
DV
Mediator 1
IV 2
b6
IV 3
Logic model has huge implications for how you fit
your linear model.
31
Before Fitting a Linear Model (2)
  • 2. Run descriptives on all variables of interest
  • Understand your data before you begin
  • Identify potential data errors
  • Identify missing values
  • Recode and/or toss variables if necessary

32
Pre-model check of model variables
  • Check distributions of variables of interest
  • Nonsense values
  • Missing values

Issues recoding may be necessary
33
Pre-model check of model variables
  • Check distributions of variables of interest
  • box plots / histograms
  • stem-and-leaf plots

34
Before Fitting a Linear Model (3)
  • 3. Run Scatterplots between IVs and DV help
    answer key questions.
  • Do relationships look linear?
  • Are there outlier or non-fitting points?

35
Graphical Tests of the assumptions
  • Linear relationship between each X and Y

Assumption Met
Assumption NOT Met
36
Pre-model check of model variables
Possible outliers
Linearity may be a problem if left unaddressed
37
Pre-model check of model variables simple fixes
Assumption of linearity becomes more tenable by
recoding educ lt 12 to equal 12
38
Before Fitting a Linear Model (4)
  • 4. Select the best set of predictors.
  • Does each variable contribute anything meaningful
    to the model
  • Association with DV
  • Association with main effect of interest
    (confounding).
  • Theory vs. Statistical Fishing
  • Look at correlations
  • Among predictors
  • Part and partial correlations correlations
    between y-residual and either x or x-residual

39
Pre-model check of partial correlations
40
What if the Xs arent continuous?
  • Classic linear regression requires that all
    predictor variables be continuous.
  • Researchers often want to consider predictor
    variables that are categorically coded, however
    (gender, presence/absence of a risk factor,
    presence/absence of an infection)
  • Categorical variables can be re-coded for
    inclusion in a linear model alongside of
    categorical predictors.
  • Slightly different interpretation
  • Model effectively becomes an ANCOVA model

41
Dummy Coding Categorical Xs
  • Dummy Coding - create one vector (or new
    variable) for each level (minus one) of the
    categorical variable you are re-coding. Use a 1
    if the person belongs in the group and a 0 if
    they dont.
  • Examples
  • Gender --gt Male Female
  • 2 levels - so you will need 1 vector
  • Vector 1 --gt Make all females 1, all males 0
  • Marital Status --gt married, single, divorced
    widowed
  • 4 levels - so you will need 3 vectors
  • x1 x2 x3
  • Married 1 0 0
  • Single 0 1 0
  • Divorced 0 0 1
  • Widowed 0 0 0

The ? for each vector tests whether the group
with a 1 is significantly different than the
group with all 0s. For example, the b-weight
for X1 tests whether married is significantly
different than widowed.
42
Using Dummy Coded Xs in Regression
  • Dummy-coded Xs can be used right alongside
    continuous predictors.
  • Estimated bs are interpreted as the expected
    mean change in the outcome for the dummy-variable
    relative to the reference group.
  • Properties of dummy variables
  • More flexible than continuous variables
  • Because they represent only one level, linearity
    of the relationship is not a concern.
  • Not as efficient as continuous variables
    require a model degree of freedom for each level
  • Tend to not be highly correlated with other
    predictors
  • Pre-model steps simpler with categorical
    variables
  • Descriptives boil down to frequencies
  • One-way ANOVAs take the place of correlations
  • Suggest skipping pre-model partial correlation
    testing.

43
Using Dummy Coded Xs in Regression
44
Lab
45
Demo
Using the world95.sav data set included with
SPSS, use correlation and regression techniques
to build a linear regression model of log gross
domestic product (log_gdp). H0
B1(lit_fema)B2-BK(covariates)0 Unit of
Analysis Country Key covariates Region
(categorical) Population Density
(continuous) Urban Population Percentage
(continuous)
46
Demo
Topics covered Descriptives Frequencies Scatte
rplots Including Categorical Covariates Regressi
on
47
Homework
  • Using the employee data.sav data set included
    with SPSS, use correlation and regression
    techniques to model current salary using
    education and previous experience.
  • Model diagram/hypotheses
  • Descriptives / Frequencies
  • Scatterplots
  • Simple correlations
  • Fit multiple regression and review output
  • Interpret
  • Add minority status and female gender to the
    model
  • Create dummy variables where necessary
  • Are one or both categorical effects significant?
  • Interpret the regression output
Write a Comment
User Comments (0)
About PowerShow.com