Empirical Modeling : Linear Regression - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Empirical Modeling : Linear Regression

Description:

Knowing only the mean of the response variable is just as good in predicting ... information about fellow students average amount of study time per week as well as GPA ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 25
Provided by: cri105
Category:

less

Transcript and Presenter's Notes

Title: Empirical Modeling : Linear Regression


1
Empirical Modeling Linear Regression

2
Fitting a continuous function to noisy data
  • Can we find a relationship between these two
    variables?
  • What about a linear relationship?
  • Intuitively, we say that two variables are
    linearly dependent if one either increases (or
    decreases) the other changes as well.

3
Linear dependence
  • But how do we know if there is a strong linear
    relationship (dependence) between the variable.
  • The correlation coefficient, r, between the two
    variables will tell you the if the variables are
    linearly dependent.

Mathematica
Excel
4
Linear dependence
  • But how do we know if there is a strong linear
    relationship (dependence) between the variable.
  • The correlation coefficient, r, between the two
    variables will tell you the if the variables are
    linearly dependent.
  • If r 1 then the values line on a straight
    line ( with positive slope is r 1 and negative
    slope if r -1.
  • If r 0 there is no dependence between the
    variables.

5
CORRELATION DOES NOT IMPLY CAUSALITY
  • When we say that x and y are highly correlated,
    this means that x and y vary together
  • There is no implication that changes in x causes
    changes in y.

6
CORRELATION DOES NOT IMPLY CAUSALITY
  • The following table provides information on life
    expectancies for a sample of 22 countries. It
    also lists the number of people per television
    set in the country

The correlation coefficient is -.758
7
Linear Regression
  • Objectives
  • Assess the significance of the predictor
    variables in explaining the variability or
    behavior of the response variable
  • Predict the values of the response variable given
    the values of the predictor variables

8
The Simple Linear Regression Baseline Model
  • In the baseline model, there is no association
    between the response variable and the predictor
    variable
  • Knowing only the mean of the response variable is
    just as good in predicting values of the response
    variable as knowing the values of the predictor
    variable as well.
  • To find the mean of the response variable
  • where there are n data points.

9
The Simple Linear Regression
  • The relationship between the response variable
    and the predictor variable can be characterized
    by
  • Where
  • Y response variable
  • X predictor variable
  • b0 intercept parameter
  • b1 slope parameter
  • e error term representing deviations of Y
    from

10
The Simple Linear Regression
  • In order to determine whether a simple linear
    regression model is better than the baseline
    model, you must compare the explained variability
    to the unexplained variability.
  • The explained variability is related to the
    difference between the regression line and the
    mean of the response variable
  • The unexplained variability is related to the
    difference between the observed data values and
    the regression line.

Unexplained
Total
Explained
11
Assumptions for Linear Regression
  • The mean of the Ys is accurately modeled by a
    linear function of the Xs
  • The errors are independent.

12
The Simple Linear Regression Model Hypothesis Test
  • Null Hypothesis
  • The simple linear regression model does not fit
    the data better than the baseline model b10
  • Alternative Hypothesis
  • The simple linear regression model does fit the
    data better than the base model b10
  • The test will help decide if the linear term,
    b1x, is significant (important) to the model

13
The Simple Linear Regression Model Hypothesis Test
  • H0 b10 vs Ha b10
  • A p-value in the hypothesis test with tell you if
    there is enough evidence to reject the null
    hypothesis.
  • You can reject the null hypothesis with 95
    confidence if plt0.025.
  • You can reject the null hypothesis with 99
    confidence if plt0.005.
  • If there is not enough evidence to reject the
    null hypothesis, then the linear term is not
    significant to the model.

14
Fitting a Line to the data
  • Assume that the data points are (xi,yi)
  • We would like to fit a function to the data.
  • Once we determine that the data appears to be
    linear we decide to find the data with
    b0b1x
  • Therefore when we plug in xi into the equation of
    the line we will get out estimates for the data

15
Fitting a Line to the data Using Least Squares
Criterion
  • We would like to pick the line in a way such that
    the error, or residual, between the actual yi and
    the estimated is minimized
  • Least squares determines b0 and b1 minimizing

16
Fitting a Line to the data Using Least Squares
Criterion
  • An important question is how well a model fits
    your observed data. The R2 statistic measures
    the amount of variation in the data this is
    explained by a model.
  • Excel
  • The data on the right was fitting with the right
    line (the equation of the line is above the
    graph)
  • The R2 value for this model is approximately .985

17
Does Increased Study Time Increase GPA?
  • You are curious about how much you need to study
    per week to keep your GPA up so you collect
    information about fellow students average amount
    of study time per week as well as GPA
  • Goal To create a model which will predict the
    GPA of a student given the amount of time they
    are willing to study per week.

Excel
18
Multiple Linear Regression with 2 predictor
variables
  • Consider the two-variable model
  • where
  • Y response (dependent) variable
  • X1 and X2 predictor (independent) variables.
  • e error term
  • b0, b1, and b2 unknown parameters.

19
Multiple Linear Regression with 2 predictor
variables
  • If there is no relationship among Y and X1 and
    X2, the model looks like a horizontal plane
    (Yb2, X10, X20)

20
Multiple Linear Regression with 2 predictor
variables
  • Consider the two-variable model
  • where
  • Y response (dependent) variable
  • X1 and X2 predictor (independent) variables.
  • e error term
  • b0, b1, and b2 unknown parameters.

21
The Multiple Linear Regression Model Hypothesis
Test
  • Null Hypothesis
  • The multiple linear regression model does not fit
    the data better than the baseline model b1 b2
    0
  • Alternative Hypothesis
  • The multiple linear regression model does fit the
    data better than the base model
  • b1 and b2 are not both 0.
  • The test will help decide if the linear terms,
    b1x1 and b2x2, are significant (important) to the
    model

22
What Effects GPA?
  • You are curious about what variables might effect
    your GPA so you collect information about fellow
    students average amount of study time per week,
    time spent in the teachers office per week,
    number of alcoholic drinks per week, and GPA
  • Goal To create a model which will predict the
    GPA of a student.

Excel
23
Multiple Linear Regression In General

24
The Multiple Linear Regression Model Hypothesis
Test
  • Null Hypothesis
  • The multiple linear regression model does not fit
    the data better than the baseline model
  • b1 b2 bk0
  • Alternative Hypothesis
  • The multiple linear regression model does fit the
    data better than the base model
  • Not all bis are 0.
Write a Comment
User Comments (0)
About PowerShow.com