Regression PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: Regression


1
Regression
  • How much should you pay for a house?
  • Would you consider the median or mean sales price
    in your area over the past year as a reasonable
    price?
  • What factors are important in determining a
    reasonable price?
  • Amenities
  • Location
  • Square footage
  • To determine a price, you might consider a model
    of the form
  • Price f(square footage) e

2
Scatter plots
  • To determine the proper functional relationship
    between two variables, construct a scatter plot.
  • For the home sales data below, what sort of
    functional relationship exists between Price and
    SQFT (square footage)?

3
Simple linear regression
  • The simplest model form to consider is
  • Yi b0 b1Xi ei
  • Yi is called the dependent variable or response.
  • Xi is called the independent variable or
    predictor.
  • ei is the random error term which is typically
    assumed to have a Normal distribution with mean 0
    and variance s2.
  • We also assume that error terms are independent
    of each other.

4
Least squares criterion
  • If the simple linear model is appropriate then we
    need to estimate the values b0 and b1.
  • To determine the line that best fits our data, we
    choose the line that minimizes the sum of squared
    vertical deviations from our observed points to
    the line.
  • In other words, we minimize

5
Least squares estimators
6
Least squares estimators
7
Home sales example
  • For the home sales data, what are least squares
    estimates for the line of best fit for Price as a
    function of SQFT?

8
Inference
  • Often times, inference for the slope parameter,
    b1, is most important.
  • b1 tells us the expected change in Y per unit
    change in X.
  • If we conclude that b1 equals 0, then we are
    concluding that there is no linear relationship
    between Y and X.
  • If we conclude that b1 equals 0, then it makes no
    sense to use our linear model with X to predict
    Y.
  • has a Normal distribution with a mean of b1
    and a variance of .

9
Hypothesis test for b1
  • To test H0 b1 D0, use the test statistic

10
Home sales example
  • For the home sales data, is the linear
    relationship between Price and SQFT significant?

11
Confidence interval for b1
  • A (1-a)100 confidence interval for b1 is
  • For the home sales data, what is a 95 confidence
    interval for the expected increase in price for
    each additional square foot?

12
Confidence interval for mean response
  • Sometimes we want a confidence interval for the
    average (expected) value of Y at a given value of
    X x.
  • With the home sales data, suppose a realtor says
    the average sales price of a 2000 square foot
    home is 120,000. Do you believe her?
  • has a Normal distribution with a
    mean of b0 b1x and a variance of

13
Confidence interval for mean response
  • A (1-a)100 confidence interval for b0 b1x is
  • With the home sales data, do you believe the
    realtors claim?

14
Prediction interval for a new response
  • Sometimes we want a prediction interval for a new
    value of Y at a given value of X x.
  • A (1-a)100 prediction interval for Y when X x
    is
  • With the home sales data, what is a 95
    prediction interval for the amount you will pay
    for a 2000 square foot home?

15
Extrapolation
  • Prediction outside the range of the data is risky
    and not appropriate as these predictions can be
    grossly inaccurate. This is called
    extrapolation.
  • For our home sales example, the prediction
    formula was developed for homes that were less
    than 3750 square feet, is it appropriate to use
    the regression model to predict the price of a
    home that is 5000 square feet?

16
Correlation
  • The correlation coefficient, r, describes the
    direction and strength of the straight-line
    association between two variables.
  • We will use StatCrunch to calculate r and focus
    on interpretation.
  • If r is negative, then the association is
    negative. (A cars value vs. its age)
  • If r is positive, then the association is
    positive. (Height vs. weight)
  • r is always between 1 and 1 (-1 lt r lt 1).
  • At 1 or 1, there is a perfect straight line
    relationship.
  • The closer to 1 or 1, the stronger the
    relationship.
  • The closer to 0, the weaker the relationship.
  • Understanding Correlation
  • Correlation by eye

17
Home sales example
  • For the home sales data, consider the correlation
    between the variables.

18
Correlation and regression
  • The square of the correlation, r2, is the
    proportion of variation in the value of Y that is
    explained by the regression model with X.
  • 0 ? r2 ? 1 always. The closer r2 is to 1, the
    better our model fits the data and the more
    confident we are in our prediction from the
    regression model.
  • For the home sales example, r2 0.7137 between
    price and square footage, so about 71 of the
    variation in price is due to square footage.
    Other factors are responsible for the remaining
    variation.

19
Association and causation
  • A strong relationship between two variables does
    not always mean a change in one variable causes
    changes in the other.
  • The relationship between two variables is often
    due to both variables being influenced by other
    variables lurking in the background.
  • The best evidence for causation comes from
    properly designed randomized comparative
    experiments.

20
Does smoking cause lung cancer?
  • Unethical to investigate this relationship with a
    randomized comparative experiment.
  • Observational studies show strong association
    between smoking and lung cancer.
  • The evidence from several studies show consistent
    association between smoking and lung cancer.
  • More and longer cigarettes smoked, the more often
    lung cancer occurs.
  • Smokers with lung cancer usually began smoking
    before they developed lung cancer.
  • It is plausible that smoking causes lung cancer
  • Serves as evidence that smoking causes lung
    cancer, but not as strong as evidence from an
    experiment.
Write a Comment
User Comments (0)
About PowerShow.com