Quantitative Methods - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Quantitative Methods

Description:

... functions, see STATA's Help on 'Functions and expressions'/'math functions. Homework ... Include all the results and answers in the file you send me ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 22
Provided by: stu104
Category:

less

Transcript and Presenter's Notes

Title: Quantitative Methods


1
  • Quantitative Methods Week 5
  • Linear Regression Analysis

Roman Studer Nuffield College roman.studer_at_nuffiel
d.ox.ac.uk
2
Homework (II)
  • Many factors (variables) are potentially
    associated with the drop in crime rates in the
    US. Where do you find correlations between a
    variable and the falling crime rate? Which
    variables are positively, which ones negatively
    correlated with the crime rate variable? Where do
    you not find any correlation? Explanations?

Concept/Variable
Measured Variable
Correlation
Strong economy
Unemployment rate, poverty level
Reliance on prisons
Number of prisoners
Education
Educational Attainment
Legalizing abortion
Number of abortions
  • How did you solve the lag problem with the
    abortion variable?

3
Simple Linear Regression Introduction
  • As with correlation, we are still looking at the
    relationship between two (ratio level) variables,
    but now we do not treat them symmetrically
    anymore, but make a distinction between the two

influences
y x
Dependent variable Independent variable
Explained variable Explanatory variable
  • As a consequence, the question that can be
    anwered with correlation analysis and regression
    analysis are different

Correlation Regression
Is there an association between X and Y? How exactly does Y change if X changes?
How strong? Positive or negative? How much of the change in Y is explained?
4
Simple Linear Regression Introduction (II)
  • Therefore, we move from issues of mere
    association to issues of causality
  • HOWEVER A simple linear regression cannot
    establish causation!
  • We assume that causality runs from x to y
  • Sometimes this assumption is questionable, then
    more sophisticated models and tests are needed
  • Simultaneous equation models
  • Two-stage least square regressions
  • Causality tests
  • So we have to be careful when making such
    assumptions. Lets look at some pairs of
    variables.
  • Which is the explanatory variable and which the
    dependent variable?
  • In which cases might you expect a mutual
    interaction between the two variables?
  • Height and weight
  • Birth rate and marriage rate
  • Rainfall and crop yield
  • Rate of unemployment and level of relief
    expenditure in British parishes
  • Government spending on welfare programmes and the
    voting share of left-liberal parties
  • CO2 emissions and global warming

5
Simple Linear Regression Introduction (III)
  • Straight line (linear) relationship between two
    variables
  • The relationship between two variables can take a
    nonlinear form
  • Kuznets (1955) hypothesised that the relationship
    between the level of economic development (x) and
    income inequality (y) takes the form of a
    nonlinear (inverted-U shaped) form

Income Inequality
Economic Development
  • Multiple or multivariate regression includes two
    or more explanatory variables

6
The Equation of a Straight Line
  • The equation of a straight line Y a bX

Example y23x
7
The Equation of a Straight Line (II)
  • Equation of a straight line YabX
  • Y and X are the dependent and the independent
    variable respectively and y1, y2, , yn and x1,
    x2, , xn their values
  • a is the intercept. It determines the level at
    which the straight line crosses the vertical
    axis, i.e. it gives the value of y when x0
  • b measures the slope of the line
  • Positive relationship bgt0
  • No relationship b0
  • Negative Relationship blt0
  • If x increases by one unit, y will increase by b
    units

8
Fitting the Regression Line
  • How do we find the line, which is the best fit,
    i.e. the line that describes the linear
    relationship between X and Y the best
  • This is an issue as in the real world,
    relationships between two variables never follow
    a completely linear pattern

9
Fitting the Regression Line (II)
  • The regression line predicts the values of Y
    based on the values of X. Thus, the best line
    will minimise the deviation between the predicted
    and the actual values (the error, e)

10
Fitting the Regression Line (III)
  • However, to avoid the problem that positive and
    negative deviations cancel out, we look at
    squared deviations (Yi Yi)2
  • Also, as we want to minimise the total errors, we
    are interested in minimising the sum of all
    squared errors
  • Therefore, the regression line it the line that.
  • minimizes the sum of the squares of the vertical
    deviation of all pairs of the values of X and Y
    from the regression line
  • This estimation procedure is known as ordinary
    least squares regression (OLS). Its formal
    derivation yields the two formulae needed to
    calculate the regression line, i.e. a formula for
    the intercept (a) and a formula for the slope (b)

11
Fitting the Regression Line (IV)
  • With these two formulae, we can actually easily
    calculate the regression line for small datasets
    by hand. However, for large datasets and when we
    have more than one explanatory variable, we use
    the Stata to do it for us

12
The Goodness of Fit
  • Once we get the best fit regression line, we
    still want to know how good a fit this line
    really is
  • How much of the variation in Y is explained by
    this regression line?
  • The measure to describe the explanatory power of
    a regression is the coefficient of determination
    r2, which is equal to the square of the
    correlation coefficient
  • It is a measure of the success with which the
    movements in Y are explained by the movements in
    X
  • In particular it measures how much of the
    derivation of Y from the mean of Y is explained
    by the regression

13
The Goodness of Fit (II)
  • This is the interactive part of the class.
  • Please explain the concept depicted in the
    following graph in your own words

Regression line
x
14
The Goodness of Fit (III)
Total variation explained variation
unexplained variation TSS ESS USS R²ESS/TSS
Explained Sum of Squares/Total Sum of Squares
15
The Goodness of Fit (IV)
  • R2 gives the proportion of the sample variation
    in y that is explained by x
  • R2 0.35 means that the explanatory variable
    explains 35 of the variation in the dependent
    variable
  • R2 ranges between 0 and 1
  • The higher R2 the better the fit of the
    regression line to the data
  • R2 can be used to compare the explanatory power
    of different regression models (with the same
    dependent variable)
  • R2 has to be interpreted in the light of what we
    would expect
  • A low R2 does not necessarily mean that an OLS
    regression equation is useless

16
Computer Class
  • Regression Analysis

17
Exercises
  • Weimar elections Unemployment and votes for the
    Nazi
  • A) Descriptive Statistics
  • Get the dataset about the Weimar election of
    1932 at http//www.nuff.ox.ac.uk/users/studer/tea
    ching.htm
  • Look at the variables (votes for the Nazi party,
    level of unemployment) in turn
  • Get a first visualisation of the data does it
    look normally distributed?
  • Compute the mean, median, standard deviation,
    coefficient of variation, kurtosis and skewness
    for the variable
  • Make a scatter plot Do you think the two
    variables are associated? How and how strongly?
  • B) Regression Analysis
  • Which one is the dependent variable?
  • Estimate the regression line
  • What is the interpretation?
  • What is the explanatory power of the regression?
  • Draw a scatter plot and add the regression line

18
Exercises (II)
2. Weimar elections Nazi votes and the share of
Catholics ? Do the same exercises as with the
unemployment rate A) Descriptive
Statistics B) Regression Analysis
19
Appendix STATA Commands
  • regress depvar indepvars Linear regression the
    dependent and independent variables are
    indicated by the order. The dependent variable
    depvar comes first, then the independent
    variable(s) indepvar follow
  • predict yhat, xb Calculates the linear
    prediction for each observation, i.e. yhati
    abindepvari
  • predict res, resid The option resid behind the
    comma let STATA calculate and save the
    residuals, i.e. resiyi- yhati
  • generate newvarexp Creates a new variable.
    STATA provides numerous functions, see STATAs
    Help on Functions and expressions/math
    functions

20
Homework
  • Readings
  • Feinstein Thomas, Ch. 5
  • Problem Set 4
  • Finish the exercises from todays computer class
    if you havent done so already. Include all the
    results and answers in the file you send me
  • On the next slide you see part of the
    macroeconomic dataset we used in week 3. Look at
    the variables GDP per head and education,
    assuming that GDP per head is the dependent
    variable and education the explanatory variable.
    Calculate by hand
  • The regression coefficient, b
  • The intercept, a
  • The total sum of squares
  • The explained sum of squares
  • The residual sum of squares
  • The coefficient of determination
  • (hint graph the values first by hand and then
    look at tables 4.1 and 4.2)
  • Download the complete data set at
    http//www.nuff.ox.ac.uk/users/studer/teaching.htm
    (macro data) and calculate the regression line
    and the coefficient of determination
  • Interpret the results
  • Do the results differ from the correlation
    results?

21
Homework (II)
Country GDP per Head Agriculture Education
Norway 54360 1.60 81
Switzerland 49660 1.40 49
United States 39430 1.20 83
Brazil 3340 10.10 21
Iran 2340 13.70 21
Write a Comment
User Comments (0)
About PowerShow.com