Correlation and Multiple Regression

About This Presentation

Title:

Correlation and Multiple Regression

Description:

Coefficients: Effect of X on log odds. Standard errors ... Odds Ratio. Note that these Chi-square values are the square of the standard t-ratios ... – PowerPoint PPT presentation

Number of Views:262

Avg rating:3.0/5.0

Slides: 118

Provided by: educa66

Category:

more less

Transcript and Presenter's Notes

Title: Correlation and Multiple Regression

1
Correlation and Multiple Regression

Robert K. Toutkoushian
Associate Professor
Educational Leadership and Policy Studies
Indiana University

2
Objectives of Module

Review statistical procedures such as correlation
and multiple regression analysis
Examine ways in which these procedures can be
applied to institutional research
Practice using SPSS to implement these procedures
Discuss more involved procedures and applications

3
My Approach

Aim for a middle ground in terms of difficulty
(higher UG/lower G level)
Focus more on intuition behind procedures rather
than proofs derivations
Assume familiarity with descriptive stats and
hypothesis testing
STRONGLY encourage questions at any time!

4
Covariance and Correlation

Both measure the extent to which two variables
move together. They differ only in units of
measure
Positive covariance/correlation Both variables
tend to move in the same direction
Negative covariance/correlation Both variables
tend to move in the opposite direction

5
(No Transcript)
6
Remember...

When looking for correlations, you may have to
first reorder one of the variables
If two variables are related, then knowing the
value of one variable may help with guesses as to
the value of the other (e.g., retention and
SAT/ACT scores)
Correlation does not imply causation!

7
Calculating the Covariance

Calculate the means for X and Y (denoted x-bar
and y-bar)
Subtract the mean for X from each X value and
repeat for Y
Multiply the differences together for each
observation, then sum and divide by degrees of
freedom (n-1)

8
Covariance -132,000/(4-1) -44,000
9
Correlation Coefficient

Properties of correlation coefficient
A standardized measure of covariance that
ranges between -1 and 1
Positive 0 lt r ? 1
Negative -1 ? r lt 0
No correlation r 0
Cov(x,y) and r will have the same sign
Stronger relationship as r moves away from zero

10
Calculating Correlation Coefficient

Calculate cov(x,y) as before
Calculate st. devs for X and Y
Divide cov(x,y) by product of standard deviations

11
(No Transcript)
12
Institutional Research Applications

Correlations can be useful in IR when one
variable of interest is unobservable, and a
correlated variable is observable
College performance (correlated with HS
performance)
Faculty experience (correlated with age, years
since degree)
Teaching quality (correlated with student
evaluations)

13
Limitations

Weak correlations are less useful for making
inferences
Correlations vary across factors, so it is
difficult to compare across factors (e.g., stock
prices and faculty salaries)
May be multiple factors affecting a single factor
of interest
Does not measure non-linear relationships

14
Class Example 1

Filename TUITION.SAV contains data on average
public tuition rates, state appropriations, and
median family income by state in 1994. In SPSS
Calculate the means and standard deviations for
these three variables.
Calculate the covariances and correlations
between state appropriations and (a) public
tuition rates, (b) median family income.

15
(No Transcript)
16
Linear Regression (OLS)

Objective find the best linear (straight line)
relationship between two or more variables.
Ordinary Least Squares (OLS) is the technique
most often used to choose the best line.
This linear relationship is based on the
covariance between two variables.
Regression analysis requires the analyst to
specify the direction of causation.

17
Advantages of Linear Regression

Can predict/forecast one variable (Y) based on
values of another variable (X)
Can perform hypothesis tests to determine if X
affects Y
Can control for differences in Y due to X
Very flexible with regard to functional form,
model specification, etc.

18
Example Gender Equity in Salaries

Your President asks you to examine faculty
salaries at your institution and determine if
there is a gender equity problem. Descriptive
stats show that on average men earn more than
women.
How can you control for salary differences due to
justifiable factors such as experience,
productivity?
How can you determine if the remaining pay
difference is large enough to conclude that this
is a problem?

19
Ordinary Least Squares
20
Three Formulations

Slope ß in population, b in sample
Error term (e or e) encompasses effects of all
omitted factors
Parameters in the population model is
unobservable
Sample line is what you estimate with OLS

21
Assumptions in Linear Regression

The error term has a mean of zero and constant
variance
The errors are unrelated to each other
The errors are unrelated to the independent
variable(s) in the model
The error term is normally distributed (needed
for hypothesis testing)

22
Ordinary Least Squares
OLS specifies that the best line is the one
that minimizes the sum of squared errors (
minimize S ei2 )
Intercept (a)
23
Notes on OLS

The slope formula is the covariance between X and
Y divided by the variance for X
The slope and covariance will always have the
same sign
b gt 0 indicates a positive relationship
b lt 0 indicates a negative relationship
b 0 indicates no linear relationship

24
Example An IR analyst is asked to help forecast
applications. She believes there is a
relationship between HS grads and resident
applications each year
25

Regression line Y -358.28 0.29X
Interpretation For each additional HS grad,
predicted applications will rise by 0.29.
The intercept may not have much meaning.
Can predict applications given projections of HS
grads. If HS grads 36,000, then
Y -358.28 0.29(36,000) 10,082

26
Goodness of Fit

Measures the strength of the relationship between
X and Y
R-squared (or coefficient of determination)
proportion of total deviation in Y that is
explained by X(s)
R-squared is bounded between 0 and 1 (R2 1 if
perfect fit, R2 0 if no fit)
R-squared square of correlation coefficient
(with only one X variable in the model)

27
More on R-squared...

When there is no covariance, the slope of the
regression line is zero and R2 0.
Adding variables to the regression model will
almost always raise R2, but this does not mean
that the resulting model is better
Adjusted R2 attempts to correct for this, but no
longer has the same interpretation
R2 varies depending on the dependent variable.
Do not use this to compare regression models with
different Ys.

28
Predicting Resident Applications
Note that HS grads account for 88.5 of the total
deviation in applications.
29
Class Example 2

Using TUITION.SAV, in SPSS
Calculate a regression line showing how median
income affects average tuition
Calculate R2, TSS, RSS, ESS, and corr(x,y). SPSS
syntax
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIAPIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT tuition
/METHODENTER income .

30
(No Transcript)
31
Equation Tuition 313.119 0.0719Income
32
Hypothesis Testing for ß

In most situations in the social sciences, it is
rarely known for sure if X affects Y
A hypothesis test can be used to determine if the
data provide sufficient evidence of a
relationship
For most variables, the sample slope b will not
exactly equal zero. How far from zero must it be
in order to safely conclude that ß ? 0??

33
Steps in Hypothesis Testing

Specify null (H0) and alternative (HA)
hypotheses
Identify test statistic and find critical
value(s) based on degrees of freedom and
significance level
Calculate test statistic and compare to critical
value(s)

34
Common Hypotheses for ß

ß 0 (X has no effect on Y)
ß gt 0 (X has a positive effect on Y)
ß lt 0 (X has a negative effect on Y)
ß ? 0 (X has some effect on Y or - )
Choose two hypotheses that are mutually exclusive
and exhaustive.
The null hypothesis (H0) should always contain
some form of equal sign.

35
Test Statistic for ß
If e N(0, s2), then b N(ß, Var(b))
Therefore the t-ratio
Will follow a Student t-distribution with n-k
degrees of freedom (k parameters to be
estimated)
The t-ratio is defined as the random variable
minus its mean (when H0 is true), divided by its
standard deviation.
36
Notes on Hypothesis Testing

The t-ratio simply counts the standard
deviations the slope is from zero (distance)
The greater the distance, the less likely you
would have found the value of b if ß 0.
For significance tests, since ß 0, the t-ratio
is the slope divided by its standard deviation
(or standard error)

37
Example t-ratio of 2.40
This shows that there is only a 1.3 chance of
finding a t-ratio of 2.40 or greater if in fact ß
0. Therefore, if you found a t-value this
high, it is unlikely that ß 0.
38
R2 0.025
TSS 1.9E10 ESS 1.9E10 se v(ESS/826) 4766
39
Do undergraduate enrollments have a significant
effect on average costs per student? Null
Hypothesis ß 0, Alternative Hypothesis ß ?
0 For 826 df, 1 significance level, reject the
null when the calculated t-ratio exceeds 2.575 in
absolute value.
40
P-value Probability of drawing a more extreme
sample value given that the null hypothesis is
true P-value Pr(b lt -0.175) Pr(t lt -4.577)
0.000
41
Units of Measurement

The significance levels of any variable will not
be influenced by the units of measure used for X
or Y
The coefficient represents the units change in
Y due to a one-unit change in X
When the units of measure change, both the
coefficients and standard errors change
proportionately (t-ratio remains the same)

42
Out-of-Sample Forecasts/Predictions

The regression model can be used to derive
predictions of Y given values of X(s)
Point estimates are found by substituting X into
the equation and solving for Y (I predict that
the grad rate will be 70)
Interval estimates are predictions that Y will
fall within a certain interval (I am 95 certain
that the grad rate will be between 68 and 72)
Interval estimates are more conservative, and
convey the uncertainty in predictions.

43
Two Types of Intervals

C.I. For expected value (mean) of Y
For given X, what is the predicted average value
of Y
C.I. For a single value of Y
For given X, what is the predicted single value
of Y (more uncertainty, so wider interval)
The two methods yield very similar intervals.
Most IR applications use C.I.s for single value.
Intervals can be obtained in SPSS using the
save subcommand.

44
Predict HS Grads in New Hampshire

An IR analyst is charged with developing a model
to help predict changes in HS grads in the state
through 2006.
File AIR1.SAV in SPSS has two vars HS grads in
year t (HSGRAD), and 2nd grade enrollments in
year t-10 (GRADE2).
Find correlation between HSGRAD and GRADE2
Estimate a regression model
Form point and 95 CI estimates of high school
grads for the next ten years.

45
Under statistics gt correlate gt bivariate
Note that r 0.959, cov(x,y) 701,160 (n12)
46
Under statistics gt regression gt linear
47
In 2006, the model predicts there will be 14,919
high school grads
95 certain that in 2006 there will be between
14,185 and 15,652 high school grads
48
(No Transcript)
49
Multiple Regression Analysis

In most IR applications, the dependent variable
may be influenced by multiple factors
Grad rate f(avg. SAT, gender composition, avg.
HS rank, students on campus,...)
Faculty Salary f(education, experience,
productivity, field,...)
Education Costs f(enrollments, research
intensity, student/faculty ratio,...)

50
Assumptions in Multiple Regression

Error term has a mean of zero and constant
variance
Error terms are unrelated to each other
Error term is unrelated to independent vars
Error term is normally distributed
Independent variables are not collinear with each
other (no multicollinearity)

51
Ordinary Least Squares
52
Least Squares Estimates

The coefficients are referred to as partial
effects because they show the effect of one
variable on Y holding other vars constant.
The OLS formula takes into account any
relationships between the X variables. For this
reason, the coefficients usually change when
variables are dropped/added from model.

53
Other Stats in Multiple Regression

Hypothesis tests for significance of coefficients
can be performed as before, except degrees of
freedom change (n-k-1).
Goodness of fit measures are calculated as
before. R-squared now represents the deviation
in Y explained by all Xs together. Thus, R2
usually rises as Xs are added.
Confidence intervals and point estimates can be
calculated as before.

54
Example Average Public Tuition Rates

An IR analyst is asked to help explain why there
are variations across states in their tuition
rates at public institutions. She feels that
factors such as state aid given to students and
state appropriations help account for these
differences.
Open the file TUITION.SAV in SPSS.

55
Question 1 How do state appropriations affect
average tuition?
State appropriations account for 13.5 of
differences in tuition.
56
Question 2 How do state appropriations and aid
to students affect average tuition?
These two variables account for 40.4 of
differences in tuition.
A 1 increase in appropriations reduces tuition
by 22.6 cents, holding constant state aid per
student.
57
Extensions of Regression Model

So far, we have only considered linear models
where Xs and Ys were continuous. We will now
examine how to handle
Categorical Xs
Interactions among Xs
Non-linear relationships between X and Y

58
Categorical Variables

There are many examples of independent variables
that are not numerical (ex gender, race,
institution attended, attitudes/beliefs)
Likert scale variables (assign s to
categorical responses) should not be used in
regression models in their present form due to
problems in interpreting changes in units.
Slope units change in Y due to a one-unit
change in X (but Likert s are artificial)

59
Dummy Variables

However, categorical Xs can be used if they are
first recoded into dummy variables
Dummy variable has only two values (0,1)
Need to specify an assignment rule. Can be used
for categorical, Likert, and continuous
variables.
The variable can now be used in regression
It does not matter which group is assigned 1
Coef represents the difference in intercepts for
the two groups
Must omit one of the dummy variables for a
construct to avoid multicollinearity

60
Examples of Assignment Rules

Let X 1 if (0 otherwise)
Teaches in Psychology Department
Enrolled in public university
Family income exceeds 100,000
Student is very satisfied with the quality of
instruction
Student graduated from campus
Student dropped out of college

61
Note Both equations have the same slope for RANK.
Question Does living on campus matter?
62
Variable Interactions

It is possible that the joint occurrence of two
Xs has an effect on Y separate from each Xs
effect
Academic performance of students with high SAT
scores and HS ranks
State appropriations for higher ed in states with
low incomes and high tax rates
The salary increase from promotions for men and
women may be different

63
Interactions (contd)

In these examples, there is something special
about the joint occurrence of two variables.
To test these assertions, an interaction
variable can be created and added to the
regression model.
Interaction variables are created by defining a
third variable as the product of the two
variables in question.

64
The interaction variable is then added to the
regression model and treated as any other
variable
To find the effect of x1 on y, you need to
differentiate the equation with respect to x1
65
Non-linear Functional Forms

Regression analysis can also be used in
situations where X has a non-linear relationship
with Y
Linear The change in Y due to a one-unit change
in X is constant.
Non-linear The change in Y due to a one-unit
change in X can vary with the level of X.

66
Graphs of Non-linear Functions
Y
Y
X
X
Exponential Y exp(X)
LogarithmicY ln(X)
67
Graphs of Quadratic Functions
Y
Y
X
X
Maximize Y
Minimize Y
68
Possible IR Examples

Exponential Implies that as X increases, Y
increases at a faster rate.
Y salary, X years of experience
Logarithmic Implies that as X increases, Y
increases at a slower rate.
Y college GPA, X hrs/week studying
Y retention rate, X avg. student SAT score

69
Possible IR Quadratic Examples

Maximize Y There is some value of X at which Y
is maximized.
Y Tuition revenue, X tuition rate
Y Student gains, X class size
Minimize Y There is some value of X at which
Y is minimized.
Y costs/student, X enrollments

70
Using Non-linear Functions

Regression analysis requires a linear
relationship between X and Y.
When there is a non-linear relationship, you can
transform one or more variables and then use the
transformed variables in the regression model.
As long as there is a linear relationship between
the transformed variables, regression analysis is
appropriate.

71
Exponential Transformations

The coefficient estimate for ß represents the
approximate percentage change in Y due to a
one-unit increase in x.
The variable x always has the same directional
effect on Y (positive or negative)
The change in Y due to a change in x increases
at an increasing rate

72
Natural Log Function
The natural log function is the inverse of the
exponential function ln (exp (X)) X
73
Logarithmic
This can also be used for a subset of Xs.
74
Double-Log Function Elasticities
75
Quadratic Transformations
If X is believed to have a quadratic effect on Y,
then create a new variable as the square of X and
add this to the regression model
The change in Y due to a one-unit change in X1
would be found by differentiating the equation
with respect to this variable
Hill-shaped if ß3 lt 0, U-shaped if ß3 gt 0, linear
if ß3 0
76
More on Quadratic Functions

The value of X that maximizes or minimizes Y can
have important implications. This is found by
solving for X in the first-derivative.
Higher-order functions (ex., cubic) can also be
used in regression. They can yield better
representations of relationships, but are harder
to explain and interpret.

77
SPSS Exercise Faculty Salaries

An IR analyst is asked to investigate if female
faculty are paid less than comparable males. She
draws a sample of 432 faculty and creates these
variables
Salary monthly base salary (in dollars)
Rank 1 if Full, 2 if Associate, 3 if Assistant
Gender if if male, 0 otherwise
Prevexp days of experience before current job
Npleave days of non-professional leave
Potenexp days since highest degree
Nine12 1 if nine-month appointment, 0 otherwise
Cite85 Citations in 1985 to all publications

78
Tasks

Open the SPSS system file FACSAL.SAV
Estimate a regression model showing how gender
affects salary. How do these results compare to
a two-sample t-test?
How do your findings change when potential
experience and citations are added?
An economist argues that salaries rise
exponentially with potential experience,
citations, and gender. How can this be addressed?

79
Answer to first task...
Note mean difference is 916, which has a
t-value of 6.227 and is significant.
80
(No Transcript)
81
Answer to second task...
82
Answer to third task...
83
More Tasks...

The VP for Finance argues that individuals with
high experience levels often get smaller
percentage salary increases than others. How
could this be addressed (use same function as in
previous example)?
A female faculty member claims that women face
discrimination in part because they are rewarded
less for each citation they receive. How could
you test this?

84
Answer to fourth task...
85
Answer to fifth task...
86
Model Selection

For most IR problems, there are many alternative
models from which to choose. How should the
best model be selected?
Begin with published studies that look at the
same (similar) Ys. What variables and
functional forms do they use?
Is there a theory that can be used to guide
human capital theory gt salary models
median voter theory gt state funding for HE
Tintos model gt student retention

87
More model selection comments

Better to include too many factors than to omit
important variables (omitted variable bias)
Can estimate several competing model
specifications and compare results. Be careful
not to simply select model with the most
appealing results!
Keep in mind trade-off between simplicity and
accuracy. A simple model is worth its weight in
gold when explaining to decisionmakers!

88
Faculty Salary Example

Return to FACSAL.SAV and create a dummy variable
for full professors
Estimate a model explaining salary as a function
of gender, then gender and full professor.
Estimate a model explaining salary as a function
of gender, full professor, and potential
experience.

89
(No Transcript)
90
Problems in Regression Analysis

There are three main problems which may arise in
multiple regression
Autocorrelation
Heteroscedasticity
Multicollinearity
We will briefly discuss what each means, how they
can be detected, and what can be done about them
when they occur.

91
Autocorrelation

This can occur in time-series data when the error
in one period is related to the error in the
next.
Violates the assumption E(eiej) for i ? j
Causes the computer to calculate incorrect
standard errors, thereby affecting t-ratios.
Usually, st.errs are too small, so t-ratios are
too high (making X appear significant when it
isnt.)
Possible IR Examples Predicting applications, HS
grads, state funding for HE.

92
First-order autocorrelation
Error
et

0
-
Time
t4
t9
93
Durbin-Watson test
Calculates a d-statistic that reflects the
correlation among subsequent error terms
94
Correcting Autocorrelation

If autocorrelation is detected, it can be
corrected through transforming the data to yield
correct standard errors (generalized least
squares).
Cochrane-Orcutt or Prais-Winston two
commonly-used methods
Standard autocorrelation option in SPSS does
not do this. Use SPSS Trends or another program.
Keep in mind that autocorrelation affects the
standard errors and not coefficients.

95
Heteroscedasticity

May occur in cross-section data when the variance
of the error term is related to one or more
independent variable (si2 not constant).
Affects standard errors, and hence t-ratios (but
not coefficient estimates)
Potential IR examples
Effects of enrollments on average costs
Effect of tax revenues on state appropriations
Effect of program size on expenditures

96
Graph of Heteroscedasticity
Dependent variable
Regression line
Independent variable
As X increases, the possible errors become larger.
97
Testing for Heteroscedasticity

Visual Plot residuals against the variable
thought to be causing the problem.
Park-Glesjer test Estimate model and save
residuals. Regress the log of squared residuals
against the log of variable thought to cause the
problem.
Other tests White (1978), Goldfeld-Quandt.
SPSS will not do these by default (must do by
hand or with other software).

98
Correcting for Heteroscedasticity

Weighted least squares Weight observations by
the variable causing heteroscedasticity. However,
you must know the form.
For example, if si2 s2X1i, then weighting each
observation by the square root of X1 will yield
correct standard errors.
An option that does not require knowing the form
of heteroscedasticity is by White.

99
Multicollinearity

Multicollinearity arises when there is an
extremely high correlation between two or more
independent variables in the model.
The coefficients are biased the stats program
does not know how to assign proper weights
Standard errors increase, making t-ratios small

100
Multicollinearity (contd)

Potential IR examples include (1) effect of
current and previous experience on faculty
salaries, (2) effect of SAT score and high school
rank on academic performance, (3) effect of
family income and wealth on student demand for
higher education.
A significant correlation between Xs does not
necessarily lead to multicollinearity. Only when
the correlation is very high does this occur.

101
Testing/Correcting Multicollinearity

There is no universally-accepted test for
multicollinearity.
Variance inflation factors (VIF) estimate how
much the standard errors increase due to
correlation with other Xs. No single cutoff
point for VIFs.
Signs of multicollinearity include
Two similar variables have widely different
effects on Y (e.g., only one is signif.)
The standard errors are large
To test, drop one of the variables from the model
and compare results. If the coef and st. err.
change considerably, this may be a problem.

102
Correcting Multicollinearity

There is also no uniformly-accepted solution to
this problem. However, you can drop one of the
problem variables from the model.
Multicollinearity may not be an important issue
if the collinearity occurs between unimportant
variables.

103
Makin Multicollinearity

Return to the faculty salary data and create a
new variable
newpot potenexp / 365 (years of exper) and
add this to the regression model
Then, make slight changes to first two data
points change 27.02 to 13 and change 19.01
to 27.
Estimate regression model again, using gender,
potenexp, newpot

104
Using gender and potenexp
105
Using gender, potenexp and newpot
Variable POTENEXP drops out of the equation
because it is perfectly correlated with NEWPOT.
106
Using gender, potenexp, newpot (after changes)
Gender is significant throughout all three models
Standard errors are about forty-three times
larger than before!
107
Limited Dependent Variables

Thus far, we have considered instances where Y
was continuous and unbounded. However, there are
many situations where this is violated
Individual student data are often dichotomous
(0,1) variables 1 if graduate, 1 if return, 1 if
apply/enroll.
Some data are discrete counts number of journal
articles or citations, number of times a student
changes his/her major

108
Problems with OLS when Y is (0,1)

Predictions can be gt 1 or lt 0
Coefficients may be biased
Heteroscedasticity is present (s2 P(1-P))
Error term is not normally distributed (only two
possible values), so hypothesis tests are invalid
Of these problems, the last is the most severe.

109
Maximum Likelihood Estimation

In this instance, there are advantages to using a
technique (MLE) in place of OLS.
MLE Find the coefficients that maximize the
likelihood of generating the observations on Y in
the sample.
Recall that OLS chooses the coefficients based on
those that minimize the sum of squared errors.

110
Logit and probit analysis

When Y (0,1), the two most commonly-used
functional forms in MLE are the cumulative
logistic distribution (logit analysis or
logistic regression) and the cumulative normal
distribution (probit analysis).
The two choices usually yield similar results
Each avoids the four problems noted with OLS

111
Logistic regression
For logistic regression, the following functional
form is used
Ln P/(1-P) a b1X1 b2X2
where P probability that Y1
All you have to do, however, is create the dummy
variable for Y and tell SPSS to use logistic
regression to estimate the model. SPSS will
create the log odds ratio for you.
112
Interpreting results

The coefficients from logistic regression are
hard to interpret and explain.
Focus on the signs of the coefficients
If the sign is positive and significant, then as
X increases, the probability that Y1 will also
increase
If the sign is negative and significant, then as
X increases, the probability that Y1 will
decrease
If the coefficient is not significant, then X has
no effect on the probability that Y1.

113
Example Faculty Rank

Return to the faculty dataset, and estimate a
logistic regression model to explain whether a
faculty member is a Full professor (under
Regression / bivariate logistic)
Xs include gender, potenexp, prevexp, and cite85
Need to create a dummy variable for Full
Professor first
SPSS Probit module is different than used here.

114
Wald Chi-square statistic (coefficient /
standard error) 2
Note that these Chi-square values are the square
of the standard t-ratios
Coefficients Effect of X on log odds
Standard errors
Odds Ratio
P-value
115
Results from rank analysis

Since the coefficient for GENDER is positive and
significant, it means that men and more likely
than women to hold the rank of Full professor
after controlling for experience and citations.
The positive and significant coef for CITE85
means that a faculty member is more likely to be
a Full Prof as citations rise

116
Final Exam SATDATA.SAV
File contains data on 1,999 NH high school
seniors in 1996 who have taken the SAT

ASSOC 1 if highest planned degree AA
MA Masters
PHD Doctorate
MALE 1 if male
FIRSTGEN 1 if 1st generation
INCOME family income
INCOME2 income squared
PUBHS 1 if attend public high school

SATCOMB Combined SAT score
SATCOMB2 SAT squared
ANYAP 1 if taken any AP course
GRADEAVG high school GPA
UNH 1 if sent SAT score to UNH
KSC 1 if sent SAT score to KSC
PSC 1 if sent SAT score to PSC

117
Questions

How does family income, student ability, and
student intentions affect whether a student
submits SAT scores to UNH, KSC, or PSC?
Do SAT takers from poor families and/or first
generation families do worse on the SAT than
other students?

Write a Comment

User Comments (0)