Title: Topic 13: Multiple Linear Regression Example
1Topic 13 Multiple Linear Regression Example
2Outline
- Description of Example
- Descriptive Summaries
- Investigation of Various Models
- Conclusions
3Study of CS students
- Computer science majors at Purdue have a large
drop out rate - Can we find predictors of success
- Predictors must be available at time of entry
into program
4Data available
- GPA after three semesters
- High school math grades
- High school science grades
- High school English grades
- SAT Math
- SAT Verbal
- Gender (of interest for other reasons)
5Data for CS Example
- Y is grade point average
- 3 HS grades and 2 SATs are the explanatory
variables (p6) - Have n224 students
6Descriptive Statistics
Data a1 infile 'C\...\csdata.dat' input
id gpa hsm hss hse satm satv
genderm1 proc means dataa1 maxdec2 var gpa
hsm hss hse satm satv run
7Output from Proc Means
Var N Mean Std Dev gpa 224 2.64 0.78 hsm
224 8.32 1.64 hss 224 8.09 1.70 hse 224
8.09 1.51 satm 224 595.29 86.40 satv 224 504.55
92.61
8Output from Proc Means
Var Minimum Maximum gpa 0.12
4.00 hsm 2.00 10.00 hss
3.00 10.00 hse 3.00
10.00 satm 300.00 800.00 satv
285.00 760.00
9Descriptive Statistics
proc univariate dataa1 var gpa hsm hss hse
satm sata histogram gpa hsm
hss hse satm sata /normal run
10GPA
11High School Math
12High School Science
13High School English
14SAT Math
15SAT Verbal
16Interactive Data Analysis
- Click on menu
- Solutions -gt analysis -gt interactive data
analysis - Obtain SAS/Insight window
- Open library work
- Click on Data Set A1 (if it exists)
- Open
17Scatter Plot Matrix
- (shift) Click on GPA, SATM, SATV
- Go to menu Analyze
- Choose option Scatterplot(XY)
- Try some other options
18Scatter Plot Matrix
19Correlations
proc corr dataa1 var hsm hss hse proc corr
dataa1 var satm satv proc corr dataa1
var hsm hss hse satm satv with gpa run
20Output from Proc Corr
hsm hss hse hsm 1.00 0.57 0.44
lt.0001 lt.0001 hss 0.57 1.00 0.57
lt.0001 lt.0001 hse 0.44 0.57
1.00 lt.0001 lt.0001
21Output from Proc Corr
satm satv satm 1.00 0.46
lt.0001 satv 0.46 1.00 lt.0001
22Output from Proc Corr
hsm hss hse gpa 0.43 0.32 0.28
lt.0001 lt.0001 lt.0001 satm satv gpa
0.25 0.11 0.0001 0.0873
23Use High School Grades to predict GPA
proc reg dataa1 model gpahsm hss hse
24 R-Square 0.2046 Par St Var DF Est
Err t P Int 1 0.58 0.29 2.00 0.0462 hsm 1
0.16 0.03 4.75 lt.0001 hss 1 0.03 0.03 0.91
0.3619 hse 1 0.04 0.03 1.17 0.2451
25CS ANOVA Table
Sum of Mean Source DF Squares
Square F Model 3 27.71 9.23 18.86 Error
220 107.75 0.48 Total 223 135.46
P-value lt .0001
26Remove HSS
proc reg dataa1 model gpahsm hse
27R-Square 0.2016 Par St Var DF
Est Err t P Int 1 0.62 0.29 2.14
0.0335 hsm 1 0.18 0.03 5.72 lt.0001 hse 1 0.06
0.03 1.75 0.0820
28Rerun with HSM only
proc reg dataa1 model gpahsm
29R-Square 0.1905 Par St Var DF Est
Err t P Int 1 0.90 0.24 3.73 0.0002 hsm
1 0.20 0.02 7.23 lt.0001
30SATs
proc reg dataa1 model gpasatm satv
31R-Square 0.0634 Par St Var DF
Est Err t P Int 1 1.28 0.37 3.43
0.0007 satm 1 0.00 0.00 3.44 0.0007 satv 1
-0.00 0.00 -0.04 0.9684
32HS and SATs
proc reg dataa1 model gpasatm satv
hsm hss hse Does general linear test
sat test satm, satv hs test hsm, hss, hse
33R-Square 0.2115 Par St Var DF
Est Err t P Int 1 0.32 0.40 0.82
0.4149 satm 1 0.00 0.00 1.38 0.1702 satv 1
-0.00 0.00 -0.69 0.4915 hsm 1 0.14 0.03 3.72
0.0003 hss 1 0.03 0.03 0.95 0.3432 hse 1
0.05 0.03 1.40 0.1637
34Test sat Results for Dep Var gpa
Mean Source DF Square F Pr gt F Num
2 0.46566 0.95 0.3882 Den 218 0.49000
35Test hs Results for Dep Var gpa
Mean Source DF Square F P Num 3
6.68660 13.65 lt.0001 Den 218 0.49000
36Best Model?
- Likely the one with just HSM.
- Well discuss comparison methods in Chapters 7
and 8
37Key ideas from case study
- First, look at graphical and numerical summaries
for one variable at a time - Then, look at relationships between pairs of
variables with graphical and numerical summaries.
- Use plots and correlations
38Key ideas from case study
- The relationship between a response variable and
an explanatory variable depends on what other
explanatory variables are in the model - A variable can be a significant (Plt.05) predictor
alone and not significant (Pgt0.5) when other Xs
are in the model
39Key ideas from case study
- Regression coefficients, standard errors and the
results of significance tests depend on what
other explanatory variables are in the model
40Key ideas from case study
- Significance tests (P values) do not tell the
whole story - Squared multiple correlations give the proportion
of variation in the response variable explained
by the explanatory variables) can give a
different view - We often express R2 as a percent
41Key ideas from case study
- You can fully understand the theory in terms of Y
Xß ? - To effectively use this methodology in practice
you need to understand how the data were
collected, the nature of the variables, and how
they relate to each other
42Background Reading
- Cs2.sas contains SAS commands
- used in the topic