Topic 13: Multiple Linear Regression Example - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Topic 13: Multiple Linear Regression Example

Description:

SAT Math. SAT Verbal. Gender (of interest for other reasons) Data for CS Example ... To effectively use this methodology in practice you need to understand how the ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 43
Provided by: georgep56
Category:

less

Transcript and Presenter's Notes

Title: Topic 13: Multiple Linear Regression Example


1
Topic 13 Multiple Linear Regression Example
2
Outline
  • Description of Example
  • Descriptive Summaries
  • Investigation of Various Models
  • Conclusions

3
Study of CS students
  • Computer science majors at Purdue have a large
    drop out rate
  • Can we find predictors of success
  • Predictors must be available at time of entry
    into program

4
Data available
  • GPA after three semesters
  • High school math grades
  • High school science grades
  • High school English grades
  • SAT Math
  • SAT Verbal
  • Gender (of interest for other reasons)

5
Data for CS Example
  • Y is grade point average
  • 3 HS grades and 2 SATs are the explanatory
    variables (p6)
  • Have n224 students

6
Descriptive Statistics
Data a1 infile 'C\...\csdata.dat' input
id gpa hsm hss hse satm satv
genderm1 proc means dataa1 maxdec2 var gpa
hsm hss hse satm satv run
7
Output from Proc Means
Var N Mean Std Dev gpa 224 2.64 0.78 hsm
224 8.32 1.64 hss 224 8.09 1.70 hse 224
8.09 1.51 satm 224 595.29 86.40 satv 224 504.55
92.61
8
Output from Proc Means
Var Minimum Maximum gpa 0.12
4.00 hsm 2.00 10.00 hss
3.00 10.00 hse 3.00
10.00 satm 300.00 800.00 satv
285.00 760.00
9
Descriptive Statistics
proc univariate dataa1 var gpa hsm hss hse
satm sata histogram gpa hsm
hss hse satm sata /normal run
10
GPA
11
High School Math
12
High School Science
13
High School English
14
SAT Math
15
SAT Verbal
16
Interactive Data Analysis
  • Click on menu
  • Solutions -gt analysis -gt interactive data
    analysis
  • Obtain SAS/Insight window
  • Open library work
  • Click on Data Set A1 (if it exists)
  • Open

17
Scatter Plot Matrix
  • (shift) Click on GPA, SATM, SATV
  • Go to menu Analyze
  • Choose option Scatterplot(XY)
  • Try some other options

18
Scatter Plot Matrix
19
Correlations
proc corr dataa1 var hsm hss hse proc corr
dataa1 var satm satv proc corr dataa1
var hsm hss hse satm satv with gpa run
20
Output from Proc Corr
hsm hss hse hsm 1.00 0.57 0.44
lt.0001 lt.0001 hss 0.57 1.00 0.57
lt.0001 lt.0001 hse 0.44 0.57
1.00 lt.0001 lt.0001
21
Output from Proc Corr
satm satv satm 1.00 0.46
lt.0001 satv 0.46 1.00 lt.0001
22
Output from Proc Corr
hsm hss hse gpa 0.43 0.32 0.28
lt.0001 lt.0001 lt.0001 satm satv gpa
0.25 0.11 0.0001 0.0873
23
Use High School Grades to predict GPA
proc reg dataa1 model gpahsm hss hse
24
R-Square 0.2046 Par St Var DF Est
Err t P Int 1 0.58 0.29 2.00 0.0462 hsm 1
0.16 0.03 4.75 lt.0001 hss 1 0.03 0.03 0.91
0.3619 hse 1 0.04 0.03 1.17 0.2451
25
CS ANOVA Table
Sum of Mean Source DF Squares
Square F Model 3 27.71 9.23 18.86 Error
220 107.75 0.48 Total 223 135.46
P-value lt .0001
26
Remove HSS
proc reg dataa1 model gpahsm hse
27
R-Square 0.2016 Par St Var DF
Est Err t P Int 1 0.62 0.29 2.14
0.0335 hsm 1 0.18 0.03 5.72 lt.0001 hse 1 0.06
0.03 1.75 0.0820
28
Rerun with HSM only
proc reg dataa1 model gpahsm
29
R-Square 0.1905 Par St Var DF Est
Err t P Int 1 0.90 0.24 3.73 0.0002 hsm
1 0.20 0.02 7.23 lt.0001
30
SATs
proc reg dataa1 model gpasatm satv
31
R-Square 0.0634 Par St Var DF
Est Err t P Int 1 1.28 0.37 3.43
0.0007 satm 1 0.00 0.00 3.44 0.0007 satv 1
-0.00 0.00 -0.04 0.9684
32
HS and SATs
proc reg dataa1 model gpasatm satv
hsm hss hse Does general linear test
sat test satm, satv hs test hsm, hss, hse
33
R-Square 0.2115 Par St Var DF
Est Err t P Int 1 0.32 0.40 0.82
0.4149 satm 1 0.00 0.00 1.38 0.1702 satv 1
-0.00 0.00 -0.69 0.4915 hsm 1 0.14 0.03 3.72
0.0003 hss 1 0.03 0.03 0.95 0.3432 hse 1
0.05 0.03 1.40 0.1637
34
Test sat Results for Dep Var gpa
Mean Source DF Square F Pr gt F Num
2 0.46566 0.95 0.3882 Den 218 0.49000
35
Test hs Results for Dep Var gpa
Mean Source DF Square F P Num 3
6.68660 13.65 lt.0001 Den 218 0.49000
36
Best Model?
  • Likely the one with just HSM.
  • Well discuss comparison methods in Chapters 7
    and 8

37
Key ideas from case study
  • First, look at graphical and numerical summaries
    for one variable at a time
  • Then, look at relationships between pairs of
    variables with graphical and numerical summaries.
  • Use plots and correlations

38
Key ideas from case study
  • The relationship between a response variable and
    an explanatory variable depends on what other
    explanatory variables are in the model
  • A variable can be a significant (Plt.05) predictor
    alone and not significant (Pgt0.5) when other Xs
    are in the model

39
Key ideas from case study
  • Regression coefficients, standard errors and the
    results of significance tests depend on what
    other explanatory variables are in the model

40
Key ideas from case study
  • Significance tests (P values) do not tell the
    whole story
  • Squared multiple correlations give the proportion
    of variation in the response variable explained
    by the explanatory variables) can give a
    different view
  • We often express R2 as a percent

41
Key ideas from case study
  • You can fully understand the theory in terms of Y
    Xß ?
  • To effectively use this methodology in practice
    you need to understand how the data were
    collected, the nature of the variables, and how
    they relate to each other

42
Background Reading
  • Cs2.sas contains SAS commands
  • used in the topic
Write a Comment
User Comments (0)
About PowerShow.com