Logistic regression Results from factor analysis Multilevel analysis Last survey issues online next - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Logistic regression Results from factor analysis Multilevel analysis Last survey issues online next

Description:

logistic regression: example case. We hebben gegevens over geboortes van bijna ... KURTOSIS SEKURT /HISTOGRAM /ORDER= ANALYSIS . LOGISTIC REGRESSION babylicht ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 52

Provided by: csnij

Category:

more less

Transcript and Presenter's Notes

Title: Logistic regression Results from factor analysis Multilevel analysis Last survey issues online next

1
Logistic regressionResults from factor
analysisMulti-level analysisLast survey
issues (online next week)
2
AMMBR week 6

logistic regression example case
We hebben gegevens over geboortes van bijna 200
baby's. Een deel van deze baby's heeft een
gevaarlijk laag geboortegewicht (1laag,
0niet-laag). De vraag is welke factoren
samenhangen met een laag geboortegewicht.

3
Logistic regression assignment what steps?

1. Look at the descriptives. Any transformations
necessary/useful?
2. Is there a problem of multicollinearity
between the (transformed) independent variables?
3. Sampling adequacy? Are all cells in the
multivariate tables of the categorical variables
filled with enough cases?
4. Were the cases sampled independently?
5. Are there any relevant 'third' variables
missing?
6. Does the chosen model fit to the data?

4
Logistic regression assignment what steps?

7. Any interaction effects?
8. Any curvilinear relationships?
9. Any outliers?
10. Homogeneity of the error variance?
11. Interpretation of the results of the finally
chosen model

5
1. Descriptives/transformations
6
1. Descriptives/transformations
new_weight1/weight
7
1. Descriptives/transformations
ln_ageln(age)
8
1. Descriptives/transformations
?
RECODE naararts (00) (1 thru Highest1)
(ELSESYSMIS) INTO new_naararts . EXECUTE .
9
two dummy variables for race

RECODE
ras
(21) (10) (30) INTO black .
EXECUTE .
RECODE
ras
(11) (20) (30) INTO white .
EXECUTE .

10
2. multicollinearity?

no strong bi-variate correlations between the
x-variables
REGRESSION
/MISSING LISTWISE
/STATISTICS COLLIN TOL
/CRITERIAPIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT babylicht
/METHODENTER black white rookt new_weight
new_naararts hypertensie ln_age
no VIF10, all tolerances gt 0.2

11
3. sampling adequacy

frequencies and (higher order!) cross-tabs
only 12 cases with hypertension
no analysis of interaction effects with
hypertension possible
Check standard error of estimate carefully

12
4. independent sampling

here we have to assume independent sampling...
if you are doing 'real' research you have to
check...

13
5. all relevant 'third' variables?

how do we know? ? theory
importance of theory guided data collection
here we have to assume....

14
6. model fit (simple model)

LOGISTIC REGRESSION babylicht
/METHOD ENTER black white ln_age new_weight
rookt new_naararts hypertensie
/SAVE PRED COOK ZRESID
/PRINT GOODFIT
/CRITERIA PIN(.05) POUT(.10) ITERATE(20)
CUT(.5) .

15
6. model fit (simple model)
Nagelkerke's R-Square 18.2
16
7. interaction effects

COMPUTE smoke_arts rookt new_naararts .
compare the difference in fit between simple
model and the model with interaction
Chi-Square26.193-26.1630.03, df1, pgt0.5
no significant improvement

17
7. interaction effects

COMPUTE smoke_arts rookt new_naararts .

18
8. linear relationship?

COMPUTE square_age ln_age ln_age .
EXECUTE .
LOGISTIC REGRESSION babylicht
/METHOD ENTER black white new_weight rookt
new_naararts hypertensie square_age
/SAVE PRED COOK ZRESID
/PRINT GOODFIT
/CRITERIA PIN(.05) POUT(.10) ITERATE(20)
CUT(.5) .
Note 1 replacement of (transformed) age by age2
Note 2 no 'nested' models

19
8. linear relationship?
Model fit not better than fit of simpler (linear)
model Note You cannot take the chi-square
difference as test-statistic
20
8. linear relationship?
COMPUTE weight_square (new_weight
new_weight)10000. EXECUTE .
is the effect of weight linear or quadratic?
21
9. Outliers

FREQUENCIES
VARIABLESZRE_1
/STATISTICSSTDDEV RANGE MINIMUM MAXIMUM MEAN
MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER ANALYSIS .
LOGISTIC REGRESSION babylicht
/METHOD ENTER black white rookt new_weight
new_naararts hypertensie
ln_age
/SAVE PRED COOK LEVER DFBETA ZRESID
/CLASSPLOT /CASEWISE OUTLIER(2)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20)
CUT(.5) .

22
9. Outliers
Residuals and influence statistics
23
9. Outliers
Plot of residuals against probabilities
(classplot)
24
9. Outliers
Step number 1 Observed Groups
and Predicted Probabilities 16 ô

ô ó
ó ó
ó F
ó
ó R 12 ô 1
ô E
ó 1
ó Q ó 1 1
ó U ó
10 0 1
ó E 8 ô 1 1 00 0 1 1
ô N ó 01 1
1 00 0 1 1 1
ó C ó 0000 0 11010010 1 1 1 1
ó Y ó 0000000
111010010 110 1 0 1
ó 4 ô 0000000 100010010 11011 0 10
1 ô ó 0000000
00000000010001000 10 1 1
ó ó 0000000000000000010000000 10 11
1 1 1 ó ó
00000000000000000000000000100000 00 1 11 1 0 0
ó Predicted òòòòòòòòòòòòòòôòòòòòòòòòòòòòòô
òòòòòòòòòòòòòòôòòòòòòòòòòòòòòò Prob 0
,25 ,5 ,75
1 Group 00000000000000000000000000000011111111
1111111111111111111111 Predicted
Probability is of Membership for 1 The
Cut Value is ,50 Symbols 0 - 0
1 - 1 Each Symbol Represents
1 Case.
25
9. Influential Outliers?

FREQUENCIES
VARIABLESCOO_1 LEV_1 DFB0_1 DFB1_1 DFB2_1
DFB3_1 DFB4_1 DFB5_1 DFB6_1
DFB7_1
/STATISTICSSTDDEV MINIMUM MAXIMUM MEAN
SKEWNESS SESKEW
/HISTOGRAM
/ORDER ANALYSIS .

Crit val lt1
Crit val 3(k1)/n.127
26
9. Influential Outliers
Crit val 1

27
9. Influential Outliers
28
9. Influential Outliers
29
10. Variance homogeneity

GRAPH
/SCATTERPLOT(BIVAR)PRE_1 WITH ZRE_1
/MISSINGLISTWISE .

30
11. Interpretation of chosen model
31
11. Interpreting the coefficients

In multiple regression
y c0 c1x1 c2x2
the c1 resembles the relation between x1 and
y, irrespective of the value of x2.
In logistic regression this is no longer the case
-gt In logistic regression, the size of the
effect depends on the values of the other
independent variables.

MULTI LEVEL ANALYSIS

33
Multi-level models
Bayesian hierarchical models mixed models
hierarchical linear models random effects
models random coefficient models subject
specific models variance component models
variance heterogeneity
dealing with clustered data. One solution the
variance component model
34
Clustered data / multi-level models

Pupils within schools
(within regions within countries)
Firms within regions (or sectors)
In the ICOP case more than one observation per
person

General issue observations are not independent
35
On individual vs aggregate data
For instance X extraversion X
innovativeness of building Y school results Y
extent to which building is liked
36
Had we only known, that the data are clustered!

So the effect of X within clusters can be
different from the effect between clusters!

37
MAIN MESSAGE

Make sure that you discern between two kinds of
effects those at the "micro-level" vs those at
the aggregate level and ...
... that you do not test a micro-hypothesis with
aggregate data (or vice versa)

38
A toy example two schools, two pupils
Two schools each with two pupils. We first
calculate the means.
(taken from Rasbash)
Overall mean (32(-1)(-4))/40
39
Now the variance
The total variance is the sum of the squares of
the departures of the observations around mean
divided by the sample size(4)
(94116)/47.5
40
The variance of the school means around the
overall mean
The variance of the school means around the
overall mean
(2.52(-2.5)2)/26.25
Total variance 7.5
41
The variance of the pupils scores around their
schools mean
The variance of the pupils scores around their
schools mean
((3-2.5)2 (2-2.5)2 (-1-(-2.5))2
(-4-(-2.5))2 )/4 1.25
The variance of the school means around overall
mean (2.52(-2.5)2)/26.25
Total variance 7.56.251.25
42
-gt So you can partition the variance in
individual level and school level
How much of the variability in pupil attainment
is attributable to schools level factors and how
much to pupil level factors?
In terms of our toy example we can now say
6.25/7.5 82 of the total variation of pupils
attainment is attributable to school level factors
And this is important we want to know how to
explain (in this example) school attainment
1.25/7.5 18 of the total variation of pupils
attainment is attributable to pupil level factors
43
Standard multiple regression won't do
So You can use all the data and just run a
multiple regression, but then you disregard the
clustering effect, which gives uncorrect
confidence intervals You can aggregate within
clusters, and then run a multiple regression on
the aggregate data. Two problems no individual
level testing possible you get less data
points. Or you can run a multi-level analysis
44
Multi-level models

The usual multiple regression model assumes
... with the subscript "i" defined at the
case-level.
... and the epsilons independently distributed
with covariance matrix I.
And with clustered data, you know these
assumptions are not met.

45
Solution 1 add dummy-variables per person

So just multiple regression, but with as many
dummy variables as you have
... where, in this example, there are j1
persons.
IF the clustering is (largely) due to differences
in the intercept between persons, this might
work.
BUT if there are only a handful of cases per
person, this necessitates a huge number of extra
variables

46
Solution 2 split your micro-level X-vars

Say you have
then create
and add these as predictors (instead of x1)

Make sure that you understand what is happening
here, and why it is of use.
47
Solution 3 the variance component model
In the variance component model, we split the
randomness in a "personal part" and a "rest part"
48
Now how do you do this in SPSS?

Somewhat problematic not available for a binary
Y-variable.
ltSee SPSS demogt

49
What we skipped

Assumption checking for instance, is the
delta-variable normally distributed?
run the analysis with dummies per group check
the set of coefficients and compare whether these
follow a normal distribution
What do we do when we have more than 2 levels?
What do we do when Y is binary? (in our case run
MIXED anyway. It's wrong but it will do.)
Random coefficients?

50
When you have multi-level data (2 levels)

If applicable consider whether using separate
dummies per group might help
Run an empty mixed model (i.e., just the constant
included) in SPSS. Look at the level on which
most of the variance resides.
If applicable divide micro-variables in "group
mean" variables and "difference from group mean"
variables.
Re-run your mixed model with these variables
included (as you would a multiple regression
analysis)

51
To Do

Have a look at your questions one last time. Uwe
and I will create the survey the coming week
(there were still too many issues rised to start
creating it now).
Train multi-level analysis with the data
available on the Moodle site start practising
and/or brushing up your SPSS skills (once again)

Write a Comment

User Comments (0)