Logistic regression Results from factor analysis Multilevel analysis Last survey issues online next - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Logistic regression Results from factor analysis Multilevel analysis Last survey issues online next

Description:

logistic regression: example case. We hebben gegevens over geboortes van bijna ... KURTOSIS SEKURT /HISTOGRAM /ORDER= ANALYSIS . LOGISTIC REGRESSION babylicht ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 52
Provided by: csnij
Category:

less

Transcript and Presenter's Notes

Title: Logistic regression Results from factor analysis Multilevel analysis Last survey issues online next


1
Logistic regressionResults from factor
analysisMulti-level analysisLast survey
issues (online next week)
2
AMMBR week 6
  • logistic regression example case
  • We hebben gegevens over geboortes van bijna 200
    baby's. Een deel van deze baby's heeft een
    gevaarlijk laag geboortegewicht (1laag,
    0niet-laag). De vraag is welke factoren
    samenhangen met een laag geboortegewicht.

3
Logistic regression assignment what steps?
  • 1. Look at the descriptives. Any transformations
    necessary/useful?
  • 2. Is there a problem of multicollinearity
    between the (transformed) independent variables?
  • 3. Sampling adequacy? Are all cells in the
    multivariate tables of the categorical variables
    filled with enough cases?
  • 4. Were the cases sampled independently?
  • 5. Are there any relevant 'third' variables
    missing?
  • 6. Does the chosen model fit to the data?

4
Logistic regression assignment what steps?
  • 7. Any interaction effects?
  • 8. Any curvilinear relationships?
  • 9. Any outliers?
  • 10. Homogeneity of the error variance?
  • 11. Interpretation of the results of the finally
    chosen model

5
1. Descriptives/transformations
6
1. Descriptives/transformations
new_weight1/weight
7
1. Descriptives/transformations
ln_ageln(age)
8
1. Descriptives/transformations
?
RECODE naararts (00) (1 thru Highest1)
(ELSESYSMIS) INTO new_naararts . EXECUTE .
9
two dummy variables for race
  • RECODE
  • ras
  • (21) (10) (30) INTO black .
  • EXECUTE .
  • RECODE
  • ras
  • (11) (20) (30) INTO white .
  • EXECUTE .

10
2. multicollinearity?
  • no strong bi-variate correlations between the
    x-variables
  • REGRESSION
  • /MISSING LISTWISE
  • /STATISTICS COLLIN TOL
  • /CRITERIAPIN(.05) POUT(.10)
  • /NOORIGIN
  • /DEPENDENT babylicht
  • /METHODENTER black white rookt new_weight
    new_naararts hypertensie ln_age
  • no VIF10, all tolerances gt 0.2

11
3. sampling adequacy
  • frequencies and (higher order!) cross-tabs
  • only 12 cases with hypertension
  • no analysis of interaction effects with
    hypertension possible
  • Check standard error of estimate carefully

12
4. independent sampling
  • here we have to assume independent sampling...
  • if you are doing 'real' research you have to
    check...

13
5. all relevant 'third' variables?
  • how do we know? ? theory
  • importance of theory guided data collection
  • here we have to assume....

14
6. model fit (simple model)
  • LOGISTIC REGRESSION babylicht
  • /METHOD ENTER black white ln_age new_weight
    rookt new_naararts hypertensie
  • /SAVE PRED COOK ZRESID
  • /PRINT GOODFIT
  • /CRITERIA PIN(.05) POUT(.10) ITERATE(20)
    CUT(.5) .

15
6. model fit (simple model)
Nagelkerke's R-Square 18.2
16
7. interaction effects
  • COMPUTE smoke_arts rookt new_naararts .
  • compare the difference in fit between simple
    model and the model with interaction
  • Chi-Square26.193-26.1630.03, df1, pgt0.5
  • no significant improvement

17
7. interaction effects
  • COMPUTE smoke_arts rookt new_naararts .

18
8. linear relationship?
  • COMPUTE square_age ln_age ln_age .
  • EXECUTE .
  • LOGISTIC REGRESSION babylicht
  • /METHOD ENTER black white new_weight rookt
    new_naararts hypertensie square_age
  • /SAVE PRED COOK ZRESID
  • /PRINT GOODFIT
  • /CRITERIA PIN(.05) POUT(.10) ITERATE(20)
    CUT(.5) .
  • Note 1 replacement of (transformed) age by age2
  • Note 2 no 'nested' models

19
8. linear relationship?
Model fit not better than fit of simpler (linear)
model Note You cannot take the chi-square
difference as test-statistic
20
8. linear relationship?
COMPUTE weight_square (new_weight
new_weight)10000. EXECUTE .
is the effect of weight linear or quadratic?
21
9. Outliers
  • FREQUENCIES
  • VARIABLESZRE_1
  • /STATISTICSSTDDEV RANGE MINIMUM MAXIMUM MEAN
    MEDIAN SKEWNESS SESKEW
  • KURTOSIS SEKURT
  • /HISTOGRAM
  • /ORDER ANALYSIS .
  • LOGISTIC REGRESSION babylicht
  • /METHOD ENTER black white rookt new_weight
    new_naararts hypertensie
  • ln_age
  • /SAVE PRED COOK LEVER DFBETA ZRESID
  • /CLASSPLOT /CASEWISE OUTLIER(2)
  • /CRITERIA PIN(.05) POUT(.10) ITERATE(20)
    CUT(.5) .

22
9. Outliers
Residuals and influence statistics
23
9. Outliers
Plot of residuals against probabilities
(classplot)
24
9. Outliers
Step number 1 Observed Groups
and Predicted Probabilities 16 ô

ô ó
ó ó
ó F
ó
ó R 12 ô 1
ô E
ó 1
ó Q ó 1 1
ó U ó
10 0 1
ó E 8 ô 1 1 00 0 1 1
ô N ó 01 1
1 00 0 1 1 1
ó C ó 0000 0 11010010 1 1 1 1
ó Y ó 0000000
111010010 110 1 0 1
ó 4 ô 0000000 100010010 11011 0 10
1 ô ó 0000000
00000000010001000 10 1 1
ó ó 0000000000000000010000000 10 11
1 1 1 ó ó
00000000000000000000000000100000 00 1 11 1 0 0
ó Predicted òòòòòòòòòòòòòòôòòòòòòòòòòòòòòô
òòòòòòòòòòòòòòôòòòòòòòòòòòòòòò Prob 0
,25 ,5 ,75
1 Group 00000000000000000000000000000011111111
1111111111111111111111 Predicted
Probability is of Membership for 1 The
Cut Value is ,50 Symbols 0 - 0
1 - 1 Each Symbol Represents
1 Case.
25
9. Influential Outliers?
  • FREQUENCIES
  • VARIABLESCOO_1 LEV_1 DFB0_1 DFB1_1 DFB2_1
    DFB3_1 DFB4_1 DFB5_1 DFB6_1
  • DFB7_1
  • /STATISTICSSTDDEV MINIMUM MAXIMUM MEAN
    SKEWNESS SESKEW
  • /HISTOGRAM
  • /ORDER ANALYSIS .

Crit val lt1
Crit val 3(k1)/n.127
26
9. Influential Outliers
Crit val 1

27
9. Influential Outliers
28
9. Influential Outliers
29
10. Variance homogeneity
  • GRAPH
  • /SCATTERPLOT(BIVAR)PRE_1 WITH ZRE_1
  • /MISSINGLISTWISE .

30
11. Interpretation of chosen model
31
11. Interpreting the coefficients
  • In multiple regression
  • y c0 c1x1 c2x2
  • the c1 resembles the relation between x1 and
    y, irrespective of the value of x2.
  • In logistic regression this is no longer the case
  • -gt In logistic regression, the size of the
    effect depends on the values of the other
    independent variables.

32
  • MULTI LEVEL ANALYSIS

33
Multi-level models
Bayesian hierarchical models mixed models
hierarchical linear models random effects
models random coefficient models subject
specific models variance component models
variance heterogeneity
dealing with clustered data. One solution the
variance component model
34
Clustered data / multi-level models
  • Pupils within schools
  • (within regions within countries)
  • Firms within regions (or sectors)
  • In the ICOP case more than one observation per
    person

General issue observations are not independent
35
On individual vs aggregate data
For instance X extraversion X
innovativeness of building Y school results Y
extent to which building is liked
36
Had we only known, that the data are clustered!
  • So the effect of X within clusters can be
    different from the effect between clusters!

37
MAIN MESSAGE
  • Make sure that you discern between two kinds of
    effects those at the "micro-level" vs those at
    the aggregate level and ...
  • ... that you do not test a micro-hypothesis with
    aggregate data (or vice versa)

38
A toy example two schools, two pupils
Two schools each with two pupils. We first
calculate the means.
(taken from Rasbash)
Overall mean (32(-1)(-4))/40
39
Now the variance
The total variance is the sum of the squares of
the departures of the observations around mean
divided by the sample size(4)
(94116)/47.5
40
The variance of the school means around the
overall mean
The variance of the school means around the
overall mean
(2.52(-2.5)2)/26.25
Total variance 7.5
41
The variance of the pupils scores around their
schools mean
The variance of the pupils scores around their
schools mean
((3-2.5)2 (2-2.5)2 (-1-(-2.5))2
(-4-(-2.5))2 )/4 1.25
The variance of the school means around overall
mean (2.52(-2.5)2)/26.25
Total variance 7.56.251.25
42
-gt So you can partition the variance in
individual level and school level
How much of the variability in pupil attainment
is attributable to schools level factors and how
much to pupil level factors?
In terms of our toy example we can now say
6.25/7.5 82 of the total variation of pupils
attainment is attributable to school level factors
And this is important we want to know how to
explain (in this example) school attainment
1.25/7.5 18 of the total variation of pupils
attainment is attributable to pupil level factors
43
Standard multiple regression won't do
So You can use all the data and just run a
multiple regression, but then you disregard the
clustering effect, which gives uncorrect
confidence intervals You can aggregate within
clusters, and then run a multiple regression on
the aggregate data. Two problems no individual
level testing possible you get less data
points. Or you can run a multi-level analysis
44
Multi-level models
  • The usual multiple regression model assumes
  • ... with the subscript "i" defined at the
    case-level.
  • ... and the epsilons independently distributed
    with covariance matrix I.
  • And with clustered data, you know these
    assumptions are not met.

45
Solution 1 add dummy-variables per person
  • So just multiple regression, but with as many
    dummy variables as you have
  • ... where, in this example, there are j1
    persons.
  • IF the clustering is (largely) due to differences
    in the intercept between persons, this might
    work.
  • BUT if there are only a handful of cases per
    person, this necessitates a huge number of extra
    variables

46
Solution 2 split your micro-level X-vars
  • Say you have
  • then create
  • and add these as predictors (instead of x1)

Make sure that you understand what is happening
here, and why it is of use.
47
Solution 3 the variance component model
In the variance component model, we split the
randomness in a "personal part" and a "rest part"
48
Now how do you do this in SPSS?
  • Somewhat problematic not available for a binary
    Y-variable.
  • ltSee SPSS demogt

49
What we skipped
  • Assumption checking for instance, is the
    delta-variable normally distributed?
  • run the analysis with dummies per group check
    the set of coefficients and compare whether these
    follow a normal distribution
  • What do we do when we have more than 2 levels?
  • What do we do when Y is binary? (in our case run
    MIXED anyway. It's wrong but it will do.)
  • Random coefficients?

50
When you have multi-level data (2 levels)
  • If applicable consider whether using separate
    dummies per group might help
  • Run an empty mixed model (i.e., just the constant
    included) in SPSS. Look at the level on which
    most of the variance resides.
  • If applicable divide micro-variables in "group
    mean" variables and "difference from group mean"
    variables.
  • Re-run your mixed model with these variables
    included (as you would a multiple regression
    analysis)

51
To Do
  • Have a look at your questions one last time. Uwe
    and I will create the survey the coming week
    (there were still too many issues rised to start
    creating it now).
  • Train multi-level analysis with the data
    available on the Moodle site start practising
    and/or brushing up your SPSS skills (once again)
Write a Comment
User Comments (0)
About PowerShow.com