Title: ANALYZING CATEGORICAL DATA WITH GRAPHICAL MODELS A SOCIAL SCIENCE APPLICATION
1ANALYZING CATEGORICAL DATA WITH GRAPHICAL
MODELS- A SOCIAL SCIENCE APPLICATION -
- Renáta Németh 1, Tamás Rudas 2, Wicher Bergsma 3
- 1 nmthrnt_at_freemail.hu. Department of Statistics,
Faculty of Social Sciences, Eötvös Loránd
University - 2 rudas_at_tarki.hu. Department of Statistics,
Faculty of Social Sciences, Eötvös Loránd
University, - 3 W.P.Bergsma_at_lse.ac.uk. Department of
Statistics, London School of Economics and
Political Science - SMABS/EAM Conference
- 5 July 2006, Budapest
2Overview
- 1. Marginal models
- 2. Graphical models
- 4. Application the classic Duncan model and its
modified version (1992, Czechoslovakia, Hungary,
USA)
3. Graphical models for categorical data
3Marginal models
- They impose restrictions on the marginal
distributions of the contingency table - Their parameterization can be obtained by
marginal loglinear parameters (Bergsma, Rudas,
2002) - Example the value of the parameter pertaining to
the EI effect depend on the marginal from which
it is computed (see the Simpson paradox) - F fathers occupation, E educational
background, O occupation, I income
4Parameterizing marginal models
- There are many different ways to parameterize a
certain table using marginal log-linear
parameters. Two parameterizations for the EOI
joint distribution -
- Consider the model (Education ? Income
Occupation). - It holds if and only if ?EIEOI 0.
- General requirements for a parameterization
- Independence models should be identified by
certain parameters being zero - The existence of a distribution when some of the
parameters are linearly restricted - Parameters should be easily interpretable
-
5Graphical models(see eg. Lauritzen 1996)
- Applications in statistical physics, genetics,
artificial intelligence... - Graph node variable, edge/arrow
undirected/directed association. Unconnected
nodes are assumed to be conditionally
independent. - Modularity by combining simpler parts
- Complex systems can be in easily visualized and
interpreted
6Conditional independences in graphical models I
- Directed acyclic graph (DAG) a variable V is
conditionally independent from its nondescendants
conditioned on its parents. - Education of roofs
- Fathers Income
- occupation Occupation of birth of storks
-
7Conditional independences in graphical models II
- Chain graph model the variables are divided into
groups that are ordered, variables in a group are
responses to variables in earlier groups.
Associations within the groups are represented by
edges, the arrows point from the explanatories to
the responses. - A missing edge between two variables means
conditional independence of these given all other
variables in the group and in all earlier groups. - A missing arrow means conditional independence of
the explanatory and the response, given all other
variables in the group of the response and in all
earlier groups. - Example
- Determinants of health from a 2-waves panel data
- Socioeconomic (wave 1) Health behaviour
(wave 1) Health (wave 1) Health
(wave 2)
- Smoking
- Alcohol consumption
- Physical activity
- Coronary heart disease
- High blood pressure
- Coronary heart disease
- High blood pressure
8Categorical data graphical models as marginal
models
- To obtain parameterization of DAG or chain-graph
models, marginal log-linear parameters can be
used. The parameterization can be chosen in a way
that ordered decomplosability and hierarchicity
hold (Rudas, Bergsma, 2004, Rudas, Bergsma,
Németh, 2006 a, 2006 b), which implies the
following (Bergsma, Rudas, 2002) - the model can be defined by setting the values of
some of the parameters to zero, the distributions
within the model are parameterized by the
remaining unrestricted parameters - the parameters are variation independent (hence
they are well-interpretable and any evaluation of
them defines non-empty model), - the model has standard asymptotic behavior (MLH
estimates for parameters, LR statistic for
testing models).
9Application, Model 1
- Status Attainment Model
- (Duncan et al, 1968)
- Fe fathers education,
- Fo fathers occupation
- E education
- O occupation
- I income
Conditional independences O ? Fe EFo, I ?
FeFo EO. Zero parameters ?FeOFeFoEO,
?FeIFeFoEOI, ?FoIFeFoEOI Free parameters
?FeFo , ?EFeFoE , ?OFeFoEO , ?EOFeFoEO ,
?FoOFeFoEO , ?FoEOFeFoEO , ?IFeFoEOI ,
?EIFeFoEOI , ?OIFeFoEOI , ?EOIFeFoEOI.
10Application, Model 2
- The modified Status Attainment Model
- a chain graph model
- (Boguszak et al, 1990)
Conditional independences I ? FeFo EO. Zero
parameters ?FeIFeFoEOI, ?FoIFeFoEOI Free
parameters ?FeFo , ?FeFoEO , ?IFeFoEOI ,
?EIFeFoEOI , ?OIFeFoEOI , ?EOIFeFoEOI. Zero
parameters of Model 1 are also zero parameters of
Model 2.
11Application comparing Model 2 to Model 1
- Data for Czechoslovakia, Hungary, and the USA,
from the International Social Survey Programme
(ISSP) 1992. - Variable categories
- E and Fe 1 - below higher education, 2 higher
education - I 1 - below country-specific sample median
income 2 - above the median - O and Fo 1 - lower class 2 - middle class 3 -
upper class - Results
- Model 1 Model 2 Model 2 to Model 1
- df 42 30 12
- USA L2 25.6 14.7 25.6-14.710.9
- p .978 .991 0.538
- Hungary L2 38.1 23.3 38.1-23.314.8
- p .643 .802 0.253
- Czechoslovakia L2 38.8 25.5 38.8-25.513.3
- p .614 .700 0.348
12Application
Model 1, estimates for ? parameters From top to
bottom USA, Hungary, Czechoslovakia
13References
- Bergsma, W., Rudas, T. (2002) Marginal models
for categorical data. The Annals of Statistics,
(30/1), 140-159. - Boguszak, Marek, Gabal, Ivan, Mateju, Petr
(1990) Ke koncepcím vývoje sociální struktury v
CSSR. Sociologický casopis (26/3), 168- 186. - Internation Social Survey Programme (ISSP)
Social Inequality II, 1992. Zentralarchiv für
Empirische Sozialforschung, Köln - Lauritzen, S.L. (1996). Graphical Models. Oxford
University Press. - Rudas, Tamás, Bergsma, Wicher, Németh, Renáta
(2006a) Parameterization and estimation of path
models for categorical data. Proceeding of the
IASC 17th Compstat Symposium, 2006, Rome - Rudas, T., Bergsma, W., Németh, R. (2006b)
Graphical and path models for categorical
variables. (manuscript) - Rudas, Tamás, Bergsma, Wicher (2004) On
application of marginal models for categorical
data. Metron, (62/1), 1-23. - Duncan, O. D., Featherman, D. L., Duncan, B.
(1968) Socioeconomic Background and Occupational
Achievement Extensions of a Basic Model.
Washington, D. C. U. S. Department of Health,
Education, and Welfare, Office of Education,
Bureau of Research.