ANALYZING CATEGORICAL DATA WITH GRAPHICAL MODELS A SOCIAL SCIENCE APPLICATION - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

ANALYZING CATEGORICAL DATA WITH GRAPHICAL MODELS A SOCIAL SCIENCE APPLICATION

Description:

Department of Statistics, Faculty of Social Sciences, ... occupation Occupation # of birth # of storks. 7. Conditional independences in graphical models II ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 14
Provided by: nmeth
Category:

less

Transcript and Presenter's Notes

Title: ANALYZING CATEGORICAL DATA WITH GRAPHICAL MODELS A SOCIAL SCIENCE APPLICATION


1
ANALYZING CATEGORICAL DATA WITH GRAPHICAL
MODELS- A SOCIAL SCIENCE APPLICATION -
  • Renáta Németh 1, Tamás Rudas 2, Wicher Bergsma 3
  • 1 nmthrnt_at_freemail.hu. Department of Statistics,
    Faculty of Social Sciences, Eötvös Loránd
    University
  • 2 rudas_at_tarki.hu. Department of Statistics,
    Faculty of Social Sciences, Eötvös Loránd
    University,
  • 3 W.P.Bergsma_at_lse.ac.uk. Department of
    Statistics, London School of Economics and
    Political Science
  • SMABS/EAM Conference
  • 5 July 2006, Budapest

2
Overview
  • 1. Marginal models
  • 2. Graphical models
  • 4. Application the classic Duncan model and its
    modified version (1992, Czechoslovakia, Hungary,
    USA)


3. Graphical models for categorical data
3
Marginal models
  • They impose restrictions on the marginal
    distributions of the contingency table
  • Their parameterization can be obtained by
    marginal loglinear parameters (Bergsma, Rudas,
    2002)
  • Example the value of the parameter pertaining to
    the EI effect depend on the marginal from which
    it is computed (see the Simpson paradox)
  • F fathers occupation, E educational
    background, O occupation, I income

4
Parameterizing marginal models
  • There are many different ways to parameterize a
    certain table using marginal log-linear
    parameters. Two parameterizations for the EOI
    joint distribution
  • Consider the model (Education ? Income
    Occupation).
  • It holds if and only if ?EIEOI 0.
  • General requirements for a parameterization
  • Independence models should be identified by
    certain parameters being zero
  • The existence of a distribution when some of the
    parameters are linearly restricted
  • Parameters should be easily interpretable

5
Graphical models(see eg. Lauritzen 1996)
  • Applications in statistical physics, genetics,
    artificial intelligence...
  • Graph node variable, edge/arrow
    undirected/directed association. Unconnected
    nodes are assumed to be conditionally
    independent.
  • Modularity by combining simpler parts
  • Complex systems can be in easily visualized and
    interpreted

6
Conditional independences in graphical models I
  • Directed acyclic graph (DAG) a variable V is
    conditionally independent from its nondescendants
    conditioned on its parents.
  • Education of roofs
  • Fathers Income
  • occupation Occupation of birth of storks

7
Conditional independences in graphical models II
  • Chain graph model the variables are divided into
    groups that are ordered, variables in a group are
    responses to variables in earlier groups.
    Associations within the groups are represented by
    edges, the arrows point from the explanatories to
    the responses.
  • A missing edge between two variables means
    conditional independence of these given all other
    variables in the group and in all earlier groups.
  • A missing arrow means conditional independence of
    the explanatory and the response, given all other
    variables in the group of the response and in all
    earlier groups.
  • Example
  • Determinants of health from a 2-waves panel data
  • Socioeconomic (wave 1) Health behaviour
    (wave 1) Health (wave 1) Health
    (wave 2)
  • Income
  • Marital status
  • Smoking
  • Alcohol consumption
  • Physical activity
  • Coronary heart disease
  • High blood pressure
  • Coronary heart disease
  • High blood pressure

8
Categorical data graphical models as marginal
models
  • To obtain parameterization of DAG or chain-graph
    models, marginal log-linear parameters can be
    used. The parameterization can be chosen in a way
    that ordered decomplosability and hierarchicity
    hold (Rudas, Bergsma, 2004, Rudas, Bergsma,
    Németh, 2006 a, 2006 b), which implies the
    following (Bergsma, Rudas, 2002)
  • the model can be defined by setting the values of
    some of the parameters to zero, the distributions
    within the model are parameterized by the
    remaining unrestricted parameters
  • the parameters are variation independent (hence
    they are well-interpretable and any evaluation of
    them defines non-empty model),
  • the model has standard asymptotic behavior (MLH
    estimates for parameters, LR statistic for
    testing models).

9
Application, Model 1
  • Status Attainment Model
  • (Duncan et al, 1968)
  • Fe fathers education,
  • Fo fathers occupation
  • E education
  • O occupation
  • I income

Conditional independences O ? Fe EFo, I ?
FeFo EO. Zero parameters ?FeOFeFoEO,
?FeIFeFoEOI, ?FoIFeFoEOI Free parameters
?FeFo , ?EFeFoE , ?OFeFoEO , ?EOFeFoEO ,
?FoOFeFoEO , ?FoEOFeFoEO , ?IFeFoEOI ,
?EIFeFoEOI , ?OIFeFoEOI , ?EOIFeFoEOI.
10
Application, Model 2
  • The modified Status Attainment Model
  • a chain graph model
  • (Boguszak et al, 1990)

Conditional independences I ? FeFo EO. Zero
parameters ?FeIFeFoEOI, ?FoIFeFoEOI Free
parameters ?FeFo , ?FeFoEO , ?IFeFoEOI ,
?EIFeFoEOI , ?OIFeFoEOI , ?EOIFeFoEOI. Zero
parameters of Model 1 are also zero parameters of
Model 2.
11
Application comparing Model 2 to Model 1
  • Data for Czechoslovakia, Hungary, and the USA,
    from the International Social Survey Programme
    (ISSP) 1992.
  • Variable categories
  • E and Fe 1 - below higher education, 2 higher
    education
  • I 1 - below country-specific sample median
    income 2 - above the median
  • O and Fo 1 - lower class 2 - middle class 3 -
    upper class
  • Results
  • Model 1 Model 2 Model 2 to Model 1
  • df 42 30 12
  • USA L2 25.6 14.7 25.6-14.710.9
  • p .978 .991 0.538
  • Hungary L2 38.1 23.3 38.1-23.314.8
  • p .643 .802 0.253
  • Czechoslovakia L2 38.8 25.5 38.8-25.513.3
  • p .614 .700 0.348

12
Application
Model 1, estimates for ? parameters From top to
bottom USA, Hungary, Czechoslovakia
13
References
  • Bergsma, W., Rudas, T. (2002) Marginal models
    for categorical data. The Annals of Statistics,
    (30/1), 140-159.
  • Boguszak, Marek, Gabal, Ivan, Mateju, Petr
    (1990) Ke koncepcím vývoje sociální struktury v
    CSSR. Sociologický casopis (26/3), 168- 186.
  • Internation Social Survey Programme (ISSP)
    Social Inequality II, 1992. Zentralarchiv für
    Empirische Sozialforschung, Köln
  • Lauritzen, S.L. (1996). Graphical Models. Oxford
    University Press.
  • Rudas, Tamás, Bergsma, Wicher, Németh, Renáta
    (2006a) Parameterization and estimation of path
    models for categorical data. Proceeding of the
    IASC 17th Compstat Symposium, 2006, Rome
  • Rudas, T., Bergsma, W., Németh, R. (2006b)
    Graphical and path models for categorical
    variables. (manuscript)
  • Rudas, Tamás, Bergsma, Wicher (2004) On
    application of marginal models for categorical
    data. Metron, (62/1), 1-23.
  • Duncan, O. D., Featherman, D. L., Duncan, B.
    (1968) Socioeconomic Background and Occupational
    Achievement Extensions of a Basic Model.
    Washington, D. C. U. S. Department of Health,
    Education, and Welfare, Office of Education,
    Bureau of Research.
Write a Comment
User Comments (0)
About PowerShow.com