Title: Definizione e Attivazione della Rete per il Supporto Metodologico e lAttivit di Ricerca del Dipartim
1Metodi statistici per lintegrazione di dati da
più fonti
Roma 9-10 Dicembre 2004
2Reti bayesiane e campionamento complesso da
popolazioni finite
- Marco Ballin Mauro Scanu
- Istituto Nazionale di Statistica
- Paola Vicard
- Università degli Studi - Roma Tre
Prin 2002 - Unità di Perugia
3Aim of the methodogical project
- To make available BAYESIAN NETWORKS in the
framework of complex sampling design on finite
population
focus of the talk
Definition of Bayesian networks in the case of
stratified sampling design
Prin 2002 - Unità di Perugia
4Why BNs?
BAYESIAN NETWORKS are a very powerful tool
suitable to manage problems with many variables.
In which contexts are BNs developed and applied?
Medicine-Biostatistics (diagnosis) Forensic
Statistics (identification) Finance (customer
segmentation-classification) Troubleshooting
(decision problems) ..
Prin 2002 - Unità di Perugia
5Are BNs useful in official statistics?
- Some experiences
- Imputation (Di Zio et al,..later this afternoon)
- Statistical Matching (Istat working group on SAM,
2004) - Description of Census results (Getoor et al,
2001) - General useful characteristics of BNs
- Straightforward description of statistical
relationships among variables by a graphical
representation. - Fast and easy propagation of evidence
- Useful tool to describe possible scenarios
- Simple updating of high dimensional distributions
given auxiliary information
Prin 2002 - Unità di Perugia
6What is a BN?
- NodesVariables
- Directed edges directed relationships
- Each node has associated a ddp given its parents
Chain rule P(X1,,Xk)? P(XiX1,,Xi-1) ?
P(Xiparents(Xi))
Prin 2002 - Unità di Perugia
7What is a BN?
- A BN is defined by
- The set of edges (structure)
- The conditional probability distributions
(parameters)
Methods and software for BN estimation (structure
and parameters) are developed under the iid
assumption.
But
Prin 2002 - Unità di Perugia
8Whats the problem in using BNs in complex
surveys?
- Sample design is not taken into consideration by
the usual methods and software - Therefore
- The estimated structure and parameters are not
consistent with those obtained through the usual
finite population techniques -
Prin 2002 - Unità di Perugia
9Whats the problem in using BNs in complex
surveys?
- Our proposal to overcome the problem
- Introduce an additional node S describing the
sampling design - Add arrows from S to each variable
- Learn the structure and estimate parameters
conditionally on S
Prin 2002 - Unità di Perugia
10A toy example on real data
- Survey business survey on farms
- Variables
- multifunctionality,
- altimetry,
- Internet,
- classes of Gross Operative Margin
- n989 (Tuscany)
- Sample design stratified (9 strata, SRS)
Prin 2002 - Unità di Perugia
11Future developments
- Understanding of the BN based estimator
properties and analogies with calibration methods
- Develop alternative methods to learn the
structure in the case of finite population and
complex sample design - Development of a software suitable for finite
population context - Application of BN in the following contexts
- Consistency among different surveys results
- Integration of different sources
- What happens when the data set is partially
observed - Application to the imputation framework
- Description (choosing contingency tables)
Prin 2002 - Unità di Perugia
12Metodi statistici per lintegrazione di dati da
più fonti
Roma 9-10 Dicembre 2004