SUDA: A program for Detecting Special Uniques - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

SUDA: A program for Detecting Special Uniques

Description:

Definition: A microdata record which is sample unique on key variable set K, ... (2004) show that Special uniques are rarer in the population than non-special ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 22
Provided by: annama51
Learn more at: https://unece.org
Category:

less

Transcript and Presenter's Notes

Title: SUDA: A program for Detecting Special Uniques


1
SUDA A program for Detecting Special Uniques
  • Mark Elliot, Anna Manning, Ken Mayes, John Gurd,
    Michael Bane
  • University of Manchester
  • Mark.Elliot_at_manchester.ac.uk
  • Cathie Marsh Centre for Census and Survey
    Research, University of Manchester 

2
Overview
  • Principle of Special Uniqueness
  • Basic SUDA algorithms
  • Description of Software
  • Current and Future work

3
Principles of Special Uniqueness
4
Special Uniqueness
  • Definition A microdata record which is sample
    unique on key variable set K, which is also
    unique on a subset of K.

5
  • Elliot(2000), Elliot and Manning(2002), Merrett
    et al (2004) show that Special uniques are rarer
    in the population than non-special (a.k.a. random
    uniques).

6
Paradigm proposition
A B AB
C D CD
A C B D ABCD
Population counts
a b ab
c d cd
a c b d abcd
Sample counts
Where a1, pr ( A1 ab1) gt pr( A1 ab gt
1) Related to neighbourhoods concept Rinott and
Shlomo (2005)
7
Hunt for proof
  • As yet we have no proof for the proposition.
  • Simulation work shows us that
  • if the variables are not independent then the
    proposition is true.
  • the degree of marginalisation is related to
    pr(A1). This is true by induction.

8
Basics of SUDA design
9
Design Principle
  • SUDA is designed around the observation that
    'Every superset of a unique attribute set
    (minimal or otherwise) is itself unique'
    (referred to as the Superset Relationship Elliot
    et al. 2002).

10
The Minimal uniques search
  • The SUDA algorithm searches the lattice of all
    possible uniquenness patterns within the for
    unique combinations.
  • The lattice can get very big as the number of
    variables increases.
  • Efficiency savings are made through grouping
    records of

11
Example lattice
12
The IS score
  • IS metric is used in subsequent output metrics,
    in essence it corresponds to the proportion of
    the lattice which is unique for a given record.
  • This is a principled construct, but not a
    standard statistical one, it is though strongly
    correlated with the underlying risk measure 1/Fj

13
Description of Software
14
(No Transcript)
15
Record level output
  • IS metric This is total IS metric calculated as
    described in section 2 of the paper.
  • 3) Scoring metric The 3rd column contains either
    the Proportion of lattice metric or the DIS-SUDA
    metric depending on which the user asked for.
  • 4-gtN) MSUs The sequence of columns after the
    output metrics give the number of MSUs for the
    record of each size up to the number the user
    specified.
  • N1 -gt NK) Contribution percentage The final
    set of columns are headed with the variable name
    with each of the variables the user has chosen.
    These columns record the percentage contribution
    of each variable to the total IS metric. This is
    simply the IS metric for the MSUs involving that
    variable over the IS metric for the record.

16
File level output
  • Example Attribute contribution
  • col2 att 'age' percentage contribution
    88.8954
  • col3 att 'sex' percentage
    contribution 14.5084
  • col4 att 'mstat' percentage
    contribution 26.7168
  • col5 att 'econpr' percentage
    contribution 43.2581
  • col6 att 'residents' percentage
    contribution 47.5376
  • col7 att 'depchild' percentage
    contribution 26.2359

17
Example Attribute value contribution output col2
att 'age'0 percentage contribution 0.2813 c
ol2 att 'age'1 percentage
contribution 0.4001 col2 att 'age'2
percentage contribution 0.5090 col2 att 'age'3
percentage contribution 0.3256
18
Current and Future work
19
GRID STAD
  • This project aims to Grid enable SUDA, this
    further increases efficiency and efefctivle
    overcomes all normal limits on SUDAs operation.
  • Distributed analyses might seem like over kill,
    but we have big plans..

20
Algorithm improvements
  • We have so far avoided the lure of making
    modelling asumptions in SUDA. Our approach has
    beeen non-parametric.
  • However we are considering biting the bullet and
    evaluating combining SUDA combinational power
    with a more theoretically grounded model based
    approach.

21
SUDA 2
  • New recursive algorithm solves many of the
    computational limitations of SUDA 1
  • Full assessment of cross-classifications of up to
    50 variables is now possible in usable time.
    Scenario keys of 15 or so variable run in seconds.
Write a Comment
User Comments (0)
About PowerShow.com