Panorama for detecting outliers methods in structural surveys implementation on French and Ukrainian - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Panorama for detecting outliers methods in structural surveys implementation on French and Ukrainian

Description:

Panorama for detecting outliers methods in structural surveys implementation on ... We consider the turnover as the main variable ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 31
Provided by: statist
Category:

less

Transcript and Presenter's Notes

Title: Panorama for detecting outliers methods in structural surveys implementation on French and Ukrainian


1
Panorama for detecting outliers methods in
structural surveys implementation on French and
Ukrainian data
  • Olga A.Vasyechko, Research Institute of
    Statistics of Ukraine O.Vasechko_at_ukrstat.gov.ua
  • Noureddine Benlagha, Université Paris 2, ERMES
    UMR 7017(CNRS) blnouri2002_at_yahoo.fr
  • Michel Grun-Rehomme, Université Paris 2, ERMES
    UMR 7017(CNRS) grun_at_u-paris2.fr

2
Introduction(1)
  • Statistical analysis requires to ensure
  • The quality of data
  • The robustness of indicators

3
Introduction(2)
  • Two categories of enterprises
  • Small enterprises
  • Big and middle enterprises
  • We consider the turnover as the main variable

4
Why is it necessary to detect outliers before
mining the data?
  • Theoretical reasons Extreme values increase the
    variance, deteriorate the occurrence of estimates
  • Practical reasons we have to detect outliers to
    prepare the next surveys

5
Some classical tests of outliers detection
  • Grubbs (1950,1969)
  • Grubbs and Beck ,Tietjen and Moore (1972)
  • Rosner (1975)
  • Atinkson A.C., Koopman S.J., Shepard N. (1997)
  • Tancredi and al (2002)
  • F.Dominici, L. Cope, D.Q. Naiman and S.L. Zeger
    (2005)

6
Objective
  • Using of different methods to detect the atypical
    units in the structural business surveys

7
Different Methods
  • Algebraic Method
  • New non parametric method
  • Graphical Method Box plot
  • Probabilistic model Extreme value theory

8
Application
  • These various methods are applied to
  • French data
  • Ukrainian data

9
Algebraic methods(1)
  • The distance from the unit to the center of the
    distribution

xi the unit i m central tendency parameter s
scale parameter
10
Algebraic methods(2)
  • Hidiroglou and Berthetols interval

11
Graphic method
  • Box plot method
  • The Tukeys limits of a box plot

12
A non parametric method to detect extreme
values(1)
  • Two aspects
  • The distance
  • The form of distribution

13
A non parametric method of extreme values
detection (2)
  • The indicator of contribution

14
A non parametric method of extreme values
detection (4)
  • Properties of the indicator
  • This indicator has the following properties
  • (1) 0 In (i) 1 For any i and N, and it can
    reach its end point
  • (2) In (i) It is increasing on the whole of the
    values above mean of X and decreasing if not
  • (3) In (i) It can admit a point of inflection
    on the whole of the values above mean of X

15
A non parametric method of extreme values
detection (5)
  • The consequences
  • If In (i) 1 then i is an extreme value
  • Else, i is a normal observation

16
The extreme value theory
  • Two approaches
  • The classical method (EVT)
  • The peak over threshold (POT)

17
The peak over threshold method
  • Two problems
  • Estimating three parameters of the distribution
  • Fixing the threshold

18
The generalized Pareto distribution
Where
19
The generalized Pareto distribution
  • G(y) the generalized extreme value
  • H(y) the generalized Pareto distribution
  • ? the shape parameter ( the tail index)
  • µ the location parameter
  • s the scale parameter

20
Estimation of the tail index (?)
  • Likelihood estimator
  • Hill estimator (1975)
  • Pickands estimator (1975)

21
Choice of the threshold(1)
  • Two methods
  • The function of the mean of excesses
  • Using the extreme quantile

22
Choice of the threshold(2)
  • The function of the excesses mean.
  • Then

23
Choice of the threshold(3)
  • Using the extreme quantile.
  • Where F-1 is the reverse function of
    distribution of X.

24
Approximation of the GPD extreme quantile
  • The function is
  • Problem estimating ?n , µn, sn

25
Data
  • French and Ukrainian data volumes of turnovers
    (in 2003) of small enterprises
  • 4 divisions
  • Work of metals (28)
  • Construction(45)
  • Retail trade(52)
  • Computer operations(72)

26
Empirical results
  • The used Software
  • SAS software
  • Extreme software
  • Matlab software

27
Results
  • Number of extreme value

28
Conclusion
  • A principal component analysis
  • The first principal axis 55 of explained
    inertia, the couple (Box plot, In) Vs other
    criteria.
  • The second principal axis 26 of explained
    inertia representing the extreme value theory.
  • The last axis corresponds to the expert s point
    of view.

29
References
  • Sim C.H., Gan F.F., Chang T.C. (2005)
  • Outlier labelling with Boxplot Procedures
  • JASA, vol. 100, n. 470, 642-652
  • Marchette D.J., Solka J.L. (2003)
  • Using data images for outlier detection
  • Computational Statistics Data Analysis,
    43, 541-552
  • Nikulin M., Zerbet A. (2002)
  • Détection des observations aberrantes par des
    méthodes statistiques
  • RSA, L(3), 25-51
  • Reiss, R., Thomas, M. (2001)
  • Statistical Analysis of extreme values
  • Birkhauser Verlag

30
Q2006
  • Thank you.
Write a Comment
User Comments (0)
About PowerShow.com