Geometric and combinatorial issues in data depth - PowerPoint PPT Presentation

About This Presentation
Title:

Geometric and combinatorial issues in data depth

Description:

Affine invariance (at the very least) Robustness: Outliers should not influence ... In 1D, only the median is affine invariant, monotonic and has max breakdown ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 51
Provided by: cgmCsM
Category:

less

Transcript and Presenter's Notes

Title: Geometric and combinatorial issues in data depth


1
Geometric and combinatorial issues in data depth
  • Greg Aloupis
  • Université Libre de Bruxelles

2
What is data depth?
  • A quantitative measurement of how central a point
    is with respect to a data set.
  • Goals to be able to rank data points, and to
    find the center of the data cloud.

3
Some geometric bivariate medians
  • Convex hull peeling (Tukey 70s)
  • 85 Chazelle ?(nlogn)
  • Halfspace median (Tukey 74)
  • 01 Langerman-Steiger O(nlog 3 n), 03 Chan
    O(nlog n) -randomized
  • Oja median (Oja 83)
  • 01 G.A.-Langerman-Soss-Toussaint O(nlog 3 n)
  • Simplicial median (Liu 88)
  • 01 ALST O(n4)

4
Convex hull peeling
5
Convex hull peeling
6
Convex hull peeling
7
Convex hull peeling
8
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • Each median is a point with max/min depth

9
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Tukey) halfspace depth
  • For every line through ?, count points
    above/below.
  • Return minimum number counted over all lines.

10
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Tukey) halfspace depth
  • For every line through ?, count points
    above/below.
  • Return minimum number counted over all lines.

11
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Tukey) halfspace depth
  • For every line through ?, count points
    above/below.
  • Return minimum number counted over all lines.

12
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Tukey) halfspace depth
  • For every line through ?, count points above.
  • Return minimum number counted over all lines.

13
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Liu) simplicial depth
  • Count the closed triangles in S that contain ?.

14
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Liu) simplicial depth
  • Count the closed triangles in S that contain ?.

15
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Liu) simplicial depth
  • Count the closed triangles in S that contain ?.

16
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Liu) simplicial depth
  • Count the closed triangles in S that contain ?.

17
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Liu) simplicial depth
  • Count the closed triangles in S that contain ?.

18
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • (Liu) simplicial depth
  • Count the closed triangles in S that contain ?.
  • etc

19
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • Oja depth
  • Sum areas of all triangles with vertices (?,si
    ,sj)

20
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • Oja depth
  • Sum areas of all triangles with vertices (?,si
    ,sj)

21
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • Oja depth
  • Sum areas of all triangles with vertices (?,si
    ,sj)

22
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • Oja depth
  • Sum areas of all triangles with vertices (?,si
    ,sj)
  • etc

23
Halfspace, simplicial and Oja depthsof a point ?
in bivariate data set S
  • O(nlog n) Khuller-Mitchell 89,
    Gil-Steiger-Wigderson 92, Roussewu-Ruts 96
  • W(nlog n) G.A.-Cortes-Gomez-Soss-Toussaint 01,
    Langerman-Steiger 01, G.A.-McLeish 05

24
Issue 1What is the complexity of computing the
depth k of a point if k is known to be
small/large?
  • If the peel median has depth kgt1 then can we
    compute it faster? (GSW92)
  • !!! this just in simplicial depth in O(nnlog
    (1 k/n))
  • Elmasry-Elbassioni ? CCCG last week
  • Is there a lower bound, sensitive to parameter k?
  • Something similar for halfspace depth?
  • Current attempts for O(nlog k)

25
Issue 2 (Improve) simplicial median computation
  • Remember, that horrible n4 result a few slides
    back

26

Easy observation
  • I the set of line segments between pairs in S.
  • The simplicial median is on an intersection of
    two segments in I.

27
Outline of a method
  • Preprocessing O(n3) brute-force, actually O(n2)
  • Count number of points above/below each segment.
  • Compute depth of all points.
  • For each segment,
  • sort all intersections with other segments.
  • O(n2log n).
  • Calculate depth of each intersection in O(1)
    time
  • O(n2)
  • Overall O(n4log n)

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
  • Constant time to update depth as we walk

38
  • Instead of sorting intersection points and
    processing each segment alone, we can use
    topological sweep.
  • The time complexity becomes O(n4) and the space
    used is O(n2).
  • Can we improve this?
  • i.e. find some structure in this depth function

39
Conjecture
  • A point of maximum simplicial depth can always be
    found on the intersection of two halving segments
  • (weak) experiments have not contradicted this

40
Desirable properties of data depth functions
  • Affine invariance (at the very least)
  • Robustness
  • Outliers should not influence the center.
  • Monotonicity
  • Center should move in same direction as
    perturbations

41
monotonicity
42
(No Transcript)
43
Robustness to outliers
  • breakdown point fraction of data that must be
    moved/added so that median is placed at infinity.
  • Oja median was considered to be robust, but
    finally it was shown that the breakdown point can
    be near zero for certain configurations. (planar
    case) (Niinimaa,Oja,Tableman 90)
  • simplicial median dont know. But the data
    point of maximum depth can be moved away with few
    corrupting points (GSW 92) (planar case)
  • halfspace median great! 1/(d1)
    (Donoho,Gasko 92)

44
Robustness to outliers
  • breakdown point fraction of data that must be
    moved/added so that median is placed at infinity.
  • Max breakdown ½
  • In 1D, only the median is affine invariant,
    monotonic and has max breakdown
  • Is there such an estimator in higher dimensions?

45
Issue 3How does the breakdown point depend on
the depth of the median?
  • Convex peeling breakdown is zero, unless depth
    is linear (GSW92)
  • Halfspace breakdown is higher (1/3) for
    centrosymmetric data distributions, where depth
    is roughly 1/2
  • Instead of 1/(d1)
  • So what can we say about other estimators?
  • For deepest point in plane
  • For deepest data point

46
Issue 4Non-strategic breakdown
  • All work so far involved carefully placing
    outliers (erroneous or corrupt data), to move an
    estimator far away.
  • (is corrupt data really placed carefully in
    practice?)
  • What about
  • average outliers (random or evenly spaced
    placement)
  • strong breakdown (should work regardless of
    direction at infinity)
  • special-case outliers (axis-parallel, or radial
    extension, or ?)

47
Issue 5Computing/analyzing other estimators
  • Projection outlyingness of q
    (Donoho-Gasko 92)
  • Take max of the following, over all
    projections
  • q-Median / (median deviation from
    median)
  • Find an algorithm for the least outlying point.
  • Gil-Steiger-Wigderson
  • superposition of unit vectors to data points
    v(ai)
  • median is a (data) point with v(ai) R lt 1
    ???
  • computation in o(n2) ? Properties?
  • Zonoid depth, Delaunay depth

48
Issue 6Points of high depth
  • A point w/ Tukey depth ? n/(d1) is a
    centerpoint.
  • Guaranteed to exist, by Hellys thm.
  • O(n) time computation (Jadhav-Mukhopadhyay 94)
  • Can be considered to be a median generalization.
  • ¼ (n 3) ? simplicial depth ? 2/9 (n 3)
    (Boros-Furedi 82)
  • (in R2 , ignoring quadratic terms)
  • Can we compute a high depth point quickly?
  • Tverberg points in R2 have depth ? 1/27 (n 3)
    and can be computed in O(n) time. Anything
    better?
  • Is there a point with high Oja depth?
    (normalized)

49
Things I may have mentioned in the abstract but
forgot to include here
  • Is it faster to locate a deep point without
    computing its depth?
  • How many points have depthgtk ?
  • When do simplicial depth levels become
    disconnected?

50
merci
Write a Comment
User Comments (0)
About PowerShow.com