Analysis of Tree Edit Distance Algorithms Serge Dulucq and Hlne - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Analysis of Tree Edit Distance Algorithms Serge Dulucq and Hlne

Description:

Analysis of Tree Edit Distance Algorithms. Serge Dulucq and H l ne. B89902009 ???. B89902011 ??? ... One way of comparing two ordered trees is by measuring ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 75
Provided by: pois7
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Tree Edit Distance Algorithms Serge Dulucq and Hlne


1
Analysis of Tree Edit Distance AlgorithmsSerge
Dulucq and Hélène
  • B89902009 ???
  • B89902011 ???
  • B89902045 ???

2
Outline
  • Introduction
  • Edit Distance for Trees and Forests
  • Cover Strategies

3
  • Introduction
  • Edit Distance for Trees and Forests
  • Cover Strategies

4
Motivation
  • One way of comparing two ordered trees is by
    measuring their edit distance
  • Application areas
  • Comparison of hierarchically structured data
  • Alignment of RNA secondary structures in
    computational biology
  • Two algorithms using dynamic programming
  • Zhang-Shasha
  • Klein

5
Purpose
  • A general analysis of dynamic programming for
    edit distance algorithm
  • Study the complexity of those decompositions by
    counting the exact number of distinct recursive
    calls
  • Define a new edit distance algorithm for trees
    which improves original algorithms with respect
    to the number of recursive calls

6
  • Introduction
  • Edit Distance for Trees and Forests
  • Cover Strategies

7
Trees and forests
  • A tree is a node (called the root) connected to
    an ordered sequence of disjoint trees
  • Such a sequence is called a forest
  • We write l(A1??An) for the tree composed of the
    node l connected to the sequence of trees A1, ,
    An

2
2
?
3
4
4
3
5
1
5
1
l
???
A2
A1
An
???
8
F
1
10
  • F denotes the number of nodes of the forest F
  • SF(F) is the set of all subforests of F
  • F(i), i is a node of F, denotes the subtree of F
    rooted at i
  • deg(i) is the degree of i, that is the number of
    children of i

2
4
7
8
9
3
5
6
F 10
4
? SF(F)
9
5
6
2
F(2)
3
deg(4) 2
9
Edit distance
  • Let F and G be two forests. The edit distance
    between F and G, denoted d(F, G), is the minimal
    cost of edit operations needed to transform F
    into G
  • Operations
  • Substitution
  • Insertion
  • Deletion
  • Let Cs, Ci, Cd denote the costs of substitution,
    insertion, deletion

10
Recursive relationship(1/3)
  • Strings
  • u, v are strings x, y are alphabet symbols
  • d(xu, yv) min Cd(x) d(u, yv),
  • Ci(y) d(xu, v),
  • Cs(x, y) d(u, v)
  • d(ux, vy) min Cd(x) d(u, vy),
  • Ci(y) d(ux, v),
  • Cs(x, y) d(u, v)

u
x
y
y
v
y
11
Recursive relationship(2/3)
  • Trees
  • l, l are roots F, F are forests
  • d(l(F), l(F)) min Cd(l) d(F, l(F)),
  • Ci(l) d(l(F), F),
  • Cs(l, l) d(F, F)

l
l
l
l
12
Recursive relationship(3/3)
  • Forests
  • T, T are forests
  • Left decomposition
  • d(l(F)?T, l(F)?T) min Cd(l) d(F?T,
    l(F)?T),
  • Ci(l) d(l(F)?T, F?T),
  • d(l(F), l(F)) d(T, T)
  • Right decomposition
  • d(T?l(F), T?l(F)) min Cd(l) d(T?F,
    T?l(F)),
  • Ci(l) d(T?l(F), T?F),
  • d(l(F), l(F)) d(T, T)
  • direction to indicate left or right

13
Example
Left decomposition
4
3
1
3
4
5
2
3
2
4
5
5
4
5
4
5
2
Right decomposition
3
4
5
1
3
2
4
5
4
3
5
2
4
5
2
4
5
4
5
4
2
2
14
Strategy Relevant forests
  • Let F and G be two forests. A strategy is a
    mapping from SF(F)SF(G) to left, right
  • Let (F, F) be a pair of forests provided with a
    strategyf.The set RFf(F, F) of relevant forests
    is defined as the least subset of SF(F)SF(F)
    such that if the decomposition of (F, F) meets
    the pair (G, G), then (G, G) belongs to RFf(F,
    F)
  • RFf(F) and RFf(F) denote the projection of
    RFf(F, F) on SF(F) and SF(F)
  • relevant denote the number of relevant forests

15
Proposition(1/2)
  • FFØ ? RFf(F, F)Ø
  • f(F, F)left, Fl(G)?T, FØ
  • ? RFf(F, F) (F, F)?RFf(G?T, F)
  • f(F, F)right, FT?l(G), FØ
  • ? RFf(F, F) (F, F)?RFf(T?G, F)
  • f(F, F)left, FØ, Fl(G)?T
  • ? RFf(F, F) (F, F)?RFf(F, G?T)

d(l(G)?T, l(G)?T) min Cd(l) d(G?T,
l(G)?T), Ci(l) d(l(G)?T,
G?T), Cs(l(G), l(G)) d(G?T,
G?T) d(T?l(G), T?l(G)) min Cd(l)
d(T?G, T?l(G)), Ci(l)
d(T?l(G), T?G), Cs(l(G), l(G))
d(T?G, T?G)
16
Proposition(2/2)
  • f(F, F)right, FØ, FT?l(G)
  • ? RFf(F, F) (F, F)?RFf(F, T?G)
  • f(F, F)left, Fl(G)?T, Fl(G)?T
  • ? RFf(F, F) (F, F)? RFf(G?T, F)?
  • RFf(F, G?T)?RFf(l(G), l(G))?RFf(T, T)
  • f(F, F)right, FT?l(G), FT?l(G)
  • ? RFf(F, F) (F, F)? RFf(T?G, F)?
  • RFf(F, T?G)?RFf(l(G), l(G))?RFf(T, T)

d(l(G)?T, l(G)?T) min Cd(l) d(G?T,
l(G)?T), Ci(l) d(l(G)?T,
G?T), Cs(l(G), l(G)) d(G?T,
G?T) d(T?l(G), T?l(G)) min Cd(l)
d(T?G, T?l(G)), Ci(l)
d(T?l(G), T?G), Cs(l(G), l(G))
d(T?G, T?G)
17
Lemma 1
  • Given a tree Al(A1??An), for any strategy we
    have
  • relevant(A)
  • A - Ai relevant(A1) relevant(An)
  • where i?1n is such that the size of Ai is
    maximal

18
Proof(1/2)
  • Let F A1??An ? RF(A) A?RF(F)
  • ? relevant(A) 1 relevant(F)
  • When n1
  • F A1, Al(A1)
  • ? relevant(A) 1 relevant(A1)
  • A - A1 relevant(A1)
  • When ngt1
  • Suppose left, Let A1 l(F1), T A2??An
  • RF(F) F?RF(A1)?RF(T)?RF(F1?T)
  • RF(F1?T) (RF(F1)?RF(T)) minF1, T
  • ? relevant(F) 1 relevant(A1)
  • relevant(T) minF1, T
  • Let j?2n st Aj is maximal among A2, ,
    An
  • ? relevant(F) 1 relevant(A1)
  • relevant(An) T - Aj
    minF1, T

19
Take a look
  • relevant(A) A - Ai
  • relevant(A1) relevant(An)
  • ? relevant(F) F Ai
  • relevant(A1) relevant(An)
  • relevant(F) 1 T - Aj minF1, T
  • relevant(A1) relevant(An)

20
Proof(2/2)
  • 1 T - Aj minF1, T F - Ai
  • 1) If F1 T
  • ? 1 T minF1, T F
  • Since Aj Ai
  • ?1 T - Aj minF1, T F - Aj
  • F - Ai
  • 2) If F1 gt T
  • ? F - Ai T (?i1)
  • ?1 T - Aj minF1, T 1 T
    T - Aj
  • 1 T
  • gt F - Ai
  • ? relevant(F) F - Ai relevant(A1)
    relevant(An)
  • ? relevant(A) A - Ai relevant(A1)
    relevant(An)

21
Lemma 2
  • For every nature number n, there exists a tree A
    of size n such that for any strategy,
    relevant(A) has a lower bound in O(n logn)
  • For complete balanced binary tree Tn of size n,
  • prove by induction on n that
  • relevant(Tn) (n1)log2(n1)/2

22
  • Introduction
  • Edit Distance for Trees and Forests
  • Cover Strategies

23
Idea
  • Suppose the direction is left
  • RF(l(F)?T) l(F)?T?RF(l(F))?RF(F?T)?RF(T)

Since T?F?T, We want to eliminate in priority
nodes of F in F?T, such that RF(F?T) and RF(T)
share relevant forests as most as possible!
24
Cover
  • Let F be a forest. A cover r of F is a mapping
    from F to F?left, right satisfying for each
    node i in F
  • if deg(i) 0 or 1, then r(i)?left, right
  • if deg(i) gt 1, then r(i) is a child of i

2
2
4
3
4
3
1
1
left, right
25
Cover strategy
  • Given a pair of trees (A, B) and a cover r for A,
    we associate a unique strategyf as follows.
  • if deg(i) 0 or 1, then f(A(i), G) r(i), for
    each forest G in B
  • If A(i) is of the form l(A1??An) with n gt 1,
    then let p?1, , n such that the favorite child
    r(i) is the root of Ap. For each forest G of B,
    we define
  • f(A(i), G) right whenever p 1, left otherwise
  • f(T?Ap??An, G) left, for each forest T of
    A1??Ap-1
  • f(Ap?T, G) right, for each forest T of
    Ap1??An
  • The tree A is called the cover tree. A strategy
    is a cover strategy if there exists a cover tree
    associated to it

26
f(A(i), G) right whenever p 1, left
otherwise f(T?Ap??An, G) left, for each forest
T of A1??Ap-1 f(Ap?T, G) right, for each
forest T of Ap1??An
i
A(i)
G
A2
A1
A4
A3
27
Some Tasks
  • The order of our Tasks
  • ?? Tree A
  • ?? Tree B
  • ? Tree A Tree B ????????
  • ?? distinct pairs (recursively)

28
?? Tree A
29
Tree A
  • Focus on relevant(A) (detail)
  • Cover strategies in A
  • A ???? B ?

30
Lemma 3
  • (F(i), G(j))? RF(F,G)

j
1
F
i
1
G
This is trivial
31
Lemma 4
  • RF(l(F)?T)
  • l(F) ?T, F1 ?T, .. ,Fk?T?RF(l(F))?RF(T)
  • ????????
  • Term k F F??node???
  • Fk1 ? Fk ?left decomposition ???
  • ?forest , so F1 , F2 , , Fk ???
  • ???left decomposition ???? forests.
  • ?? ??cover strategy ? f(l(F) ? T) left
  • ????????recursive????

32
RF(l(F)?T)
T
F
Since cover strategy, the direction is left
T
T
F
F
RF(l(F))
RF(T)
RF(F?T)
RF(l(F)?T) l(F) ?T ? RF(l(F)) ? RF(T) ?RF(F?T)
33
RF(F?T)
Continue..
T
F
Since cover strategy, the direction is left
T
T
F1
F1
?RF(l(F))
RF(T)
34
T
So .
F
T
T
F
F
F1 ?T , .. , Fk?T
35
Conclusion
  • RF(l(F)?T)
  • l(F) ?T, F1 ?T, .. ,Fk?T?RF(l(F))?RF(T)

36
Lemma 5
  • relevant(A)
  • A - Aj relevant(A1) relevant(A2)
    relevant(An)
  • Term A l(A1 ?A2 ? ? An).
  • Aj ? A?favorite child.
  • ?? ????cover tree?relevant forests???

37
A
l


A1
An
Aj
Aj ?A? favorite child j?1n
38
Part 1 A - Aj Note F(A(i), G) right
whenever p 1, left otherwise
F(T?Ap??An, G) left, for each forest T of
A1??Ap-1 F(Ap?T, G) right, for each
forest T of Ap1??An
?? ??Aj ? A? favorite child , ??A - Aj
?????A ? ????Aj? forests ? ??
Aj
39
Part 2 relevant(A1) relevant(A2)
relevant(An) Note RF(A1?A2?A3?A4?... ?An)
A1?A2?A3?A4?... ?An
?RF(F1?A2?A3?A4?... ?An)?RF(A1)?RF(A2?A3?A4?...
?An )
A1
A2
A3
A4
An
..
40
Conclusion
  • relevant(A)
  • A - Aj relevant(A1) relevant(A2)
  • relevant(An)

41
free node
  • ???free node?
  • ?????
  • ?????????
  • Definition
  • the root of A
  • the node whose parent is of degree grater than 1
    and is not the favorite child

favorite child
free node
42
?? Tree B
43
Tree B
  • B ?? A ????
  • So no any cover strategy
  • Focus on following three things
  • Rightmost forests
  • Leftmost forests
  • Special forests

44
Three Things (1)
Rightmost ? leftmost special ?
NO!
  • Definition
  • Rightmost forests
  • ? B ??,????? left decomposition ???,?????
    subforests
  • Leftmost forests
  • ? B ??,????? right decomposition ???,?????
    subforests
  • special forests
  • ? B ??,????? left or right decomposition
    ???,????? subforests

45
example
Left decomposition
all rightmost forests of B
46
Three Things (2)
  • Three categories
  • relevant forests of A fall within three
    categories
  • (a) those are compared with all rightmost forests
    of B
  • (ß) those are compared with all leftmost forests
    of B
  • (?) those are compared with all special forests
    of B

why?
47
Three Things (3)
  • The of rightmost , leftmost , special
    forests ( )
  • right(B) ?(B(i),i?B) - ?(B(i),i is a
    rightmost child)
  • left(B) ?(B(i),i?B) - ?(B(i),i is a
    leftmost child)
  • special(B) B(B3) / 2 - ?(B(i),i?B)

number
right left special
48
?? right(B) , left(B)
  • Rightmost forests all cover strategies are that
    favorite child is rightmost child because of
    all left decomposition
  • Leftmost forests all cover strategies are that
    favorite child is leftmost child because of all
    right decomposition

right(B) ?(B(i),i?B) - ?(B(i),i is a
rightmost child)
right(B) B - B? right(B1)
right(Bn)
recursively
left(B) B - B? left(B1) left(Bn)
left(B) ?(B(i),i?B) - ?(B(i),i is a
leftmost child)
recursively
relevant(B) B - Bj relevant(B1)
relevant(Bn)
Review
49
?
?
50
comparison
  • two types (??A)
  • Trees comparison
  • free node
  • favorite child
  • Forests comparison

51
Lemma 6
  • let F be a relevant forest of A
  • if the direction is left , then F is at least
    compared with all rightmost forests of B
  • if the direction is right , then F is at least
    compared with all lef tmost forests of B

Why?
????
52
free nodes comparison
  • Lemma 7
  • let i be a free node of A
  • if the direction of i is left , then A(i) is (a)
  • if the direction of i is right , then A(i) is (ß)

(a) those are compared with all rightmost
forests of B (ß) those are compared with all
leftmost forests of B (?) those are compared
with all special forests of B
53
lemma7 ??
if the direction of i is left , then A(i) is (a)
  • consider G , the largest forest of B such that
    (A(i),G) belongs to RF(A,B) and G is not a
    rightmost forest
  • ?? G ???? B , so..
  • ??????? (A(i),G) ?
  • ??????? case

G is a tree !
not free node !
Case1 ???? , ???? since the
direction of A(i) is left ?? a node
l , two forests H and P such that G H ? P
? (A(i) , l(H) ? P) is in RF(A,B)
(A(i) , l(H) ? P) -gt (A(i),G) by ????!!
G is the largest and not rightmost gt
l(H) ? P is a rightmost forest of B
gt G H ? P is also a a rightmost forest of B

??
Case2 ???? , ???? ?? a node l , (l
? A(i) , G) -gt by ????!!
(A(i) ? l , G) -gt by ????!!
(l(A(i)) , G) -gt by ????!!
Case3 tree ?????? (A(i) ? F1 , G ?
F2) -gt (A(i) , G) by tree ??????
(F1 ? A(i) , F2 ? G) -gt (A(i) , G) by tree
??????
Case4 forest ?????? (T1 ? A(i) , T2
? G) -gt (A(i) , G) by forest ??????
(A(i)? T1 , G ? T2) -gt (A(i) , G) by forest
??????
54
forests comparison
  • Lemma9
  • let F be a relevant forest of A but not a tree.
    Let i be the lower common ancestor of the set or
    nodes of F and j be the favorite child of i
  • if F is a rightmost forest whose left most tree
    is not A(j) , then F has the same category
    as A(i)
  • if F is a leftmost forest , then F has the same
    category as A(i)
  • else F is (?)

55
lemma8 ?? (1) (2)
  • The fact
  • (1) (2) is very trivial !!

?????? forest , ?? ??(1) -gt decomposition ????
favorite child (???) ??(2) -gt decomposition
???? favorite child (???)
category
????(LCA)???forests
????
56
Lemma8 ?? (3)
  • ?????? forest , ?????(1) (2) ,
  • ?????? tree ????????? (favorite child) ,
  • ?????? direction ? right
  • now consider a forest G

?? G is a rightmost forest of B ?? F is
not a leftmost forest ?? F ?????????????
gt A(i) ???? left by lemma gt ?? (A(i) ,
G) (A(i) , G) -gt (F , G) by ???
????,???? !!
?? G is not a rightmost forest of B B
??? G ??? right decomposition ??? F ??????
right ?? (F , G) ??
57
favorite childs comparison
  • Lemma9
  • let i be the node of A is not free , and j be the
    parent of i
  • if the direction of i is left , if i is the
    rightmost child of j and A(j) is left , then A(i)
    is (a)
  • if the direction of i is right , if i is the
    leftmost child of j and A(j) is right , then A(i)
    is (ß)
  • else A(i) is (?)

58
Lemma9 ??
??? trivial ???
  • The fact
  • all are very trivial !!

(1) left ??? (2) right ??? (3) ??
59
Final Task
60
Notation
  • let i be a node of A , let j be the parent of i
    (if i is not root)
  • Free(A(i)) relevent(A(i),B) if i is free
  • Right(A(i)) relevent(A(i),B) if A(j) is (a)
  • Left(A(i)) relevent(A(i),B) if A(j) is (ß)
    All(A(i)) relevent(A(i),B) if A(j) is (?)
  • So , relevant(A,B) Free(A)

61
Theorem
  • let (A,B) be a pair of trees , A be a cover tree
  • 7 case

62
Case(1)
  • If A is reduced to a single node whose direction
    is right

Free(A) left(B) Right(A)
special(B) Left(A) left(B) All(A)
special(B)
63
Case2
  • If A is reduced to a single node whose direction
    is left

Free(A) right(B) Right(A)
left(B) Left(A) special(B) All(A)
special(B)
64
Case3
  • if A l(A) and the direction of l is right
  • ( A is a tree )

Free(A) left(B) Left(A) Right(A)
special(B) All(A) Left(A) left(B)
Left(A) All(A) special(B) All(A)
65
Case4
  • if A l(A) and the direction of l is left
  • ( A is a tree )

Free(A) right(B) Right(A)
Right(A) right(B) Right(A) Left(A)
special(B) All(A) All(A) special(B)
All(A)
66
Case5
  • if A l(A1??An) and the favorite child is the
    leftmost child

Free(A) left(B)(A-A1) Left(A1)
Free(A2) Free(An) Right(A)
special(B)(A-A1) All(A1) Free(A2)
Free(An) Left(A) left(B)(A-A1) Left(A1)
Free(A2) Free(An) All(A)
special(B)(A-A1) All(A1) Free(A2)
Free(An)
67
Case6
  • if A l(A1??An) and the favorite child is the
    rightmost child

Free(A) right(B)(A-An) Right(An)
Free(A1) Free(An-1) Right(A)
right(B)(A-An) Right(An) Free(A1)
Free(An-1) Left(A) special(B)(A-An)
All(An) Free(A1) Free(An-1) All(A)
special(B)(A-An) All(An) Free(A1)
Free(An-1)
68
Case7
  • if A l(A1??An) and the favorite child is Aj ,
    with 1ltjltn

Free(A) right(B)(1A1??Aj-1)
special(B)(Aj??An) All(Aj) Free(A1)
Free(Aj-1) Free(Aj1) Free(An) Right(A)
right(B)(1A1??Aj-1) special(B)(Aj??An)
All(Aj) Free(A1) Free(Aj-1) Free(Aj1)
Free(An) Left(A) special(B)(A-Aj)
All(Aj) Free(A1)
Free(Aj-1) Free(Aj1) Free(An) All(A)
special(B)(A-Aj) All(Aj)
Free(A1) Free(Aj-1) Free(Aj1) Free(An)
69
conclusion
  • Steps
  • ??two tree A B
  • ?? right(B) left(B) special(B)

Free(A)
relevant(A,B)
Free(A)
by theorem
recursive
70
example
  • For Zhang-Shasha algorithm

relevant(A,B) right(A) right(B)
Why ?
71
????
72
Choose the favorite child (1)
  • Choose the good favorite child to minimize Free(A)

Case 5 (favorite child ????) Case 6 (favorite
child ????) Case 7 (favorite child ????)
Free(A) min
73
Choose the favorite child (2)
  • Is this really good?


Not necessarily !!
Why?
Need preprocessing time !!
74
The end
?? ??? ?? ?? ??? ??? ??? ?? ??? ??? ??? ?? ???
??? ??? ?? ??? ??? ??? ?? ??? ??? ??? ?? ??? ???
???
Happy New Year !
Write a Comment
User Comments (0)
About PowerShow.com