Succinct Data Structure - PowerPoint PPT Presentation

About This Presentation
Title:

Succinct Data Structure

Description:

2006/07/05 3 Succinct Data Structure – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 49
Provided by: hillbigCo
Category:

less

Transcript and Presenter's Notes

Title: Succinct Data Structure


1
Succinct Data Structure
2006/07/05 ?3?????????????_at_???
  • ??? ??
  • ???????????????????????
  • hillbig_at_is.s.u-tokyo.ac.jp

2
??
  • ???????
  • ???????????
  • ????????
  • ??????????????
  • ??????????????????

?? ?????????????????????????
3
?????
  • ??
  • Succinct Data Structure (SDS)
  • ????????SDS
  • ???????SDS
  • ???????SDS
  • SDS??????????
  • Suffix Arrays?Burrow Wheelers ??
  • FM-index, Compressed Suffix Arrays
  • ????????

4
?????
  • ??
  • Succinct Data Structure (SDS)
  • ????????SDS
  • ???????SDS
  • ???????SDS
  • SDS??????????
  • Suffix Arrays?Burrow Wheelers ??
  • FM-index, Compressed Suffix Arrays
  • ????????

5
??(1/2)
  • ????? word RAM
  • ????n?? log n ???????????
  • ? ?? n232 64 ???64bit???????
  • ???????
  • ?????D????????L?L log(D?????)
  • ?????????????????D??????????????
  • ? n ??????T ?????L ?2n - ?(log n)

6
??(2/2)???????? Manzini 01
  • H00?????????
  • nc T??c?????
  • Hk k?????????
  • Ts s?Sk?????????????
  • Hk ? Hk-1 ?? H1 ?H0 ?1?????
  • ????????????????????

7
?
  • T aacbbcbc
  • T 8 , na2, nb3, nc3
  • k0???
  • H0(T) (2/8)log(8/2) ? 0.47
  • k2???
  • S2ac,cb,bb,bc,c,
  • Taca Tcbab Tbbc
  • H2(T) (1/8)0 (2/8)1 ? 0.25
  • ?????H5?? ??0.23 DNA0.24 XML0.10

8
?????
  • ??
  • Succinct Data Structure (SDS)
  • ????????SDS
  • ???????SDS
  • ???????SDS
  • SDS??????????
  • Suffix Arrays?Burrow Wheelers ??
  • FM-index, Compressed Suffix Arrays
  • ????????

9
Succinct Data Structure (SDS)??
  • ???????D ??????????
  • ??????? ??????????????
  • ????????? ??????????
  • ??????? L log (D ?????)
  • ???????????? (1o(1))L bits
  • ??????
  • ?????O(1)????o(L)?????????????????
  • ???????(??) ???????????????????

10
????(??)????SDS
  • ???? B0n-1 Bi1 ??? 0
  • ??D?0n-1??????? i?D ??? Bi1,???????Bi0
  • ????????????
  • lookup(B,i) Bi???
  • rank1(B,i) B0i??1?????
  • select1(B,j)B??(j1)???1??????

B020 000100101010010010010 lookup1 (B,0)0,
lookup1 (B,6)1 rank1 (B,10)4, rank1
(B,15)5select1 (B,0)3, select1 (B,4)13
11
????(??)????SDS???
  • ??????
  • ??????????????
  • ?????????????????
  • ?????????????(c.f. word RAM ???)
  • ????????
  • no(n) bit Jacobson 89 M96
  • H0(B)o(n) bit Grossi 02
  • ??????????
  • ??????????????Gonzalez 05 Kim 05

12
????(??)????SDS???rank??
  • B???log2n????????? (SB Super-Block)
  • ?SB???logn/2????????? (TBTiny-Block)
  • ?SB????rank?????? O(n/logn) bit
  • ?TB????rank????SB?????rank??(??rank)??? O(n
    loglogn / logn) bit
  • TB??rank?????????popCount(????)
  • rank(B,i) SBi/log2nTBi/logn2
    rank(???rank)

B
log2n
SB
TB
logn/2
13
popcount(x) x??1??????
  • unsinged int popCount(unsinged int r) r ((r
    0xAAAAAAAA) gtgt 1) (r 0x55555555)r ((r
    0xCCCCCCCC) gtgt 2) (r 0x33333333)r ((r gtgt
    4) r) 0x0F0F0F0F
  • r (rgtgt8) r
  • return ((rgtgt16) r) 0x3F
  • 0xAAAAAAAA 1010101010...102
  • 0x55555555 0101010101...012
  • 0xCCCCCCCC 1100110011...002
  • 0x33333333 0011001100...112
  • 0x0F0F0F0F 0000111100...112

14
????(??)????SDS???select??
  • rank????? O(log n)??
  • select1(B,j) rank1(B,k)ltj?rank1(B,k1) ???k
  • o(n)???????????????
  • 1?1??????n??????
  • Algorithm I Kim 2005
  • B???log1/2n????????
  • 1???????????????
  • log2n???1??????????
  • log1/2n???1????????????(1?1??????2log1/2?????????
    ?)
  • 1??????????????????

15
?????SDSJacobson 85 Munro 01 Geary 05
Benoit 05
  • ???n???????2n bit???
  • ??????????O(nlogn) bit(??????????????????96n
    bit)
  • Balanced Parenthesis (BP) Munro 01 Geary 05
  • ??DFS??????????(????)???
  • Depth First Unary Degree Sequence (DFUDS)
  • ???(???DFS?????????????k????k??(?1??)???

BP
DUFDS
(()(()()))
((())(()))
0010010111
0001100111
16
BP?????????
  • ?????child,childrank???????????
  • parent (x) x??
  • firstchild (x)????
  • sibling(x)????
  • depth(x) x???(???????)
  • desc(x) x????
  • rank(x) x?preorder???
  • select(i) preorder?i????????
  • LA(x, d) x???????d??? (level-ancestor)
  • lca(x, y) x?y???????
  • degree(x) x?????
  • child (x, i) x?i????
  • childrank (x) x???????x???????

???????????
DFUDS?lca, depth, LA??????
17
BP???
  • n????,?????????????B02n-1????????
  • ??DFS??????????(????)
  • (??????????????B??rank(? select(
    ?BP???????????????
  • B???Mlogn/2?????B0B(2n-1)/M???
  • m(x) x??????????
  • b(x) x???????????

(()(((
))()()
)())
18
BP???????
  • BP???????????(???????)
  • findopen(x), findclose(x) x??????????????
  • enclose(x) x?????????????????????
  • ????????????
  • parent(x) enclose(x)
  • sibling(x) findclose(x)1
  • first-child(x) x1

parent
(()(()()))
19
findclose???(1/4)Munro 01 Geary 05
  • ???x ????????
  • x?near ? b(x) b(m(x))?????????????????
  • x?far ? b(x) ? b(m(x))
  • findclose(x) (???????)
  • x?near???????????O(n1/2(logn)2) o(n)
  • x?far????????
  • ??????far???????nlogn?????

((((((
((((((
))))))
))))))
???far???
20
findclose???(2/4)
  • ???far?????????????
  • x?pioneer x?far???x????far??????x?????b(m(x))?b
    (m(x))
  • ??m(x)?pioneer???x?pioneer???
  • ?? Block??T???pioneer???4T-6??
  • ?? pioneer?????????????????????? ??????????

(()(((
))()()
)())
(
pioneer
(
far
21
findclose???(3/4)
  • Pioneer????????????BP2???
  • BP2???????????????O(n/logn)
  • BP2 ??BP?????????(???)
  • BP?BP2??????????P02n-1???
  • Bi?Pioneer???Pi1 ????Pi0
  • P??rank, select?BP??pioneer?BP2?????????

22
findclose???(4/4)
  • findclose(x) (???????)
  • x?near??? ???
  • x?pioneer??? BP2???????
  • x?far???(1)???far x ?select(rank(P,x))????(2)
    y µ(x)????,B(y)????? (B(y)
    ?B(µ(x))???)(3)x?x???(???)???????,
    B(µ(x))????????????
  • findclose????enclose???????

23
BP??
P 100010 010000 0001
BP
(()(((
))()()
)())
(
pioneer
(
far
(())
BP2
findclose(4)B4?pioneer???BP2???????(1rank(P,4)
-1)??findclose????????2????select(P,2)
7 findclose(3) B3?far??????pioneer?????????????
??????????
24
???????SDS
  • T0n-1?Ti?S????????c?S???rankc(T,i)?selectc(T
    ,i) ?????
  • ??????????????
  • S2???????????????
  • Sgt2???
  • ????????SDS????SB,TB??????????????????????????O(
    S (n/logn n loglog n/ logn))
  • Sgtlogn???????????????n?????(???????S2566553
    6 lognlt32)

25
???????SDS
  • ????????
  • Slto(n/loglogn)???Generalized Wavelet Tree
    ?????? Ferragina 2004
  • nH0(T) bits ? rank, select?????
  • ?????Wavelet Tree??? Grossi 2003
  • nH0(T) bits ? rank, select?log(S)??
  • ?????????????Huffman?
  • ?????????
  • rank??????select??????????rank,select??????

26
Wavelet??(1/2)
  • Sa,b,ca 02 b 102 c 112
  • T abbccbaacbab

0
1
a
0
1
b
c
abbccbaacbab011111001101
Huffman?
????1bit?
1
0
bbccbcbb
b?c??????????
a
00110100
????2bit?
0
1
c
b
27
Wavelet Tree??(2/2)
0
1
  • Sa,b,ca 02 b 102 c 112
  • T abbccbaacbab

a
0
1
b
c
Huffman?
abbccbaacbab011111001101
rank1(8)5
1
0
rankb(T,8)3
bbccbcbb00110100
a
rank0(T,5)3
0
1
c
b
28
?????
  • ??
  • Succinct Data Structure (SDS)
  • ????????SDS
  • ???????SDS
  • ???????SDS
  • SDS??????????
  • Suffix Arrays?Burrow Wheelers ??
  • FM-index, Compressed Suffix Arrays
  • ?????

29
(??)?????????/??
  • ????
  • ???????????? T (??n ?????????S)
  • ???P (??m)
  • occ(P) ???P?T??????
  • loc(P) ???P?T?????????
  • ??
  • ??????????????????????????
  • ???? (??????????????)
  • ?????????????????????????????
  • ?????????????????
  • ??????
  • ???????????????????????????

30
????
  • ??????????P?????????
  • ?????????????????????
  • ?????????????????????????????????????(???????????
    ?)
  • ??????????????????
  • n-gram????????????????
  • ??n?????????????
  • ?????????????????O(N)????????????????????

31
??????
  • ?????????????????????
  • ????????????????????????O(1)??????????????
  • ??(??)?????????????????
  • ????????????
  • ?????????????
  • ?????????????????????????
  • Suffix Arrays (BW??)?SDS???????
  • ???? nHkbit?occ(P)?O(m) ??Ferragina 2005

32
Suffix Arrays (SA) Manber 1989
  • ?? Tt1t2 t3..tN
  • T????(suffix) Sk tk tk1tk2..tN

S7 S6 aS1 abraca S4 aca S2 bracaS5
ca S3 raca
S1 abraca S2 braca S3 raca S4 aca S5
ca S6 a S7
7614253
(1) T??????????
(3) ?????????
(2) ?????????????????
33
SA??????
  • ?? Tabracadabra ??? P bra

???????
????? occ(P) O(m log n) loc(P) O(m log
n occ(P)) ????? log n bit (5n
byte) Hgt??????occ(P)?O(mlog n)
11 10 a 7 abra 0 abracadabra 3
acadabra 5 adabra 8 bra 1 bracadabra
4 cadabra 6 dabra 9 ra 2 racadabra
bra gt adabra
bra bra
bra lt cadabra
34
Compressed Suffix Arrays (CSA)Grossi, Vitter
00Sadakane 03Grossi, Guputa, Vitter 03
  • SA?????????????????????????????
  • ??SA??????? nlogn bit?SA?0??N-1??????????????
  • SA???????????????????
  • ?i SA-1SAi 1
  • SA?????????SAkiSAik??????
  • ?????SA?????SAk??SA????
  • SAi SA?i-1 SA?2i-2 ...
    SA?ni-n
  • ?? SA?ni p ??? SAi p-n

35
????
  • ??SA??????????
  • ?? TSAi TSAi1 ??? ?i lt?i1
  • ?? TSAi TSAi1???SAi?SAi1????
    ?????????????????????Suffix????????????SA-1SAi
    1 lt SA-1SAi11??i lt ?i1
  • di?i1-?i?d?????????????nH0 bits
    (???nHkbits???) Sadakane 2003
  • d?wavelet tree??????????nHkbit Grossi 2003

abra abracadabra
bra bracadabra
1????????
2?????(SAi1?SAi11)??????????????????
36
Backward Search Sadakane 2002 Makinen 2004
  • ???????SA?????SA?lookup?????????????????????
  • Search PCAGTA in backword (Pm Pm-1)

A
AGTA
A
A
A
A
A
????????
??????prefix Pim??????????????(???????????
?????)
C
C
C
C
C
CAGTA
G
G
G
G
G
T
T
T
T
T
TA
GTA
37
Burrows Wheelers Transform 1994 (BWT)
  • ???????????(????)
  • ?? BWTi TSAi-1
  • ??SAi0?? BWTi Tn
  • ? abracadabra ? BWT ardrcaaaabb
  • BWT????????????????
  • ???????????????????
  • c.f. Compression boosting Ferragina 2005

t hese are possible ... t hese were not of
.. t hese ...
38
BWT?
  • When Farmer Oak smiled, the corners of his mouth
    spread till they were within an unimportant
    distance of his ears, his eyes were reduced to
    chinks, and diver gingwrinkles appeared round
    them, extending upon his countenance like the
    rays in a rudimentary sketch of the rising sun.
    His Christian name was Gabriel, and on working
    days he was a young man of sound judgment, easy
    motions, proper dress, and general good
    character. On Sundays he was a man of misty
    views, rather given to postponing, and hampered
    by his best clothes andumbrella upon the whole,
    one who felt himself to occupy morally that vast
    ..??

BW??
BWT?
Ioooooioororooorooooooooorooorromrrooomooroooooooo
rmoorooororioooroormmmmmuuiiiiiIiuuuuuuuiiiUiiiiii
oooooooooooorooooiiiioooioiiiiiiiiiiioiiiiiieuiiii
iiiiiiiiiiiiiouuuuouuUUuuuuuuooouuiooriiiriirriiii
riiiiiiaiiiiioooooooooooooiiiouioiiiioiiuiiuiiiiii
iiiiiiiiiiiiiiiiioiiiiioiuiiiiiiiiiiiiioiiiiiiiiii
iiioiiiiiiuiiiioiiiiiiiiiiiioiiiiiiiiiioiiiioiiiii
iioiiiaiiiiiiiiiiiiiiiiioiiiiiioiiiiiiiiiiiiiiiuii
iiiiiiiiiiiiiiiioiiiiiiiioiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiuuuiioiiiiiuiiiiiiiiiiiiiiiiiiiiiiiioi
iiiuioiuiiiiiiioiiiiiiiuiiiiiiiiiiiiiiiiiiiiiiiiii
iiiioaoiiiiioioiiiiiiiioooiiiiiooioiiioiiiiiouiiii
iiiiiiiiooiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiioiiiiiiii
iiiiiiiiiiioiooiiiiiiiiiiioiiiiiuiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiioiiiiiiiiiiiiioiiiuiiiiiiiiiioi
iiiiiiiiiiiuoiiioiiioiiiiiiiiiiiiiiiiiiiiiiuiiiiuu
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiuiuiiiiiuuiiiii
iiiiiiiiiiiiiiiiiiiuiiiiiiiiiiiiiiiiiiiiiiiiiiiioi
iiiiiioiiiiiiiiiiiiiiiiiiiiioiiiiiiiiioiiiiuiiiioi
iiioiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiioiioiiii
iiuiiiiiiiiiiiiiiiooiiiiiiiiiiiiiiiiiiiioooiiiiiii
ioiiiiouiiiiiiiiiiiiiii..??
39
BWT??????
  • F ?Suffix????????????
  • ??
  • BWT??????c????????????????????????????F????????
    ??
  • ??
  • BWT,F??????????????????????????????????????????
    ???

a1 r d r c a2 a3 a4 a5 b b
a1 a2 bra a3 bracadabra a4 cadabra a5
dabra bra bracadabra cadabra dabra ra racada
bra
BWT
F
40
BWT??????(?)
  • SA-1 SA???? SAki ?? SA-1ik
  • cumc T??c??????????
  • ???????????????????
  • SA-1SAi-1 rankc(BWT,i-1) cumc
  • SA-1SAi1 selectx(BWT,i-cumx)????cBWTi
    x?cumx?i?cumx1????x
  • BWT???????(lf-mapping)?????

41
i SA SA-1 SA-1SAi1 BWT Suffix
0 11 3 3 a
1 10 7 0 r a
2 7 11 6 d abra
3 0 4 7 abracadabra
4 3 8 8 r acadabra
5 5 5 9 c adabra
6 8 9 10 a bra
7 1 2 11 a bracadabra
8 4 6 5 a cadabra
9 6 10 2 a dabra
10 9 1 1 b ra
11 2 0 4 b racadabra
42
?BW?? (LF-mapping)
  • void revBWT(char bwt, int n)int count 0x100
    memset(count,0,sizeof(int)0x100)for (i 0 i
    lt n i) countbwtifor (int i 1 i lt
    0x100 i) counticounti-1int LFmapping
    new intn for (int i n-1 i gt 0 i--)
    LFmapping--countbwti iint next
    find(BWT,) //return the position of for
    (int i 0 i lt n i) next
    LFmappingnext putchar(bwtnext)
  • delete LFmapping

43
FM-index Ferragina 2000
  • BWT????????????????
  • BWTi TSAi-1 ?????
  • BWT??rank, select???SA???
  • SA-1SAi-1 rank(BWT,c) cumc
  • SA-1SAi1 select(BWT,c)
  • CSA????????????????????????????
  • ?????????LZ-index Karkkainen 96???

44
FM-index???
?? P0m-1???????? BWT0n-1
????????T?BWT?????? C0S-1
Cc?c????????BWT????????? ??? sp,ep
P?prefix?????suffix arrays????
epltsp????P?T????????????
  1. i m-1
  2. sp 0 ep n-1
  3. while (sp ? ep) and (i gt 0) do
  4. c Pi
  5. sp Ccrank(BWT,c,sp-1)1
  6. ep Ccrank(BWT,c,ep)
  7. i--
  8. end

45
I SA BWT Head of Suffix
0 11 a1
1 10 r1 a1
2 7 d a2
3 0 a3
4 3 r2 a4
5 5 c a5
6 8 a2 b1
7 1 a3 b2
8 4 a4 c
9 6 a5 d
10 9 b1 r1
11 2 b2 r2
PabrTabracadabra BWTardrcaaaabb
sp
sp 0 ep 11 sp 901 10 ep 92
11 sp 501 6ep 52 7 sp 111
3 ep 13 4
i 2cr
abr
i 1cb
i 0ca
br
i m-1 sp 0 ep n-1 while (sp ? ep) and
(i gt 0) do c Pi sp
Ccrank(BWT,c,sp-1)1 ep
Ccrank(BWT,c,ep) i-- end
r
ep
46
?????
  • ??
  • Succinct Data Structure (SDS)
  • ????????SDS
  • ???????SDS
  • ???????SDS
  • SDS??????????
  • Suffix Arrays?Burrow Wheelers ??
  • FM-index, Compressed Suffix Arrays
  • ????????

47
???
  • Succinct Data Structures (SDS)
  • ??????(???????)???(????)??????
  • ???????????????????
  • ??????
  • Suffix Arrays?Burrows Wheeler ?????????????rank?se
    lect?????
  • SDS???????????nHk bits???

48
?????
  • Succinct Data Structures
  • ????????????
  • ?????????????????
  • ?????????????
  • nHkbits O(1)??
  • ?????????????? (Gap Measure Gupta 06)
  • ??????????
  • ?????????????(??????????)
  • ????????????
  • ???????Huynh 2005, ????, ????
  • ????????
  • logn???????????????SDS Makinen 06
Write a Comment
User Comments (0)
About PowerShow.com