PAT?(Patricia%20tree) - PowerPoint PPT Presentation

About This Presentation
Title:

PAT?(Patricia%20tree)

Description:

Title: Author: huxg Last modified by: huxg Created Date: 4/17/2005 7:50:23 AM Document presentation format: Company – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 27
Provided by: huxg
Category:

less

Transcript and Presenter's Notes

Title: PAT?(Patricia%20tree)


1
?????
  • ???
  • ???????

2
??
  • ????
  • ????
  • ????
  • ????
  • PAT?(Patricia tree)
  • ????

3
??
  • ????????
  • ?????????????
  • ???????????????
  • ????????
  • ??(search)
  • ?????????????
  • ??(query)
  • Query???????????????
  • ?????????????????????
  • ????????????
  • ????????????????????

4
????
  • ??????
  • Brute Force
  • Knuth-Morris-Pratt
  • Boyer-Moore
  • Shift-Or
  • Suffix Automaton
  • ??????
  • Dynamic Programming
  • Non-deterministic Finite Automaton
  • Bit-Parallelism
  • ??????????

5
??
  • ????
  • ?????,????????????
  • ????,????,?????????

6
????
  • Karp-Rabin????
  • ????????????A????B????
  • ?A?B???????hash (A)?hash (B)
  • ??hash (A) ! hash (B) ?A ! B
  • ??hash (A) hash (B) ???? A B
  • Karp-Rabin????
  • ??? x0..5 A A C T C T
    Hash( x0..5 ) 17579
  • ??y0..9 G C A A C T C T C A
    Hash( y0..5 ) 17819
  • ??y0..9 G C A A C T C T C A
    Hash( y1..6 ) 17533
  • ??y0..9 G C A A C T C T C A
    Hash( y2..7 ) 17579

7
????
  • ?????
  • ???????????F????Signature
  • ???????????,??????????????
  • ????(superimposed coding)
  • ????????????????Signature
  • ?????????Signature?????????
  • ????(False drop)
  • ??????????????,????Signature?????????
  • Signature??????????????,?????

8
????
Block 1 Block2 Block3 Block4
This is a text. A text has many words. Words
are made from letters.
??
000101 110101 100100 101101
????
h(text) 000101 h(many) 110000 h(words) 100100
h(made) 001100 h(letters) 100001
9
????
  • ??
  • ??????,??????????
  • ????,??,??,??????
  • ?????,???????????
  • ??
  • ?????,??????
  • ??,?False Drop????????????
  • ??
  • ???????????????????

10
????
  • ??????
  • ?????????????????
  • ????????????????????????????????
  • ??????
  • ???(Vocabulary)
  • ??Heaps??,?????O (n?), ? 0.40.6
  • ??????????????????(index file)
  • ????(Occurrence)
  • ??,O(n),???????3040
  • ???????????????????(posting file)

11
????
1 6 9 11 17 19 24 28
33 40 46 50
55 60 This is a text. A text has many
words. Words are made from letters.
Text
Vocabulary Occurrences
  • addressing granularity
  • inverted list
  • word positions
  • character positions
  • inverted file
  • document

letters 60 made 50 many 28 text 11,
19 words 30, 40
12
????
  • ?????
  • ???????????,????????
  • ?????????,??????????

Block1 Block2
Block3 Block
4 This is a text. A text has many words. Words
are made from letters.
Vocabulary Occurrences
Text
letters 4 made 4 many 2 text 1, 2
words 3
Inverted index
13
????
  • ???????
  • ?????????????????
  • ?????????????
  • ????????????????
  • ???????????????????
  • ??????????IO??,??????????
  • ??????????
  • ??Hash???
  • ??????????
  • ??Trie?,B?????
  • ???????
  • ????????(delta compression)

14
????
  • ????????
  • ????????,???????
  • ?????????????
  • ??????????,???????????
    ??????,??2??????64K??
  • ??
  • ????
  • ????
  • ??
  • ??????????????
  • ???????????,??????
  • ???????????

15
????
  • ????????
  • ???????????
  • ???????????
  • ??
  • ???? ??????(??????)
  • ??
  • ?????????
  • ?????????????????????
  • ?????????Nlog N (??????)
  • ????????????????????????
  • ??????????????IO????,???????
  • ???????
  • ????logN?????????????
  • ??????????????????

16
????
  • Lucene???????????
  • ????????16,000??
  • indexInterval16
  • ????????????16log(1000) 26?

17
????
  • ???????
  • ????????????????
  • ???,B?,Trie ?
  • ?????
  • ??????,????????
  • B?
  • ??????,???????,?????
  • Trie ?
  • ????????????
  • ???????????
  • ??????????????
  • Log (????) gt E(??) E????

18
Trie?
  • ???trie?
  • trie????????????????
  • trie??????????????????
  • ?????????????????
  • ?trie????????????????
  • ??,??????,??????????????,??????trie??

19
  • ????a?b?c?aa?ab?ac?ba?ca?
  • aba?abc?baa?bab?bac?cab?abba?baba?caba?abaca?caab
    a

20
Trie?
  • ??
  • ?????,???????
  • Trie???????????????
  • ?????????????13??
  • ?????????????
  • ???????????????????
  • ???????????????????
  • ?????,?????
  • ??,????Trie????????????
  • ????????????,Trie?????
  • ?????,????????????
  • ??
  • ??????
  • ?????m??,????????
  • ??Trie??,?????????
  • ????? ?? ?????? ????? ????
  • ??20000 6 256 4 120M
  • ?????

21
????(Delta Compression)
  • ????
  • ????????????
  • ?????????ID,?????????Pos
  • ????
  • ????ID???ID???
  • ????Pos???Pos???
  • ??????????ID,Pos?????
  • ?????A???13,124,346???
  • ?????,??346gt256,???????
  • ?346-124222lt256,??????
  • ????
  • Lucene?????????????????

22
PAT?(Patricia tree)
  • ???Patricia?
  • Patricia??Trie??????
  • ???????????????????
  • ???(Suffix tree)
  • ????????????Patricia?
  • ???????????????????
  • ????
  • ??????
  • ??????
  • ????
  • ????(Suffix array)
  • ???????????,????

23
1 6 9 11 17 19 24 28
33 40 46 50
55 60 This is a text. A text has many
words. Words are made from letters.
Text
Suffix Trie
60
l
50
d
m
a
28
space overhead 120240 over the text size
19
n
t
e
x
t
w
11
40
o
r
d
s
33
60
l
Suffix Tree
50
d
m
3
1
28
19
n
t
5
11
w
40
6
33
24
difference between suffix array and inverted list
  • suffix array the occurrences of each word are
    sorted lexicographically by the text following
    the word
  • inverted list the occurrences of each word are
    sorted by text position

1 6 9 11 17 19 24 28
33 40 46 50
55 60 This is a text. A text has many
words. Words are made from letters.
Vocabulary Supra-Index
Suffix Array
Inverted list
25
????
  • ????
  • ??????????????????
  • ????
  • ????????????????
  • ????????,?????????
  • ????
  • ???????????????
  • ??????????????????
  • ????(??????????)
  • ?????????????????????
  • ???????????????
  • ??????,?????????????
  • ??Trie ??,???????????,??E(??)

26
??!
Write a Comment
User Comments (0)
About PowerShow.com