Combinatorial aspects of the Burrows-Wheeler transform - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Combinatorial aspects of the Burrows-Wheeler transform

Description:

Combinatorial aspects of the Burrows-Wheeler transform Sabrina Mantaci Antonio Restivo Marinella Sciortino University of Palermo Burrows-Wheeler Transform How does ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 13
Provided by: 3825
Category:

less

Transcript and Presenter's Notes

Title: Combinatorial aspects of the Burrows-Wheeler transform


1
Combinatorial aspects of the Burrows-Wheeler
transform
Sabrina Mantaci Antonio Restivo Marinella
Sciortino
University of Palermo
2
Burrows-Wheeler Transform
  • In 1994 M. Burrows and D. Wheeler introduced a
    new data compression method based on a
    preprocessing on the input string. Such a
    preprocessing, called after them the
    Burrows-Wheeler Transform (BWT), produces a
    permutation of the letters in the input string
    such that
  • the transformed string is easier to compress
    than the original one.
  • the original string can be recovered
  • The use of this preprocessing allowed to define a
    class of lossless data compression algorithms
    that
  • achieve speed comparable to the algorithms based
    on the techniques by Lempel and Ziv
  • obtains a compression ratio close to the best
    statistical modelling techniques.

3
How does BWT work ?
  • INPUT w abraca
  • Lexicographically sort the cyclic rotations of w
  • The following properties hold
  • the character Li is followed in w by Fi
  • for each character ch, the i-th occurrence of ch
    in F corresponds to the i-th occurrence of ch in
    L.

4
Reversibility
The Burrows-Wheeler transform is reversible, in
the sense that given BWT(w) and an index I, it is
possible to recover w.
  • Given LBWT(w)caraab and I1
  • Construct F by alphabetically sorting the
    letters in L
  • Define a permutation ? on 0,1,,n-1,
    establishing a correspondence between the
    positions of the same letters in F and in L
  • Starting from position I, we can recover ww0
    wn as follows
  • wi F?i(I), where ?0(x)x, ?i1(x) ?(?i(x))

5
We can deduce that
Therefore we can study combinatorial properties
of the BWT by studying the conjugacy classes of
primitive words.
6
Standard Words
d1, d2,,dn, a sequence of natural numbers d1?0,
gt0 i 2,,n Consider the sequence snn ?0
defined as
  • s is a characteristic
    Sturmian word
  • sn ?0 is called approximating sequence of s
  • (d1, d2,,dn, ) is the directive sequence of s
  • Each finite word sn is a standard word

7
Characterization of standard words
  • A word w is standard if and only if it is a
    letter or wvab (or equivalently wvba) and v has
    periods p,q such that gcd(p,q)1 and
    vpq-2.(extremal case of Fine and Wilf
    theorem)
  • A word w is standard if and only if it is a
    letter or there exist palindrome words P,Q,R,
    such that w QR Pxy where x,ya,b.
  • Standard words correspond to an extremal case of
    Knuth-Morris-Pratt algorithm.

8
Rotations
Standard words can also be generated by
rotations. Let p,q?2 such that gcd(p,q)1 and
npq. ?p0,1,,n-1?0,1,,n-1 defined as
?p(z)zp (mod n)
If n8, p3, q5, wabaababa
9
A new characterization of standard words
10
Idea of the proof
The permutation ? giving the correspondence
between the positions of characters in F and L is
?(z)zp(mod n). Starting, for example, from the
position Ip we can recover the word u,
uiF(?i(p)).
11
Further Research
Further Research
  • Study extremal case of the BWT for k-letters
    alphabets with kgt2.
  • For instance for k3, characterize the words w
    such that BWT(w) belongs to cab or bca.
  • This property does work neither with 3-Standard
    words nor with balanced words.
  • Does a relation between the complexity function
    of a word w and the structure of BWT(w) exist?
  • Given a language L, one can define
    BWT(L)BWT(w) w in L. One can ask whether BWT
    preserves some properties of a language L, such
    as belonging to a certain family of languages in
    the Chomsky Hierarchy.
  • We found negative results

L1(ab), BWT(L1)bnan n0 a context free
language
L2(abc), BWT(L2)cnanbn n0 a context
sensitive language
12
Further Research
  • Is it possible to characterize interesting
    families of words in terms of their BWT?
Write a Comment
User Comments (0)
About PowerShow.com