P1254503540CaQPv - PowerPoint PPT Presentation

About This Presentation
Title:

P1254503540CaQPv

Description:

DNA strands consist of nucleotides, composed of sugar and ... for Ligation. Probing Complement. Strands for Reading. GAAAGTCGCGTA. GCATCGTTTGAT. GACGAGCACACA ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 38
Provided by: sysa173
Category:

less

Transcript and Presenter's Notes

Title: P1254503540CaQPv


1
DNA CODES BASED ON HAMMING STEM SIMILARITIES
A.G. Dyachkov1, A.N. Voronina1
1 Dept. of Probability Theory, MechMath., Moscow
State University, Russia
2
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

3
DNA STRANDS
Single DNA strand
  • DNA strands consist of nucleotides, composed of
    sugar and phosphate backbone and 1 base
  • There are 4 types of bases

5 end
adenine
guanine
cytosine
thymine
  • Base A is said to be complement to T and C to G
  • DNA strands are oriented. Thus, for example,
    strand AATG is different from strand GTAA
  • 2 oppositely directed strands containing
    complement bases at corresponding positions are
    called reverse-complement strands. For example,
    this 2 strands are reverse-complement

Bases
Nucleotide
The strands have different directions
Sugar phosphate backbone
3 end
4
HYBRIDIZATION
  • 2 oppositely directed DNA strands are capable of
    coalescing into duplex, or double helix
  • The process of forming of duplex is referred to
    as hybridization
  • The basis of this process is forming of the
    hydrogen bonds between complement bases
  • Duplex, formed of reverse-complement strands is
    called a Watson-Crick duplex. Here is the example
    of it

Watson-Crick duplex
5
CROSS-HYBRIDIZATION AND ENERGY OF HYBRIDIZATION
  • Though, hybridization is not a perfect process
    and non-complementary strands can also hybridize
  • This is one example of cross-hybridization

This bases are not complement
This bases are not complement
  • The indicator of strength, or stability of
    formed duplex is its energy of hybridization. Its
    value depends on the total number of bonds formed
  • Thus, the greatest hybridization energy is
    obtained when Watson-Crick duplex is formed
    rather than is case of cross-hybridization

6
LONE BONDS AND PAIRWISE METRIC
  • If a pair of bases is bonded but neither of its
    neighbor bases form a bond as well, then it is
    called a lone bond. Here it is

Lone bond does not contribute to hybr. energy
A triplet is counted as 2 adjacent pairs
A pair of bonds add 1 to total hybr. energy
Hybr. Energy 3
  • The lone bond is too weak to form a strong
    connection, so it does not contribution much to
    the total energy of hybridization
  • Moreover, in fact, the energy of hybridization
    depends not on the number of bonds formed, but on
    the number of pairs of adjacent bonds
  • Thus, if we suppose, that hybridization energy is
    equal to the number of pairs, then in the example
    above it is equal to 3, not 5 or 6

7
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

8
NOTATIONS
  • General notations
  • Let be an arbitrary
    even integer
  • Denote by
    the standard alphabet of size
  • Denote by the largest (smallest)
    integer
  • Reverse-complementation
  • For any letter , define
    the complement of the
    letter
  • For any q-ary sequence
    , define its reverse
    complement
  • Note, that if , then
    for any .

9
STEM HAMMING SIMILARITY
  • For 2 q-ary sequences of length n

  • and
  • stem Hamming similarity is equal to

  • where
  • is equal to the total number of
    common 2-blocks containing adjacent symbols in
    the longest common Hamming subsequence

10
HAMMING VS. STEM HAMMING
  • Hamming similarity is element-wise while stem
    Hamming similarity is pair-wise (though still
    additive)
  • Re-ordering the elements in the sequence does not
    influence Hamming similarity, but may change stem
    Hamming similarity
  • Example

11
STEM HAMMING DISTANCE
  • Note, that
    and if and
    only if
  • Stem Hamming distance between
    is
  • Example
  • Let and
  • The longest common Hamming subsequence is
  • Stem Hamming similarity is equal to
  • Stem Hamming distance is equal to

12
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

13
MOTIVATION
  • Study of DNA codes was motivated by the needs of
    DNA computing and biomolecular nanotechnology
  • In these applications, one must form a collection
    of DNA strands, which will serve as markers,
    while the collection of reverse-complement (to
    that first strands) DNA strands will be utilized
    for reading, or recognition

Probing Complement Strands for Reading
Coding Strands for Ligation
  1. Collection of mutually reverse-complement pairs
  2. No self-reverse complement words
  3. No cross-hybridization

TACGCGACTTTC ATCAAACGATGC TGTGTGCTCGTC ATTTTTGCGTT
A CACTAAATACAA GAAAAAGAAGAA
GAAAGTCGCGTA GCATCGTTTGAT GACGAGCACACA TAACGCAAAAA
T TTGTATTTAGTG TTCTTCTTTTTC
14
DNA CODE

  • is a code of length and size
  • , where
    are the codewords of code
  • is called a DNA -code based
    on stem Hamming similarity if the following 2
    conditions are fulfilled
  • For any , there exists
    , such that
  • For any
  • Let be the maximal size of
    DNA -codes.
  • Is called a rate of DNA codes

15
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

16
Q-ARY REED-MULLER CODES
  • q-ary Reed-Muller codeLet
  • Define mapping
    , with
  • Reed-Muller code of order
    is the image
  • Reed-Muller code of order 1
    satisfy the condition of reverse-complementarity
  • It may contain self-reverse complement words,
    that should be excluded from the final
    construction

17
EXAMPLE OF CODE
Let q4 and m1
Mutually-reverse complement
0 1 2 3
0 0 0 0
0 1 2 3
0 1 2 3
0 2 0 2
0 3 2 1
1 1 1 1
1 2 3 0
1 3 1 3
1 0 3 2
2 2 2 2
2 3 0 1
2 0 2 0
2 1 0 3
3 3 3 3
3 0 1 2
3 1 3 1
3 2 1 0
18
OUTLINE
  • DNA background
  • Modeling the hybridization energy
  • DNA codes
  • Example of DNA codes
  • Bounds on the rate on DNA codes
  • Lower Gilbert-Varshamov bound
  • Upper bounds
  • Graphs
  • On sphere sizes
  • Possible generalizations
  • Bibliography

19
RANDOM CODING
  • and are independent identically
    distributed random sequences with uniform
    distribution on
  • Define
  • Probability distribution of
  • Sum of

20
GILBERT-VARSHAMOV BOUND
  • Let . Introduce
  • We construct random code as a collection of
    independent variables and their
    reverse-complements. This fact leads to necessity
    of special random coding technique for DNA codes
  • One can check, that
  • Random coding bound (Gilbert-Varshamov bound)
    if then

21
CALCULATION OF THE BOUND
  • are dependent variables and
    both depend on and
  • do not constitute a Markov chain

  • vs.
  • are deterministic functions of Markov chain
  • and
  • We cannot apply standard technique as in case of
    Hamming similarity
  • We have to use Large Deviations Principle for
    Markov chains for

22
GILBERT-VARSHAMOV BOUND
  • Introduce
  • Gilbert-Varshamov lower bound on the rate
    If
    then , where
  • and is a
    decreasing -convex function with

23
OUTLINE
  • DNA background
  • Modeling the hybridization energy
  • DNA codes
  • Example of DNA codes
  • Bounds on the rate on DNA codes
  • Lower Gilbert-Varshamov bound
  • Upper bounds
  • Graphs
  • On sphere sizes
  • Possible generalizations
  • Bibliography

24
UPPER BOUNDS
  • Plotkin upper bound
  • If , then
    and

  • if
  • Elias upper boundIf
    , then ,
    where is presented by parametric
    equation
  • Elias bound improves Plotkin bound for small
    values of . We
    calculated and
    .

25
OUTLINE
  • DNA background
  • Modeling the hybridization energy
  • DNA codes
  • Example of DNA codes
  • Bounds on the rate on DNA codes
  • Lower Gilbert-Varshamov bound
  • Upper bounds
  • Graphs
  • On sphere sizes
  • Possible generalizations
  • Bibliography

26
BOUNDS ON THE RATE (Q2)
Bound on the rate of DNA code, q2
0.75
27
BOUNDS ON THE RATE (Q4)
Bound on the rate of DNA code, q4
0.9375
28
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

29
FIBONACCI NUMBERS
  • q-ary Fibonacci numbers are defined by recurrent
    equation
  • with initial conditions
  • q-ary Fibonacci numbers may also be calculated as
    sum
  • q-ary Fibonacci number may be
    interpreted as the numberof q-ary sequences of
    length , which do not contain 2-stems of the
    form (0,0)

30
COMBINATORIAL CALCULATION
  • Space with metric is
    homogeneous, i.e., the volume of a sphere does
    not depend on its center
  • Define
  • for any
  • Consider a sphere with center
    . Anysequence
    must have no
    common2-stems (pairs) with . In other
    words, is must have no 2-stems of type (0,0).
    Thus,
  • Sphere sizes for other may be obtained using
    the same technique with some corresponding
    modifications

31
GRAPH OF PROBABILITIES
Probability distribution
32
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

33
B-STEM HAMMING SIMILARITY
  • -stem Hamming similarity in spite of
    counting the number of 2-stems (pairs)
    calculate the number of -stems

  • where

34
WEIGTHED STEM HAMMING SIMILARITY
  • Weighted stem Hamming similarity assign weight
    to each type of q-ary pairs and take it into
    account while calculating the sum
  • Let
    be a weight function such that
  • Similarity is defined as follows
  • , where

35
INSERTION-DELETION STEM SIMILARITY
  • Insertion-deletion stem similarityallow loops
    and shifts at the DNA duplex
  • is a common block
    subsequence between and , if is an
    ordered collection of non-overlapping common (
    , )-blocks of length
  • common ( , )-block of length ,
    is a subsequence of and ,
    consisting of consecutive elements of and
  • is the set of all common block
    subsequences between and
  • is the minimal number of
    blocks of consecutive elements of and in
    the given subsequence
  • Similarity is defined as follows

Shift
Loop
36
OUTLINE
  1. DNA background
  2. Modeling the hybridization energy
  3. DNA codes
  4. Example of code construction
  5. Bounds on the rate on DNA codes
  6. On sphere sizes
  7. Further generalizations
  8. Bibliography

37
BIBLIOGRAPHY
  • Probability theory and Large Deviation Principle
  • V.N. Tutubalin, The Theory of Probability and
    Random Processes. Moscow Publishing House of
    Moscow State University, 1992 (in Russian).
  • A. Dembo, O. Zeitouni, Large Deviations
    Techniques and Applications. Boston, MA Jones
    and Bartlett, 1993.
  • DNA codes
  • D'yachkov A.G., Macula A.J., Torney D.C.,
    Vilenkin P.A., White P.S., Ismagilov I.K.,
    Sarbayev R.S., On DNA Codes. Problemy Peredachi
    Informatsii, 2005, V. 41, N. 4, P. 57-77, (in
    Russian). English translation Problems of
    Information Transmission, V. 41, N. 4, 2005, P.
    349-367.
  • Bishop M.A.,D'yachkov A.G., Macula A.J., Renz
    T.E., Rykov V.V., Free Energy Gap and Statistical
    Thermodynamic Fidelity of DNA Codes. Journal of
    Computational Biology, 2007, V. 14, N. 8, P.
    1088-1104.
  • A. Dyachkov, A. Macula, T. Renz and V. Rykov,
    Random Coding Bounds for DNA Codes Based on
    Fibonacci Ensembles of DNA Sequences. Proc. of
    2008 IEEE International Symposium on Information
    Theory, Toronto, Canada, 2008, in print.
Write a Comment
User Comments (0)
About PowerShow.com