Genome Rearrangements - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Genome Rearrangements

Description:

These have important (usually fatal) consequences for the organism and its evolution ... Those with three edges don't need to be touched since reality = desire ... – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 65
Provided by: timoth73
Category:

less

Transcript and Presenter's Notes

Title: Genome Rearrangements


1
Genome Rearrangements
  • CIS 667 April 13, 2004

2
Genome Rearrangements
  • We have seen how differences in genes at the
    sequence level can be used to infer evolutionary
    relations among species
  • Differences in sequences in (one or more) genes
    resulted from point mutations (insert, delete,
    substitute)
  • These are not the only type of changes that can
    occur in the genome

3
Genome Rearrangements
  • Repair of broken chromosomes is an important
    process
  • Mistakes can occur, however
  • Mistakes can also occur during crossover
  • These mistakes cause changes in gene order
  • A large piece of chromosome can be moved or
    copied to another location
  • It can also move from one chromosome to another
  • We call these movements genome rearrangments

4
Crossover
5
Chromosome Repair
6
Genome Rearrangements
  • These have important (usually fatal) consequences
    for the organism and its evolution
  • Alignments do not capture genome rearrangments
  • Two species may have nearly the same gene
    sequences, but in a different order (why would
    the two species then be different?)

7
Genome Rearrangements
  • We need some other way to compare entire genomes
    (i.e. compare at a higher level)
  • Rather than simple point mutations a genome is
    obtained from another by a number of a special
    kind of rearrangements Reversals
  • Use the number of reversals needed to transform
    one genome into another to measure evolutionary
    distance

8
The Method
  • Use combinatorial optimization techniques in an
    attempt to infer a most economical sequence of
    rearrangement operations to account for
    differences among the genomes
  • Compare with character-based methods for
    phylogenetics (parsimony)

9
Reversals
  • Consider the genome of a species as a sequence of
    blocks
  • A block is some sequence of the genome (possibly
    containing more than one gene) transcribed as a
    unit
  • Blocks are oriented since they can be transcribed
    from either strand of DNA
  • Give homologous blocks the same label

10
Reversals
  • Relation between chloroplast genomes of alfalfa
    and garden pea

11
Reversals
  • Reversal operation for oriented blocks
  • Inverts the order of affected blocks and changes
    their orientation (arrow)
  • Affects a contiguous segment of blocks
  • What sequence of reversal operations could have
    changed alfalfa into garden pea?
  • Would like to have a polynomial time algorithm to
    find the shortest sequence

12
(No Transcript)
13
Genome Comparison vs. Gene Comparison
  • In the late 1980s, J. Palmer and his colleagues
    studied the mitochondrial genomes of cabbage and
    turnips
  • The gene sequences are very similar (some genes
    are 99 equal)
  • Gene order, however, differs dramatically
  • Genome rearrangements are now considered to be a
    common mode of molecular evolution

14
Genome Comparison vs. Gene Comparison
  • Extreme conservation of genes on X chromosomes
    across mammalian species provides an opportunity
    to study the evolutionary history of X chromosome
    independently of the rest of the genomes
  • According to Ohnos law, the gene content of X
    chromosome has barely changed throughout
    mammalian development in the last 125 million
    years.
  • However, the order of genes on X chromosomes has
    been disrupted several times.

15
Human and Mouse X Chromosomes
16
Human and Mouse X Chromosomes
-4 -6 1 7
2 -3 5 8
3 2 7 -1
6 4 5 8
1 7 2 -3
6 4 5 8
1 2 7 -3
6 4 5 8
1
2 7 3
-6 4 5 8
1 2 -5 -4
-3 6 7 8
1 2 3 4
5 6 7 8
17
Genome Comparison vs. Gene Comparison
  • The traditional molecular evolutionary technique
    is a gene comparison to construct a phylogenetic
    tree
  • In the cabbage and turnip case this is hardly
    suitable, since rate of point mutations in their
    mitochondrial genes is so low that their genes
    are almost identical
  • Genome comparison (i.e. comparison of gene
    orders) is the method of choice in the case of
    very slowly evolving genomes
  • Another area is the case where genomes evolve
    very rapidly (genes not very similar)

18
Genome Comparison
  • Only about (178?39) genome rearrangements have
    happened since human and mouse diverged 80
    million years ago
  • Mouse and human genomes can be viewed as a
    collection of about 200 fragments which are
    shuffled in mice as compared to humans
  • A comparative mouse-human genetic map gives the
    position of a human gene given the location of a
    related mouse gene

19
Man-Mouse Comparative Physical Map
20
Definitions
  • A signed permutation a over the set of labels L
    1, 2, , n is a permutation such that a(i) a
    or a, where a Î L
  • Example 3, 2, 1 is a signed permutation over
    L 1, 2, 3
  • Note that no label may appear twice in the
    permutation
  • A reversal i,j is an operation that transforms
    one signed permutation into another, reversing
    the order or a contiguous portion and flipping
    the signs

21
Definitions
  • a ai,j a(1), , a(i 1), a(j), , a(i),
    a(j 1), , a(n)
  • We are interested in the problem of sorting by
    reversals Given two signed permutations a and b,
    find the minimum number of reversals r1, , rt
    that will transform a into b - a r1rt b
  • The reversal distance db(a) t

22
Definitions
  • Note that the reversal operation does not
    directly correspond to the biological operations
    (inversion, translocation, fission, fusion)
  • Given a and b, can we always transform a into b
    using only the reversal operation? If so, how
    many reversals are required in the worst case?

23
Breakpoints
  • A breakpoint is a point between consecutive
    labels in the initial permutation that must
    necessarily be separated by at least one reversal
    to reach the target permutation
  • The two consecutive labels are not consecutive in
    the target, or their orientations are not the
    same in a relative sense

24
Breakpoints
  • To formalize the idea of breakpoint, we introduce
    the extended version of a
  • Let a a(1), , a(n)
  • Then the extended version of a is (L, a(1), ,
    a(n), R)
  • For example let extended a be (L, 2, 3, 1, 6,
    5, 4, R) and let extended b be (L, 1, 2, 3,
    4, 5, 6, R)
  • The breakpoints are (L,2), (2,3), (3,1),
    (1,6), (6,5), (4,R)

25
Breakpoints
  • The number of breakpoints of a permutation a is
    denoted by b(a)
  • In the example, 6
  • Can you characterize the situations where L is
    involved in a breakpoint? When R is involved in a
    breakpoint?

26
A Lower Bound
  • A reversal can remove at most two breakpoints
  • Cuts the permutation in exactly two places
  • So, if ar1 rt b then
  • b(a) b(ar1) 2
  • b(ar1) b(ar1r2) 2
  • b(ar1rt-1) b(ar1rt) 2
  • So b(a) 2t. If t d(a), b(a)/2 d(a)

27
Reality and Desire Diagram
  • The lower bound found is not very tight
  • We can derive a better l.b. based on a structure
    called the reality-desire diagram of a
    permutation with respect to another
  • To draw the diagram, we will represent a with
    the tuple (-a a) and -a with the tuple (a -a)
  • The orientation is given by the rightmost member
    of the tuple

28
Reality and Desire Diagram
  • A permutation is a sequence of adjacent tuples
  • a 3, 2, 1, 4, 5 can be represented as
    L---(3 3)---(2 2)---(1 1)---(4 4)---(5
    5)---R
  • b L---(1 1)---(2 2)---(3 3)---(4
    4)---(5 5)---R

29
Reality and Desire Diagram
  • Now we will draw a graph to represent a (L, 3,
    -2, -1, 4, -5, R)
  • The reality diagram

30
Reality and Desire Diagram
  • Suppose that b is the identity (L, 1, 2, 3,
    4, 5)
  • We will add desire edges to the previous graph to
    represent b

L -3 3 2 -2 1 -1
-4 4 5 -5 R
31
Reality and Desire Diagram
  • a is the reality
  • b is what is desired
  • The diagram (a multigraph) shows both reality and
    desire
  • Call it RD(a)
  • We can rearrange the nodes of the graph to make
    it easier to understand

32
Reality and Desire Diagram
L
R
Desire
Reality
33
Properties of RD(a)
  • Each vertex has degree 2
  • Each node is incident to one edge from A, the set
    of reality edges, and B, the set of desire edges
  • The connected components of the graph are
    alternating cycles (edges alternate between
    reality - blue - and desire - red)
  • Each cycle has an even number of edges, half
    reality and half desire

34
Properties of RD(a)
  • The number of cycles of RD(a) is denoted by cb(a)
  • Note that cb(b) n 1 since b has no
    breakpoints
  • All cycles are two parallel edges between the
    same pair of nodes
  • We have 2n 2 nodes, so n 1 cycles
  • This is the only permutation for which cb(a) 1

35
Properties of RD(a)
  • So transforming a into b can be seen as
    transforming RD(a) into a graph with as many
    cycles as possible - n 1
  • Now we need to see how a reversal affects the
    cycles of RD(a)
  • Note that a reversal is characterized by the two
    points where it cuts the current permutation,
    which each correspond to a reality edge

36
Reversals and RD(a)
  • Let r be a reversal defined by two reality edges
    (s,t) and (u,v), then RD(ar) differs from RD(a)
    as follows
  • Reality edges (s,t) and (u,v) are replaced by
    (s,u) and (t,v)
  • Vertices u, , t are reversed
  • Desire edges remain unchanged
  • See example on following slide

37
Example
Some nodes/edges omitted
L
L
R
R
38
Orientation of Cycles
  • How many cycles are affected by a reversal?
  • First we define convergent and divergent edges
  • Two reality edges on the same cycle converge if
    they are traversed in the same direction
    (clockwise or counterclockwise on the circle in
    the diagram) on the cycle
  • Otherwise they diverge

39
Orientation of Cycles
L
Convergent (3,2) (-1,-4) Divergent (L,-3)
(3,2)
R
40
Reversals and Cycles
  • Let r be a reversal acting on two reality edges e
    and f
  • If e and f belong to different cycles, c(ar)
    c(a) 1
  • If e and f belong to the same cycle and converge,
    c(ar) c(a)
  • If e and f belong to the same cycle and diverge,
    c(ar) c(a) 1

41
First Case
  • If e and f belong to different cycles, c(ar)
    c(a) 1

42
Second Case
  • If e and f belong to the same cycle and converge,
    c(ar) c(a)

43
Third Case
  • If e and f belong to the same cycle and diverge,
    c(ar) c(a) 1

44
Reversals and Cycles
  • Note that the number of cycles changes by at most
    one with each reversal
  • Use that to find another lower bound for reversal
    distance
  • Suppose we have ar1r2rt b we know that c(b)
    n 1 and we have
  • c(ar1) - c(a) ? 1
  • c(ar1r2) - c(ar1) ? 1
  • c(ar1r2rt) - c(ar1r2rt-1) ? 1
  • Adding and cancelling terms we get
  • n 1 - c(a) ? t
  • If r1r2rt is optimal then t d(a), n 1 - c(a)
    ? d(a)

45
Interleaving Graph
  • This new lower bound is better than the old one -
    b(a)/2
  • For most signed permutations, it is close to the
    actual distance, however it does not always work
    (we cant always choose two divergent edges)
  • We can classify the cycles of RD(a) as good or
    bad
  • A cycle is good if it has two divergent reality
    edges
  • Otherwise it is bad

46
Interleaving Graph
  • The classification only applies to proper cycles
    (those with at least four edges)
  • Those with three edges dont need to be touched
    since reality desire
  • If we have only good cycles in a permutation,
    then the lower bound previously given is an
    equality
  • We sort, increasing the number of cycles by one
    per reversal

47
Interleaving Graph
  • If a desire edge from one cycle crosses some
    desire edge from another cycle we say that the
    two cycles interleave
  • Interleaved cycles allow us to change a bad cycle
    into a good one while breaking another cycle
  • This good cycle can then broken in the next step
  • To find interleaving cycles, we construct an
    interleaving graph

48
Interleaving Graph
49
Interleaving Graph
  • Nodes in the interleaving graph are cycles
  • Edge between two nodes if the cycles interleave
  • The connected components of the graph are called
    bad components if they consist entirely of bad
    cycles
  • Component otherwise is a good component

50
Interleaving Graph
  • What is the interleaving graph of the previous
    example?
  • Suppose that F and C are good cycles.
  • Which components of the interleaving graph are
    good and which are bad?

51
Sorting Good Components
  • We need to choose two divergent edges in the same
    cycle to define a reversal that increases the
    number of cycles
  • Example
  • A reversal characterized by two divergent edges
    of the same cycle is a sorting reversal if and
    only if it does not lead to the creation of bad
    components

52
Bad Components
  • Using this criterion to sort all of the good
    components, we must now sort the bad ones
  • Give a hierarchy of bad components
  • We say a component B separates components A and C
    if all chords in RD(a) that link a terminal in A
    to a terminal in C cross a desire edge of B

53
Diagram with no Good Components
54
Bad Components
  • Reversal through reality edges in different
    components A and C will result in every component
    B that separates A and C being twisted
  • A bad component becomes good when twisted
  • A good component can stay good or become bad when
    twisted
  • So twist only when no good components

55
Hierarchy of Bad Components
  • A hurdle is a bad component that does not
    separate any other two bad components
  • If a bad component separates others, then it is a
    nonhurdle
  • A hurdle A protects a nonhurdle B when removal of
    A would cause B to become a hurdle
  • B is protected by A when every time B separates
    two bad components, A is one of them

56
Hierarchy of Bad Components
  • A hurdle A is called a superhurdle if it protects
    some other nonhurdle B
  • Otherwise it is called a simple hurdle

Bad Components
Nonhurdles
Hurdles
Simple hurdles
Super hurdles
57
Fortress
  • A signed permutation a is called a fortress iff
    RD(a) has an odd number of hurdles and all of
    them are super hurdles

58
Reversal Distance
  • The reversal distance of oriented permutations is
    given by
  • d(a) n 1 - c(a) h(a) f(a)
  • c(a) - number of cycles (proper and non)
  • h(a) - number of hurdles
  • f(a) - a a fortress? (1 else 0)
  • n 1 - c(a) good components and bad components
    which become good during sort
  • h(a) - bad components require extra reversal
  • f(a) - extra reversal for fortress

59
Algorithm
  • If we dont have a good cycle we must use either
    a reversal on two convergent edges or a reversal
    on edges in different cycles
  • In first case, number of cycles is constant
  • In second case, number of cycles decreases by one
  • Choose case one on a hurdle
  • Transforms bad component into good
  • Number of cycles remains constant

60
Algorithm
  • Getting rid of a non-hurdle doesnt change the
    number of hurdles or fortress status, so distance
    remains the same
  • If we reverse a superhurdle, the nonhurdle it
    protects becomes a hurdle so h remains constant
  • Call reversal on some cycle in a hurdle hurdle
    cutting

61
Algorithm
  • In order not to increase f(a), use hurdle cutting
    only when h(a) is odd
  • Using reversal on edges in two different cycles
    increases c(a)
  • However d(a) will decrease if we can decrease
    h(a) by two
  • Choose edges from two different hurdles - this is
    called hurdle merging
  • The two hurdles as well as any nonhurdle
    separating them become good components

62
Algorithm
  • We have to be careful that hurdle merging doesnt
    transform a nonhurdle into a hurdle
  • A and B are called opposite hurdles when we find
    the same number of hurdles walking the circle
    clockwise from A to B as we do walking
    counterclockwise
  • This can only happen if h(a) is even
  • Choosing opposite hurdles, we dont turn a
    nonhurdle into a hurdle

63
Algorithm
  • To avoid creating a fortress where we dont have
    one, we choose the opposite hurdles when they
    exist
  • If h(a) is odd and we have a simple hurdle, do
    hurdle cutting to avoid fortress
  • If neither case if possible, we already have a
    fortress so f(a) doesnt increase with any hurdle
    merging

64
Algorithm
Algorithm Sorting Reversal input distinct
permutations a and b output a sorting reversal
for a with target b if there is a good component
in RDb(a) then pick 2 divergent edges e and f
in this component, making sure the
corresponding reversal does not create any bad
components return the reversal characterized by
e and f else if h(a) is even then return
merging of two opposite hurdles else if
h(a) is odd and there is a simple hurdle
then return a reversal cutting this hurdle
else // fortress return merging of
any two hurdles
Write a Comment
User Comments (0)
About PowerShow.com