Genome Rearrangements

About This Presentation

Title:

Genome Rearrangements

Description:

These have important (usually fatal) consequences for the organism and its evolution ... Those with three edges don't need to be touched since reality = desire ... – PowerPoint PPT presentation

Number of Views:187

Avg rating:3.0/5.0

Slides: 65

Provided by: timoth73

Category:

more less

Transcript and Presenter's Notes

Title: Genome Rearrangements

1
Genome Rearrangements

CIS 667 April 13, 2004

2
Genome Rearrangements

We have seen how differences in genes at the
sequence level can be used to infer evolutionary
relations among species
Differences in sequences in (one or more) genes
resulted from point mutations (insert, delete,
substitute)
These are not the only type of changes that can
occur in the genome

3
Genome Rearrangements

Repair of broken chromosomes is an important
process
Mistakes can occur, however
Mistakes can also occur during crossover
These mistakes cause changes in gene order
A large piece of chromosome can be moved or
copied to another location
It can also move from one chromosome to another
We call these movements genome rearrangments

4
Crossover
5
Chromosome Repair
6
Genome Rearrangements

These have important (usually fatal) consequences
for the organism and its evolution
Alignments do not capture genome rearrangments
Two species may have nearly the same gene
sequences, but in a different order (why would
the two species then be different?)

7
Genome Rearrangements

We need some other way to compare entire genomes
(i.e. compare at a higher level)
Rather than simple point mutations a genome is
obtained from another by a number of a special
kind of rearrangements Reversals
Use the number of reversals needed to transform
one genome into another to measure evolutionary
distance

8
The Method

Use combinatorial optimization techniques in an
attempt to infer a most economical sequence of
rearrangement operations to account for
differences among the genomes
Compare with character-based methods for
phylogenetics (parsimony)

9
Reversals

Consider the genome of a species as a sequence of
blocks
A block is some sequence of the genome (possibly
containing more than one gene) transcribed as a
unit
Blocks are oriented since they can be transcribed
from either strand of DNA
Give homologous blocks the same label

10
Reversals

Relation between chloroplast genomes of alfalfa
and garden pea

11
Reversals

Reversal operation for oriented blocks
Inverts the order of affected blocks and changes
their orientation (arrow)
Affects a contiguous segment of blocks
What sequence of reversal operations could have
changed alfalfa into garden pea?
Would like to have a polynomial time algorithm to
find the shortest sequence

12
(No Transcript)
13
Genome Comparison vs. Gene Comparison

In the late 1980s, J. Palmer and his colleagues
studied the mitochondrial genomes of cabbage and
turnips
The gene sequences are very similar (some genes
are 99 equal)
Gene order, however, differs dramatically
Genome rearrangements are now considered to be a
common mode of molecular evolution

14
Genome Comparison vs. Gene Comparison

Extreme conservation of genes on X chromosomes
across mammalian species provides an opportunity
to study the evolutionary history of X chromosome
independently of the rest of the genomes
According to Ohnos law, the gene content of X
chromosome has barely changed throughout
mammalian development in the last 125 million
years.
However, the order of genes on X chromosomes has
been disrupted several times.

15
Human and Mouse X Chromosomes
16
Human and Mouse X Chromosomes
-4 -6 1 7
2 -3 5 8
3 2 7 -1
6 4 5 8
1 7 2 -3
6 4 5 8
1 2 7 -3
6 4 5 8
1
2 7 3
-6 4 5 8
1 2 -5 -4
-3 6 7 8
1 2 3 4
5 6 7 8
17
Genome Comparison vs. Gene Comparison

The traditional molecular evolutionary technique
is a gene comparison to construct a phylogenetic
tree
In the cabbage and turnip case this is hardly
suitable, since rate of point mutations in their
mitochondrial genes is so low that their genes
are almost identical
Genome comparison (i.e. comparison of gene
orders) is the method of choice in the case of
very slowly evolving genomes
Another area is the case where genomes evolve
very rapidly (genes not very similar)

18
Genome Comparison

Only about (178?39) genome rearrangements have
happened since human and mouse diverged 80
million years ago
Mouse and human genomes can be viewed as a
collection of about 200 fragments which are
shuffled in mice as compared to humans
A comparative mouse-human genetic map gives the
position of a human gene given the location of a
related mouse gene

19
Man-Mouse Comparative Physical Map
20
Definitions

A signed permutation a over the set of labels L
1, 2, , n is a permutation such that a(i) a
or a, where a Î L
Example 3, 2, 1 is a signed permutation over
L 1, 2, 3
Note that no label may appear twice in the
permutation
A reversal i,j is an operation that transforms
one signed permutation into another, reversing
the order or a contiguous portion and flipping
the signs

21
Definitions

a ai,j a(1), , a(i 1), a(j), , a(i),
a(j 1), , a(n)
We are interested in the problem of sorting by
reversals Given two signed permutations a and b,
find the minimum number of reversals r1, , rt
that will transform a into b - a r1rt b
The reversal distance db(a) t

22
Definitions

Note that the reversal operation does not
directly correspond to the biological operations
(inversion, translocation, fission, fusion)
Given a and b, can we always transform a into b
using only the reversal operation? If so, how
many reversals are required in the worst case?

23
Breakpoints

A breakpoint is a point between consecutive
labels in the initial permutation that must
necessarily be separated by at least one reversal
to reach the target permutation
The two consecutive labels are not consecutive in
the target, or their orientations are not the
same in a relative sense

24
Breakpoints

To formalize the idea of breakpoint, we introduce
the extended version of a
Let a a(1), , a(n)
Then the extended version of a is (L, a(1), ,
a(n), R)
For example let extended a be (L, 2, 3, 1, 6,
5, 4, R) and let extended b be (L, 1, 2, 3,
4, 5, 6, R)
The breakpoints are (L,2), (2,3), (3,1),
(1,6), (6,5), (4,R)

25
Breakpoints

The number of breakpoints of a permutation a is
denoted by b(a)
In the example, 6
Can you characterize the situations where L is
involved in a breakpoint? When R is involved in a
breakpoint?

26
A Lower Bound

A reversal can remove at most two breakpoints
Cuts the permutation in exactly two places
So, if ar1 rt b then
b(a) b(ar1) 2
b(ar1) b(ar1r2) 2
b(ar1rt-1) b(ar1rt) 2
So b(a) 2t. If t d(a), b(a)/2 d(a)

27
Reality and Desire Diagram

The lower bound found is not very tight
We can derive a better l.b. based on a structure
called the reality-desire diagram of a
permutation with respect to another
To draw the diagram, we will represent a with
the tuple (-a a) and -a with the tuple (a -a)
The orientation is given by the rightmost member
of the tuple

28
Reality and Desire Diagram

A permutation is a sequence of adjacent tuples
a 3, 2, 1, 4, 5 can be represented as
L---(3 3)---(2 2)---(1 1)---(4 4)---(5
5)---R
b L---(1 1)---(2 2)---(3 3)---(4
4)---(5 5)---R

29
Reality and Desire Diagram

Now we will draw a graph to represent a (L, 3,
-2, -1, 4, -5, R)
The reality diagram

30
Reality and Desire Diagram

Suppose that b is the identity (L, 1, 2, 3,
4, 5)
We will add desire edges to the previous graph to
represent b

L -3 3 2 -2 1 -1
-4 4 5 -5 R
31
Reality and Desire Diagram

a is the reality
b is what is desired
The diagram (a multigraph) shows both reality and
desire
Call it RD(a)
We can rearrange the nodes of the graph to make
it easier to understand

32
Reality and Desire Diagram
L
R
Desire
Reality
33
Properties of RD(a)

Each vertex has degree 2
Each node is incident to one edge from A, the set
of reality edges, and B, the set of desire edges
The connected components of the graph are
alternating cycles (edges alternate between
reality - blue - and desire - red)
Each cycle has an even number of edges, half
reality and half desire

34
Properties of RD(a)

The number of cycles of RD(a) is denoted by cb(a)
Note that cb(b) n 1 since b has no
breakpoints
All cycles are two parallel edges between the
same pair of nodes
We have 2n 2 nodes, so n 1 cycles
This is the only permutation for which cb(a) 1

35
Properties of RD(a)

So transforming a into b can be seen as
transforming RD(a) into a graph with as many
cycles as possible - n 1
Now we need to see how a reversal affects the
cycles of RD(a)
Note that a reversal is characterized by the two
points where it cuts the current permutation,
which each correspond to a reality edge

36
Reversals and RD(a)

Let r be a reversal defined by two reality edges
(s,t) and (u,v), then RD(ar) differs from RD(a)
as follows
Reality edges (s,t) and (u,v) are replaced by
(s,u) and (t,v)
Vertices u, , t are reversed
Desire edges remain unchanged
See example on following slide

37
Example
Some nodes/edges omitted
L
L
R
R
38
Orientation of Cycles

How many cycles are affected by a reversal?
First we define convergent and divergent edges
Two reality edges on the same cycle converge if
they are traversed in the same direction
(clockwise or counterclockwise on the circle in
the diagram) on the cycle
Otherwise they diverge

39
Orientation of Cycles
L
Convergent (3,2) (-1,-4) Divergent (L,-3)
(3,2)
R
40
Reversals and Cycles

Let r be a reversal acting on two reality edges e
and f
If e and f belong to different cycles, c(ar)
c(a) 1
If e and f belong to the same cycle and converge,
c(ar) c(a)
If e and f belong to the same cycle and diverge,
c(ar) c(a) 1

41
First Case

If e and f belong to different cycles, c(ar)
c(a) 1

42
Second Case

If e and f belong to the same cycle and converge,
c(ar) c(a)

43
Third Case

If e and f belong to the same cycle and diverge,
c(ar) c(a) 1

44
Reversals and Cycles

Note that the number of cycles changes by at most
one with each reversal
Use that to find another lower bound for reversal
distance
Suppose we have ar1r2rt b we know that c(b)
n 1 and we have
c(ar1) - c(a) ? 1
c(ar1r2) - c(ar1) ? 1
c(ar1r2rt) - c(ar1r2rt-1) ? 1
Adding and cancelling terms we get
n 1 - c(a) ? t
If r1r2rt is optimal then t d(a), n 1 - c(a)
? d(a)

45
Interleaving Graph

This new lower bound is better than the old one -
b(a)/2
For most signed permutations, it is close to the
actual distance, however it does not always work
(we cant always choose two divergent edges)
We can classify the cycles of RD(a) as good or
bad
A cycle is good if it has two divergent reality
edges
Otherwise it is bad

46
Interleaving Graph

The classification only applies to proper cycles
(those with at least four edges)
Those with three edges dont need to be touched
since reality desire
If we have only good cycles in a permutation,
then the lower bound previously given is an
equality
We sort, increasing the number of cycles by one
per reversal

47
Interleaving Graph

If a desire edge from one cycle crosses some
desire edge from another cycle we say that the
two cycles interleave
Interleaved cycles allow us to change a bad cycle
into a good one while breaking another cycle
This good cycle can then broken in the next step
To find interleaving cycles, we construct an
interleaving graph

48
Interleaving Graph
49
Interleaving Graph

Nodes in the interleaving graph are cycles
Edge between two nodes if the cycles interleave
The connected components of the graph are called
bad components if they consist entirely of bad
cycles
Component otherwise is a good component

50
Interleaving Graph

What is the interleaving graph of the previous
example?
Suppose that F and C are good cycles.
Which components of the interleaving graph are
good and which are bad?

51
Sorting Good Components

We need to choose two divergent edges in the same
cycle to define a reversal that increases the
number of cycles
Example
A reversal characterized by two divergent edges
of the same cycle is a sorting reversal if and
only if it does not lead to the creation of bad
components

52
Bad Components

Using this criterion to sort all of the good
components, we must now sort the bad ones
Give a hierarchy of bad components
We say a component B separates components A and C
if all chords in RD(a) that link a terminal in A
to a terminal in C cross a desire edge of B

53
Diagram with no Good Components
54
Bad Components

Reversal through reality edges in different
components A and C will result in every component
B that separates A and C being twisted
A bad component becomes good when twisted
A good component can stay good or become bad when
twisted
So twist only when no good components

55
Hierarchy of Bad Components

A hurdle is a bad component that does not
separate any other two bad components
If a bad component separates others, then it is a
nonhurdle
A hurdle A protects a nonhurdle B when removal of
A would cause B to become a hurdle
B is protected by A when every time B separates
two bad components, A is one of them

56
Hierarchy of Bad Components

A hurdle A is called a superhurdle if it protects
some other nonhurdle B
Otherwise it is called a simple hurdle

Bad Components
Nonhurdles
Hurdles
Simple hurdles
Super hurdles
57
Fortress

A signed permutation a is called a fortress iff
RD(a) has an odd number of hurdles and all of
them are super hurdles

58
Reversal Distance

The reversal distance of oriented permutations is
given by
d(a) n 1 - c(a) h(a) f(a)
c(a) - number of cycles (proper and non)
h(a) - number of hurdles
f(a) - a a fortress? (1 else 0)
n 1 - c(a) good components and bad components
which become good during sort
h(a) - bad components require extra reversal
f(a) - extra reversal for fortress

59
Algorithm

If we dont have a good cycle we must use either
a reversal on two convergent edges or a reversal
on edges in different cycles
In first case, number of cycles is constant
In second case, number of cycles decreases by one
Choose case one on a hurdle
Transforms bad component into good
Number of cycles remains constant

60
Algorithm

Getting rid of a non-hurdle doesnt change the
number of hurdles or fortress status, so distance
remains the same
If we reverse a superhurdle, the nonhurdle it
protects becomes a hurdle so h remains constant
Call reversal on some cycle in a hurdle hurdle
cutting

61
Algorithm

In order not to increase f(a), use hurdle cutting
only when h(a) is odd
Using reversal on edges in two different cycles
increases c(a)
However d(a) will decrease if we can decrease
h(a) by two
Choose edges from two different hurdles - this is
called hurdle merging
The two hurdles as well as any nonhurdle
separating them become good components

62
Algorithm

We have to be careful that hurdle merging doesnt
transform a nonhurdle into a hurdle
A and B are called opposite hurdles when we find
the same number of hurdles walking the circle
clockwise from A to B as we do walking
counterclockwise
This can only happen if h(a) is even
Choosing opposite hurdles, we dont turn a
nonhurdle into a hurdle

63
Algorithm

To avoid creating a fortress where we dont have
one, we choose the opposite hurdles when they
exist
If h(a) is odd and we have a simple hurdle, do
hurdle cutting to avoid fortress
If neither case if possible, we already have a
fortress so f(a) doesnt increase with any hurdle
merging

64
Algorithm
Algorithm Sorting Reversal input distinct
permutations a and b output a sorting reversal
for a with target b if there is a good component
in RDb(a) then pick 2 divergent edges e and f
in this component, making sure the
corresponding reversal does not create any bad
components return the reversal characterized by
e and f else if h(a) is even then return
merging of two opposite hurdles else if
h(a) is odd and there is a simple hurdle
then return a reversal cutting this hurdle
else // fortress return merging of
any two hurdles

Write a Comment

User Comments (0)