Multiple-Alignment - PowerPoint PPT Presentation

1 / 94
About This Presentation
Title:

Multiple-Alignment

Description:

... Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA ... Visualizing Interpreting the results The multiple ... data sets that can ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 95
Provided by: Alg90
Category:

less

Transcript and Presenter's Notes

Title: Multiple-Alignment


1
Multiple-Alignment
  • ???????????????????

2
Reference
  • Michael Brudno, Chuong B. Do, Gregory M. Cooper,
    Michael F. Kim, Eugene Davydov, NISC Comparative
    Sequencing Program, Eric D. Green, Arend Sidow,
    and Serafim BatzoglouLAGAN and Multi-LAGAN
    Efficient Tools for Large-Scale Multiple
    Alignment of Genomic DNAGenome Res., Apr 2003
    13 721 - 731 doi10.1101/gr.926603
  • Michael Brudno, Alexander Poliakov, Asaf Salamov,
    Gregory M. Cooper, Arend Sidow, Edward M. Rubin,
    Victor Solovyev, Serafim Batzoglou, and Inna
    DubchakAutomated Whole-Genome Multiple Alignment
    of Rat, Mouse, and HumanGenome Res., Apr 2004
    14 685 - 692 doi10.1101/gr.2067704

3
??
  • Multiple Sequence Alignment

AAA----CTGCAC----AG A--CTG-CT--ACTG---G ---CTGACTG
C----TTA-
NP-Complete
4
LAGAN toolkit
  • http//lagan.stanford.edu/
  • LAGAN
  • Multi-LAGAN
  • Suffle-LAGAN

5
LAGAN (??)
  • ????? sequence ? global alignment
  • Find local alignments. (seeds)
  • Compute a rough global map.
  • Restricted DP.

6
Multi-LAGAN (??)
  • ??? K ? sequence
  • ?? K-Clustering
  • ??????? (phylogenetic-tree )
  • ???????? sequence ????,??? K-1 ? LAGAN?

7
Phylogenetic Tree
8
????
  • ????
  • ROSETTA Set
  • 129 genes
  • ???
  • ?? 10 Kbp
  • ?? 12 ???? CFTR Region
  • ?? 1Mbp
  • ????? annotated exon ? align ?? 11
    ???,??????????????

9
ROSETTA
Aligner 100 exons 90 exons 70 exons Time (sec)
DIALIGN 89 96 98 388
MUMmer 0 1 3 17
GLASS 91 97 98 154
AVID 90 95 97 19
BlastZ 94 97 98 17
LAGAN 94 97 98 48
10
CFTR
11
MLAGAN
12
?? misaligned ???
13
Discussion
  • Multiple Alignments ??????????? Pairwise
    Alignments ????
  • Local Alignment v.s. Global Alignment
  • MLAGAN ?????,???????
  • LAGAN / MLAGAN ?????????,????????????????

14
LAGAN
15
Three main steps
  • 1.Generation of local alignments.
  • 2. Construction of a rough global map.
  • 3. Computation of the final global alignment.

16
1. Generation of Local Alignment
  • LAGAN uses CHAOS to find local homologies between
    two sequence.
  • Michael Brudno, Michael Chapman, Berthold
    Gottgens, Serafim Batzoglou, and Burkhard
    Morgenstern Fast and sensitive multiple
    alignment of long genomic sequences. BMC
    Bioinformatics, 466 2003.
  • CHAOS works by chaining short words, the seeds,
    which match between the two sequence.
  • Anchor chain of seeds, local alignment.

17
1. Generation of Local Alignment
  • k word length, c degeneracy
  • A (k, c)-seed is a pair of k-long words that
    match with at most c differences between the two
    sequence.
  • d maximum distance , s maximum shift.
  • Two seeds are x-letters and y-letters apart. They
    can be chained together if
  • x lt d and y lt d
  • x - y lt s

18
1. Generation of Local Alignment
  • Find seeds at current location in seq1
  • Find the previous seeds that fall into the
    search box
  • Do a range query seeds are indexed by their
    diagonal.
  • Pick a previous seed that maximizes the score
    of chain

Time O(n log n), where n is number of seeds.
19
1. Generation of Local Alignment
  • Scoring of Chains
  • I love SWEET COW (oo)
  • Match and mismatch penalties for each pair of
    chained seed.
  • Gap penalties proportional to x y for each
    pair of chained seed.
  • Chains are threw away if they score under a
    threshold t.
  • Rapid rescoring
  • For the chains that score above t.
  • Rescore them by performing ungapped extensions in
    both directions from each seed. Finding the
    optimal location to insert exactly one gap of
    size x y

20
2. Construction of a Rough Global Map
  • (b, e, b, e, s) represent a local alignment
    (anchor).
  • From (b, b) to (e, e)
  • s is the score of the alignment
  • A1 lt A2 iff e1 lt b2 and e1 lt b2
  • A1 (b1, e1, b1, e1, s1)
  • A2 (b2, e2, b2, e2, s2)
  • A chain of local alignment A1 lt A2 lt lt Ak, has
    score s1 s2 sk.
  • The optimal rough global map is the
    highest-scoring chain.
  • Computed using Sparse Dynamic Programming LIS,
    in time O(nlogn), n is the total number of local
    alignment.

21
2. Construction of a Rough Global Map
  • Recursive anchoring
  • The choice of parameter k (length of seeds), d
    (maximum degeneracy of seeds), and t (score
    threshold) is a tradeoff between speed and
    sensitivity.
  • Speed higher k, lower c.
  • Sensitivity lower k, higher c.
  • To achieve combination of speed and sensitivity,
    LAGAN calls CHAOS with a restrictive set of
    parameters in the regions between each anchor
    (local alignment) of the global map.

22
3. Computation of Global Alignment
  • Limits the area for each anchor
  • The rectangle (0, 0) to (ir, i-r).
  • The rectangle (i-r, j-r) to (M, N).
  • The band enclosed by the two diagonals
  • (i-r, jr) to (i-r, jr)
  • (ir, j-r) to (Ir, j-r)
  • r is a parameter, typically 15.

23
3. Computation of Global Alignment
  • Do dynamic programming method Needleman-Wunsch to
    this limited area.
  • In this sense the anchors in LAGAN are more
    flexible than the anchors in MUMer, AVID, and
    GLASS.
  • LAGAN provide only approximate locations by which
    the alignment should pass.

24
Memory-efficient Implementation
  • LAGAN performs the entire computation with memory
    proportional to the size of the largest
    rectangle.
  • LAGAN achieves this memory efficiency as follow
  • Allocates working memory for one rectangle and
    the neck that follows it. Compute
    Needleman-Wunsch matrix.
  • Traces back all optimal alignments ending in the
    cells at the rightmost column of the neck.
  • Soon converge upon a single optimal alignment in
    practice.
  • Deallocates all working memory, except the memory
    necessary to keep the traced-back alignments.
  • Repeat step 1 to step 3 for the next rectangle
    and neck.

25
LAGAN Running Time Analysis
  • The running time of LAGAN is dominated by the
    rectangles.
  • The running time of necks is Or(MN), which
    is linear in the sequence lengths.
  • Suppose there are n anchors, let (x0, y0),,(xn,
    yn) be dimension of the n1 rectangles. Let
  • denote the total length of the inter-anchor
    segments in each sequence. We can asume the
    anchors will be aligned in linear time and
    therefore ignore their length.

and
26
LAGAN Running Time Analysis
  • The total number of cells in these rectangles is
  • The first term depends only on the effective
    lengths of the sequences and the total number of
    anchors.
  • If we assume a lower bound on acceptable anchor
    density, then L1L2/n behaves linearly in sequence
    length, because L1/n and L2/n areO(1).

27
LAGAN Running Time Analysis
  • The total number of cells in these rectangles is
  • The second term is at most nsx sy where s
    denotes the standard deviation.
  • Assuming constant anchor density. (reasonable
    assumption for a fixed pair of organism.) Thus,
    linear in sequence length provided the standard
    deviations are constant.
  • If the anchors are spaced evenly, and with a
    constant density, the running time will be linear
    in sequence length.

28
References
  • LAGAN online
  • http//genome.lbl.gov/cgi-bin/VistaInput?align_pgm
    lagannum_seqs2
  • http//ai.stanford.edu/serafim/CS262_2005/index.h
    tml
  • LAGAN
  • http//lagan.stanford.edu/lagan_web/citing.shtml
  • Algorithms for Alignment of Genomic,
    SequencesMichael Brudno, Department of Computer
    Science, Stanford UniversityPGA Workshop
    07/16/2004

29
LAGAN and Multi-LAGAN
Efficient Tools for Large-Scale Multiple
Alignment of Genomic DNA
30
Outline
  • LAGAN
  • Multi-LAGAN
  • Performance Evaluation

31
Multi-LAGAN
32
Multiple Alignment
  • A natural extension of 2-sequence comparisons
  • More difficult than pairwise
  • the running time scales as the product of the
    lengths of all sequences
  • NP-complete problem
  • (need heuristic approaches)

33
Progressive Alignment
  • the most widely used heuristic approach
  • Successive applications of a pairwise alignment
    algorithm
  • CLUSTALW (best-known) and MLAGAN

34
MLAGAN (Multi-LAGAN)
  • A multiple aligner based on progressive alignment
    with LAGAN
  • 2 main phases
  • (1) Progressive alignment with LAGAN
  • (2) (optional) Iterative improvement
  • 1. successively remove each sequence
  • 2. realign it

35
Algorithm MLAGAN
  • Input
  • K sequences X1,,XK
  • A phylogenetic binary tree between them

36
Algorithm MLAGAN (cont.)
  • 3 main steps
  • (1) Generation of rough global maps.
  • Find the rough global map between
  • each pair of sequences.
  • (step 1, 2 of LAGAN)

37
Algorithm MLAGAN (cont.)
  • (2) Progressive multiple alignment with
  • anchors.
  • 2.1 Perform a global alignment
  • between the 2 closest sequences
  • according to the phylogenetic tree
  • using step 3 of LAGAN.

38
Algorithm MLAGAN (cont.)
  • 2.2 Find the rough global maps of the new
  • multi-sequence to all other multi-
  • sequences.
  • (details scoring metric in later)
  • 2.3 Iterate steps 2.1, 2.2 (K-1 times).
  • Repeat until left with a multiple
  • alignment of all sequences.

39
Algorithm MLAGAN (cont.)
  • (3) (Optional) Iterative refinement
  • with anchors.
  • For each sequence Xi in the multiple
  • alignment
  • 3.1 Find anchors between Xi the other
  • sequences that align better than a
  • given cutoff.

40
Algorithm MLAGAN (cont.)
  • 3.2 Align Xi to the multiple
  • alignment of the other
  • sequences with LAGAN.
  • (details in later)

41
Align 2 Multi-sequences
  • In the order of the given phylogenetic tree.
  • E.g. 1. (human, chimpanzee)
  • 2. (mouse, rat)
  • 3. (human/chimpanzee, mouse/rat)
  • 4. (human/chimpanzee/mouse/rat,
  • chicken)

42
Align 2 Multi-sequences (cont.)
  • Step 2.2 of MLAGAN
  • E.g. Compute the rough global map of
  • 2-sequence X/Y and 1-sequence Z
  • (1) Anchors in the rough global maps
  • between X Z, Y Z.
  • (2) Reweigh overlapped anchors
  • (s1s2)I/U

43
Align 2 Multi-sequences (cont.)
  • I length of intersection
  • U length of union
  • (3) The highest weight chain, by LIS.

44
Scoring with Affine Gaps
  • An open research area
  • (T-COFFEE)
  • 2 classical models
  • (1) sum-of-pairs model
  • (2) consensus model

45
Scoring with Affine Gaps (cont.)
  • sum-of-pairs model Sum of scores of all
    pairwise alignments
  • consensus model
  • Create a consensus string by a majority vote at
    each position.
  • Sum of pairwise scores between the consensus and
    each individual sequence

46
Scoring with Affine Gaps (cont.)
  • Each scoring scheme has advantages
    disadvantages.
  • E.g. consensus
  • We use a combination of both
  • sum-of-pairs gt substitutions.
  • consensus gt gaps.
  • p.s. Most similar to CLUSTALW
  • ? heuristically weighted per-sequence
    penalties gt gaps

47
Scoring with Affine Gaps (cont.)
  • Stacking effect (consensus affine-gap)
  • Because gap-open penalties are large compared
    to match mismatch scores, often it is favorable
    to artificially open additional gaps in order to
    stack the gap openings.
  • Solution use gap-end penalty
  • ( gap-open penalty)

48
Scoring with Affine Gaps (cont.)
  • consensus string
  • ATCTGT---CAG

49
Scoring with Affine Gaps (cont.)
  • Define
  • (Aij) K L alignment matrix
  • Aij belongs to A, C, G, T, -
  • (Bij) K L alignment matrix
  • Bij belongs to N, O, G, C

50
Scoring with Affine Gaps (cont.)
  • Bij
  • O (gap-open) the ones opening a gap.
  • G (gap-continue) Aij- except gap-open.
  • C (gap-close) the ones closing a gap.
  • N (nucleotide) Aij?- except gap-close.

51
Scoring with Affine Gaps (cont.)
  • Let m, d, g, c be the match, mismatch, gap-open,
    gap-continue penalties, respectively.
  • Define the function S(x, y) as follows
  • 0, if x- or y-
  • m, if xy
  • d, otherwise. (i.e. x?y)

52
Scoring with Affine Gaps (cont.)
  • Let Nj, Oj, Gj, Cj be the number of Ns, Os,
    Gs, Cs in the jth column of (Bij),
    respectively.
  • Define the function
  • T(i) min(Oj, K-Oj) (gc)
  • min(Gj, K-Gj) c
  • min(Cj, K-Cj) g

53
Scoring with Affine Gaps (cont.)
  • The multiple alignment score is then

54
Iterative Refinement With Anchors
  • A shortcoming of progressive alignment
  • The initial pairwise alignments are fixed, and
    early errors cant be corrected later.

55
Iterative Refinement With Anchors (cont.)
  • Solution Iterative refinement.
  • 2 versions
  • (1) standard
  • (2) limited-area

56
Iterative Refinement With Anchors (cont.)
  • (1) standard (e.g. CLUSTALW)
  • Repeat it for a of iterations, or until
    a
  • local maximum is reached.
  • (2) limited-area
  • Constraint (stay within radius r).
  • Improve the alignment locally, not allow
  • large-scale changes.

57
Iterative Refinement With Anchors (cont.)
  • MLAGAN
  • Introduce (2) limited-area version, but of
    allowing larger-scale adjustments (by anchors).
  • Cumulative contribution
  • si max 0, (the score of position i) si-1
  • When si ?threshold T, reset si 0 and an
    anchor is created at position i.
  • Re-align the removed sequence using LAGAN with
    the anchors.

58
Thanks! ?
59
Introduction(1/3)
  • Multiple sequence alignments provide
    identification and characterization of functional
    elements.
  • Similarity of evolutionary distances is detected
    by multiple alignment of homologous sequences.
  • There have been several schemes of multiple
    alignments for bacterial and yeast genomes.
  • Several strategies for pairwise genome alignments
    of human and mouse genomes based on local or
    local/global technique.

60
Introduction(2/3)
  • Comparing more than two large and structurally
    complex genomes presents several challenges
  • Obtaining a consistent map between several
    genomes
  • Performing large-scale multiple alignment
  • Visualizing
  • Interpreting the results
  • The multiple alignment method of this paper
    expands on the local/global approach of Couronne
    and colleagues.

61
Introduction(3/3)
  • The technique is fully automated and efficient
  • It dose not require a prebuilt synteny map
  • It is able to align the three mammalian genomes
    in lt1 day on 24-node computer cluster.
  • The Multi-VISTA browser is developed with an
    user-friendly visualization approach for
    exploring conserved regions among multiple
    genomes.

62
Overview of Strategy for Multiple alignment(1/3)
General scheme of the method. White boxes show
original and intermediate data green boxes,
mapping/alignment steps and yellow and grey
boxes, resulting data.
The multi-contigs are aligned to human, using
the union of all available BLAT local
alignments from mouse to human and from rat to
human mouse or rat sequences that could not be
aligned to the other rodent are also aligned to
human.
The mouse and rat genomes are aligned by BLAT,
followed by LAGAN global alignment of selected
regions.
Multi-contigs
63
Overview of Strategy for Multiple alignment(2/3)
  • This strategy allows to predict more accurately
    the ortholog of each multi-contig in the human
    genome
  • Only 0.8(2Mbp) of the rat genome and 7 of the
    rat contigs were mapped to multiple areas in the
    human genome.
  • Compared with 4.4 of the genome and 32 of the
    contigs in the original technique.
  • The substitution matrices derived specifically
    for the human, mouse, and rat genomes is modified
    from LAGAN.

64
Overview of Strategy for Multiple alignment(3/3)
  • It generated 11235 areas of three-way alignments,
    74 of which are longer then 200 Kbp in the human
    sequence.
  • Verifing the quality of the alignments
  • Determining the percentage of whole genomes and
    protein-coding exons covered by high-scoring
    subalignments.
  • Comparing our alignments with a syntenic map
    generated independently, based on gene
    predictions.

65
Method
  • Part 1 Progressive Alignment Strategy
  • Part 2 Finding Genomic Synteny by Exons

66
Progressive Alignment Strategy
  • Genomes
  • April 2003 Human
  • February 2003 Mouse
  • June 2003 Rat
  • RepeatMasker
  • RefSeq

67
Work Flow
Global Align Genomes of Mouse and Rat
Aligned Sequences
Unaligned Sequences
Aligned to Human Genome
68
Mouse/Rat Alignment
4. Filtering
3.Group and extend.
1. Rat genome is divided into regions roughly
250kb in size.
Mouse Genome
Rat Genome
ltL
5. Use LAGAN to find global alignment.
Mouse Genome
2. Use BLAT to map these region with mouse.
(Local alignment)
Mouse Genome
ltL
ltL
Mouse Genome
69
Mouse/Rat Alignment (cont.)
  • Filtering
  • Group score lt70,000
  • lt70 of the maximum
  • Looser threshold ? higher sensitivity?
  • LAGAN
  • Empirically derived gap penalties (-500 for
    mouse/rat and -800 for human/rodent)
  • fastreject option
  • Abandon the alignment if the homology looks
    weak. Currently tuned for human/mouse distance,
    or closer.

!
70
Human/Rodent Alignment
Rat/Mouse
Aligned and Unaligned seq. of rat (and mouse) are
aligned to Human genome. Similar procedure as
mouse/rat alignment.
Unaligned sequence
Aligned sequence
71
Human/Rodent Alignment (Cont.)
  • Use scorealign tool to clip aligned sequences of
    mouse and rat.
  • scorealign
  • Hidden Markov Model
  • Given a pairwise alignment and conservation
    cutoff k, it returns regions that are more
    likely to have resulted from a k conservation
    model rather than the background (25)
    conservation model.

72
Finding Genomic Synteny by Exons
  • Get Predicted exons from genome
  • EST_MAP
  • Based on gene prediction
  • Fgenesh
  • Use Chains of Coding Exons to find Synteny
  • Initially build human/mouse and human/rat
    pairwise map, and then resolve them into a single
    three way map for human.

73
Get Exons from Genome (Cont.)
  • RefRNAs are mapped onto genomes by the EST_MAP
    program.
  • Ab intio Fgenesh prediction is run on the rest of
    genome.
  • Use BLAST to search protein homologs of genome
    region of 2.
  • Run Fgenesh to production 3. (more accurate)
  • Second run of ab intio gene prediction to regions
    without prediction in 1 to 4.
  • Fgenesh gene predictions are run in large introns
    of known and predicted genes.

74
Get Predicted Exons from Genome
Exon from running Fgenesh on large intron
Exons from second run of ab inio Fgenesh and
Fgenesh
Chromosome
Exons from ab intio Fgenesh and Fgenesh
Exons from mapping RefSeq mRNAs (Real exons!)
75
Use Chains of Coding Exons to Find Synteny
  • Compile a set of nonredundant, nonoverlapping
    exons with at least 10 amino acids
  • In ascending order along each chromosome.

MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFG
TQRFFD
TAEEKAAVTSLW
AGGEALGRLLV
76
Use Chains of Coding Exons to Find Synteny (Cont.)
  • Use BLASTP to align each exon to an exon set from
    the chromosome of the other organism.
  • Homologous exon chain 10 exons at least
  • Syntenic segment
  • Share at least 5 pairs of exons with
    bidirectional hits.
  • Syntenic segments are extended into Syntenic
    blocks and used to create synteny map.

70
77
Exon-Based Map of Conserved Synteny(1/3)
  • Because most gene-prediction programs demonstrate
    higher accuracy in predicting exons than in
    predicting entire genes.
  • A three-way synteny map is built based on chains
    of Fgenesh predicted exons, rather than whole
    genes.
  • Built human/mouse and human/rat pairwise maps,
    then resolved them into a single three-way map
    for human, mouse, and rat.

78
Exon-Based Map of Conserved Synteny(2/3)
  • During the construction of pairwise maps, chains
    of exons are defined .
  • The same order in each of the two genomes is at
    least 70 of the exons have homologs in the other
    genome.
  • Pairwise syntenic maps are merged into a
    three-way synteny map by selecting a single
    genome as a base and merging overlapping parts of
    the pairwise maps.
  • The resulting map has a total of 4497 three-way
    synteny segments.

79
Exon-Based Map of Conserved Synteny(3/3)
  • Among the 4497 segments, the mouse segment is
    absent in 191 cases (4.2), and the rat segment
    in 315 cases (7).
  • The total length of three-way synteny segments in
    the human genome was 674 Mb, with average segment
    length of 150 kb.
  • These segments are further extended into larger
    blocks by merging those that are within 5 Mb of
    each other in every genome.
  • Finding 494 synteny blocks is shared among all
    three genomes.
  • The total length of three-way synteny blocks was
    2351 Mb, with an average block length of 4.76Mb.

80
Exon-based map of conserved synteny between the
rat, human, and mouse genomes. Each rat
chromosome ( presented along the x-axis) contains
two columns, colored according to conserved
synteny with chromosomes of the human and mouse
genomes. Chromosome color scheme is shown at the
bottom.
81
Agreement Between Alignments and the Exon-Based
Map(1/2)
  • The multiple alignment of the three genomes and
    the predicted exon-based synteny map produce
    complementary, independent data sets that can be
    used to evaluate the accuracy of both methods.
  • The alignments generated by the automatic
    alignment pipeline with the exon-based synteny
    map are compared.
  • A syntenic block and an alignment were considered
    matching if they overlapped.

82
Agreement Between Alignments and the Exon-Based
Map(2/2)
gt100 kb in human
  • The longer alignments exhibited greater than 97
    agreement between the two maps.
  • But for very short alignments, the correlation is
    dropped to 13.

110 kb
83
Genome Coverage by Three-Way Alignments(1/2)
  • One way to evaluate alignment sensitivity is to
    compute the percentage of the base pairs of all
    genomes that are reliably aligned.
  • The scoring techniques developed for comparison
    of the human and mouse genomes is used to
    computed overall coverage, as well as coverage of
    RefSeq exons.

84
Genome Coverage by Three-Way Alignments(2/2)
85
Multi-VISTA Browser
  • To visualize the results of comparative sequence
    analysis of multiple genomes in the VISTA format.
  • It can be accessed at http//pipeline.lbl.gov.
  • The Multi-VISTA Browser displays humanmouserat
    multiple alignments on the scale of whole
    chromosomes, along with annotations.
  • The user may select any of the three genomes as
    the reference and display the level of
    conservation between this reference and the
    sequences of the other two species in a
    particular interval.

86
Multi-VISTA Browser
87
Discuss
  • By comparing the alignment to an independently
    generated map of protein synteny between the
    genomes, it concludes that 97 of alignments with
    a human sequence gt100 kbp, and 87 of all
    alignments, agree with the map.
  • The difference between these numbers can be
    explained by the lower accuracy of both alignment
    and synteny map generation when dealing with very
    short regions of conserved synteny.

88
Discuss
  • Only 3.4 of the human base pairs in the whole
    genome alignment are within such nonmatching
    regions.
  • One drawback of global alignments is their
    inability to deal with small rearrangement
    events.
  • A previous study has suggested that as much as 2
    of the gene-coding regions of the human genome
    may have undergone some local rearrangement
    events.
  • Since the divergence between human and the
    rodents, and the local/global approach often is
    not able to cope with these events.

89
Discuss
  • Additional genomes will help to verify the
    quality of existing alignments and provide the
    biologists with additional comparative
    information with which to judge the evolutionary
    importance of a region.
  • Adding several other mammalian genomes will
    possibly allow us to locate constraints at the
    individual base pair level.
  • The availability of these genomes would make
    possible the use of comparative sequence analysis
    in new areas, such as the determination of
    individual binding sites.

90
??-syteny
  • ?????????
  • ??????????????????????????????????????????????????
    ??????????????????,??????????????????????,????????
    ??????????????????????,??????????????????????(inve
    rsion, translocation, duplication)????????????????
    ????????????????,?????????????????????,???????????
    ????????????synteny?

http//vschool.scu.edu.tw/biology/content/genetics
/ge082.htm
91
??-syteny
  • ????,?????????????????????????,????????(syntenic)?
    ????????????

http//vschool.scu.edu.tw/biology/content/genetics
/ge082.htm
92
??- RefSeq
  • RefSeq?Reference Sequences???-????????DNA
    contigs??????mRNAs??????????mRNAs???
    ???????????????Accession numbers????
    ??2??????6???,??NT_123456?NM_123456?
    NC_123456?NG_123456?XM_123456?XR_123456(??
    http//www.ncbi.nlm.nih.gov/RefSeq/)?

http//www.ascc.sinica.edu.tw/nl/92/1922/02.txt
93
??-orthologs
  • orthologs/orthologous (????)
  • ?????????(??????????)??????????????

http//pastime.cgu.edu.tw/petang/Bioinfomatics/Lec
ture/93-1/93-1_11/CGU_Bioinformatics_20041203.file
s/frame.htmslide0040.htm
94
Reference
  • http//www.cf.ac.uk/biosi/research/biosoft/Help/To
    pics/evolutionaryDistance.html
  • http//vschool.scu.edu.tw/biology/content/genetics
    /ge082.htm
  • http//pastime.cgu.edu.tw/petang/Bioinfomatics/Lec
    ture/93-1/93-1_11/CGU_Bioinformatics_20041203.file
    s/frame.htmslide0040.htm
  • http//www.ascc.sinica.edu.tw/nl/92/1922/02.txt
  • http//en.wikipedia.org/wiki/Binding_site
  • http//genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid769
    33989cchr7gsoftberryGene
Write a Comment
User Comments (0)
About PowerShow.com