Title: A Comparison of Methods for Aligning Genomic Sequences
1A Comparison of Methods for Aligning Genomic
Sequences
- JaNera Mitchom
- Fisk University
- Research Alliance for Minorities
- Oak Ridge National Laboratory
- Dr. Mike Leuze
- http//www.csm.ornl.gov
2Outline
- Introduction
- Purpose of Genomic Sequence Analysis
- Background Information
- Central Dogma of Molecular Biology
- Gene Regulation
- Transcription Factor Binding Sites
- Computational Methods of Aligning Genomic
Sequences - CLUSTALW
- DIALIGN
- ORNL Method
- Results
- Acknowledgements
3Introduction
- Genomic Sequence Analysis is used to identify
genomic regulatory regions - Genes are regulated by proteins called
transcription factors - Transcription factors bind to DNA sites found
close to the gene- regulatory regions - Regions can be found by sequence comparison and
may contain binding sites for proteins
The importance of this research is a better
understanding of the way genes are regulated,
which will aid in the development of treatments
for genetic diseases
4Background Information
Central Dogma of Molecular Biology
- DNA specifies RNA, which specifies proteins
-
-Francis Crick
5Gene Regulation
- Each cell has copy of entire genome.
- Every gene not expressed in every cell
- Mechanism in each cell that causes the gene to be
expressed and inhibits expression of other genes
6Transcription Factor Binding Sites
-
- A section of DNA that transcription
factors, proteins that control genes expression,
bind to in order to help transcription of DNA to
RNA, usually found in close proximity to the gene.
7Computational Methods of Aligning Genomic
Sequences
- Computational methods have been developed
to identify genomic regulatory regions by
sequence comparison. CLUSTALW and DIALIGN are two
computational methods that were compared to a
method developed at ORNL to determine which
method is the best for finding these regions.
8FAS(Fatty Acid Synthase)
- Chicken
- Goose
- Important implications in obesity, diabetes,
artherosclerosis, and cancer
gt Chicken_FAS GGATCCCGTGGGGAGGTTGAGCTGTGCACTAAATGG
AGGAGGTGAATTCT CACAGCTTTCTATTATGACCAAAATATTAACAAAT
AGTTCATTGATCTAA AAAACAGATGGCAAAGAAGGGTGGCGAGGTGGCA
CCGCGGCTGCTACCGG CTTGTGCAAGGAAAGGAGATCCTATAAGCAATG
ACTCCAGCCGACCACAG AGGAAGGCAGAACTCGCCTCACAGCTCACCCC
GCCCAACCCCCCGCACTG
gt Goose_FAS AGTCACCACACAATCGCTTATCGCCTAGCAACACCTAC
CGGGCACGCCAT TGGCAGGCCGCGCCCCCGCCCAACGCCCCCGCCTGAT
TGGGTGCCGGCCC AGGACTGCGCGTCCAGCCGCGCGCCCCTTCTCTTTG
TGCGCTGCCGGGTG CGCGGCGATCCGTGGCCCGGCGCCCGCCCGGCTTC
AGCGCGCCCTGCCGA GCCACGGTGCCGGCGCAGTAGTAGTCCCAACCAC
AGTGTGCACATCCGCG GGGCGGGGGGAGAGGGACACAGAAAGGGACGCG
GCGCTCGCTGCGATGGG
9CLUSTALW
- Uses progressive method of multiple sequence
analysis - Basic Alignment Procedure
- Performs pair-wise alignment of sequences
- Produces phylogenetic tree based on alignment
scores - Sequences progressively aligned according to
branching order in phylogenetic tree
10DIALIGN(Diagonal Alignment)
- Uses iterative method of multiple sequence
alignment - Basic Alignment Procedure
- Assembles pair-wise and multiple alignments of
sequences - Assigns each possible fragment with weight score
based on probability of random occurrence - Incorporates sequences into multiple alignment
based on weight score
11Two Phase ORNL Method
- Developed within the last year.
- Phase One collection of small subsequences that
match specified criteria - Phase Two assembly of pieces from phase-one
collection into larger patterns
12Results
- Showed an importance of similarity in the regions
closest to the gene - ORNL method found similarity in several regions
- CLUSTALW and DIALIGN could not find those
similarities-possibly due to their method of
alignment
The discovery of these potential regulatory
regions can make the ORNL method a better method
of genomic sequence analysis compared to CLUSTALW
and DIALIGN.
13Results
- Similar regions found by all three methods
- CLUSTALW
- Chicken_FAS TGGCCGCCGGAGGTGGTGGCTGCTTAAATAGCGGTG
GGAGCTAGAGGGAGA - Goose_FAS TGGCCGCCGGAGGTGGCGGCTGCTTAAATAGCGGTG
GGAGCTAGAGGGAGA - DIALIGN
- Chicken_FAS GTGGCCGCCGGAGGTGGtG GCTGCTTAAA
TAGCGGTGGG AGCTAGAGGG AGA - Goose_FAS GTGGCCGCCGGAGGTGGcG GCTGCTTAAA
TAGCGGTGGG AGCTAGAGGG AGA - ORNL Method
- Length 52 Substitutions 1 Score
67.033 2 C AGAGGGAGATCGAGGGTGGCGATAAATTCGTCGG
TGGTGGAGGCCGCCGGTG 0 G AGAGGGAGATCGAGGGTGGCGA
TAAATTCGTCGGCGGTGGAGGCCGCCGGTG
14Results
- Region found by ORNL
- Length 28 Substitutions 2 Score
30.660 404 C CCTGCGCGTCACGGCCCGGCCGTGGGTT
483 G CCTGCGCGTCAGGACCCGGCCGTGGGTT - CLUSTALW match
- Chicken_FAS TTGGGTGCCGGCCCGGCACTGCGCGTCC
- Goose_FAS CCGGTCGCAGAGCGCGGCCTTCCACGGC
- DIALIGN match
- Chicken_FAS TTG GGTGCCG-GC CCGGCACTGC GCGTCC
- Goose_FAS CCG GTCGCAGaGC GCGGCCTTCC ACGGCC
15Accomplishments
- Learned PERL
- Compared DNA sequences of two species using three
computational methods - Discussed potential applications of comparison
results - Prepared research document
16Acknowledgements
- This research was performed under the Research
Alliance for Minorities Program administered
through the Computer Science and Mathematics
Division, Oak Ridge National Laboratory. This
program is sponsored by the Mathematical,
Information and Computational Sciences Division,
Office of Advanced Scientific Computing Research,
U.S. Department of Energy. Oak Ridge National
Laboratory is managed by UT-Battelle, LLC, for
the U.S. Department of Energy under contract
DE-AC05-00OR22725. - This work has been authored by a contractor of
the U.S. Government under contract
DE-AC05-00OR22725. Accordingly the U.S.
Government retains a nonexclusive, royalty-free
license to publish or reproduce the published
form of this contribution, or allow others to do
so, for U.S. Government purposes.
I would like to thank Dr. Mike Leuze Dr.
Andrey Gorin Dr. Glenn Allgood Jason
McKay Cheryl Hamby Debbie McCoy