Lecture 20: Bioinformatics continue - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Lecture 20: Bioinformatics continue

Description:

Let's have the YY1 gene as an example. Get the mRNA sequence for ... a procedure of comparing two (pair-wise) or more ... the practice and homework tests ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 12
Provided by: mdsc2
Category:

less

Transcript and Presenter's Notes

Title: Lecture 20: Bioinformatics continue


1
Lecture 20Bioinformatics (continue)
2
Outline
  • Genomics in general
  • DNA sequencing method
  • Human Genome Project
  • What to learn from HGP
  • Bioinformatics
  • predicting Open Reading Frame
  • comparing two sequences ---gt information

3
Open Reading Frame Prediction
Lets have the YY1 gene as an example. Get the
mRNA sequence for human YY1 from
(http//www.ncbi.nlm.nih.gov/mapview/map_search.cg
i?taxid960) ------gt NM_003403 Cut and paste the
sequence into the window of ORF prediction
programs 1) http//searchlauncher.bcm.tmc.edu/seq
-util/seq-util.html 2) http//www.ncbi.nlm.nih.g
ov/gorf/gorf.html
gthYY1, 1320 bases. ccgcccgcccgcagccgaggagccgaggccg
ccgcggccgtggcggcggagccctcagccatggcctcgggcgacaccctc
tacatcgccacggacggctcggagatgccggccgagatcgtggagctgca
cgagatcgaggtggagaccatcccggtggagaccatcgagaccacagtgg
tgggcgaggaggaggaggaggacgacgacgacgaggacggcggcggtggc
gaccacggcggcgggggcggccacgggcacgccggccaccaccaccacca
ccatcaccaccaccaccacccgcccatgatcgctctgcagccgctggtca
ccgacgacccgacccaggtgcaccaccaccaggaggtgatcctggtgcag
acgcgcgaggaggtggtgggcggcgacgactcggacgggctgcgcgccga
ggacggcttcgaggatcagattctcatcccggtgcccgcgccggccggcg
gcgacgacgactacattgaacaaacgctggtcaccgtggcggcggccggc
aagagcggcggcggcggctcgtcgtcgtcgggaggcggccgcgtcaagaa
gggcggcggcaagaagagcggcaagaagagttacctcagcggcggggccg
gcgcggcgggcggcggcggcgccgacccgggcaacaagaagtgggagcag
aagcaggtgcagatcaagaccctggagggcgagttctcggtcaccatgtg
gtcctcagatgaaaaaaaagatattgaccatgagacagtggttgaagaac
agatcattggagagaactcacctcctgattattcagaatatatgacagga
aagaaacttcctcctggaggaatacctggcattgacctctcagatcccaa
acaactggcagaatttgctagaatgaagccaagaaaaattaaagaagatg
atgctccaagaacaatagcttgccctcataaaggctgcacaaagatgttc
agggataactcggccatgagaaaacatctgcacacccacggtcccagagt
ccacgtctgtgcagaatgtggcaaagcttttgttgagagttcaaaactaa
aacgacaccaactggttcatactggagagaagccctttcagtgcacgttc
gaaggctgtgggaaacgcttttcactggacttcaatttgcgcacacatgt
gcgaatccataccggagacaggccctatgtgtgccccttcgatggttgta
ataagaagtttgctcagtcaactaacctgaaatctcacatcttaacacat
gctaaggccaaaaacaaccagtgaaaagaagagagaaga
4
Three possible frames of YY1
Which one is the right ORF? Hints-----gt use the
rules 1) ATG start and STOP codon() and 2) the
longest frame
gthYY1, 1320 bases., frame3, 439 bases, 1139
checksum. ARPQPRSRGRRGRGGGALSHGLGRHPLHRHGRLGDAGRDR
GAARDRGGDHPGGDHRDHSGGRGGGGGRRRRGRRRWRPRRRGRPRAR RP
PPPPPSPPPPPAHDRSAAAGHRRPDPGAPPPGGDPGADARGGGGRRRLGR
AARRGRLRGSDSHPGARAGRRRRRLHTNAGHRGG GRQERRRRLVVVGR
RPRQEGRRQEERQEELPQRRGRRGGRRRRRPGQQEVGAEAGADQDPGGRV
LGHHVVLRKKRYPDSGRTD HWRELTSLFRIYDRKETSSWRNTWH
PLRSQTTGRICNEAKKNRRCSKNNSLPSRLHKDVQGLGHEKTSA
HPRSQSPRLCR MWQSFCEFKTKTTPTGSYWREALSVHVRRLWETLFTG
LQFAHTCANPYRRQALCVPLRWLEVCSVNPEISHLNTCGQKQPVK
RREK gthYY1, 1320 bases., frame2, 439 bases, E51
checksum. RPPAAEEPRPPRPWRRSPQPWPRATPSTSPRTARRCRPRS
WSCTRSRWRPSRWRPSRPQWWARRRRRTTTTRTAAVATTAAGAATGT PA
TTTTTITTTTTRPSLCSRWSPTTRPRCTTTRRSWCRRARRWWAATTRT
GCAPRTASRIRFSSRCPRRPAATTTTLNKRWSPWR RPARAAAAARRRRE
AAASRRAAARRAARRVTSAAGPARRAAAAPTRATRSGSRSRCRSRPWRAS
SRSPCGPQMKKKILTMRQWLKNR SLERTHLLIIQNIQERNFLLEEYLA
LTSQIPNNWQNLLESQEKLKKMMLQEQLALIKAAQRCSGITRPENIC
TPTVPESTSVQ NVAKLLLRVQNNDTNWFILERSPFSARSKAVGNAFHW
TSICAHMCESIPETGPMCAPSMVVIRSLLSQLTNLTSHMLRPKTTSE
KKRE gthYY1, 1320 bases., frame1, 440 bases, 149
checksum. PPARSRGAEAAAAVAAEPSAMASGDTLYIATDGSEMPAEI
VELHEIEVETIPVETIETTVVGEEEEEDDDDEDGGGGDHGGGGGHGH AG
HHHHHHHHHHHPPMIALQPLVTDDPTQVHHHQEVILVQTREEVVGGDDSD
GLRAEDGFEDQILIPVPAPAGGDDDYIEQTLVTVA AAGKSGGGGSSSSG
GGRVKKGGGKKSGKKSYLSGGAGAAGGGGADPGNKKWEQKQVQIKTLEGE
FSVTMWSSDEKKDIDHETVVEEQ IIGENSPPDYSEYMTGKKLPPGGIPG
IDLSDPKQLAEFARMKPRKIKEDDAPRTIACPHKGCTKMFRDNSAMRKHL
HTHGPRVHVCA ECGKAFVESSKLKRHQLVHTGEKPFQCTFEGCGKRFSL
DFNLRTHVRIHTGDRPYVCPFDGCNKKFAQSTNLKSHILTHAKAKNNQ
KEERR
5
How to confirm the predicted ORF
  • we can test the existence of the predicted ORF
    by searching similar protein
  • in protein database
  • a logic behind this approach,
  • many biological organisms have similar sets
    of protein machineries
  • if the predicted ORF is the right one, it
    should find similar sequence in the
  • existing database
  • Most popular sequence searching program
  • --------------gt BLAST (Basic Local
    Alignment Search Tool)
  • 1) you can use
    either DNA or Protein sequences
  • to find similar
    sequences from databases
  • demonstration
    using three ORFs

6
ORF prediction with another program
  • ORF Finder (Open Reading Frame Finder)
  • http//www.ncbi.nlm.nih.gov/gorf/gorf.
  • These two approaches just provide several hints,
    and you (human) are still
  • the calling which one. (Use all the possible
    hints to predict the right one!)
  • exon-intron structure
  • ATG start and STOP codons
  • the longest one

7
Comparing two sequences
  • Sequence alignment a procedure of comparing two
    (pair-wise) or more
  • (multiple) DNA or protein sequences by looking
    for a series of characters
  • or patterns that are in the same order in the
    sequences.
  • Sequence alignment can provide us with
  • Function
  • Structure
  • Evolutionary information

8
Comparing two sequences
Many sequence alignment approaches have been
developed starting from dot plot to BLAST
approaches
9
Comparing two sequences
Lets use again YY1 sequence. Compare two YY1
sequences derived from human and
mouse. Retrieving YY1 sequence from the database
(Use mapviewer http//www.ncbi.nlm.nih.gov/mapvi
ew/) gtMus musculus YY1 transcription factor
(Yy1), mRNA CTTCCCCACGGCCGGCCGCCTCCTCGCCCGCCCGCCCT
CCCTCCCGCAGCCCAGGAGCCGACGCCGCCTGCCGCGGCGGCCGTGGC G
GCGGAGCCCTCAGCCATGGCCTCGGGCGACACCCTCTACATCGCCACGGA
CGGCTCGGAGATGCCGGCCGAGATCGTGGAGCTGC ATGAGATCGAGGTG
GAGACCATCCCGGTGGAGACCATCGAGACCACGGTGGTGGGCGAGGAGGA
GGAGGAGGACGACGACGACGAG GACGGCGGCGGCGGCGACCACGGCGGC
GGCGGGGGCGGCCACGGGCACGCCGGCCACCACCATCACCACCACCACCA
CCACCACCA CCACCCGCCCATGATCGCGCTGCAGCCGCTGGTGACGGAC
GACCCGACCCAAGTGCACCACCACCAGGAGGTGATCCTGGTGCAGA CGC
GCGAGGAGGTGGTCGGCGGGGACGACTCGGACGGGCTGCGCGCCGAGGAC
GGCTTCGAGGACCAGATCCTCATCCCGGTGCCC GCGCCGGCCGGCGGCG
ACGACGACTACATAGAGCAGACGCTGGTCACCGTGGCGGCGGCCGGCAAG
AGCGGCGGCGGGGCCTCGTC GGGCGGCGGTCGCGTGAAGAAGGGCGGCG
GCAAGAAGAGCGGCAAGAAGAGTTACCTGGGCGGCGGGGCCGGCGCGGCG
GGCGGCG GCGGCGCCGACCCGGGGAATAAGAAGTGGGAGCAGAAGCAGG
TGCAGATCAAGACCCTGGAGGGCGAGTTCTCGGTCACCATGTGG TCCTC
GGATGAAAAAAAAGATATTGACCATGAAACAGTGGTTGAAGAGCAGATCA
TTGGAGAGAACTCACCTCCTGATTATTCTGA ATATATGACAGGCAAGAA
ACTCCCTCCTGGAGGGATACCTGGCATTGACCTCTCAGACCCTAAGCAAC
TGGCAGAATTTGCCAGAA TGAAGCCAAGAAAAATTAAAGAAGATGATGC
TCCAAGAACAATAGCTTGCCCTCATAAAGGCTGCACAAAGATGTTCAGGG
ATAAC TCTGCTATGAGAAAGCATCTGCACACCCACGGTCCCAGAGTCCA
CGTCTGTGCAGAGTGTGGCAAAGCGTTCGTTGAGAGCTCAAA GCTAAAA
CGACACCAGCTGGTTCATACTGGAGAAAAGCCCTTTCAGTGCACATTCGA
AGGCTGCGGGAAGCGCTTTTCACTGGACT TCAATTTGCGCACACATGTG
CGAATCCATACCGGAGACAGGCCCTATGTGTGCCCCTTCGACGGTTGTAA
TAAGAAGTTTGCTCAG TCAACTAACCTGAAATCTCACATCTTAACACAC
GCTAAAGCCAAAAACAACCAGTGAAAAGAAGAGAGAAGACCTTCTCGACC
CGG GAAGCCTCTTCAGGAGTGTGATTGGGAATAAATATGCCTCTCCTTT
GTATATTATTTCTAGGAAGAATTTTAAAAATGAATCCTAC ACACTTAAG
GGACATGTTTTGATAAAGTAGTAAAAATTTAAAAAATACTTTAATAAGAT
GACATTGCTAAGATGCTATATCTTGCT CTGTAATCTCGTTTCAAAAACA
AGGTATTTTTGTAAAGTGTGGTCCCAACAGGAGGACAATTCATGAACTTC
GCATCAAAAGACAA TTCTTTATACAACAGTGCTAAAAATGGGACTTCTT
TTCACATTCTTATAAATATGAAGCTCACCTGTTGCTTACAATTTTTTTAA
T TTTGTATTTTCCAAGTGTGCATATTGTACACTTTTTGGGGATATGCTT
AGTAATGCTGTGTGATTTTCTGGAGGTTGATAACTTTG CTTGCGGTAGA
TTTTCTTTAAAAGAATGGGCAGTTACATGCATACTTCAAAAGTATTTTTC
CTGTACAAAAAAAAAAGTTATATAG GTTTTGTTTGCTATCTTAATTTTG
GTTGTATTCTTTGATGTTAACACATTTTGTATAATTGTATCGTATAGCTG
TATTGAATCATG TAGAATCAAATATTAGATGTGATTTAATAGTGTTAAT
CAATTTAAACCCATTTTAGTCACTTTTTTTTTCCCCAAAAATACTGCCA
GATGCTGATGTTCAGTGTAATTTCTTTGCCTGTTCAGTTACAGAAAGTGG
TGCTCAGTTGTAGAATGTATTGTACCTTTTAACATC TGATGTGTACATC
CGTGTAACAGGAAGGGCAACAATAAAATAGCGATCCTAAAGAAAGATTAC
GGCAGAAAGAGCTCTGTAAGCAC AGCCTTATTTTCTTCTGTTGTCCAGA
ATACTTAGAATTCTTGAGCCTCCCAGAAATTGGAAGCAAATAAAGCAACT
TGAGTTTCCT TTAAAAAA
10
Comparing two sequences
Compare two species sequences at the DNA and
protein levels Use one of BLAST program bl2seq
(BLAST 2 SEQUENCES) human YY1 mRNA (NM_003403)
protein (NP_003394) mouse YY1 mRNA (NM_009537)
protein (NP_033563) DNA sequence similarity
-------------gt 94 - 96 Protein sequence
similarity ------------gt 98 Why protein
sequence similarity is higher than DNA sequence?
11
Final exam will cover the followings
  • starting from Transcription to Genomics
  • try to solve all the practice and homework tests
  • go over each lecture slide and focus on the main
    points
  • final review will be held at the Annex
    auditorium at 140 Wednesday
Write a Comment
User Comments (0)
About PowerShow.com