Fr 410: Introduction History, Database Biology, Sequence Formats - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Fr 410: Introduction History, Database Biology, Sequence Formats

Description:

Fr 4/10: Introduction History, Database Biology, Sequence Formats ... Orthology Search. Blast. BLAST is actually a family of programs: ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 17
Provided by: Asatisfied332
Category:

less

Transcript and Presenter's Notes

Title: Fr 410: Introduction History, Database Biology, Sequence Formats


1
  • Fr 4/10 Introduction History, Database
    Biology, Sequence Formats
  • Fr 11/10 Pairwise comparison, scoring, matrices
  • Do 17/10 B10 Perl Herling Demo
  • Fr 18/10 Multiple alignment, Basic and advanced
    Database Searching
  • Do geen practicum
  • Fr 25/10 geen les
  • Do geen practicum
  • Fr 1/11 geen les
  • Do 7/11 Demo Oefeningen
  • Fr 8/11 Phylogenetics, Gene prediction, junk
    mining (RNA prediction)
  • Fr 15/11 geen les
  • Fr 22/11 Protein structure, classification and
    engineering

2
(No Transcript)
3
(No Transcript)
4
Blast
  • Examples
  • Proteomics, 2D analysis
  • Yeast Two-Hybrid positives
  • Sequencing
  • Orthology Search
  • .

5
Blast
  • BLAST is actually a family of programs
  • BLASTN - Nucleotide query searching a nucleotide
    database.
  • BLASTP - Protein query searching a protein
    database.
  • BLASTX - Translated nucleotide query sequence (6
    frames) searching a protein database.
  • TBLASTN - Protein query searching a translated
    nucleotide (6 frames) database.
  • TBLASTX - Translated nucleotide query (6 frames)
    searching a translated nucleotide (6 frames)
    database.

6
Oefeningen http//biochema.rug.ac.be/
  • Which genes are involved in the PRADER-WILLI
    SYNDROME ?
  • How may different human PDE (phosphodiesterases)
    are available in Genbank ?
  • How big is the anthrax genome and how many genes
    are present ?
  • Which of the 4 sequences (seq1/2/3/4)
  • Contains a hexokinases signature
    (LIVM-G-F-TN-F-S-FY-P-x(5)-LIVM-DNST-x(3
    )-LIVM- x(2)-W-T-K-x-LF)
  • How many of them?
  • Where (hint) ?
  • Write program (random.pl) to generate 10 random
    sequences of 1000 bp and write them to a file in
    fasta format
  • Find the answer in ultimate-sequence.txt
  • (hint use AA1 to perform translation(s))
  • What is the restriction enzyme which the longest
    recognition site ?

7
Pattern.pl
  • We would like to find out whether the concensus
    sequence is contained (somewhere) in a given
    sequence a.
  • Without quantifiers
  • if (a /ACCCCAGAGAGGTGT/) ...
  • With quantifiers
  • if (a /AC4AG3(GT)2/) ...

8
  • gtSEQ1
  • MGNLFENCTHRYSFEYIYENCTNTTNQCGLIRNVASSIDVFHWLDVYIST
    TIFVISGILNFYCLFIALYT YYFLDNETRKHYVFVLSRFLSSILVIISL
    LVLESTLFSESLSPTFAYYAVAFSIYDFSMDTLFFSYIMIS
    LITYFGVVHYNFYRRHVSLRSLYIILISMWTFSLAIAIPLGLYEAASNSQ
    GPIKCDLSYCGKVVEWITCS LQGCDSFYNANELLVQSIISSVETLVGSL
    VFLTDPLINIFFDKNISKMVKLQLTLGKWFIALYRFLFQMT
    NIFENCSTHYSFEKNLQKCVNASNPCQLLQKMNTAHSLMIWMGFYIPSAM
    CFLAVLVDTYCLLVTISILK SLKKQSRKQYIFVVVRLSAAILIALCIII
    IQSTYFIDIPFRDTFAFFAVLFIIYDFSILSLLGSFTGVAM
    MTYFGVMRPLVYRDKFTLKTIYIIAFAIVLFSVCVAIPFGLFQAADEIDG
    PIKCDSESCELIVKWLLFCI ACLILMGCTGTLLFVTVSLHWHSYKSKKM
    GNVSSSAFNHGKSRLTWTTTILVILCCVELIPTGLLAAFGK
    SESISDDCYDFYNANSLIFPAIVSSLETFLGSITFLLDPIINFSFDKRIS
    KVFSSQVSMFSIFFCGKR
  • gtSEQ2
  • MLDDRARMEA AKKEKVEQIL AEFQLQEEDL KKVMRRMQKE
    MDRGLRLETH EEASVKMLPT YVRSTPEGSE VGDFLSLDLG
    GTNFRVMLVK VGEGEEGQWS VKTKHQMYSI PEDAMTGTAE
    MLFDYISECI SDFLDKHQMK HKKLPLGFTF SFPVRHEDID
    KGILLNWTKG FKASGAEGNN VVGLLRDAIK RRGDFEMDVV
    AMVNDTVATM ISCYYEDHQC EVGMIVGTGC NACYMEEMQN
    VELVEGDEGR MCVNTEWGAF GDSGELDEFL LEYDRLVDES
    SANPGQQLYE KLIGGKYMGE LVRLVLLRLV DENLLFHGEA
    SEQLRTRGAF ETRFVSQVES DTGDRKQIYN ILSTLGLRPS
    TTDCDIVRRA CESVSTRAAH MCSAGLAGVI NRMRESRSED
    VMRITVGVDG SVYKLHPSFK ERFHASVRRL TPSCEITFIE
    SEEGSGRGAA LVSAVACKKA CMLGQ
  • gtSEQ3
  • MESDSFEDFLKGEDFSNYSYSSDLPPFLLDAAPCEPESLEINKYFVVIIY
    VLVFLLSLLGNSLVMLVILY SRVGRSVTDVYLLNLALADLLFALTLPIW
    AASKVTGWIFGTFLCKVVSLLKEVNFYSGILLLACISVDRY
    LAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLIFRKTIYPPYVSPV
    CYEDMGNNTANWRMLLRILP QSFGFIVPLLIMLFCYGFTLRTLFKAHMG
    QKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTWVIQET
    CERRNDIDRALEATEILGILHSCLNPLIYAFIGQKFRHGLLKILAIHGLI
    SKDSLPKDSRPSFVGSSSGH TSTTL
  • gtSEQ4
  • MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK
    NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ
    LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF
    IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW
    TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG
    TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG
    KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ
    IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA
    WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL
    WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS
    LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA
    ALCALQAVKE KKGLA MEANFQQAVK KLVNDFEYPT ESLREAVKEF
    DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG
    GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII
    DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV
    SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR
    IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE
    QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL
    DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE
    GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR
    FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG
    KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG
    IAKDGSGIGA ALCALQAVKE KKGLA

9
Use w and strict
  • !C\Perl\Bin\perl.exe
  • value 42
  • print "Value is OK\n" if valu lt 100
  • Using my
  • You can use "my" on a single variable, or on a
    list of variables my value 42 my a my
    (c,d,e,f) my (first,second) (1,2)

10
Sub routine
  • a5b9sumAdd(5,9)print "The SUM is
    sum\n" sub Add() a_0
  • b_1
  • alternatively we could do this
    my(a,b)_at__ my(answer)ab return
    answer

11
Sub routine
  • sub random_nucleotide
  • my(_at_nucs) _at__
  • return nucsrand _at_nucs
  • Rand Function
  • number int(rand(5)) will produce possible
    numbers of 0,1,2,3,4

12
Translate.pl
  • gtultimate-sequence
  • ACTCGTTATGATATTTTTTTTGAACGTGAAAATACTTTTCGTGCTATGGA
    AGGACTCGTTATCGTGAAGTTGAACGTTCTGAATGTATGCCTCTTGAAAT
    GGAAAATACTCATTGTTTATCTGAAATTTGAATGGGAATTTTATCTACAA
    TGTTTTATTCTTACAGAACATTAAATTGTGTTATGTTTCATTTCACATTT
    TAGTAGTTTTTTCAGTGAAAGCTTGAAAACCACCAAGAAGAAAAGCTGGT
    ATGCGTAGCTATGTATATATAAAATTAGATTTTCCACAAAAAATGATCTG
    ATAAACCTTCTCTGTTGGCTCCAAGTATAAGTACGAAAAGAAATACGTTC
    CCAAGAATTAGCTTCATGAGTAAGAAGAAAAGCTGGTATGCGTAGCTATG
    TATATATAAAATTAGATTTTCCACAAAAAATGATCTGATAA

13
  • my AA1 (
  • 'UUU','F',
  • 'UUC','F',
  • 'UUA','L',
  • 'UUG','L',
  • 'UCU','S',
  • 'UCC','S',
  • 'UCA','S',
  • 'UCG','S',
  • 'UAU','Y',
  • 'UAC','Y',
  • 'UAA','',
  • 'UAG','',
  • 'UGU','C',
  • 'UGC','C',
  • 'UGA','',
  • 'UGG','W',
  • 'CUU','L',

14
Web application
  • Install/Run Apache (PHPTriad)
  • Input Form in C\apache\HTDOCS
  • Eg translate.html
  • Action to .pl script in C\apache\cgi-bin that
    generate HTML (using the CGI module)

15
Sliding window GC, Chou-Fasman, Dotplot
  • sub gc_content
  • my seq shift
  • my win shift
  • print "length ",length(seq),"\n"
  • for (my i 0 i lt length(seq) - win i)
  • my segment substr(seq,i,win)
  • my gc_count segment tr/GCgc/GCgc/
  • print i1,"\t",segment,"\t",gc_count,"\n"

16
  • Combinatie Blast ClustalW
  • Structure/Function - Directed Evolution
  • Rather than rely upon natural variation, directed
    evolution generates new variation in one of two
    ways.
  • The first is by inducing mutations in the gene of
    interest, either by using mutator strains, or by
    error-prone PCR (in which the fidelity of Taq
    polymerase is decreased rather than optimised).
  • The second involves shuffling different regions
    of members of the same gene family to produce
    genes in which the inherent variation of the
    family is recombined to give novel gene products.
  • Once a population has been produced, screening
    and selection can isolate those members which
    most closely match the desired attributes. The
    whole process can then be repeated using these
    members as the starting point for the generation
    of new variation. After relatively few cycles,
    great improvements in performance have been seen.
Write a Comment
User Comments (0)
About PowerShow.com