BS961 - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

BS961

Description:

Applications of sequence database searches to generate useful biological ... gi|16878176|gb|AAH17293.1|AAH17293 (BC017293) moesin [Homo sapiens] Length = 577 ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 67
Provided by: stanwa2
Category:
Tags: bs961 | sapiens

less

Transcript and Presenter's Notes

Title: BS961


1
BS961
  • Session 4
  • Bioinformatics

2
  • Session 4 Introduction to bioinformatics.
  • 1. Applications of sequence database searches to
    generate useful biological information.
  • 2. Sequence motifs. RNA structures. Phylogenetic
    trees.
  • 3. Molecular epidemiology and monitoring of
    therapy.
  • 4. The transcriptome and applications of
    microarray technology.
  • 5. Preparation for case studies on applications
    of sequenced-based information on human health.
  • 6. Worksheet distributed.

3
Objectives
  • Discuss the applications of sequence database
    searches to generate useful biological
    information.
  • Explain what sequence motifs are.

4
Sequence database searches (Strachan and Reid pp
468-471)
  • Vast amount of sequence information in
    databases, from cloning specific genes,
    generation of random ESTs and genome projects.
  • Led to development of one aspect of
    Bioinformatics- obtaining biological information
    on gene structure and function and on proteins
    from raw sequence data.
  • Done by comparing the sequence under study (a
    gene or protein with unknown function), with
    databases, to find similarities with genes with
    known function. This can then give clues about
    the function of unknown genes.

5
Main programs used are
  • BLASTN compares a nucleotide sequence against a
    nucleotide sequence database
  • BLASTP compares an amino acid sequence against
    a protein sequence database
  • These and many programs are available at
    http//www.ebi.ac.uk/Tools/index.html These
    programs find sequences most closely related to
    the test sequence defined either by the greatest
    number of matches or least number of mismatches.

6
Typical scenario
  • Full-length or partial amino acid sequence
    (usually predicted from the nucleotide sequence
    using a translation program such as the translate
    tool at http//www.expasy.ch/tools/dna.html) is
    compared against all protein sequences in the
    SWISSPROT database.

7
Typical scenario
  • This may reveal the presence of closely related
    proteins (homologs is the general term) in the
    same organism (paralogs- paralogues) or the
    equivalent protein in other organisms (orthologs-
    orthologues). If the function of these is
    already known, some idea may be derived about the
    function of the unknown protein.

8
Simplified scheme for origin of paralogs and
orthologs
9
Simplified scheme for origin of paralogs and
orthologs
10
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
11
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
Divergence possibly to related but different
functions
12
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
Divergence possibly to related but different
functions
13
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
Divergence possibly to related but different
functions
14
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
Divergence possibly to related but different
functions
15
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
Divergence possibly to related but different
functions
16
Simplified scheme for origin of paralogs and
orthologs
Gene duplication
Divergence possibly to related but different
functions
17
Simplified scheme for origin of paralogs and
orthologs
18
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
19
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
20
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
Divergence
21
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
Divergence
22
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
Divergence
23
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
Divergence
24
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
Divergence
Duplication
25
Simplified scheme for origin of paralogs and
orthologs
Evolution into 2 organisms
Divergence
Duplication
A B
C D D
26
Simplified scheme for origin of paralogs and
orthologs
and are paralogs and are paralogs and
are orthologs is an ortholog of both and
A B
C D D
27
Simplified scheme for origin of paralogs and
orthologs
A and are paralogs and are paralogs and
are orthologs is an ortholog of both and
A B
C D D
28
Simplified scheme for origin of paralogs and
orthologs
A and B are paralogs and are paralogs and
are orthologs is an ortholog of both and
A B
C D D
29
Simplified scheme for origin of paralogs and
orthologs
A and B are paralogs C and D/D are paralogs
and are orthologs is an ortholog of both
and
A B
C D D
30
Simplified scheme for origin of paralogs and
orthologs
A and B are paralogs C and D/D are paralogs
A and C are orthologs is an ortholog of both
and
A B
C D D
31
Simplified scheme for origin of paralogs and
orthologs
A and B are paralogs C and D/D are paralogs A
and C are orthologs B is an ortholog of both D
and D
A B
C D D
32
Example 1.
  • Neurofibromatosis 2
  • Genetic disorder leading to cranial and
    peripheral nerve tumours.
  • Inherited dominantly.
  • Affected individuals generally develop symptoms
    of eighth-nerve dysfunction in early adulthood,
    including deafness and balance disorder.

33
Example 1.
  • Neurofibromatosis 2
  • Location of tumours mean that disease symptoms
    are serious and can be fatal.
  • Disease tracked down to a defect in one gene
    (neurofibromatosis type 2 gene (NF2).
  • What is the function of the protein encoded?

34
Example 1.
  • Neurofibromatosis 2
  • Database searches using the protein sequence
    identified related proteins- moesin, ezrin and
    radixin.
  • These proteins act as structural links between
    cell membrane proteins and intermediate filament
    proteins.
  • This gave some initial clues to the function of
    the NF2 gene product.

35
Example 1.
  • In next slide we see the output from such a
    search, the match identified between the NF2
    protein (Query) and one of the Subject sequence
    i.e. one of the sequences in the database, which
    turns out to be moesin.

36
gi16878176gbAAH17293.1AAH17293 (BC017293)
moesin Homo sapiens Length 577
Score 380 bits (975), Expect e-104
Identities 258/588 (43), Positives 354/588
(59), Gaps 23/588 (3) Query 78
DKKVLDHDVSKEEPVTFHFLAKFYPENAEEELVQEITQHLFFLQVKKQIL
DEKIYCPPEA 137 KKV DV KE P F F
AKFYPE EELQITQ LFFLQVK IL IYCPPE Sbjct
62 NKKVTAQDVRKESPLLFKFRAKFYPEDVSEELIQDITQRLFFLQVK
EGILNDDIYCPPET 121 Query 138 SVLLASYAVQAKYGDYDPS
VHKRGFLAQEELLPKRVINLYQMTPEMWEERITAWYAEHRG 197
VLLASYAVQKYGD VHK GLA LLPRV
WEERI W EHRG Sbjct 122 AVLLASYAVQSKYGDFNKEVH
KSGYLAGDKLLPQRVLEQHKLNKDQWEERIQVWHEEHRG
181 Query 198 RARDEAEMEYLKIAQDLEMYGVNYFAIRNKKGTE
LLLGVDALGLHIYDPENRLTPKISFP 257 RA
EYLKIAQDLEMYGVNYFINKKGEL LGVDALGLIY
RLTPKI FP Sbjct 182 MLREDAVLEYLKIAQDLEMYGVNYFSIK
NKKGSELWLGVDALGLNIYEQNDRLTPKIGFP 241
37
Header
  • gi16878176gbAAH17293.1AAH17293 (BC017293)
    moesin Homo sapiens
  • Length 577
  • Score 380 bits (975), Expect e-104
  • Identities 258/588 (43), Positives 354/588
    (59), Gaps 23/588 (3)

38
Alignment
  • Query 78 DKKVLDHDVSKEEPVTFHFL
  • KKV DV KE P F F
  • Sbjct 62 NKKVTAQDVRKESPLLFKFR

39
Alignment
  • Query 78 DKKVLDHDVSKEEPVTFHFL
  • KKV DV KE P F F
  • Sbjct 62 NKKVTAQDVRKESPLLFKFR

40
Alignment
  • Query 78 DKKVLDHDVSKEEPVTFHFL
  • KKV DV KE P F F
  • Sbjct 62 NKKVTAQDVRKESPLLFKFR
  • Identical in both sequences

41
Alignment
  • Query 78 DKKVLDHDVSKEEPVTFHFL
  • KKV DV KE P F F
  • Sbjct 62 NKKVTAQDVRKESPLLFKFR
  • Identical in both sequences
  • Similar type of amoino acid in both sequences

42
Header
  • gi16878176gbAAH17293.1AAH17293 (BC017293)
    moesin Homo sapiens
  • Length 577
  • Score 380 bits (975), Expect e-104
  • Identities 258/588 (43), Positives 354/588
    (59), Gaps 23/588 (3)

43
Example 2.
  • Angiotensin-converting enzyme (ACE)
  • Dipeptidyl carboxydipeptidases which cleaves two
    amino acids from the C-terminus angiotensin I
    giving angiotensin II.
  • Interesting because there are two allelic forms
    of the gene, D and I. D gives higher levels of
    ACE and sprinters and short distance swimmers
    tend to have this allele- possibly because
    angiotensin regulates muscle size.

44
Example 2.
  • Angiotensin-converting enzyme (ACE)
  • Here we see a search using an unknown protein
    from Anopheles gambiae which shows a match with
    human ACE.
  • The know active site E (glutamic acid) and 2
    histidines (H) used to bind a zinc atom needed
    for the activity are conserved.
  • It shows that this protein from Anopheles is also
    probably a carboxypeptidase.

45
Example 2.
  • QueryTTVNLEDLVVAHHEMGHIQYFMQYKD
  • T V D HHE HIQYF Y
  • SbjctTQVTHKDFITVHHELAHIQYFLNYRN

46
Gene (protein) families (Strachan and Reid, p 153)
  • A number of proteins, or protein domains,
    clearly share the same evolutionary origin as
    they have similar structures and functions which
    is reflected in conserved amino acid sequences

47
Classical gene families
  • Have members which exhibit a high degree of
    sequence identity over most of the gene length
    (or the protein-encoding region at least).
  • This identifies the genes as being closely
    related in evolutionary terms and therefore in
    functional terms.

48
Gene families encoding products with large,
highly conserved domains
  • Have members which show extensive identity within
    strongly conserved regions of the genes
  • The identity in the rest of the gene may be quite
    low- seen in transcription factors.

49
Gene families encoding products with very short
conserved amino acid motifs/patterns/profiles
  • Have members not obviously related in sequence
    terms, but with a common general function (e.g.
    dehydrogenases, proteases).
  • They share short amino acid motifs, patterns or
    profiles which can give a clue about general
    function.

50
Examples of motifs/patterns
  • RGD - a motif recognised by integrins, proteins
    involved in cell-cell interactions and found in
    some viruses
  • DEAD box - several proteins which appear to act
    as RNA helicases
  • GXSG found in trypsin-like proteases
  • GDD - found in polymerases
  • LIM domain cysteine-rich 56 amino acid domain
    likely to be involved in protein-protein
    interactions

51
Gene superfamilies
  • Have members which are functionally related in a
    general sense and show only weak sequence
    identity over a large segment, without
    significant amino acid motifs. Share common
    structural features e.g. immunoglobulin
    superfamily.

52
Alignments
  • Following the identification of related proteins,
    they can be aligned using programs such as
    CLUSTALW. In this example, 4 human proteins have
    been found and aligned. They are clearly
    paralogs as they are very similar. Absolutely
    conserved amino acids () would be expected to be
    the ones critical for function. Other positions
    require similar ( or .) amino acids.

53
Example of a CLUSTAL analysis of human paralogs
in a classical gene family
  • hrev5 MALARPRPRLGDLIEISRFGYAHWAIYVGDGYVV
    HLAPASEIAGAGAASVLSALTNKAIV 60
  • Hrev107 MRAPIPEPKPGDLIEIFRPFYRHWAIYVGDGYVV
    HLAPPSEVAGAGAASVMSALTDKAIV 60
  • TIG3 MASPHQEPKPGDLIEIFRLGYEHWALYIGDGYVI
    HLAPPSEYPGAGSSSVFSVLSNSAEV 60
  • hrev4 MAEGKPRPRPGDLIEIFRIGYEHWAIYVEDDCVV
    HLAPPSEESECG--SITSIFSNRAVV 58
  • . .
    . . .
  • hrev5 KKELLSVVAGGDNYRVNNKHDDRYTPLPSNKIVK
    RAEELVGQELPYSLTSDNCEHFVNHL 120
  • Hrev107 KKELLYDVAGSDKYQVNNKHDDKYSPLPCTKIIQ
    RAEELVGQEVLYKLTSENCEHFVNEL 120
  • TIG3 KRGRLEDVVGGCCYRVNNSLDHEYQPRPVEVIIS
    SAKEMVGQKMKYSIVSRNCEHFVAQL 120
  • hrev4 KYSRLEDVLHAASWKVNNKLDGTYLPLPVDKIIQ
    RTKKMVNKIVQYSLIEGNCEHFVNGL 118
  • . .
    . . . .

54
  • Hr5 NKIVKRAEELVGQELPYSLTSDNCEHFVNHL
  • Hr107 TKIIQRAEELVGQEVLYKLTSENCEHFVNEL
  • TIG3 EVIISSAKEMVGQKMKYSIVSRNCEHFVAQL
  • Hrev4 DKIIQRTKKMVNKIVQYSLIEGNCEHFVNGL
  • . . . .

55
RNA structures
  • RNA structures are vital to a number of processes
    in the cell and in viruses etc.
  • Can be predicted through folding programs which
    look at minimum free energy
  • Also suppression of codon variation
  • Covariance can be useful

56
5 Terminal region of type 2 and type 1 5UTRs
Important in RNA replication
Type 2
Type 1
57
(No Transcript)
58
RNA folding
59
Enterovirus
HEV-A
HEV-B
Inhibitor of RNase L ?
HEV-C
60

II

I

III

61
Phylogenetic trees
Way of showing the relationship between proteins,
nucleic acids Different methjods for generating
these e.g. Neighbor joining Generate a distance
matrix expressing the relationship between each
element Find the closest neighbours and join
together Add on the closest neighbours to keep
branch lengths to a minimum
62
Molecular epidemiology
Species origin of viruses Relationship between
viruses and geography/time etc
63
Origin
  • Derived from a virus which infects other
    primates?
  • Classification- Lentivirus genus of Retroviridae
    (lenti slow in Latin)

64
(No Transcript)
65
Monitoring treatment
Sequence of virus mutants linked to drug
resistance can be used to design treatment
strategies predict the outcome of
treatment monitor treatment
66
Conclusions
  • Biological information can be obtained from
    analysing sequences, using programs such as BLAST
    and CLUSTAL
  • There are copies of related genes in different
    organisms- orthologs and in the same organism-
    paralogs
Write a Comment
User Comments (0)
About PowerShow.com