The Genome Access Course Protein Structure - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

The Genome Access Course Protein Structure

Description:

To be sure - use ORF finder at NCBI. ORF finder to BLAST ... with other automated DBs (Pfam-B, DOMO), useful for seeing if region appears in ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 39
Provided by: jamesg61
Category:

less

Transcript and Presenter's Notes

Title: The Genome Access Course Protein Structure


1
TheGenomeAccessCourseProtein Structure
HSP 70 (1DKG, 1DKZ) and prefoldin (1FXK)
2
Protein structure
  • What is the correct amino acid sequence?
  • Is the predicted protein complete (ATG real?)
  • To be sure - use ORF finder at NCBI

3
ORF finder to BLAST
http//www.ncbi.nlm.nih.gov/gorf/gorf.html
4
Protein Structural Elements
  • 2o Structural Elements
  • a-Helix
  • ß-Sheet
  • Globular regions
  • Domains
  • SH2
  • Leucine Zipper

5
Protein function - different categories
  • Protein of known function
  • Protein of inferred function
  • Protein of unknown function

6
Protein of known function
  • Work already done
  • Ancillary databases (e.g Pubmed. OMIM, MGI, other
    organism specific databases)
  • Warning - make sure it really is the SAME protein
  • First port of call - LocusLink/Entrez Gene

7
Human genes and OMIM
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbO
MIM
8
Mouse genes and MGI
http//www.informatics.jax.org/
9
And the list goes on.
10
Protein of inferred function
  • Similar to protein of known function
  • Annotated
  • BLAST
  • Paralogue (same species) or orthologue (different
    species) or just similar
  • Make sure key residues are conserved e.g Pairwise
    or Multiple alignment

11
Protein of inferred function
Human X Chr
12
Protein of Unknown Function
  • Not similar at the primary sequence level to a
    protein of known function
  • Can you predict function - so many caveats!
  • Transmembrane protein?
  • TMPred http//www.ch.embnet.org/software/TMPRED_fo
    rm.html
  • Protein domains
  • can infer function e.g Homeobox
  • Warning - some domains are poorly and/or widely
    predicted

13
Domains
  • Discrete structural units
  • Can infer boundaries from sequence analysis
  • 25 500 residues long
  • Most lt 200 residues
  • Less than 50 residues usually stabilized by SS
    bonds or metal ions

14
LipoxygenaseDomain
gt500 residues
15
WW Domain
33 residues
16
Domain Determination
  • Internal duplications
  • Detect with a dotplot
  • Transmembrane segments
  • Hydrophobic, 1535 residues
  • Segments easy to predict
  • Topology and multiple segments harder to predict
  • PHD, TMHMM, TMpred
  • Low complexity segments
  • Composition typically non-random
  • Non-compact folds coiled coils, rods, flexible
    domain linkers
  • Complexity function (SEG)
  • Small-pitch overlapping repeats (XNU)

17
Protein sequence databases
  • Non curated
  • Trembl - automatically predicts proteins from CDS
    in Genbank/EMBL/ddBJ
  • Entrez protein www.ncbi.nlm.nih.gov80/entrez/quer
    y.fcgi?dbProtein
  • Curated
  • Swisprot - proteins identified with confidence
    manually added to database
  • Uniprot (e.g hosted at EBI http//www.expasy.unip
    rot.org/index.shtml

18
Proteins of Unknown function
Protein domain databases e.g. Interpro
http//www.ebi.ac.uk/interpro/index.html
19
Comparison of Protein Family DBs
Pfam
SMART
CDD
PROSITE
SRS
20
  • Conserved Domain Database (NCBI)
  • Linked into other NCBI resources
  • Includes Pfam and SMART domains (but does not
    necessarily give the same answer)

21
Proteins in Ensembl
22
Proteins in UCSC
23
  • HMM family profiles constructed by hand
  • Structural data in alignments
  • No hierarchy
  • No specific compositional bias
  • Good graphical output

24
Pfam-A and Pfam-B
  • Pfam-A (75)
  • Curated, annotated families
  • Pfam-B (19)
  • Families derived automatically from ProDom
  • Other

25
  • Protein fingerprint database (fingerprints are
    groups of conserved motifs that characterize a
    protein family)
  • Regular grammar for describing profiles (e.g.
    EDQ-x-G-x-DN-A-x-x-GALI)
  • Profile search is sensitive, but low coverage
    (signaling)
  • Pattern search has high false positive rate

26
  • Highly conserved, ungapped MSAs
  • Derived from PROSITE

27
  • Fingerprints are sets of ungapped weight matrices
  • Hierarchical classification for important
    families
  • Families, domains, and proteins

28
  • Simple Modular Architecture Research Tool
  • Collected by Ponting and Bork (641 HMMs)
  • Focuses on
  • Signaling Domains
  • Extracellular domains
  • Nuclear domains
  • High quality nice graphics

29
Alignment of Representative Members
Profile-HMM built with HMMer 2.0
Search Protein DB
Description
Full alignment
30
  • Profiles automatically built from PSI-BLAST
    alignments of Swiss-PROT
  • No annotation
  • As with other automated DBs (Pfam-B, DOMO),
    useful for seeing if region appears in different
    contexts

31
Protein Sequence Analysis
  • Biochemical/biophysical properties
  • Secondary Structure
  • Super-secondary (signal peptides, domains,
    motifs)
  • 3D prediction (Threading)

32
Amphipathic Helix
Edge Strand
Buried Strand
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Viewing 3D Structures
  • Cn3d
  • Chime
  • RasMol
  • Protein Explorer

37
(No Transcript)
38
Protein of inferred or unknown function
  • All predictions must be taken as exactly that
  • PREDICTIONS!!
  • The true function of a protein is NOT known until
    it has been proven in the lab
Write a Comment
User Comments (0)
About PowerShow.com