26S Proteasome and Protein Stability - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

26S Proteasome and Protein Stability

Description:

CATH and SCOP are two of these, each containing 950-1400 protein superfamilies ... SCOP database. Classification scheme: Class, ... HMM's also useful at SCOP ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 30
Provided by: MICROCOM1
Category:

less

Transcript and Presenter's Notes

Title: 26S Proteasome and Protein Stability


1
Algorithms and databases for sequence and
structural analysis
Biology thru homology or analogy
2
In biomolecular sequences (DNA, RNA, or amino
acid sequences), high sequence similarity usually
implies significant functional or structural
similarity.
However Evolutionary and functionally
related molecular strings can differ
significantly throughout much of the string and
yet preserve the same three-dimensional
structure(s), or the same two dimensional
substructure(s) (motifs, domains), or the same
active sites, or the same or related dispersed
residues (DNA or amino acid). Dan Gusfield.
Algorithms on Strings, Trees, and Sequences.
1997. University of Cambridge Press. p.334
3
Objectives
  • What is the function of this gene?
  • Do other genes have this functional motif?
  • Can I predict the higher order structure of this
    protein?
  • Is this gene a member of a known gene family?
  • Do other organisms have this gene?

4
Intuition
  • Similar sequences should have (long) regions of
    similar/identical residues.
  • Why?
  • Evolution descent from a common ancestral
    sequence
  • Functional/structural convergence

5
General Database Search Issues
  • Search using amino acid sequence if possible
  • Why? Protein evolution is slower than DNA
    sequence evolution
  • Statistical theory is based on unrealistic
    assumptions consider results as predictions.

6
Sequence Alignment
  • Sequence alignment is simply the optimal
    assignment of substitution and indel events to a
    pair of sequences.
  • Global alignment align entire sequences
  • Local alignment find best matching regions of
    sequences

7
Measuring Alignment Quality
  • Good alignments should have
  • many exact matches
  • few mismatches
  • many of the mismatches should be similar
    residues
  • few gaps

8
Measuring Alignment Quality
Begin with...
Longest Exact Match
QTRPQNVLNPP STRQNVINPWAAQ
S 3a
Salignment score amatch score
9
Measuring Alignment Quality
allow some mismatches
QTRPQNVLNPP STRQNVINPWAAQ
Salignment score amatch score bmismatch penalty
S 5a - 1b
10
Measuring Alignment Quality
and finally, introduce some gaps
QTRPQNVLNPP STR-QNVINPWAAQ
Salignment score amatch score bmismatch
penalty cgap penalty
S 7a - 1b -1c
11
Scoring Issues
  • Relative costs of matches, mismatches, and gaps
    should depend on their probabilities (rare events
    receive higher penalties)
  • In practice, the appropriate costs are rarely
    known.
  • A variety of scoring matrices are available.

12
BLAST (www.ncbi.nlm.nih.gov/BLAST)Basic Local
Alignment Search Tool
BLAST is based on a systematic search of
conserved words. The query sequence is decomposed
into words of length W (W3 for amino acids 11
for nucleotides), a list of these words and
similar words from entries in the relational
database are compared. Sequences scoring below
a threshold are deleted from the list.
13
Scoring Matrices
  • Scoring matrix specifies a score, sij, for
    aligning sequence I with sequence II.
  • Choice of matrix depends on the divergence level
    of desired/expected hits.
  • Examples PAM, BLOSUM
  • Both can be modified for different divergence
    levels (eg, BLOSUM40, BLOSUM62)
  • Advice try several matrices when possible.

14
(No Transcript)
15
(No Transcript)
16
PSI-BLASTPosition Specific Iterated BLAST
1. BLAST with query 2. Keep hits w/ E lt E
(adjustable constant) 3. Multiple alignment of
HSPs from step (2) 4. Build profile 5. BLAST with
profile 6. Iterate (1)-(5) until no new hits are
found
17
PSI-BLASTPosition Specific Iterated BLAST
Use with great caution!!! Once an unrelated
sequence is mistakenly incorporated into the
profile, subsequent iterations will incorporate
homologues of the unrelated sequence
(catastrophic transitivity). Human intervention
is essential.
18
The COG database new developments in
phylogenetic classification of proteins from
complete genomes. Tatusov RL, Natale DA,
Garkavtsev IV, Tatusova TA, Shankavaram UT,
Rao BS, Kiryutin B, Galperin MY, Fedorova ND,
Koonin EV.
All vs. all blastp of genome sequence (primarily
microbial) database. Each COG consists of
individual orthologous genes or orthologous
groups of paralogs from three or more
phylogenetic lineages. In other words, any two
proteins from different lineages that belong to
the same COG are orthologs. Each COG is assumed
to have evolved from an individual ancestral
gene through a series of speciation and
duplication events.
19
Domains and insight into protein function
  • Proteins are modular, exhibiting discrete folding
    units known as domains
  • Switching and swapping domains is a mechanism for
    functional diversity in proteins
  • Domains can exhibit intrinsic function

20
Examples
  • SH2 binds phosphorylated tyrosine residues in
    protein partners
  • PDZ mediates protein-protein interactions
    between enzymes
  • HTH binds DNA in site-specific manner
  • Once a domain acquires selectable functionality,
    it can be distributed to other gene products and
    providing a mechanism for evolution

21
Hidden Markov Models are sensitive tools for
domain detection
www.pfam.wustl.edu www.tigr.org/TIGRFAMs/ www.s
mart.embl-heidelberg.de/ These tools use
profiles generated from multiple sequence
alignments.
22
Rossman fold - Profile HMM and PROSITE
  • GLGFFGV
  • GVGYFGV
  • GLGFFGL
  • GLGFFGL
  • GQGVLGL

23
Transition to structural classifications
  • Several useful databases link sequence analysis
    and protein structure information
  • CATH and SCOP are two of these, each containing
    950-1400 protein superfamilies
  • Since structure is more highly conserved than
    sequence during evolution, structural alignment
    algorithms and classifications enable more
    distant evolutionary relatives to be identified.

24
CATH
  • Contains 200,000 sequence domains, assigned to
    1200 CATH homologous superfamilies
  • Classification Scheme Class, Architecture,
    Topology and Homology
  • Class secondary structure composition and
    packing
  • Architecture orientation of secondary
    structures in 3D, regardless of connectivity
  • Topology both orientation and connectivity of
    secondary structure is accounted for
  • Homologous superfamily grouped based on whether
    an evolutionary relationship exists (clustered at
    different levels of sequence ID)

25
SCOP database
  • Classification scheme Class, Fold, Superfamily,
    and Family,
  • Class Type and organization of secondary
    structure
  • Fold Share common core structure, same
    secondary structure elements in the same
    arrangement with the same topological connections
  • Superfamily share very common structure and
    function
  • Family protein domains share a clear common
    evolutionary origin as evidenced by sequence
    identity or similar structure/function

26
HMMs also useful at SCOP
  • For instance, SCOP (http//scop.mrc-lmb.cam.ac.uk/
    scop/) HMMs are derived from the PDB databank at
    www.rcsb.org
  • Identify sequence signatures for specific domains

27
Structural Alignments
  • Various algorithms allow structure vs. structure
    comparisons
  • VAST, DALI
  • CATH (http//www.biochem.ucl.ac.uk/bsm/cath/)
    also has SSAP and GRATH (one computationally
    intensive, one not)
  • Sequence similarity to structural families for
    modeling often extracted using PSI-BLAST
    (Gene3D)

28
Comparison of sequence and structure alignments
1 Taylor WR, Orengo CA, 1989, Protein structure
alignment. J Mol Biol 2081-224 Mueller L,
2003, Protein structure alignment. Paper
presentations 27.51630h
29
Multiple structural alignments
  • CORA from CATH (where?)
  • MultiProt - http//bioinfo3d.cs.tau.ac.il/MultiPro
    t/
  • DMAPS (pre-calculated) http//dmaps.sdsc.edu/
  • CE-MC - http//bioinformatics.albany.edu/cemc/
  • Others?
Write a Comment
User Comments (0)
About PowerShow.com