Protein Analysis and Modeling - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Protein Analysis and Modeling

Description:

Conserved features (binding sites, catalytic centers, pockets) Conserved Domain Entry ... What can a homology model provide ... study patterns of conservation ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 29
Provided by: pelt2
Category:

less

Transcript and Presenter's Notes

Title: Protein Analysis and Modeling


1
Protein Analysis and Modeling
  • BFB Workshop
  • Selected Methods in Bioinformatics
  • April 2009

2
Protein Comparison
3
Protein Comparison (1D)
  • Sequence-based sequence alignment
  • goal align conserved residues together
  • each pair of aligned residues obtains a score
  • residues that frequently occur at the same
    position in related proteins are more similar
    and obtain better scores than distinct residues
  • NCBI BLAST (Basic Local Alignment Search Tool)
  • database search for similar sequences
  • different specialized databases
  • nucleotide or peptide sequences
  • pairwise alignment

4
Protein Comparison (3D)
  • Structure-based
  • structure is more conserved than sequence ? find
    distantly related proteins
  • identification of secondary structure elements
  • residues in corresponding structure elements are
    aligned together
  • residues that are aligned together have similar
    spatial positions
  • NCBI VAST (Vector Alignment Search Tool)
  • database search for similar 3D structures
  • secondary structure elements are represented
  • as vectors
  • alignment of vectors in compared structures

5
Search for Related Structures in NCBI
  • Protein entry ? related structures link
  • Protein structure entry ? structure summary page

Click on sequence bar to retrieve related
structures for entire chain or individual 3D
domains
6
Related Structures in NCBI
Click on sequence bar to view structure-based
sequence alignment
View 3D alignment
Colored residues correspond to aligned secondary
structure elements red highly conserved residue
7
Conserved Domains
8
Conserved Domains
  • are distinct functional units
  • often coincide with 3D protein domains (but are
    not the same!)
  • can help to elucidate the function of a protein
  • contain highly conserved sequence patterns
  • can be identified through multiple sequence
    alignment of related proteins

9
Conserved Domains in NCBI
  • NCBI Conserved Domain Database (CDD) contains
    conserved domains
  • Domains are derived from multiple sequence
    alignments of related proteins in different
    species
  • Structure information is used (if available)

10
Conserved Domains in NCBI
  • Related domains are hierarchically organized into
    families with common conserved residues and
    general function
  • Child nodes represent more specific domain models
    and contain additional conserved residues
    compared to parent nodes

Sub-family hierarchy
11
Conserved Domains in NCBI
  • Sequences in domain families are clustered based
    on their similarity

12
Detection of Conserved Domains
  • Sequence comparison of the query protein against
    multiple alignments in CDD
  • Search techniques
  • Enter protein sequence or accession code in CD
    search

13
Detection of Conserved Domains
  • Sequence comparison of the query protein against
    multiple alignments in CDD
  • Search techniques
  • Enter protein sequence or accession code in CD
    search
  • Structure summary page

14
Detection of Conserved Domains
  • Sequence comparison of the query protein against
    multiple alignments in CDD
  • Search techniques
  • Enter protein sequence or accession code in CD
    search
  • Structure summary page
  • Domains link for many Entrez search results
  • BLAST results page

15
Conserved Domain Search Results
Click to show all domain hits
Conserved features
Best-scoring domains
4 types of domain hits
16
Conserved Domain Search Results
  • 4 types of domain hits
  • Specific hits
  • domain-specific e-value threshold
  • high confidence that query protein belongs to the
    same family as the proteins used to identify the
    conserved domain
  • Non-specific hits
  • general e-value threshold
  • Domain super-family
  • including specific and non-specific hits
  • Multi-domains
  • computationally detected
  • likely to contain multiple single domains

17
Conserved Domain Entry
Select individual domain hit
Search for proteins with similar domain
architecture
18
Conserved Domain Entry
Text summary
Conserved features (binding sites, catalytic
centers, pockets)
19
Conserved Domain Entry
Alignment of sequences used to derive the domain
Residues of conserved features
Query sequence embedded in the alignment
20
Homology Modeling
21
Homology Modeling
  • Given protein sequence
  • Aim model of the 3D structure of the target
    protein
  • Approach use homologous proteins as templates

...MPKYTLHYFPLMGRAELCRFVLAAHG...
Sequence
Model
Template
22
Homology Modeling
  • Based on the observation that 3D structure is
    much more conserved than sequence
  • Take the known structure of a protein with
    sequence similarity to the modeling target as a
    structural template
  • Template and target proteins need not be
    evolutionary related (comparative modeling)
  • Generation of topologically correct sequence
    alignments is the most important step in
    comparative modeling

23
4 Steps of Building a Model
  • Template selection
  • Target-template alignment
  • Model construction
  • Model refinement and assessment

24
Template Selection
  • Homology searching database search for
    homologous proteins
  • Sequence similarity searching (BLAST, FASTA)
  • Sequence identity crucial for model reliability
  • gt 50 high accuracy ? RMSE 1 Å
  • (Swiss-Model Automated Mode)
  • 30 - 50 medium accuracy ? RMSE 1.5 Å
  • (Swiss-Model Alignment Mode)
  • 20 - 30 twilight zone (Swiss-Model Project
    Mode)

25
Alignment
  • Usually multiple template proteins
  • Structure-based alignment
  • superpose template structures
  • align conserved motifs
  • derive corresponding sequence alignment
  • embed target sequence

gktlit nfsqehip gktlisflyeqnfsqehip
sequence vs. structure alignment
sequence alignment
structure alignment
Most critical step!
26
Model Construction
  • Three sequential steps
  • Core
  • conserved regions
  • gapless aligned blocks
  • assignment of secondary structure elements
  • Loops
  • variable regions
  • de novo modeling
  • conformation databases
  • Side chains
  • energy minimization
  • molecular dynamics
  • rotamer databases

27
Model Refinement and Assessment
  • Check for unfavorable local conformations
  • Ramachandran plot
  • bond angles/distances
  • chirality
  • model refinement by energy minimization
  • Sequence-structure mapping
  • map target sequence onto modeled structure
  • score compatibility of sequence and structure by
    knowledge-based potentials or energy calculations
  • Retrospective benchmarking
  • comparison to experimental structure
  • RMSE

28
Use of Homology Modeling
  • What can a homology model provide
  • study patterns of conservation
  • spatial proximity of residues to known active
    sites
  • surface exposure of residues
  • and what not?
  • atomic details of protein geometry
  • exact loop and side chain conformations
  • local shape
  • protein flexibility
Write a Comment
User Comments (0)
About PowerShow.com