Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction - PowerPoint PPT Presentation

Loading...

PPT – Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction PowerPoint presentation | free to download - id: 60853a-OTJmO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction

Description:

Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction Lecture 13: Protein Function Systems Biology is the study of the interactions ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 55
Provided by: AntonFe9
Learn more at: http://www.ibi.vu.nl
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction


1
Bioinformatics Master CourseDNA/Protein
Structure-Function Analysis and Prediction
  • Lecture 13 Protein Function

2
Sequence-Structure-Function
Ab initio prediction and folding
Sequence Structure Function
impossible but for the smallest structures
Threading
Function prediction from structure
Homology searching (BLAST)
very difficult
3
Metabolomics fluxomics
4
Systems Biology
  • is the study of the interactions between the
    components of a biological system, and how these
    interactions give rise to the function and
    behaviour of that system (for example, the
    enzymes and metabolites in a metabolic pathway).
    The aim is to quantitatively understand the
    system and to be able to predict the systems
    time processes
  • the interactions are nonlinear
  • the interactions give rise to emergent
    properties, i.e. properties that cannot be
    explained by the components in the system
  • Biological processes include many time-scales,
    many compartments and many interconnected network
    levels (e.g. regulation, signalling,
    expression,..)

5
Systems Biology
  • understanding is often achieved through modeling
    and simulation of the systems components and
    interactions.
  • Many times, the four Ms cycle is adopted
  • Measuring
  • Mining
  • Modeling
  • Manipulating

6
The silicon cell (some people think
silly-con cell)
7
(No Transcript)
8
A system response
Apoptosis programmed cell death Necrosis
accidental cell death
9
Human
Yeast
Comparative metabolomics
We need to be able to do automatic pathway
comparison (pathway alignment)
This pathway diagram shows a comparison of
pathways in (left) Homo sapiens (human) and
(right) Saccharomyces cerevisiae (bakers yeast).
Changes in controlling enzymes (square boxes in
red) and the pathway itself have occurred (yeast
has one altered (overtaking) path in the graph)
10
The citric-acid cycle
http//en.wikipedia.org/wiki/Krebs_cycle
11
The citric-acid cycle
Fig. 1. (a) A graphical representation of the
reactions of the citric-acid cycle (CAC),
including the connections with pyruvate and
phosphoenolpyruvate, and the glyoxylate shunt.
When there are two enzymes that are not
homologous to each other but that catalyse the
same reaction (non-homologous gene displacement),
one is marked with a solid line and the other
with a dashed line. The oxidative direction is
clockwise. The enzymes with their EC numbers are
as follows 1, citrate synthase (4.1.3.7) 2,
aconitase (4.2.1.3) 3, isocitrate dehydrogenase
(1.1.1.42) 4, 2-ketoglutarate dehydrogenase
(solid line 1.2.4.2 and 2.3.1.61) and
2-ketoglutarate ferredoxin oxidoreductase (dashed
line 1.2.7.3) 5, succinyl- CoA synthetase
(solid line 6.2.1.5) or succinyl-CoAacetoacetate
-CoA transferase (dashed line 2.8.3.5) 6,
succinate dehydrogenase or fumarate reductase
(1.3.99.1) 7, fumarase (4.2.1.2) class I (dashed
line) and class II (solid line) 8,
bacterial-type malate dehydrogenase (solid line)
or archaeal-type malate dehydrogenase (dashed
line) (1.1.1.37) 9, isocitrate lyase (4.1.3.1)
10, malate synthase (4.1.3.2) 11,
phosphoenolpyruvate carboxykinase (4.1.1.49) or
phosphoenolpyruvate carboxylase (4.1.1.32) 12,
malic enzyme (1.1.1.40 or 1.1.1.38) 13, pyruvate
carboxylase or oxaloacetate decarboxylase
(6.4.1.1) 14, pyruvate dehydrogenase (solid
line 1.2.4.1 and 2.3.1.12) and pyruvate
ferredoxin oxidoreductase (dashed line 1.2.7.1).
M. A. Huynen, T. Dandekar and P. Bork Variation
and evolution of the citric acid cycle a genomic
approach'' Trends Microbiol, 7, 281-29 (1999)
12
The citric-acid cycle
b) Individual species might not have a complete
CAC. This diagram shows the genes for the CAC for
each unicellular species for which a genome
sequence has been published, together with the
phylogeny of the species. The distance-based
phylogeny was constructed using the fraction of
genes shared between genomes as a similarity
criterion29. The major kingdoms of life are
indicated in red (Archaea), blue (Bacteria) and
yellow (Eukarya). Question marks represent
reactions for which there is biochemical evidence
in the species itself or in a related species but
for which no genes could be found. Genes that lie
in a single operon are shown in the same color.
Genes were assumed to be located in a single
operon when they were transcribed in the same
direction and the stretches of non-coding DNA
separating them were less than 50 nucleotides in
length.
M. A. Huynen, T. Dandekar and P. Bork Variation
and evolution of the citric acid cycle a genomic
approach'' Trends Microbiol, 7, 281-29 (1999)
13
Experimental
  • Structural genomics
  • Functional genomics
  • Protein-protein interaction
  • Metabolic pathways
  • Expression data

14
Communicability Functional Genomics
  • Interpretation of genome-scale gene expression
    data

External Program
DNA-chip data
  • Cluster of coregulated genes
  • gene 1
  • gene 2
  • ...
  • gene n

PFMP query
  • Pathways affected
  • pathway 1
  • pathway 2

15
Communicability Functional Genomics
  • Interpretation of genome-scale gene expression
    data

External Programs
DNA-chip data
  • Cluster of coregulated genes
  • gene 1
  • gene 2
  • ...
  • gene n
  • Pattern discovery
  • gene 1
  • gene 2
  • ...
  • (putative regulatory sites)
  • Similarities with known regulatory sites
  • site 1 Factor 1
  • site 2 Factor 2
  • ...

PFMP query
16
Other Issues
  • Partial information (indirect interactions) and
    subsequent filling of the missing steps
  • Negative results (elements that have been shown
    not to interact, enzymes missing in an organism)
  • Putative interactions resulting from
    computational analyses

17
Protein function categories
  • Catalysis (enzymes)
  • Binding transport (active/passive)
  • Protein-DNA/RNA binding (e.g. histones,
    transcription factors)
  • Protein-protein interactions (e.g.
    antibody-lysozyme) (experimentally determined by
    yeast two-hybrid (Y2H) or bacterial two-hybrid
    (B2H) screening )
  • Protein-fatty acid binding (e.g. apolipoproteins)
  • Protein small molecules (drug interaction,
    structure decoding)
  • Structural component (e.g. ?-crystallin)
  • Regulation
  • Signalling
  • Transcription regulation
  • Immune system
  • Motor proteins (actin/myosin)

18
Catalytic properties of enzymes
Vmax S V -------------------
Km S
Michaelis-Menten equation
Vmax
  • Km kcat
  • E S ES E P
  • E enzyme
  • S substrate
  • ES enzyme-substrate complex (transition state)
  • P product
  • Km Michaelis constant
  • Kcat catalytic rate constant (turnover number)
  • Kcat/Km specificity constant (useful for
    comparison)

Moles/s
Vmax/2
Km
S
19
Protein interaction domains
http//pawsonlab.mshri.on.ca/html/domains.html
20
Energy difference upon binding
  • Examples of protein interactions (and functional
    importance) include
  • Protein protein (pathway analysis)
  • Protein small molecules (drug interaction,
    structure decoding)
  • Protein peptides, DNA/RNA (function analysis) 
  • The change in Gibbs Free Energy of the
    protein-ligand binding interaction can be
    monitored and expressed by the following 
  • ?G ? H T ?S       
  • (HEnthalpy, SEntropy and TTemperature)

21
Protein function
  • Many proteins combine functions
  • Some immunoglobulin structures are thought to
    have more than 100 different functions (and
    active/binding sites)
  • Alternative splicing can generate (partially)
    alternative structures

22
Protein function Interaction
Active site / binding cleft
Shape complementarity
23
Protein function evolution
Chymotrypsin
24
How to infer function
  • Experiment
  • Deduction from sequence
  • Multiple sequence alignment conservation
    patterns
  • Homology searching
  • Deduction from structure
  • Threading
  • Structure-structure comparison
  • Homology modelling

25
Cholesterol Biosynthesis
  • Cholesterol biosynthesis primarily occurs in
    eukaryotic cells. It is necessary for membrane
    synthesis, and is a precursor for steroid hormone
    production as well as for vitamin D. While the
    pathway had previously been assumed to be
    localized in the cytosol and ER, more recent
    evidence suggests that a good deal of the enzymes
    in the pathway exist largely, if not exclusively,
    in the peroxisome (the enzymes listed in blue in
    the pathway to the left are thought to be at
    least partly peroxisomal). Patients with
    peroxisome biogenesis disorders (PBDs) have a
    variable deficiency in cholesterol biosynthesis

26
Cholesterol Biosynthesis from acetyl-Coa to
mevalonate
Mevalonate plays a role in epithelial cancers
it can inhibit EGFR
27
Epidermal Growth Factor as a Clinical Target in
Cancer
  • A malignant tumour is the product of uncontrolled
    cell proliferation. Cell growth is controlled by
    a delicate balance between growth-promoting and
    growth-inhibiting factors. In normal tissue the
    production and activity of these factors results
    in differentiated cells growing in a controlled
    and regulated manner that maintains the normal
    integrity and functioning of the organ. The
    malignant cell has evaded this control the
    natural balance is disturbed (via a variety of
    mechanisms) and unregulated, aberrant cell growth
    occurs. A key driver for growth is the epidermal
    growth factor (EGF) and the receptor for EGF (the
    EGFR) has been implicated in the development and
    progression of a number of human solid tumours
    including those of the lung, breast, prostate,
    colon, ovary, head and neck.

28
Energy housekeeping
  • Adenosine diphosphate (ADP) Adenosine
    triphosphate (ATP)

29
Chemical Reaction
30
Enzymatic Catalysis
31
Gene Expression
32
Inhibition
33
Metabolic Pathway Proline Biosynthesis
34
Transcriptional Regulation
35
Methionine Biosynthesis in E. coli
36
Shortcut Representation
37
High-level Interaction
38
Levels of Resolution
39
Cholesterol Biosynthesis
40
SREBP Pathway
41
Signal Transduction
Important signalling pathways Map-kinase (MapK)
signalling pathway, or TGF-? pathway
42
Transport
43
Phosphate Utilization in Yeast
44
Multiple Levels of Regulation
  • Gene expression
  • Protein activity
  • Protein intracellular location
  • Protein degradation
  • Substrate transport

45
Graphical Representation Gene Expression
46
Experimental Data Gene Expression
47
Experimental Data Transcriptional Regulation
48
Experimental Data Transcriptional Regulation
49
Transcriptional RegulationIntegrated View
50
Pathways and Pathway Diagrams
  • Pathways
  • Set of nodes (entities) and edges (associations)
  • Pathway Diagrams
  • XY coordinates
  • Node splitting allowed
  • Multiple views of the same pathway
  • Different abstraction levels

51

Metabolic networksGlycolysis and
Gluconeogenesis
Kegg database (Japan)
52
Gene Ontology (GO)
  • Not a genome sequence database
  • Developing three structured, controlled
    vocabularies (ontologies) to describe gene
    products in terms of
  • biological process
  • cellular component
  • molecular function
  • in a species-independent manner

53
The GO ontology
54
Gene Ontology Members
  • FlyBase - database for the fruitfly Drosophila
    melanogaster
  • Berkeley Drosophila Genome Project (BDGP) -
    Drosophila informatics GO database software,
    Sequence Ontology development
  • Saccharomyces Genome Database (SGD) - database
    for the budding yeast Saccharomyces cerevisiae
  • Mouse Genome Database (MGD) Gene Expression
    Database (GXD) - databases for the mouse Mus
    musculus
  • The Arabidopsis Information Resource (TAIR) -
    database for the brassica family plant
    Arabidopsis thaliana
  • WormBase - database for the nematode
    Caenorhabditis elegans
  • EBI GOA project annotation of UniProt
    (Swiss-Prot/TrEMBL/PIR) and InterPro databases
  • Rat Genome Database (RGD)  - database for the rat
    Rattus norvegicus
  • DictyBase  - informatics resource for the slime
    mold Dictyostelium discoideum
  • GeneDB S. pombe - database for the fission yeast
    Schizosaccharomyces pombe (part of the Pathogen
    Sequencing Unit at the Wellcome Trust Sanger
    Institute)
  • GeneDB for protozoa - databases for Plasmodium
    falciparum, Leishmania major, Trypanosoma brucei,
    and several other protozoan parasites (part of
    the Pathogen Sequencing Unit at the Wellcome
    Trust Sanger Institute)
  • Genome Knowledge Base (GK) - a collaboration
    between Cold Spring Harbor Laboratory and EBI)
  • TIGR - The Institute for Genomic Research
  • Gramene - A Comparative Mapping Resource for
    Monocots
  • Compugen (with its Internet Research Engine)
  • The Zebrafish Information Network (ZFIN) -
    reference datasets and information on Danio rerio
About PowerShow.com