Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets and Homo - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets and Homo

Description:

Disclaimer I am not associated with any structural genomics project ... of Biomedical Imaging and Bioengineering (NIBIB), and the National Institute ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 35
Provided by: PhilB64
Category:

less

Transcript and Presenter's Notes

Title: Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets and Homo


1
Functional Coverage of the Human Genome by
Existing Structures, Structural Genomics Targets
and Homology Models
  • Philip E. Bourne
  • Dept. of Pharmacology
  • University of California San Diego
  • pbourne_at_ucsd.edu

2
Agenda
  • What is structural genomics exactly?
  • What has it achieved thus far?
  • What are its goals going forward?
  • One possible strategy for selecting targets

Disclaimer I am not associated with any
structural genomics project
3
My Broad Working Definition
  • Structural genomics is the process of
    high-throughput determination of the
    3-dimensional structures of biological
    macromolecules

4
The Process - X-ray Crystallography
Basic Steps
  • Crystallomics
  • Isolation,
  • Expression,
  • Purification,
  • Crystallization

Target Selection
Data Collection
Structure Solution
Structure Refinement
Functional Annotation
Publish
5
What Has The Process Achieved Thus Far?
6
Much of the Data Discussed Will Come from
http//sg.rcsb.org
Nucleic Acids Research 2006 Database Issue
Accepted
7
Current Status of All Centers 90421 Targets
56626
2479 (7.5 of PDB)
Chen et al. 2004 Bioinformatics 20(16) 2860-2
http//targetdb.rcsb.org Oct 20, 2005
8
Total Structures Released per Year
Chen et al. 2004 Bioinformatics 20(16) 2860-2
http//targetdb.rcsb.org Oct 20, 2005
9
PepcDB http//pepcdb.pdb.org/
Capturing of protocols associated with the
experiment
10
(No Transcript)
11
(No Transcript)
12
What Has The Process Achieved Thus Far?
  • While only 7.5 of the current PDB (30 year
    history), now contributing 15-20 of all
    structures in a given year
  • Higher throughput is being achieved traditional
    laboratories benefit too
  • Useful data are being collected more
    systematically, but the situation could still be
    improved

13
Ah Yes, But What is the Goal?
  • The goal of the human genome project was clear
    cut.. The goal of structural genomics is not so
    clear cut..
  • Provision of enough structural templates to
    facilitate homology modeling of most proteins
  • Structures of all proteins in a complete proteome
  • Structural elucidation of a complete biological
    pathway
  • Structural elucidation of a complete disease

14
Example Goals
The hyperthermophilic bacterium Thermotoga
maritima has been the target of choice for
pipeline development and genome-wide fold
coverage.
207
The SGPP consortium will determine and analyze
the three-dimensional structures of a large
number of proteins from major global pathogenic
protozoa, Leishmania major, Trypanosoma brucei,
Trypanosoma cruzi and Plasmodium falciparum.
35
Structural Genomics of Pathogenic Protozoa
It is aimed at determining structures of proteins
and protein complexes directly relevant to human
health and diseases.
79
15
Growth in the Number of Folds per Year According
To SCOP
New Folds
Total Folds
2 of structures determined per year are new
folds
http//pdbbeta.rcsb.org from Oct., 2005
16
Todd, Marsden, Thornton and Orengo 2005 JMB
348(5) 1235-60 provide the following data, but
based on 316 non-redundant structures
  • Quality and size of structures is comparable
  • 29 of domains revealed an evolutionary
    relationship not apparent from sequence
  • 19 and 11 contributed new superfamilies and
    folds, respectively
  • 9287 reliable homology models built across 206
    completely sequenced genomes

17
What Should be the Target Selection Strategy
Going Forward?
18
One Approach - Pfam 5000 Chandonia Brenner 2005
Proteins 58(1) 166-179
  • Would provide fold assignment for 68 of
    prokaryotic proteins and 61 of eukaryotic
  • This is significantly greater than would be
    achieved by completing a single genome

19
Our Approach is to Consider Coverage Relative to
the Human Genome
  • What protein structures would tell us most about
    the human condition if determined?

20
Basic Logic of Our Approach to Target Selection
  • Given the functions of proteins currently in the
    PDB
  • And what we can ascertain about the function of
    structural genomics targets
  • And what we know about the functional coverage of
    the human genome
  • What structures should be determined to increase
    our coverage of functional space
  • Which of those structures are most tractable?

21
Coverage of the Human Genome By Structure
PDB
Structural Genomics Targets
GO
Ensembl Human Genome Annotation
Superfamily
EC
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
22
Drill down to the Appropriate level
Define the level of redundancy
Coverage by domains(s) or structure
23
PDB vs Human Genome Top Level EC Shows Even
Distribution
PDB
607 Structures
9698 Sequences
Ensembl Human Genome Annotation
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
24
PDB vs Human Genome EC Hydrolases Begins to
Illustrate the Bias in the PDB
PDB
2.5 Transferring alkyl or aryl groups over
represented in PDB 2.4 Glycosyltransferases
under represented in PDB
Ensembl Human Genome Annotation
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
25
Functional Coverage (GO Molecular Function) of
the Human Genome By Structure, Targets and Models
SG Targets
Human Genome
PDB
Homology Models
  • As expected few structures of unknown function
    in the PDB at this stage. Large number of targets
    of unknown function
  • Enzyme regulation over represented in PDB
    GTPase, kinase regulator, caspase regulator

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
26
Target Selection Relative to Disease
PDB
Structural Genomics Targets
OMIM
Swiss-Prot
Superfamily
Ensembl Human Genome Annotation
Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
27
Human Disease Coverage
SG Targets
Human Genome
PDB
Homology Models
  • PDB covers 69 of OMIM disease categories
  • Diseases of the CNS are over represented by
    targets
  • Disease of ear nose throat under represented in
    PDB but covered by targets and models
  • Cancers fewer targets at top level, but female
    related cancers over represented, male under
    represented by structures

28
Structural Coverage of the Human Genome
  • Single domains cover 37 of the functional
    classes identified in the genome
  • Whole structures cover 25
  • 37 goes to 56 with homology models
  • 25 goes to 31 with homology models
  • If all current structural genomics targets were
    solved (3x current PDB)
  • 37 goes to 69
  • 25 goes to 44

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
29
Other Points to Note
  • Coverage by homology models is not even more
    divergent families are less well represented
  • Transporters and receptors (non membrane regions)
    are the most pressing
  • Possible to create a most wanted list of
    structures

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
30
The Most Wanted List
  • So Far We Have Considered the Functional Coverage
    of Structures, Models and Targets Relative to the
    Human Genome (Based on the Current Level of
    Functional Annotation)
  • What if we turn that round and rather than ask
    what we know, ask what we do not know

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
31
Bottom Line
  • There are approximately 1800 domains which have
    been functionally recognized in the human genome
    for which no structure exists (hence no homology
    models) and for which no target exists

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31
http//sg.rcsb.org
32
How Do We Get To This List?
  • Start with functional categories without
    structures
  • Select those without Superfamily assignments
    i.e., cant be modeled
  • Prefer those with a disease association
  • Remove those that appear less tractable based on
    prediction of transmembrane segments,
    coiled-coiled and low complexity

33
Examples from the Most Wanted List
  • The most understudied structures are various
    kinds of receptors and transporters
  • For catalytic activity the largest under
    representation is in protein synthesis and gene
    regulation
  • Congenital adrenal hyperplasia appears to have
    tractable domains without structure representation

34
Acknowledgements
  • Lei Xie (Functional DB)
  • Andrei Kouranov, Joanna de la Cruz, Li Chen, John
    Westbrook, Helen Berman (TargetDB and PepcDB)
  • NIH GM63208
  • The RCSB PDB is supported by funds from the
    National Science Foundation (NSF), the National
    Institute of General Medical Sciences (NIGMS),
    the Office of Science, Department of Energy
    (DOE), the National Library of Medicine (NLM),
    the National Cancer Institute (NCI), the National
    Center for Research Resources (NCRR), the
    National Institute of Biomedical Imaging and
    Bioengineering (NIBIB), and the National
    Institute of Neurological Disorders and Stroke
    (NINDS). 
Write a Comment
User Comments (0)
About PowerShow.com