Structuring molecular biology databanks into a databank network - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Structuring molecular biology databanks into a databank network

Description:

Databanks appear (many) and disappear (much less) Some databanks are very different ... The Problem with Nomenclature. is referred in BRENDA as the following: ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 34
Provided by: christian81
Category:

less

Transcript and Presenter's Notes

Title: Structuring molecular biology databanks into a databank network


1
Structuring molecular biology databanks into a
databank network
Cambridge, 14. October 2001
  • Thure Etzold, MD of Lion Bioscience Ltd, Cambridge

2
Our situation
  • Lots of databanks
  • Databanks appear (many) and disappear (much less)
  • Some databanks are very different (heterogeneity)
  • Some databanks are very similar, but not quite
    the same
  • Many databanks have links to other databanks
  • What about analysis tools?

3
Different Degrees of Databank Interoperation
X03456
  • hypertext links

4
The Data Jungle
High throughputscreening
Pharm. / Tox.
Physiology
Molecular Biology
Genetics
Too much data - isolated research - suboptimal
communication
5
Our mantra the library network
6
Querying several databanks simultaneously
  • Only fields that databanks have in common are
    shown in query form.
  • Only views that databanks have in common are
    displayed

7
Our mantra the library network
8
Linking Databases with SRS
  • Explicit cross-references unique ID
  • Implicit links by
  • organism name (Taxonomy)
  • gene name (?),
  • small compound names (structure similarity
    search)
  • gene function (GO)
  • Literature citation (?)
  • Sequence similarity (protein family databanks,
    sequence similarity search)

9
The Problem with Nomenclature
is referred in BRENDA as the following
  • Pyruvic acid common or trivial name
  • Pyruvate common name for the anionic species
  • 2-Oxopropanoic acid IUPAC name
  • 2-Oxopropionic acid systematic name
  • 2-Oxopropionate systematic name for the anionic
    species
  • alpha-Keto-propanoic acid systematic name
  • CH3COCOOH line diagram notation
  • CH3COCOO- line diagram notation for anionic
    species

Other names (or descriptors) for this parent
structure include Acetylformic acid, BTS,
Pyroracemic acid, alpha-Oxo-propanoic acid and
2-Keto-propionate there are plenty more
possibilities
10
Introducing Links
EMBL
SWISSPROT
PDB
ID ABC
ID ABC
ID XYZ
DR EMABC
DR SW123
DR PDBXYZ
  • Entries may contain references to other databases
    e.g. an EMBL entry may contain a
    cross-reference to the SWISSPROT entry for the
    protein it encodes
  • Cross-references may be bidirectional or
    unidirectional
  • Cross-references can be indirect

11
Different Degrees of Databank Interoperation
X03456
  • hypertext links
  • indexed links

12
The Link Operators
ID A1 DR B3
Indexing
Queries
A gt B B lt A
B2
A
B
B3
B4
A1
B1
A2
B2
A3
B3
A4
B4
A lt B B gt A
A5
A1
A2
A3
A4
13
Queries Using Links
  • EMBL gt Swissprotproteins encoded by genes
  • EMBL lt Swissprotgenes coding for proteins
  • Swissprot lt EPDall eukaryotic proteins for which
    the promoter is further characterised
  • Swissprot gt Prosite gt Swissprota single protein
    is expanded by all members of its family (find
    all similar sequences)

14
Integration Supports Enquiry
TIGR
HSSP
PDB
SwissProt
PATHWAY
ENZYME
All H. pylori genes, encoding membrane bound
proteins, involved in glucose metabolism, and
with a homologue of known 3D structure
withresolution better 2Å
15
Solving a murder case
  • Ideal We find the knife with blood stains of the
    victim and the finger prints of the murderer
  • More probable circumstantial evidence
  • Murder weapon, but no finger prints.
  • A cracked watch belonging to the victim, probably
    with the precise time of the murder.
  • A suspect denies knowing the victim, but has
    address of victim in his address book.
  • Alibi of suspect not watertight (no witnesses).
  • Witness saw person of same height as suspect
    disappear from victims house shortly after
    estimated time of murder, but it was too dark to
    see more.

16
Scientific discovery
  • Ideal information for a new drug target
    contained in a single SWISS-PROT entry.
  • If it is that easy, someone else has found it
    already.
  • Real world concepts are often distributed over
    many databanks, e.g., transcription regulation,
    gene, protein family.

17
Different Degrees of Databank Interoperation
X03456
  • hypertext links
  • indexed links
  • composite structures

new datastructure
18
The Object Loader
19
Adding a genome to genomeSCOUT
  • Input is the genome sequence and a list of
    putative genes
  • Computation of protein function prediction
    (bioSCOUT)
  • Computation of orthologs for each genome pairing

20
bioSCOUT automating the pipeline
21
Searching genomeSCOUT databanks
22
Identification of Orthologs
23
Comparative Pathway View
Straightforward analysis of pathways and enzymes
in any selection of organisms
24
Applications in SRS
How many members of the TM4 family did I
find? Did I find any enzymes in the
phenylanaline pathway? Remove all viral
sequences from my hit list
25
Comparing Results of Different Search Methods
  • Run a BLAST search
  • Run a FASTA search with the same sequence
  • Use links to
  • Create a view thatshows both result types
  • Ask questions likeGive me all hits found by
    BLAST and FASTA

BLAST
FASTA
SWISS-PROT
Hit
FASTA
BLAST
1
2
3
26
The Blast results viewer
27
The protein story
28
Selective Display of Protein/Protein Interactions
29
Inspecting the evidence
30
Scaleability is important
mloft
Frustration level
SystemB
SystemA
SRS
1 10 15 50
100 200
No. of Databanks
mloft Maximum Level Of Frustration Tolerance
31
The system must be kept up to date
  • Obtain and install the latest versions of all
    involved databanks
  • Regenerate all derived databanks (eg, FASTA
    files, nonredundant sequence databanks
  • Update links between databanks

32
Complete Automation with SRS Prisma
33
LIONs Company Profile
  • Establishment March 1997 - Heidelberg, Germany
  • IPO August 2000
  • Employees gt 420
  • Locations Heidelberg, Germany Cambridge,
    UK Cambridge MA, USA
  • San Diego (formerly Trega)
  • Represented by CTC in Japan
  • Revenues 1. Software product licenses 2.
    Comprehensive Bio-IT solutions 3. Drug
    discovery-partnerships
Write a Comment
User Comments (0)
About PowerShow.com