Overview of the Pathway Tools Software and Pathway/Genome Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of the Pathway Tools Software and Pathway/Genome Databases

Description:

What do you hope to get out of the tutorial? SRI International. Bioinformatics. SRI ... Mike McLeod, University of British Columbia, Rhodococcus sp. RHA1 ... – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 39
Provided by: Pan47
Category:

less

Transcript and Presenter's Notes

Title: Overview of the Pathway Tools Software and Pathway/Genome Databases


1
Overview of the Pathway Tools Software and
Pathway/Genome Databases
2
Introductions
  • BRG Staff
  • Peter Karp
  • Tomer Altman
  • Joe Dale
  • Fred Gilham
  • John Myers
  • Suzanne Paley
  • Markus Krummenacker
  • Ingrid Keseler
  • Ron Caspi
  • Alex Shearer
  • Carol Fulcher
  • Attendees
  • Where from, what genome?
  • What do you hope to get out of the tutorial?

3
SRI International
  • Private nonprofit research institute
  • No permanent funding sources
  • 1300 staff in Menlo Park
  • Founded in 1946 as Stanford Research Institute
  • Separated from Stanford University in 1970
  • Name changed to SRI International in 1977

4
SRI Organization
Information and Computing Sciences
Bioinformatics Research Group
Engineering Systems And Sciences
Biopharmaceuticals And Pharmaceutical Discovery
Physical Sciences
Education and Policy
5
Research in the SRIBioinformatics Research Group
  • BioCyc Database Collection
  • EcoCyc
  • MetaCyc
  • Pathway Tools
  • BioWarehouse

6
Outline for Tutorial
  • Monday
  • Introduction
  • Pathway/Genome Navigator
  • Introduction to Pathway/Genome Editors
  • Tuesday
  • PathoLogic tutorial
  • PathoLogic lab session Build initial version of
    PGDB
  • Pathway hole filler lecturelab
  • Wednesday
  • PathoLogic Creating protein complexes, operon
    predictor, transport inference parser
  • Pathway Tools Schema
  • Model organism database projects
  • Thursday
  • Advanced Pathway/Genome Editors
  • Friday
  • Overviews and Omics Viewers
  • Comparative analysis
  • Structured Advanced Query Form
  • Metabolite Tracing

7
Outline for Tutorial
  • Monday
  • Introduction
  • Pathway/Genome Navigator
  • Metabolite Tracing
  • Omics Viewers
  • Tuesday
  • PathoLogic tutorial
  • PathoLogic lab session Build initial version of
    PGDB
  • Pathway hole filler (run overnight)
  • Wednesday
  • PathoLogic Creating protein complexes, operon
    predictor, transport inference parser
  • Pathway Tools Schema
  • Model organism database projects
  • Thursday
  • Editors
  • Feedback session
  • Friday
  • Writing programs to access and modify PGDBs

8
Tutorial Goals
  • General familiarity with Pathway Tools goals and
    functionality
  • Ability to create, edit, and navigate a new PGDB
  • Create new PGDB for genome(s) you brought with
    you
  • Familiarity with information resources available
    about Pathway Tools to continue your work

9
SRIs Support for Pathway Tools
  • NIH grant finances software development and user
    support
  • Additional grants finance other software
    development
  • Email us bug reports, suggestions, questions
  • Comprehensive bug reports are required for us to
    fix the problem you reported
  • Keep us posted regarding your progress

10
Administrative Details
  • Please wear badge at all times
  • Escort required outside this room/hallway
  • Let us know when you are leaving
  • Use E-Bldg Entrance
  • Phone numbers to call from entrance
  • Meals
  • Restrooms

11
Tutorial Format
  • Questions welcome during presentations
  • Lab sessions will take different amounts of time
    for different people
  • Refine your PGDB
  • Read Pathway Tools manuals
  • Computer logins
  • Internet connectivity

12
Pathway/Genome Database
Pathways
Reactions
Compounds
Sequence Features
Proteins RNAs
Operons Promoters DNA Binding Sites Regulatory
Interactions
Genes
Chromosomes Plasmids
CELL
13
BioCyc Collection of Pathway/Genome Databases
  • Pathway/Genome Database (PGDB) combines
    information about
  • Pathways, reactions, substrates
  • Enzymes, transporters
  • Genes, replicons
  • Transcription factors/sites, promoters, operons
  • Tier 1 Literature-Derived PGDBs
  • MetaCyc
  • EcoCyc -- Escherichia coli K-12
  • Tier 2 Computationally-derived DBs, Some
    Curation -- 20 PGDBs
  • HumanCyc
  • Mycobacterium tuberculosis
  • Tier 3 Computationally-derived DBs, No Curation
    -- 349 DBs

14
Terminology Pathway Tools Software
  • PathoLogic
  • Predicts operons, metabolic network, pathway hole
    fillers, from genome
  • Computational creation of new Pathway/Genome
    Databases
  • Pathway/Genome Editors
  • Distributed curation of PGDBs
  • Distributed object database system, interactive
    editing tools
  • Pathway/Genome Navigator
  • WWW publishing of PGDBs
  • Querying, visualization of pathways, chromosomes,
    operons
  • Analysis operations
  • Pathway visualization of gene-expression data
  • Global comparisons of metabolic networks

Bioinformatics 18S225 2002
15
Pathway Tools Software PGDBs Created Outside SRI
  • 1000 licensees 75 groups applying software to
    150 organisms
  • Saccharomyces cerevisiae, SGD project, Stanford
    University
  • pathway.yeastgenome.org/biocyc/
  • Mouse, MGD, Jackson Laboratory
  • dictyBase, Northwestern University
  • Under development
  • CGD (Candida albicans), Stanford University
  • Drosophila, P. Ebert in collaboration with
    FlyBase
  • C. elegans, P. Ebert in collaboration with
    WormBase
  • Planned
  • RGD (Rat), Medical College of Wisconsin
  • Arabidopsis thaliana, TAIR, Carnegie Institution
    of Washington
  • Tomato and Potato, Cornell University
  • GrameneDB, Cold Spring Harbor Laboratory
  • Medicago truncatula, Samuel Roberts Noble
    Foundation

16
Pathway Tools Software PGDBs Created Outside SRI
  • NIAID BRCs BioHealthBase (M. tuberculosis, F.
    tuleremia), PATRIC, ApiDB (Cryptosporidium)
  • F. Brinkman, Simon Fraser Univ, Pseudomonas
    aeruginosa
  • V. Schachter, Genoscope, Acinetobacter
  • M. Bibb, John Innes Centre, Streptomyces
    coelicolor
  • G. Church, Harvard, Prochlorococcus marinus,
    multiple strains
  • E. Uberbacher, ORNL and G. Serres, MBL,
    Shewanella onedensis
  • R.J.S. Baerends, University of Groningen,
    Lactococcus lactis IL1403, Lactococcus lactis
    MG1363, Streptococcus pneumoniae TIGR4, Bacillus
    subtilis 168, Bacillus cereus ATCC14579
  • Matthew Berriman, Sanger Centre, Trypanosoma
    brucei, Leishmania major
  • Herbert Chiang, Washington University,
    Bacteroides thetaiotaomicron
  • Sergio Encarnacion, UNAM, Sinorhizobium meliloti
  • Gregory Fournier, MIT, Mesoplasma florum
  • Mark van der Giezen, University of London,
    Entamoeba histolytica, Giardia intestinalis
  • Michael Gottfert, Technische Universitat Dresden,
    Bradyrhizobium japonicum
  • Artiva Maria Goudel, Universidade Federal de
    Santa Catarina, Brazil, Chromobacterium violaceum
    ATCC 12472
  • Kenneth J. Kauffman, University of California,
    Riverside, Desulfovibrio vulgaris

17
Pathway Tools Software PGDBs Created Outside SRI
  • Mike McLeod, University of British Columbia,
    Rhodococcus sp. RHA1
  • Robert S. Munson, Children's Research Institute,
    Ohio, Haemophilus ducreyi, Haemophilus influenzae
    86-026NP
  • John Nash, Canadian NRC, Campylobacter jejuni
  • Christopher S. Reigstad, Washington University,
    Escherichia coli UTI89
  • Haluk Resat, Pacific Northwest Lab, Rhodobacter
    sphearoides
  • Gary Xie, Los Alamos Lab, Bacillus cereus
  • Large scale users
  • C. Medigue, Genoscope, 107 PGDBs
  • G. Burger, U Montreal, 48 PGDBs
  • Bart Weimer, Utah State University, Lactococcus
    lactis, Brevibacterium linens, Lactobacillus
    acidophilus, Lactobacillus plantarum,
    Lactobacillus johnsonii, Listeria monocytogenes
  • Partial listing of outside PGDBs at BioCyc.org

18
Terminology
  • Database DB Knowledge Base KB
    Pathway/Genome Database PGDB

19
Why Create PGDBs?
  • Extract more information from your genome
  • Create an up-to-date computable information
    repository about an organism
  • Perform analyses on the genome and pathway
    complement of the organism
  • Analyses of omics data
  • Analyses of cellular systems (dead-end
    metabolites)
  • Reports generated by Pathway Tools
  • Perform comparative analyses with other organisms
  • Generate a genome poster and metabolic wall chart

20
Sequence Project Workflow
Raw Sequence
PathoLogic
Phred
P/G Editors
Pathway Tools
Phrap
P/G Navigator
GeneMark/Glimmer
WWW Publishing
Analyses
BLAST, BLOCKS
21
EcoCyc E.coli Dataset
Pathway/Genome Navigator
URL EcoCyc.org
Pathways 205
Reactions 4,956 Metabolic 993 Transport 235
Compounds 1,187
Citations 15,880
Proteins 4,316 RNAs 277
Gene Regulation Operons 3133 Trans Factors
172 Promoters 1649 TF Binding Sites 1770
Genes 4,516
22
EcoCyc Project EcoCyc.org
  • E. coli Encyclopedia
  • Review-level Model-Organism Database for E. coli
  • Tracks evolving annotation of the E. coli genome
    and cellular networks
  • The two paradigms of EcoCyc
  • Multi-dimensional annotation of the E. coli K-12
    genome
  • Positions of genes functions of gene products
    76 / 66 exp
  • Gene Ontology terms MultiFun terms
  • Gene product summaries and literature citations
  • Evidence codes
  • Multimeric complexes
  • Metabolic pathways
  • Regulation of transcription initiation

Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 357577 2007 ASM News
7025 2004 Science 2932040
23
Paradigm 1EcoCyc as Textual Review Article
  • All gene products for which experimental
    literature exists are curated with a minireview
    summary
  • Found on protein and RNA pages, not gene pages!
  • 3257 gene products contain summaries
  • Summaries cover function, interactions, mutant
    phenotypes, crystal structures, regulation, and
    more
  • Additional summaries found in pages for operons,
    pathways
  • EcoCyc cites 15,880 publications

24
Summaries in Gene Products
25
Paradigm 2 EcoCyc as Computational Symbolic
Theory
  • Highly structured, high-fidelity knowledge
    representation provides computable information
  • Each molecular species defined as a DB object
  • Genes, proteins, small molecules
  • Each molecular interaction defined as a DB object
  • Metabolic reactions
  • Transport reactions
  • Transcriptional regulation of gene expression
  • 220 database fields capture extensive properties
    and relationships

26
EcoCyc Procedures
  • DB updates performed by 5 staff curators
  • Information gathered from biomedical literature
  • Enter data into structured database fields
  • Author extensive summaries
  • Update evidence codes
  • Corrections submitted by E. coli researchers
  • Four releases per year
  • Quality assurance of data and software
  • Evaluate database consistency constraints
  • Perform element balancing of reactions
  • Run other checking programs

27
MetaCyc Metabolic Encyclopedia
  • Describe a representative sample of every
    experimentally determined metabolic pathway
  • Describe properties of metabolic enzymes
  • Literature-based DB with extensive references and
    commentary
  • Pathways, reactions, enzymes, substrates
  • Jointly developed by
  • P. Karp, R. Caspi, C. Fulcher, SRI International
  • L. Mueller, A. Pujar, Cornell Univ
  • S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research 2008
28
MetaCyc Data -- Version 11.6
Pathways 1010
Reactions 6,576
Enzymes 4,582
Small Molecules 6,561
Organisms 1,077
Citations 15,875
29
Taxonomic Distribution ofMetaCyc Pathways
Bacteria 517
Green Plants 372
Mammals 90
Fungi 89
Archaea 65
30
Family of Pathway/GenomeDatabases
31
Comparison of BioCyc to KEGGThe Data
  • KEGG approach Static collection of pathway
    diagrams that are color-coded to produce
    organism-specific views
  • KEGG vs MetaCyc Resource on literature-derived
    pathways
  • KEGG pathway maps are composites of pathways in
    many organisms -- do not identify what specific
    pathways elucidated in what organisms
  • KEGG pathway maps encompass multiple biological
    pathways are 2-4 times the size of MetaCyc
    pathways
  • KEGG has no literature citations, no summaries,
    less enzyme detail
  • KEGG vs BioCyc organism-specific PGDBs
  • KEGG re-annotates entire genome for each organism
  • KEGG does not curate or customize pathway
    networks for each organism

32
Comparison of Pathway Tools to KEGG The Software
  • KEGG has no pathway hole filler or transport
    inference parser or operon predictor
  • KEGG has no interactive editing tools you
    cannot refine a KEGG pathway DB
  • KEGG has no algorithmic visualization tools
    pathway diagrams are pre-drawn
  • May become out of date
  • Cannot show pathways at multiple detail levels
  • KEGG genome browser has very limited
    functionality
  • KEGG has one overview diagram with limited
    functionality
  • KEGG has no metabolite tracing tool
  • KEGG has no Structured Advanced Query Tool

33
Overviews and Omics Viewers
  • Genome-scale Visualizations
  • Metabolic map
  • Transcriptional regulatory network
  • Genome map
  • Overlay gene expression, proteomics, metabolomics
    data
  • Obtain pathway based visualizations of omics data
  • Numerical spectrum of expression values mapped to
    a color spectrum
  • Steps of overview painted with color
    corresponding to expression level(s) of genes
    that encode enzyme(s) for that step


34
Environment for Computational Exploration of
Genomes
  • Powerful ontology opens many facets of the
    biology to computational exploration
  • Global characterization of metabolic network
  • Analysis of interface between transport and
    metabolism
  • Nutrient analysis of metabolic network

35
Pathway Tools Implementation Details
  • Allegro Common Lisp
  • Sun, Linux, Windows, Macintosh platforms
  • Ocelot object database
  • 370,000 lines of code
  • Lisp-based WWW server at BioCyc.org
  • Manages 370 PGDBs

36
The Common Lisp ProgrammingEnvironment
  • Gatt studied Lisp and Java implementation of 16
    programs by 14 programmers (Intelligence 1121
    2000)

37
Peter Norvigs Solution
  • I wrote my version in Lisp. It took me about 2
    hours (compared to a range of 2-8.5 hours for the
    other Lisp programmers in the study, 3-25 for
    C/C and 4-63 for Java) and I ended up with 45
    non-comment non-blank lines (compared with a
    range of 51-182 for Lisp, and 107-614 for the
    other languages). (That means that some Java
    programmer was spending 13 lines and 84 minutes
    to provide the functionality of each line of my
    Lisp program.)
  • http//www.norvig.com/java-lisp.html

38
Survey
  • Please complete survey at end of each day

39
PGDB(s) That You Build
  • Before you leave
  • Tar up your PGDB directory and FTP it home, email
    it home, or copy it to flash disk
  • We will create a backup copy of your PGDB
    directory if the directory is still there at the
    end of the tutorial
  • Delete the PGDB directory if you dont want us to
    back it up
  • We will not give the backed up data to anyone else

40
Information Sources
  • Pathway Tools Users Guide
  • /root/aic-export/pathway-tools/ptools/11.5/doc/man
    uals/userguide.pdf
  • NOTE Location of the aic-export directory can
    vary across different computers
  • Pathway Tools Web Site
  • Publications, FAQ, programming examples, etc.
  • http//bioinformatics.ai.sri.com/ptools/
  • BioCyc Publications Page
  • http//biocyc.org/publications.shtml
  • MetaCyc Guide
  • http//metacyc.org/MetaCycUserGuide.shtml
  • Slides from this tutorial
  • http//bioinformatics.ai.sri.com/ptools/tutorial/
  • BioCyc Webinars
  • http//biocyc.org/webinar.shtml

41
Reporting Pathway Tools Problems
  • ptools-support_at_ai.sri.com
  • Tell us
  • What platform you are running on
  • What version of Pathway Tools you are running
  • The error message
  • Result of 1 EC(2) zoom count all
  • What operation were you performing when the error
    occurred?
  • New patches automatically downloaded and loaded
    with PTools starts up
  • Auto-Patch
  • Tools -gt Instant Patch -gt Download and Activate
    All Patches

42
Summary
  • Pathway Tools and Pathway/Genome Databases
  • Not just for pathways!
  • Computational inferences
  • Operons, metabolic pathways, pathway hole fillers
  • Editing tools
  • Analysis tools Omics data on pathways
  • Web publishing of PGDBs
  • Main classes of users
  • Develop PGDB to extract more information from
    genome for genome paper
  • Develop a model-organism DB for the organism that
    is updated regularly and published on the web
Write a Comment
User Comments (0)
About PowerShow.com