Pathway Tools Schema and Semantic Inference Layer Genes, Operons, and Replicons - PowerPoint PPT Presentation

About This Presentation
Title:

Pathway Tools Schema and Semantic Inference Layer Genes, Operons, and Replicons

Description:

Computing with Pathway/Genome Databases – PowerPoint PPT presentation

Number of Views:152
Avg rating:3.0/5.0
Slides: 65
Provided by: PeterK200
Category:

less

Transcript and Presenter's Notes

Title: Pathway Tools Schema and Semantic Inference Layer Genes, Operons, and Replicons


1
Pathway Tools Schema and Semantic Inference
LayerGenes, Operons, and Replicons
2
Representing a Genome
product
components
Product1
Gene1
Gene2
CHROM1
genome
Gene3
CHROM2
ORG
PLASMID1
  • Classes
  • ORG is of class Organisms
  • CHROM1 is of class Chromosomes
  • PLASMID1 is of class Plasmids
  • Gene1 is of class Genes
  • Product1 is of class Polypeptides or RNA

3
  • (defun genes-of-chrom (chrom)
  • (loop for x in (get-slot-values chrom
    components)
  • when (instance-all-instance-of-p x
    Genes)
  • collect x)
  • )

4
Polynucleotides
Review slots of COLI and of COLI-K12
5
Genetic-Elements
  • Sequence is stored in a separate file or database
    table

6
Polymer-Segments
Review slots of Genes
7
Complexities of Gene / Gene-ProductRelationships
  • The Product of a gene can be an instance of
    Polypeptides or RNAs
  • An instance of Polypeptides can have more than
    one gene encoding it
  • Sequence position
  • Nucleotide positions of starting and ending
    codons specified in Left-End-Position and
    Right-End-Position (usually greater, except at
    origin)
  • Transcription-Direction / -
  • Alternative splicing
  • Nucleotide positions of starting and ending
    codons specified in Left-End-Position and
    Right-End-Position
  • Intron positions specified in Splice-Form-Introns
    of gene product
  • (200 300) (350 400)

8
Gene Reaction Schematic
9
Substring Search Example
  • Find all genes that contain a given substring
    within their common name or synonym list.
  • (defun find-gene-by-substring (substring)
  • (let (result)
  • (loop for g in (get-class-all-instances
    'Genes)
  • do
  • (loop for name in (get-slot-values
    g 'names)
  • when (search substring
    name test 'string-equal)
  • do (pushnew g result)
  • ) )
  • result
  • ) )

10
Proteins
11
Proteins and Protein Complexes
  • Polypeptide the monomer protein product of a
    gene (may have multiple isoforms, as indicated at
    gene level)
  • Protein complex proteins consisting of multiple
    polypeptides or protein complexes
  • Example DNA pol III
  • DnaE is a polypeptide
  • pol III core is DnaE and two other polypeptides
  • pol III holoenzymes is several protein complexes
    combined

12
Protein Complex Relationships
13
Slots of a protein (DnaE)
  • catalyzes
  • Is it a regulator/reactant/etc?
  • comment
  • component-of
  • dblinks
  • features (edited in feature editor)
  • Many other attributes possible

14
A complex at the frame level (pol III)
  • Most of the same attributes as polypeptide frame
  • component-of and components
  • note coefficients

15
Protein Complex Relationships
16
Relationships are Defined in Many Places
  • component-of comes from creating a complex
  • appears-in-left-side-of comes from defining a
    reaction (as do modified forms)
  • regulates comes from an enzymatic reaction or TU
  • can only edit dna-footprint if protein has been
    associated with a TU

17
Semantic Inference Layer
  • Reactions-of-protein (prot)
  • Returns a list of rxns this protein catalyzes
  • Transcription-units-of-proteins(prot)
  • Returns a list of TUs activated/inhibited by the
    given protein
  • Transporter? (prot)
  • Is this protein a transporter?
  • Polypeptide-or-homomultimer?(prot)
  • Transcription-factor? (prot)
  • Obtain-protein-stats
  • Returns 5 values
  • Length of all-polypeptides, complexes,
    transporters, enzymes, etc

18
Example
  • Find all enzymes that use pyridoxal phosphate as
    a cofactor or prosthetic group
  • (loop for protein in (get-class-all-instances
    Proteins)
  • for enzrxn (get-slot-value protein
    enzymatic-reaction)
  • when (and enzrxn
  • (or (member-slot-value-p enzrxn cofactors
    pyridoxal_phosphate)
  • (member-slot-value-p enzrxn
    prosthetic-groups pyridoxal_phosphate))
  • collect protein)
  • (member-slot-value-p frame slot value) T if
    Value is one of the values of Slot of Frame.

19
Sample
  • Find all proteins without
  • a comment anywhere

20
RNAs
21
RNAs
  • PGDBs only represent RNAs that are terminal gene
    products
  • tRNAs
  • rRNAs
  • Regulatory RNAs
  • Miscellaneous small RNAs
  • Slots similar to proteins
  • tRNAs can have an anticodon

22
(No Transcript)
23
The RNA Ontology
24
Compounds / Reactions / Pathways
25
Compounds / Reactions / Pathways
  • Think of a three tiered structure
  • Reactions built on top of compounds
  • Pathways built on top of reactions
  • Metabolic network defined by reactions alone
    pathways are an additional optional structure
  • Some reactions not part of a pathway
  • Some reactions have no attached enzyme
  • Some enzymes have no attached gene

26
Compounds
  • Relatively few aspects of a compound defined
    within the compound editor
  • MW, formula calculated from edited structure
  • Most aspects defined in other editors
  • Pathway reactions comes from reaction editing
    followed by pathway editing
  • Activator, etc come from the protein editor

27
(No Transcript)
28
(print-frame TRP) -- Instance TRP ---
Types Amino-Acid, Aromatic-Amino-Acids,
Non-polar-amino-acids APPEARS-IN-LEFT-SIDE-OF
RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN,
TRYPTOPHAN--TRNA-LIGASE-RXN
APPEARS-IN-RIGHT-SIDE-OF RXN0-2382, RXN0-301,
TRANS-RXN-76, TRYPSYN-RXN CHEMICAL-FORMULA (C
11), (H 12), (N 2), (O 2) COMMON-NAME
"L-tryptophan" DBLINKS (LIGAND-CPD "C00078"
NIL kaipa 3311532640 NIL NIL), (CAS
"6912-86-3"), (CAS "73-22-3") NAMES
"L-tryptophan", "W", "tryptacin", "trofan",
"trp", "tryptophan", "2-amino-3-indolylp
ropanic acid" SMILES "c1(c(CC(N)C(O)O)c2(c(n
H1)cccc2))" SYNONYMS "W", "tryptacin",
"trofan", "trp", "tryptophan",
"2-amino-3-indolylpropanic acid" _________________
___________________________
29
Where is diphosphate in the ontology?
30
Semantic Inference Layer
  • Reactions-of-compound (cpd)
  • Pathways-of-compound (cpd)
  • Is-substrate-an-autocatalytic-enzyme-p (cpd)
  • Activated/inhibited-by? (cpds slots)
  • Returns a list of enzrxns for which a cpd in cpds
    is a modulator (example slots activators-all,
    activators-allosteric)
  • All-substrates (rxns)
  • All unique substrates specified in the given rxns
  • Has-structure-p (cpd)
  • Obtain-cpd-stats
  • Returns two values
  • Length of all-cpds, cpds with structures

31
Miscellaneous things.
  • History List
  • Back/Forward and History buttons
  • Default list is 50 items
  • Show frame
  • (print-frame frame)

32
(No Transcript)
33
Queries with Multiple Answers
  • Navigator queries
  • Example Substring search for pyruvate
  • Selected list is placed on the Answer list
  • Use Next Answer button to view each one of them
  • Lisp queries
  • Example Find reactions involving pyruvate as a
    substrate
  • (get-class-all-instances Compounds)
  • (loop for rxn in (get-class-all-instances
    Reactions)
  • when (member pyruvate (get-slot-values rxn
    substrates)
  • collect rxn)
  • (replace-answer-list )

34
Reactions
35
Reactions
  • Represents information about a reaction that is
    independent of enzymes that catalyze the reaction
  • Connected to enzyme(s) via enzymatic reaction
    frames
  • Classified with EC system when possible
  • Example 2.7.7.7 DNA-directed DNA
    polymerization
  • Carried out by five enzymes in E. coli

36
Reaction Ontology
37
Where is 2.7.7.7 in the Ontology?
38
Slots of Reaction Frames
  • Balance-state
  • EC-number
  • Enzymatic-reaction
  • Generated in protein or reaction editor
  • In-pathway
  • Generated in pathway editor
  • Left and Right (reactants / products)
  • Can include modified forms of proteins, RNAs, etc
    here
  • Not all reactants/products need to be frames

39
(No Transcript)
40
Reaction relationships
41
Enzymatic Reactions (DnaE and 2.7.7.7)
  • A necessary bridge between enzymes and generic
    versions of reactions
  • Carries information specific to an
    enzyme/reaction combination
  • Cofactors and prosthetic groups
  • Alternative substrates
  • Links to regulatory interactions
  • Frame is generated when protein is associated
    with reaction (via protein or reaction editor)

42
(No Transcript)
43
Regulation of Enzyme Activity
44
Semantic Inference Layer
  • Genes-of-reaction (rxn)
  • Substrates-of-reaction (rxn)
  • Enzymes-of-reaction (rxn)
  • Lacking-ec-number (organism)
  • Returns list of rxns with no ec numbers in that
    database
  • Get-reaction-direction-in-pathway (pwy rxn)
  • Reaction-type(rxn)
  • Indicates types of Rxn as Small molecule rxn,
    transport rxn, protein-small-molecule rxn (one
    substrate is protein and one is a small
    molecule), protein rxn (all substrates are
    proteins), etc.
  • All-rxns(type)
  • Specify the type of reaction (see above for type)
  • Obtain-rxn-stats
  • Returns six values
  • Length of all-rxns, transport, non-transport,
    etc

45
Find all small-molecule reactions that have no
enzyme but are not spontaneous (orphan
reactions) (defun orphan-reactions (optional
(verbose? t)) (loop for r in (all-rxns
small-molecule) when (and (not
(slot-has-value-p r 'enzymatic-reaction))
(not (get-slot-value r
'spontaneous?))) collect r) )
46
Reaction Direction
  • Left/Right reflect direction of reaction as
    written by Enzyme Commission
  • Reflects systematic direction for different
    reaction classes
  • Left/Right do not necessarily correspond to
    physiological direction of a reaction
  • Get-rxn-direction(rxn)
  • Returns L2R or R2L or BOTH or NIL
  • Integrates all available info about direction of
    this reaction
  • Direction(s) it occurs in all pathways in the
    PGDB
  • Direction(s) as specified in Enzymatic-Reactions

47
Pathways
48
Outline
  • Pathways
  • Representation of Pathways
  • Querying Pathways Programmatically
  • How Pathway Diagrams are Generated
  • Future Work Signalling Pathways
  • Cellular Overview Diagram
  • New Functionality
  • Under the Hood
  • How Overview Diagram is Generated
  • Using Overview Diagram for Global Queries

49
What is a Pathway?
  • An ordered set of interconnected, directed
    biochemical reactions
  • Reactions form a coherent unit, e.g.
  • Regulated as a single unit
  • Evolutionarily conserved across organisms as a
    single unit
  • When combined, perform a single cellular function
  • Historically grouped together as a unit
  • Includes metabolic pathways and signalling
    pathways
  • Evidence for all reactions in a single organism
  • Pathways can be linear, cyclical, branched, or
    some combination

50
Internal Representation of Pathways
  • REACTION-LIST unordered list of reactions that
    comprise the pathway
  • PREDECESSORS list of reaction pairs that define
    ordering relationships between reactions.
  • E.g. R1 R2 C
  • A B
  • R3 D
  • (R2 R1) Predecessor of R2 is R1
  • (R3 R1) Predecessor of R3 is R1
  • (R1) R1 has no predecessor (can be omitted)

51
What is missing from Pathway Representation?
  • Reaction directions
  • Some reactions are unidirectional, but many are
    reversible how do we know in which direction to
    draw the reaction?
  • Main vs. side substrates
  • A B
    C
  • D E
    F
  • Main compounds form the backbone of the pathway
  • substrates shared between connecting reactions
  • major inputs and outputs.
  • Side compounds omitted from pathway diagrams at
    low detail levels
  • Individual reactions do not necessarily have main
    and side compounds a particular substrate may
    be either a main or a side depending on the
    pathway context.

52
Computing Directionality and Mains/Sides
  • Our philosophy Enable curator to specify as
    little as possible. Compute as much as possible.
    This reduces redundancy and potential for
    inconsistencies.
  • Example
  • Reactions R1 A B ? C D
  • R2 B ? E
  • Predecessors (R2 R1)
  • Only substrate overlap is B
  • B must be a main substrate
  • A must be a side substrate,
  • R1 must proceed from right to left
  • R2 must proceed from left to right
  • C D ? B ? E
  • A

53

But
  • Unfortunately, mains, sides and reaction
    directions are sometimes ambiguous
  • At beginnings and ends of pathways
  • Use heuristics to determine main/side substrates
    at beginnings, ends of pathways
  • Not always what the curator wants
  • Substrate overlap with both sides of a reaction,
  • e.g. A B ? C D
  • C B ? E
  • Solution Additional slot PRIMARIES, should only
    be populated when necessary
  • PRIMARIES (R (A B) (C)) says that for reaction
    R, A and B are both main reactants, and C is a
    main product.

54
More Complications
  • ENZYME-USE a reaction may be catalyzed by
    multiple enzymes, but not all the enzymes
    necessarily participate in a given pathway
  • Not present in the same compartment with rest of
    pathway enzymes
  • Down-regulated or not expressed under conditions
    in which pathway is active
  • ENZYME-USE slot tells us which enzymes catalyze
    reaction in pathway, if not all.
  • LAYOUT-ADVICE helps software draw pathway
    correctly, e.g. in a cyclical pathway, tells
    which substrate should be at the top.
  • HYPOTHETICAL-REACTIONS list of reactions in the
    pathway that are considered hypothetical (i.e. no
    direct experimental evidence)

55
Polymerization Pathways
  • ? Xn Xn1 X10
  • POLYMERIZATION-LINKS specifies reactions that
    should be connected by a polymerization link
  • (X R1 R1) --- REACTANT-NAME-SLOT
    N-NAME
  • ---
    PRODUCT-NAME-SLOT N1-NAME
  • CLASS-INSTANCE-LINKS specifies when a link
    should be drawn between a substrate class and
    some instance of it (necessary only if instance
    is not a member of some reaction, so no
    predecessor relationship can be defined)
  • R1 --- PRODUCT-INSTANCES X10

56
Super-Pathways
  • Collection of pathways that connect to each other
    via common substrates or reactions, or as part of
    some larger logical unit
  • Can contain both sub-pathways and additional
    connecting reactions
  • Can be nested arbitrarily
  • REACTION-LIST a pathway ID instead of a reaction
    ID in this slot means include all reactions from
    the specified pathway
  • PREDECESSORS a pathway ID instead of a tuple in
    this slot means include all predecessor tuples
    from the specified pathway

57
Pathway Links
  • Can be used as an alternative or in addition to
    defining super-pathways
  • Link must be to or from some main substrate in
    the pathway
  • Other end of link can be a pathway, a reaction,
    or an arbitrary text string
  • Software automatically computes direction of
    link, but curator can override it

58
Querying Pathways Programmatically
  • See http//bioinformatics.ai.sri.com/ptools/ptools
    -resources.html
  • (all-pathways)
  • (base-pathways)
  • Returns list of all pathways that are not
    super-pathways
  • (genes-of-pathway pwy)
  • (unique-genes-of-pathway pwy)
  • Returns list of all genes of a pathway that are
    not also part of other pathways
  • (enzymes-of-pathway pwy)
  • (substrates-of-pathway pwy)
  • (variants-of-pathway pwy)
  • Returns all pathways in the same variant class as
    a pathway
  • (get-predecessors rxn pwy), (get-successors rxn
    pwy)
  • (get-rxn-direction-in-pathway pwy rxn)
  • (pathway-inputs pwy), (pathway-outputs pwy)
  • Returns all compounds consumed (produced) but not
    produced (consumed) by pathway (ignores
    stoichiometry)

59
Example Queries
  • Find all genes involved in metabolic pathways
  • (remove-duplicates
  • (loop for p in (all-pathways)
  • append (genes-of-pathway p)))
  • Find all compounds that are unique to a single
    pathway
  • (loop for p in (base-pathways)
  • append
  • (loop for c in (substrates-of-pathway p)
  • when (null (remove p
    (pathways-of-compound c)))
  • collect (list c p)))

60
Regulation
61
Regulation
  • Reorganization and expansion of regulation under
    way in Pathway Tools
  • Initial application to EcoCyc
  • Class Regulation with subclasses that describe
    different biochemical mechanisms of regulation
  • Slots
  • Regulator
  • Regulated-Entity
  • Mode
  • Mechanism

62
Regulation of Enzyme Activity
  • Class Regulation-of-Enzyme-Activity
  • Each instance of the class describes one
    regulatory interaction
  • Slots
  • Regulator -- usually a small molecule
  • Regulated-Entity -- an Enzymatic-Reaction
  • Mechanism -- One of
  • Competitive, Uncompetitive, Noncompetitive,
    Irreversible, Allosteric, Other
  • Mode -- One of , -
  • Physiologically-relevant? true/false

63
Transcription Initiation
  • Class Regulation-of-Transcription-Initiation
  • Transcription factor binds to DNA binding site to
    regulate transcription initiation from a promoter
  • Slots
  • Regulator -- instance of Proteins or Complexes
    (a transcription-factor)
  • Regulated-Entity -- instance of Promoters
  • Mode -- One of , -
  • Associated-binding-site a DNA-Binding-Site

64
Attenuation
  • Class Transcriptional-Attenuation
  • Several subclasses depending on type of
    attenuation
  • Slots common to all
  • Regulator -- Depends on subtype of attenuation
  • Regulated-Entity -- instance of Terminators
  • Mode -- One of , -

65
Attenuation Subtypes
  • Ribosome-Mediated-Attenuation
  • E.g. trp operon ribosome pauses based on levels
    of charged tRNA, determines formation of
    terminator or antiterminator
  • RNA-Mediated-Attenuation
  • RNA (tRNA or sRNA) binds to transcript,
    determines formation of terminator or
    antiterminator
  • Protein-Mediated-Attenuation
  • Protein binds to transcript, determines formation
    of terminator or antiterminator
  • Small-Molecule-Mediated-Attenuation
  • Small molecule binds to transcript, determines
    formation of terminator or antiterminator
  • Rho-Blocking-Antitermination
  • RNA-Polymerase-Modification
  • Regulatory protein binds to site in transcription
    unit and interacts with RNA polymerase to
    determine termination

66
Transcriptional Regulation
trp
rxn001
apoTrpR
TrpRtrp
reg001
site001
pro001
trpL
trpLEDCBA
trpE
trpD
trpC
trpB
trpA
term001
reg002
charged-tRNAtrp
67
Data Exchange
68
Data Exchange
  • Java API and Perl API read modify
  • BioPAX Export since Pathway Tools 9.0
  • Biopax.org
  • Export of entire PGDB as Flatfiles
  • Export of Reactions as SBML -- sbml.org
  • Import/Export of Pathways between PGDBs
  • Import/Export of Selected Frames, for
    Spreadsheets
  • Import/Export of Compounds as Molfile, CML
  • Registering/Publishing PGDBs on WWW
  • Export PGDB as Genbank
  • BioWarehouse Loader for Flatfiles, SQL access
  • http//bioinformatics.ai.sri.com/biowarehouse/

69
Dump PGDB into Flatfiles
  • Export of entire PGDB as Flatfiles
  • Format Description UG v.I section 4.5
  • Column delimited 1 line per frame
  • Attribute-value 1 record per frame
  • Multiple slot values
  • Column delimited several values per column
  • Attribute-value several lines for several values

70
Frame Import/Export
  • Import/Export of Selected Frames, for
    Spreadsheets
  • Frame selection, Slot selection GUI
  • Format Description UG v.I section 4.6.3
  • Column delimited 1 line per frame
  • Attribute-value 1 record per frame
  • Multiple slot values
  • Column delimited several values per column
  • Attribute-value several lines for several values
Write a Comment
User Comments (0)
About PowerShow.com