Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase - PowerPoint PPT Presentation

About This Presentation
Title:

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase

Description:

Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase Eric Just dictyBase http://dictybase.org Center for Genetic Medicine Northwestern University – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 20
Provided by: cgm69
Learn more at: http://gmod.org
Category:

less

Transcript and Presenter's Notes

Title: Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase


1
Porting CHADO and GMOD Tools to Oracle and
Integration with dictyBase
  • Eric Just
  • dictyBase
  • http//dictybase.org
  • Center for Genetic Medicine
  • Northwestern University

2
WHY?
  • dictyBase based on SGD
  • Increase flexibility in feature storage
  • Want to use CHADO for feature data, but dicty
    SGD schema for the rest
  • dicty SGD (Oracle) needs to link to CHADO

Eric Just - dictyBase Northwestern University
3
Schema porting
  • SQL Fairy did most of this, but
  • Had to tweak Oracle Producer
  • Object name limited to 30 characters,
    systematically truncate names
  • Unique/primary keys on CLOBs (text) not allowed,
    changed to varchar2(4000)
  • SYNONYM reserved name in Oracle, changed name
    to SYNONYM _

Eric Just - dictyBase Northwestern University
4
ClassDBI
  • ClassDBI provides nice table level
    abstraction
  • CRUD, follow references WITHOUT WRITING SQL
  • Excellent tool for portability
  • GMOD ships with ClassDBI configured for CHADO
  • Had to fix/customize Oracle Driver

Eric Just - dictyBase Northwestern University
5
AutoDBI
  • Package which loads ClassDBI classes for each
    table
  • Keep class name ChadoSynonym but call
    set_up_table( synonym_ )
  • Made residues a lazy column of ChadoFeature
  • No other Significant porting needed

Eric Just - dictyBase Northwestern University
6
Data Migration
GFF3
CHADO
dicty SGD
  • Export chromosome sequences and locations in GFF3
  • Load GFF3 into CHADO schema
  • Update references to features with new tables and
    ids

Eric Just - dictyBase Northwestern University
7
GBrowse porting
  • rows method does not exist in Oracle DBI Driver

if (sth-gtrows() 0)
my rows_returned _at_sth-gtfetchall_arrayref()
sth-gtexecute or BioRootRoot-gtthrow() if (
rows_returned 0)
  • Oracle fetchrow_hashref() is case sensitive

sth-gtfetchrow_hashref()
sth-gtfetchrow_hashref("NAME_lc")
Eric Just - dictyBase Northwestern University
8
GBrowse porting - Queries
  • Oracle does not like anything in a using
    clause to also be in the where clause
  • select f.feature_id, f.name, fl.fmin,fl.fmax
  • from feature f
  • join featureloc fl using (feature_id)
  • where f.feature_id 221659 and fl.rank0

select f.feature_id, f.name, fl.fmin,fl.fmax
from feature f join featureloc fl
on f.feature_id fl.feature_id where
f.feature_id 221659 and fl.rank0
  • substring becomes substr
  • Any SQL containing synonym table must be
    modified
  • Any procedural SQL must be reproduced, in some
    cases this can be avoided

Eric Just - dictyBase Northwestern University
9
Tuning
  • Added is_deleted flag to feature table
  • Added some audit columns
  • Added audit table and triggers
  • Created Indexes Heuristically
  • Added hints to some difficult queries

Eric Just - dictyBase Northwestern University
10
Integrating into dictyBase I
Various middleware and presentation objects
dictyBase Presentation Layer
dictyBase Object Model
Dbtable database abstraction layer
dicty SGD
Eric Just - dictyBase Northwestern University
11
Integrating into dictyBase II
dictyBase Presentation Layer
Various middleware and presentation objects
dictyBase Object Model
Dbtable layer
ClassDBI layer
dicty SGD
CHADO
Eric Just - dictyBase Northwestern University
12
dictyBase Objects
  • Retrieve, insert, update, delete
  • Interface ignorant of schema
  • No presentation in data classes
  • Easy to use interfaces
  • Tuned with lazy evaluation most accessors
  • 75 80 unit test coverage

Eric Just - dictyBase Northwestern University
13
Use BioPerl
  • Use BioSeq to represent sequences
  • Use BioSeqFeatures to represent transcript and
    alignment locations
  • Harness the power of BioPerl for sequence tasks,
    file generation
  • NOTE BioPerl only used for sequence and location

Eric Just - dictyBase Northwestern University
14
Class Diagram
Feature
Aligned
mRNA
Contig
Chromosome
getOverlappingFeatures() getOverlappingAlignments(
)
BioSeqFeatureGeneTranscript
BioSeqFeatureGeneric
BioSeqFeatureGeneric
BioSeq
Eric Just - dictyBase Northwestern University
15
Object use case Add an Exon, dbxref, and
Description
!perl use dictyFeature my transcript new
dictyFeature( -feature_no gt 218420
) transcript-gtdescription( Gene model derived
from AU12345' ) transcript-gtadd_external_id(
-source gt GenBank Accession Number',
-id gt 'AU12345'
) bioperl transcript-gtbioperl() bioperl-gte
xons()-gt2-gtstart( 281050 ) my exon
BioSeqFeatureGeneExon-gtnew( -start gt
280921,
-end gt 280959,
-strand gt -1
) exon-gtis_coding(1) bioperl-gtadd_exon(exon)
transcript-gtupdate()
Eric Just - dictyBase Northwestern University
16
Using Apollo
GenBank file
GenBank file
GenBank file
GenBank file
Object layer
Chado
  • Request segment through SOAP message over HTTP
  • Object layer generates GenBank File
  • Send GenBank File via SOAP message
  • Modify, in Apollo send changed gene models back
    via SOAP
  • Adaptor changes gene models and updates the
    database

Eric Just - dictyBase Northwestern University
17
New Curation Tools
  • Gene and Feature curation had to be rewritten
  • Gene centric curation
  • Added more evidence qualifiers
  • Presentation classes that manipulate Object Layer

Eric Just - dictyBase Northwestern University
18
Where Are We Going
  • Utilize the flexibility New Feature Types,
    feature relations, and SO.
  • Contribute back to GMOD
  • Gradually port different areas into CHADO
  • Provide feedback and testing ground for database
    independence

Eric Just - dictyBase Northwestern University
19
Acknowlegments
  • Other Groups
  • Funding
  • NIH (NIGMS and NHGRI)
  • SGD
  • GMOD
  • CHADO
  • GBrowse
  • Apollo
  • BioPerl
  • dictyBase
  • PIs
  • Rex Chisholm, PhD
  • Warren Kibbe, PhD
  • Programmer
  • Sohel Merchant
  • Curators
  • Petra Fey
  • Pascale Gaudet, PhD
  • Karen Pilcher
  • Bioinformatics Core at Northwestern

Eric Just - dictyBase Northwestern University
Write a Comment
User Comments (0)
About PowerShow.com