Title: Porting CHADO and GMOD Tools to Oracle and Integration with dictyBase
1Porting CHADO and GMOD Tools to Oracle and
Integration with dictyBase
- Eric Just
- dictyBase
- http//dictybase.org
- Center for Genetic Medicine
- Northwestern University
2WHY?
- dictyBase based on SGD
- Increase flexibility in feature storage
- Want to use CHADO for feature data, but dicty
SGD schema for the rest - dicty SGD (Oracle) needs to link to CHADO
Eric Just - dictyBase Northwestern University
3Schema porting
- SQL Fairy did most of this, but
- Had to tweak Oracle Producer
- Object name limited to 30 characters,
systematically truncate names - Unique/primary keys on CLOBs (text) not allowed,
changed to varchar2(4000) - SYNONYM reserved name in Oracle, changed name
to SYNONYM _
Eric Just - dictyBase Northwestern University
4ClassDBI
- ClassDBI provides nice table level
abstraction - CRUD, follow references WITHOUT WRITING SQL
- Excellent tool for portability
- GMOD ships with ClassDBI configured for CHADO
- Had to fix/customize Oracle Driver
Eric Just - dictyBase Northwestern University
5AutoDBI
- Package which loads ClassDBI classes for each
table - Keep class name ChadoSynonym but call
set_up_table( synonym_ ) - Made residues a lazy column of ChadoFeature
- No other Significant porting needed
Eric Just - dictyBase Northwestern University
6Data Migration
GFF3
CHADO
dicty SGD
- Export chromosome sequences and locations in GFF3
- Load GFF3 into CHADO schema
- Update references to features with new tables and
ids
Eric Just - dictyBase Northwestern University
7GBrowse porting
- rows method does not exist in Oracle DBI Driver
if (sth-gtrows() 0)
my rows_returned _at_sth-gtfetchall_arrayref()
sth-gtexecute or BioRootRoot-gtthrow() if (
rows_returned 0)
- Oracle fetchrow_hashref() is case sensitive
sth-gtfetchrow_hashref()
sth-gtfetchrow_hashref("NAME_lc")
Eric Just - dictyBase Northwestern University
8GBrowse porting - Queries
- Oracle does not like anything in a using
clause to also be in the where clause
- select f.feature_id, f.name, fl.fmin,fl.fmax
- from feature f
- join featureloc fl using (feature_id)
- where f.feature_id 221659 and fl.rank0
select f.feature_id, f.name, fl.fmin,fl.fmax
from feature f join featureloc fl
on f.feature_id fl.feature_id where
f.feature_id 221659 and fl.rank0
- substring becomes substr
- Any SQL containing synonym table must be
modified - Any procedural SQL must be reproduced, in some
cases this can be avoided
Eric Just - dictyBase Northwestern University
9Tuning
- Added is_deleted flag to feature table
- Added some audit columns
- Added audit table and triggers
- Created Indexes Heuristically
- Added hints to some difficult queries
Eric Just - dictyBase Northwestern University
10Integrating into dictyBase I
Various middleware and presentation objects
dictyBase Presentation Layer
dictyBase Object Model
Dbtable database abstraction layer
dicty SGD
Eric Just - dictyBase Northwestern University
11Integrating into dictyBase II
dictyBase Presentation Layer
Various middleware and presentation objects
dictyBase Object Model
Dbtable layer
ClassDBI layer
dicty SGD
CHADO
Eric Just - dictyBase Northwestern University
12dictyBase Objects
- Retrieve, insert, update, delete
- Interface ignorant of schema
- No presentation in data classes
- Easy to use interfaces
- Tuned with lazy evaluation most accessors
- 75 80 unit test coverage
Eric Just - dictyBase Northwestern University
13Use BioPerl
- Use BioSeq to represent sequences
- Use BioSeqFeatures to represent transcript and
alignment locations - Harness the power of BioPerl for sequence tasks,
file generation - NOTE BioPerl only used for sequence and location
Eric Just - dictyBase Northwestern University
14Class Diagram
Feature
Aligned
mRNA
Contig
Chromosome
getOverlappingFeatures() getOverlappingAlignments(
)
BioSeqFeatureGeneTranscript
BioSeqFeatureGeneric
BioSeqFeatureGeneric
BioSeq
Eric Just - dictyBase Northwestern University
15Object use case Add an Exon, dbxref, and
Description
!perl use dictyFeature my transcript new
dictyFeature( -feature_no gt 218420
) transcript-gtdescription( Gene model derived
from AU12345' ) transcript-gtadd_external_id(
-source gt GenBank Accession Number',
-id gt 'AU12345'
) bioperl transcript-gtbioperl() bioperl-gte
xons()-gt2-gtstart( 281050 ) my exon
BioSeqFeatureGeneExon-gtnew( -start gt
280921,
-end gt 280959,
-strand gt -1
) exon-gtis_coding(1) bioperl-gtadd_exon(exon)
transcript-gtupdate()
Eric Just - dictyBase Northwestern University
16Using Apollo
GenBank file
GenBank file
GenBank file
GenBank file
Object layer
Chado
- Request segment through SOAP message over HTTP
- Object layer generates GenBank File
- Send GenBank File via SOAP message
- Modify, in Apollo send changed gene models back
via SOAP
- Adaptor changes gene models and updates the
database
Eric Just - dictyBase Northwestern University
17New Curation Tools
- Gene and Feature curation had to be rewritten
- Gene centric curation
- Added more evidence qualifiers
- Presentation classes that manipulate Object Layer
Eric Just - dictyBase Northwestern University
18Where Are We Going
- Utilize the flexibility New Feature Types,
feature relations, and SO. - Contribute back to GMOD
- Gradually port different areas into CHADO
- Provide feedback and testing ground for database
independence
Eric Just - dictyBase Northwestern University
19Acknowlegments
- Other Groups
- Funding
- NIH (NIGMS and NHGRI)
- SGD
- GMOD
- CHADO
- GBrowse
- Apollo
- BioPerl
- dictyBase
- PIs
- Rex Chisholm, PhD
- Warren Kibbe, PhD
- Programmer
- Sohel Merchant
- Curators
- Petra Fey
- Pascale Gaudet, PhD
- Karen Pilcher
- Bioinformatics Core at Northwestern
Eric Just - dictyBase Northwestern University