BioJava in 2002 - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

BioJava in 2002

Description:

Library for building Applications. Sequence Centric (we'd love to do more) ... Auto-generate much of the event notification web. Much better transactionallity ... – PowerPoint PPT presentation

Number of Views:320
Avg rating:3.0/5.0
Slides: 16
Provided by: Preins5
Category:

less

Transcript and Presenter's Notes

Title: BioJava in 2002


1
BioJava in 2002
  • An Open-Source Java Library for Bioinformatics
  • (Matthew Pocock, BioJava Consulting LTD)

2
What is BioJava?
  • Java code (Java2 required 1.2 and higher)
  • Open-Source
  • Bioinformatics
  • Library for building Applications
  • Sequence Centric (wed love to do more)
  • Part of the Open Bioinformatics Foundation (OBF)
  • Drop biojava.jar into your CLASSPATH go

3
Where is BioJava?
  • http//www.biojava.org
  • mailtobiojava-l_at_biojava.org
  • biojava on irc.openprojects.net

4
Who is BioJava?
  • 35 Developers in most continents and time-zones
  • Core team gt5 individuals
  • Ever expanding user group

5
A look at some API Stuff
6
Whats Been There for a While?
  • Sequences with hierarchical features
  • Sequence databases
  • Sequence IO
  • Various sequence formats (embl, genbank, gff,
    swissprot)
  • Object model can be bypassed for high-performance
    scanning
  • Probability distributions over symbols and
    Dynamic programming toolkit
  • Blast Parsers

7
Whats Reasonably New?
  • TagValue parser API
  • Sequence Search APIs
  • Interoperable with BioJava XML-based parsers for
    many common sequence search algorithms
  • Pure-Java SSAHA implementation
  • Bit-packed sequence storage
  • Taxonomies
  • Literature References
  • Phred

8
Whats Recently Improved?
  • Gap handling
  • Consistent algebra for representing ambiguities
    (e.g. n), compound symbols (e.g. codons) and gaps
  • DAS Client is now very robust
  • Distributed sequence API allows DAS-like
    distributed sequence databases to be easily built
    and implemented
  • More framey annotation bundles
  • Sequence Rendering
  • Looks much better now
  • Handles dotter-style 2d rendering
  • We now actually write JUnit Tests!

9
Java 1.4-reliant Source
  • Java 1.4 offers APIs that are really useful for
    Bioinformatics
  • Logging
  • NIO interfaces for fast IO and raw data access
  • Regular expressions
  • Cascading Exceptions
  • Biojava code relying on 1.4 APIs are
    conditionally built
  • SSAHA implementation
  • Some parsers and handlers for TagValue
  • Restriction enzyme digests

10
OBDA and Fun Trips
  • Sponsored by OReilly and Electric Genetics
  • Developers attended a two-part Hackathon in
    Tuscon, AZ, USA and Cape Town, South Africa
  • Representatives from BioJava, BioPerl, BioPython,
    BioRuby, Ensembl, Emboss and others
  • We hammered out and implemented a range of
    standards designed from the ground up to be
  • Interoperable between the Bio projects
  • Relatively easy to implement from scratch
  • We drank lots of red wine

11
OBDA Support
  • BIOCORBA corba sequence interfaces
  • BioSQL relational tables and standard semantics
    for storing sequences
  • BioFetch cgi-bin-based sequence fetching
  • XEMBL xml-based sequence fetching
  • Bio Directories configuration file for
    resolving resources
  • Flat-file Indexing fetch records by ID and
    secondary ID from multiple ASCII files

12
Things Wed Like To Do in the Near Future
  • Support non-DNA areas of Bioinformatics
  • Cladistics, evolutionary trees, clusters
  • Expression data
  • Proteomics
  • Networks/pathways
  • Biochemical reactions
  • Integrate pre- and post-1.4 exception systems
  • Modify the change notification system
  • Better synchronization and transaction support
  • Easier to optimize events that dont have
    listeners
  • More robust handling of event cascades

13
What Will We See in BioJava 2?
  • Pervasive use of Ontologies
  • Storing annotating data
  • Definition of processing pipelines (e.g.
    customizing parsers)
  • Bindings between BioJava interfaces and external
    data sources
  • Das, biosql, biocorba
  • Pervasive querying making any BioJava
    application an Object Data Store with easy routes
    for data-providers to optimize searches
  • Much more code generation
  • Push most repetitive code into code generators
  • Auto-generate much of the event notification web
  • Much better transactionallity
  • Reduce implementation cost for developers
  • Expose any/all BioJava instances through SOAP
  • Naming and Directory Services

14
And the Biggest Change of All?
  • Make the library accessible to casual developers
    for writing throw-away scripts as well as system
    architects
  • Documentation
  • Tutorials
  • Training
  • Utility classes (e.g. SeqIOTools)

15
Some Contributors
Brian Gilman Brian King Brian Osborne Colin Hardman David H. Klatte
David Huen David Waring Gerald Loeffler Greg Cox Hanning Ni
Jason Stajich Kalle Näslund Keith James Kim Rutherford Lei Lai
Mark Schreiber Martin Senger Mathieu Wiepert Matthew Pocock Michael Jones
Mike Jones Nimesh Singh Ron Kuhn Samiul Hasan Simon Brocklehurst
Stuart Johnston Thad Welch Thomas Down Tim Dilks OBF
Write a Comment
User Comments (0)
About PowerShow.com