The Centralized Life Sciences Data (CLSD) service - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

The Centralized Life Sciences Data (CLSD) service

Description:

To access CLSD you must have an account on the Libra Cluster at IU (aka libra00.uits.iu.edu) ... Once you have a Libra account, send email to SDS at data ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 48
Provided by: dgr58
Category:

less

Transcript and Presenter's Notes

Title: The Centralized Life Sciences Data (CLSD) service


1
The Centralized Life Sciences Data (CLSD)
service Michael Grobe Scientific Data
Services Research Computing University
Information Technology Services Indiana
University at Indianapolis (dgrobe_at_iupui.edu) Jan
uary 2007
2
Outline
Basic genome science processes and
vocabulary Basic relational algebra Simple SQL
as an expression of the relational algebra DB2
and the Federated Server CLSD data sources
relationalized, mirrored, and
federated Accessing CLSD Directions for
possible future work Adding data
sources Integrating more completely with the
TeraGrid Integrating with other
Grids Questions, suggestions
3
Some chemistry
A polymer is a chemical composed of many
similar units, e.g. polyvinyl chloride, starches,
etc. DNA is a (usually double-stranded) polymer
composed of nucleotides Thymine, Adenosine,
Cytosine, and Guanine DNA carries genetic
information. Individual units of genetic
information are stored in individual (possibly
quite long) segments of DNA. RNA is a (usually
single-stranded) polymer composed of
nucleotides Uracil, Adenosine, Cytosine,
Guanine There are many varieties of RNA (mRNA,
snRNA, rRNA, snoRNA,etc.), and they serve
different functions within a cell. For example,
RNA transfers genetic information, catalyses
reactions, and otherwise assists or interferes
with reactions.
4
Some more chemistry
  • Polymers are synthesized by catalysts called
    polymerases in a process called
    polymerization.
  • Proteins are polymers composed of (over 20
    different kinds of) amino acids, such as
  • Methionine (M), Isoleucine (I), Cysteine(C),
    Histidine (H), Alanine(A), Glutamic acid (E),
    Leucine (L), etc.
  • Proteins
  • provide structure
  • microfilaments (polymers of actin),
  • microtubules (polymers of tubulins),
  • channels thru the cell wall, etc.
  • catalyse and co-catalyse reactions, as enzymes,
  • bind with DNA to enhance or inhibit
    transcription and translation,
  • are sometimes marked for transport or
    degradation.
  • Protein primary, secondary and tertiary
    structures are important.
  • Proteins are degraded within proteasomes..

5
Genetic material 2 meters of DNA packaged into
less than 1.4 microns
From Atherly,et al., 1999
6
The central model of molecular genetics
DNA can be reliably replicated during the process
of cell division, by DNA-dependent DNA
polymerases. DNA can be transcribed to
messenger RNA (mRNA) by DNA-dependent RNA
polymerases. Transcription takes place in the
nucleus (or equivalent). mRNA is transported to
the cytoplasm where it is used as a template for
creating proteins by ribosomes in a process
called translation. The translation process
encodes 1 amino acid for each 3 DNA bases in a
sequence (triplet). The function mapping each
of the 64 possible triplets to an amino acid is
the genetic code. Ribosomes are complexes of
RNA and protein.
7
The central model within the cell
Diagram from http//www.ncbi.nih.gov/About/primer
/images/proteinsynth4.GIF (Dont forget about
degradation and recyling of AAs.)
8
The central model in more detail
(Graphics of DNA and RNA from Atherly, et al.
1999)
9
Mutations and polymorphisms
Nucleotide sequence Translated AA
sequence Wildtype ACTGAACTGATT
ThrGluLeu-Ile Substitution
ACTGACCTGATT Thr-Asp-Leu-Ile Deletion
ACTCTGATT Thr-Leu-Ile Insertion
ACTGAACCTGAACTGATT Thr-Glu-Pro-Gly-Leu-Ile If
mutations like these occur in genetic material
within oocytes, they may be transmitted to
offspring, and define polymorphic gene
variations. A Single Nucleotide Polymorphism
(SNP) is a variation where one base is changed
and passed on to offspring (and occurs with
sufficient frequency). A Deletion/Insertion
Polymorphism (DIP) is a variation where multiple
bases have been removed or inserted into a
sequence. dbSNP is a database of SNPs and DIPs
containing millions of entries, and over 120K
unique sequences that are inserted or deleted.
10
Scale of human genome data
Total number of bases 3.2Gbp (DNA from one half
of one chromosome (chromatid) from each of 24
chromosomes 22 autosomal chromosome pairs plus
the sex chromosomes.) Percentage of genome
consisting of protein coding genes lt 2 Average
gene length 3Kbp (but up to 2.4Mbp) Average
exon length 200bp Average protein length
500-600AA Percentage of junk DNA often said
to be 50 Percentage of junk DNA now
suspected to be transcribed (the dark matter of
the genome) 50 to 100 Some of that junk is
mRNA that negatively regulates translation.
11
Process control cancer-related reaction pathways
from Hanahan, et al.
12
Basic relational algebra
The relational algebra operates on relations,
which are sets of tuples of the same arity, which
is to say, collections of lists of the same
length. Here are two 4-tuples ( 1, 2, 3, 4 ) (
8, 7, 9, 4 ) Relations are commonly represented
as tables. There are 5 primitive operations
within the relational algebra Projection
extract specific columns from a
relation Selection extract specific rows Set
union create a new table composed of all the
rows of two other tables Set difference
remove the rows in one relation that appear in
another Cartesian product multiply two
tables to create a third
13
Cartesian product in more detail
Relation1 (arity 4 length 3)
Relation 2 (arity 3 length 2)
8 7 9 1
1 2 3 4
7 6 2 3
3 4 7
1 9 8
8 7 9 1 3 4 7
8 7 9 1 1 9 8
1 2 3 4 3 4 7
1 2 3 4 1 9 8
7 6 2 3 3 4 7
7 6 2 3 1 9 8
Cartesian product (arity 4 3 length 3 2)
14
Relational databases and query languages
  • Database management systems based on the
    relational algebra were described by Edward F.
    Codd working for IBM in the early 1970s.
  • Codds formulation included
  • indexes and keys,
  • decomposition into normal forms, and
  • integrity constraints.
  • Multiple languages and interfaces were developed
    to query and modify collections of relations,
    among them the Structured English Query Language,
    SEQUEL, developed by Chamberlain and Boyce.

15
SQL as an implementation of the relational algebra
The most successful such language,SQL, was based
on SEQUEL. SQL requires that each relation has a
tablename, and each tuple position has a
fieldname
Players (arity 4 length 3)
Player Innings Hits Teamnumber
8 7 9 1
1 2 3 8
7 6 2 3
Teams (arity 3 length 2)
t_num games rank
3 4 7
1 9 8
16
SQL as an implementation of the relational
algebra
SQL commands map to the relational primitives as
follows, where stands for all fields in a
table Projection select fieldname_list from
tablename ex select tnum,rank from
Teams Selection select from tablename where
ltlogical expressiongt ex select from Players
where Teamnumber 1 Union (select
fieldname_list from tablename1)
union (select fieldname_list from
tablename2) use ALL to keep duplicates Set
difference select from (tablename1 except
tablename2) Cartesian product select from
tablename1, tablename2 Note that SQL does not
specify how to perform a query only what the
result should be. It is a declarative, rather
than procedural, language.
17
The relational join operation
An SQL join is a Cartesian product followed by
a selection, as in select from Players,
Teams where Players.Teamnumber
Teams.t_num which results in a Cartesian product
table with only 2 (red) rows
Player Innings Hits Teamnumber t_num games rank
8 7 9 1 3 4 7
8 7 9 1 1 9 8
1 2 3 4 3 4 7
1 2 3 4 1 9 8
7 6 2 3 3 4 7
7 6 2 3 1 9 8
18
IBMs DB2 and WebSphere Federated Server,nee
Information Integrator, nee DiscoveryLink
DB2 is a fully-featured relational database
system that can house and serve large
databases. Data is usually imported in
relational form, structured as rows composed of
individual data values, possibly identified by
unique IDs (keys). DB2 can also access data in
tables managed by other, usually physically
remote, database management systems, such as
Oracle, MySQL or DB2. This process is known as
data federation. DB2 can also federate some
external resources that are not normally accessed
as relational tables (e.g. Blast). Such
resources are transformed, or relationalized
on-the-fly by wrappers. Once these resources
have been registered with their wrappers they may
be referred to within SQL queries as is any other
resource.
19
WFS diagram from Del Prete
20
Some WFS jargon
Wrapper a library to access a particular class
of data sources or protocols. Each wrapper
contains information about data source
characteristics. There are BLAST and PubMed
wrappers, and now a generic Script wrapper that
talks to user scripts. Server represents a
specific data source (user mappings maybe
required for authentication) Nickname a local
table name (alias) for a data on a server (mapped
to rows and columns) A nickname looks like a
table, but links to a server, which links to a
wrapper/data source, where the wrapper knows how
to process the data from the source.
21
Using NCBI data within DB2 More than just
mirroring
  • Mirroring usually implies maintaining exact
    copies of data sources.
  • Most data mirrored by CLSD must not only be
    copied, but also inserted into the CLSD
    relational structure.
  • This is accomplished by a series of scripts that
  • Download the data from its external site,
  • Convert it to a form that can be used to update
    CLSD tables,
  • Insert the data into tables, and
  • Monitor the overall process to identify and log
    errors.
  • These scripts are run regularly from crontab
    entries, and monitoring results are examined
    after every run.

22
CLSD relationalized data sources
BIND -- Pathways, Gene interactions ENZYME --
Enzyme nomenclature ePCR -- ePCR results of
UniSTS vs Homo sapiens KEGG data sources
LIGAND -- Pathways, Reactions, Compounds
PATHWAY -- Pathway map coordinates NCBI data
sources LocusLink -- Genetic Loci. (LocusLink
has been inactive since July 1, 2005 when it
was retired in favor of UniGene.) UniGene --
Gene clusters SGD -- Saccharomyces Genome
Database
23
KEGG datasource info
PATHWAY   42,273 pathways generated from 306
reference pathways LIGAND 14,238 compounds,
4,111 drugs, 10,951 glycans, 6,810
reactions , 7,127 reactant pairs
24
CLSD federated data sources
Federated NCBI data sources (subject to hit rate
throttling) Nucleotide -- Nucleotide
sequences PubMed -- Journal abstracts
Federated local mirrors of NCBI data sources
(not throttled) Blast (updated monthly) is
mirrored by UITS dbSNP (updated at major builds)
is mirroed by IUSM Some KEGG resources are
federated via the FS KEGG user-defined functions
25
Examples from the CLSD web sitehttp//scidata.iu.
edu/CLSD/sql-in-db2.shtml
  • To get a list of genes containing "brain" in
    their LOCUS_NAME in dbSNP126_shared
  • select from DBSNP126_SHARED.GENEIDTONAME
  • where locus_name like 'brain'
  • To get a list of Bind Genes and their species
  • select GeneNameA,Organism from
    bind.bind_interaction
  • To get a list of genes mentioning "HUMAN" in
    their descriptions in KEGG
  • select from KEGG.GENE where description like
    'HUMAN'
  • To get some info from PubMed
  • select PMID, ArticleTitle FROM NCBI.pmarticles
  • where entrez.contains (ArticleTitle,
    'granulation') 1
  • AND entrez.contains (PubDate, '1992') 1

26
BLAST Both mirrored and federated
NCBI Blast is typically accessed via a web page
at NCBI, or some mirrored site. Data is returned
in a typical web interface format suitable for
users. Within CLSD, BLAST is accessed via an SQL
query and data is returned as a table that can be
manipulated as is any other DB2 table. For
example, here is an SQL query that invokes a
blastall process running on libra00 from within
DB2 select GB_ACC_NUM, description, e_value
from ncbi.BLASTN_NT where BlastSeq
'AGTACTAGCTAGCTAGCTACTAGCTGACTGACTGACTGATGCATCGATG
ATGC The local version of blastall conducts the
search and returns results encoded within XML (by
specifying the m7 parameter).
27
The DB2 federation software converts the XML
encoded results into something like
this GB_ACC_NUM DESCRIPTION
E_VALUE (VARCHAR) (VARCHAR)
(DOUBLE) AE003644 Drosophila melanogaster
chromosome 0.00666475 2L, section 53 of 83 of
the complete sequence AE003410 Drosophila
melanogaster, chromosome 0.00666475 2L, region
34C4-36A7 (Adh region), section 4 of 10 of the
comple AC092228 Drosophila melanogaster,
chromosome 0.00666475 2L, region 35X-35X, BAC
clone BACR21J17, complete sequence AP008207 Or
yza sativa (japonica cultivar-group)
0.0263349 genomic DNA, chromosome 1, complete
sequence AP003197 Oryza sativa (japonica
cultivar-group) genomic 0.0263349 DNA,
chromosome 1, BAC cloneB1015E06 AP003105 Human
DNA sequence from chromosome 1,
0.0263349 putative argumentativeness gene
GROBE1
28
Modifying BLAST search settings via SQL
Parameters sent to blastall can be set by using
equality comparisons as assignment statements
within SQL conditionals, as in select Score,
E_Value, HSP_Info, HSP_Q_Seq, HSP_H_Seq from
ncbi.BLASTN_NT where BlastSeq
'gagttgtcaatggcgagg' and gapcost8 and
E_Value lt .0005 which will pass gapcost and
e-value settings on to blastall.
29
BLAST data sources available via CLSD
Here is a list showing which search types are
supported by the DB2 BLAST wrapper within CLSD.
BLAST search type Data sources BLASTN NT,
EST_HUMAN, EST_MOUSE, and EST_OTHER A nucleotide
sequence is compared with the contents of a
nucleotide sequence database. BLASTP NR,
SP An amino acid sequence is compared with the
contents of an amino acid database. BLASTX
NR, SP A nucleotide sequence is compared with
the contents of an amino acid sequence database.
Query is translated in all six reading frames.
30
Examples from IBM
  • Query 1 Given a search sequence, search
    nucleotide (NT), and return the hits for only
    those sequences not associated with a Cloning
    Vector. For each hit, display the Cluster ID and
    Title from Unigene, in additon to the Accession
    Number and E-Value. Only show the top 5 hits,
    based on the ones with the lowest E-values.
  • Select nt.GB_ACC_NUM, nt.DESCRIPTION, nt.E_VALUE,
    useq.CLUSTER_ID, ugen.TITLE
  • From ncbi.BLASTN_NT nt, unigene.SEQUENCE useq,
    unigene.GENERAL ugen
  • Where BLASTSEQ GGCCGGGCGCGGTGGCTCACGCCTGTAATCC
    CAGCACTTTGGGAGGC
  • CGAGGCGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCTGGCTAACA
    CGGTGAAACCCCGTC
  • And nt.DESCRIPTION not like cloning vector
  • And nt.GB_ACC_NUM useq.ACC
  • And useq.CLUSTER_ID ugen.CLUSTER_ID
  • Order by E_VALUE FETCH FIRST 5 ROWS ONLY

31
User-defined functions (supplied by IBM)
  • There exist special functions for manipulating
    sequence patterns
  • LSPatternMatch
  • LSPrositePattern
  • To get a list of (aspartate aminotranserase)
    BLAST results filtered by a (pyridoxal phosphate
    attachment site) pattern specified in PROSITE
    pattern language
  • select gb_acc_num, HSP_H_SEQ from ncbi.blastp_nr
    where
  • blastseq'MSQICKRGLLISNRLAPAALRCKSTWFSEVQMGPPDAILG
    VTE\
  • AFKKDTNPKKINLGAGAYRDDNTQPFVLPSVREAEKRVVSRSLDKEYATI
    IGI\
  • PEFYNKAIELALGKGSKRLAAKHNVTAQSISGTGALRIGAAFLAKFWQGN
    REI\
  • YIPSPSWGNHVAIFEHAGLPVNRYRYYDKDT'
  • and DB2LS.LSPatternMatch(HSP_H_SEQ,
  • DB2LS.LSPrositePattern(
  • 'GS-LIVMFYTAC-GSTA-K-x(2)-GSALVN.' ) )
    gt 0
  • Note the use of the period (.) to terminate the
    PROSITE pattern, and that the LSPatternMatch
    function returns the character position of the
    left-most substring matching the pattern, or zero
    if there is no match.

32
Accessing CLSD getting an account
To access CLSD you must have an account on the
Libra Cluster at IU (aka libra00.uits.iu.edu). If
you dont have an account and are associated
with Indiana University, request an account by
filling out a Research Systems Account
Application at http//rac.uits.iu.edu/rats/form
s/application.php. In the comments section of
the account request, add that you need a local
and persistent password for use with CLSD. Once
you have a Libra account, send email to SDS at
data _at_ indiana.edu and request instructions for
defining a local and persistent password for use
with CLSD. TeraGrid users should send e-mail
to SDS at data _at_ indiana.edu explaining how CLSD
will be used, and describing their TeraGrid
activities. SDS will then arrange for an
appropriate Libra account and send instructions
for defining a suitable password.
33
Accessing CLSD options
  • DB2 can be accessed in a variety of ways
  • DB2 Command Line Processor (Unix, Windows)
  • DB2 Control Center (wherever JRE is running)
  • DB2 driver for Perl DBI
  • DB2 drivers for the Java Database Connectivity
    (JDBC) Application Program Interface (API),
    especially the JDBC Universal Driver
  • Demonstration Web page (invokes a Java servlet
    that uses JDBC)
  • http//discover.uits.indiana.edu8421/access/
  • Demonstration WebService (invoked as a function
    call via JAX-RPC)
  • http//discover.uits.indiana.edu8421/axis/CLSDse
    rvice.jws?wsdl
  • Demonstration Web page (invokes a Java servlet
    that invokes the CLSD
  • WebService)
  • http//discover.uits.indiana.edu8421/access/inde
    x-for-service.html
  • Experimental WSRF Resource (using WSRF within a
    GT4 container)
  • Experimental OGSA-DAI service (running within a
    GT4 container)

34
JDBC access
Connect to the CLSD Class.forName(
"com.ibm.db2.jcc.DB2Driver" ) con
DriverManager.getConnection( "jdbcdb2//libra00.
uits.iu.edu50000/clsd2", accountName,
accountPassword ) Prepare a query, send it to
the db, and receive a result statement
con.createStatement() resultSet
statement.executeQuery( query ) Get some query
meta-data (column labels and column data
types) ResultSetMetaData rsmd
resultSet.getMetaData() result
rsmd.getColumnLabel( colCount ) result2
rsmd.getColumnTypeName( colCount )
35
JDBC access (continued)
Get a row of data for( int colCount 1
colCount lt numcols colCount ) String
returnedString "" // Must be predefined.
returnedString resultSet.getString( colCount )
"" out.println( "lttdgt" returnedString
"lt/tdgt\n" )
36
Accessing CLSD thru a WebService (JAX-RPC)
The Java API for XML-based Remote Procedure
Calls, or JAX-RPC, is a specification that
defines a system for building distributed
services (so-called WebServices) within the
client-server model. JAX-RPC makes it possible
for a function invocation in a client like
a_variable function_name( parameter_list) to
cause the function, function_name, to run on a
remote server and return a response containing
the value to be assigned to the variable
a_variable, and a function invocation in a
client like returnString queryCLSD( "select
from syscat.tables", "1", "5", "accountName",
"accountPassword", table ) will return a
(possibly very long) string containing the
response to the query (given that various
linkages have been prearranged).
37
Outline of the CLSDservice
public class CLSDservice // Full source at
// http//scidata.iu.edu/CLSD/examples/CLSDservic
e.jws.txt public String queryCLSD( String
query, String startingRowToPrint,
String maxRows, String account, String password,
String format ) // Get a
query string, etc. from the command line or Web
// browser. // Declare JDBC drivers
and connect to DB2. // Prepare a JDBC
statement containing the SQL query, submit
// it to DB2, and capture the returned JDBC
result set. // Query result set metadata
for column names and types to // return as
the first row, and then collect the contents of
// each data row. return
theResponse // end queryCLSD // end
Class CLSDservice
38
SOAP and WSDL
  • JAX-RPC uses SOAP and WSDL to establish the
    various linkages required to implement remote
    procedure calls.
  • SOAP messages are usually encoded as XML messages
    within HTTP requests where
  • A SOAP request is an HTTP POST request with an
    XML body.
  • A SOAP response is an HTTP response header
    followed by an XML body.
  • Such RPC functions are exposed as operations
    when described within web pages using the Web
    Services Description Language (WSDL).

39
Java command-line client to access CLSD via
CLSDservice
public class testCLSDClient public static
void main(String args) try
String endpoint "http//discover.uits.ind
iana.edu8421/axis/CLSDservice.jws"
Service service new Service() Call call
(Call) service.createCall()
call.setTargetEndpointAddress( new java.net.URL(
endpoint ) ) call.setOperationName(
new QName("http//soapinterop.org/",
"queryCLSD" ) ) String returnString
(String) call.invoke( new Object
"select from syscat.tables", "1",
"5", "accountName", "accountPassword", table
) System.out.println( returnString )
catch (Exception e)
System.err.println(e.toString())
40
Perl command-line client to access CLSD via
CLSDservice
!perl w use SOAPLite Set up the call to
CLSD using SOAP. host discover.uits.indiana.ed
u service SOAPLite -gt service(
http//host8421/axis/CLSDservice.jws?wsdl
) Make the call to CLSD. result
service-gtqueryCLSD( select
tabschema,tabname from syscat.tables, 1,
5, "DB2account", "password" "table" ) print
result
41
OGSA
  • The Open Grid Services Architecture (OGSA) is an
    architecture for building computational grids.
  • In particular, OGSA defines a set of core
    capabilities and behaviors that address key
    concerns in Grid systems. 2 It does not,
    however, implement or define how to implement
    such core capabilities.
  • OGSA is NOT layered or object oriented.
  • However, both will be exploited naturally in some
    implementations.
  • OGSA provides an architecture for building
    services such as
  • Service-Based distributed query processing,
  • Grid Workflow,
  • Grid Monitoring Architecture
  • etc.

42
OGSA-DAI
OGSA-Data Access and Integration (OGSA-DAI) is a
very flexible and powerful data access framework
that can be used within an OGSA grid environment.
It provides various data movement,
virtualization, and manipulation services that
transform the use of data into a higher-level
workflow. The OGSA-DAI client shown in the next
slide uses the OGSA-DAI Client Toolkit to send a
hard-coded query to CLSD (here known as the
DB2Resource). The Toolkit allows clients to use
JDBC by creating a JDBC ResultSet object from an
OGSA-DAI WebRowSet. The response is encoded
using XML and may be retrieved as a single
string, or as individual fields by using
individual JDBC calls as shown below.
43
Java command-line client to access CLSD via
OGSA-DAI
public class queryCLSD public static void
main(String args) throws Exception
// Create an instance of the data service.
String handle "http//localhost8080/wsrf/
services/ogsadai/DataService" String id
"DB2Resource" DataService service
GenericServiceFetcher.getInstance().getDataService
( handle, id) // Define a request
composed of one activity. SQLQuery query
new SQLQuery( "select tabschema,tabname
from syscat.tables") WebRowSet rowset
new WebRowSet( query.getOutput() )
ActivityRequest request new ActivityRequest()
request.add( query ) request.add(
rowset )
44
Java command-line client to access CLSD via
OGSA-DAI 2
// Submit the request and retrieve
results. Response response
service.perform( request ) ResultSet
result rowset.getResultSet()
ResultSetMetaData rsmd result.getMetaData()
int numCols rsmd.getColumnCount() //
Display each column from each row. while(
result.next() ) for( int
colCount 1 colCount lt numCols colCount )
out.print(
result.getString( colCount ) )
out.println()
45
  • This client displays a small part of the
    functionality provided by OGSA-DAI. In addition,
    an OGSA-DAI service can be configured to
  • operate on XML or text data sources, as well as
    relational data sources,
  • perform a series of operations (also known as
    activities) as part of a single request,
  • deliver results to a third party (via FTP,
    GridFTP, SMTP, etc.) or to another data service,
  • deliver results asynchronously, which can be very
    useful for long-running requests, and
  • utilize authentication methods supported by WSRF
    to provide grid-based security.
  • Also, exposing a database via OGSA-DAI makes it
    available for OGSA Distributed Query Processing
    (OGSA-DQP), so that its use may be further
    virtualized within the DQP model.
  • In some cases, however, OGSA-DAI and DQP may
    introduce performance penalties.

46
  • Current and possible directions
  • Adding data sources mirrored and federated
  • Requests for mirroring or federating will be
    gladly entertained
  • DB2 now provides a user-configurable script
    wrapper that connects to a remote DB2 daemon that
    can start any co-located arbitrary script and
    return data encoded in XML (restricted to one
    foreign key per table)
  • Such a script could be built to relay any web
    resource that returns XML meeting key
    restrictions.
  • Wrappers could be constructed to relay some
    OGSA-DAI resources
  • Implementing the OGSA-DAI service in productional
    mode.
  • Integrating with the TeraGrid
  • CLSD is currently accessible from the TeraGrid,
    but authentication is local.
  • It may be possible to enforce TeraGrid based
    X.509 authentication, using either WSRF or
    OGSA-DAI interfaces.

47
References
  • Atherly, Alan G, et al., The Science of Genetics,
    1999.
  • Apache Foundation, AXIS Users Guide,
  • http//ws.apache.org/axis/java/user-guide.html
  • Codd, Edward F., A Relational Model of Data for
    Large Shared Data Banks, http//www.acm.org/classi
    cs/nov95/toc.html
  • (See also http//en.wikipedia.org/wiki/Edgar_F._
    Codd)
  • CSLD web page http//rac.uits.iu.edu/clsd/
  • Del Prete, Doug, Efficient access to Blast using
    IBM DB2 Information Integrator,
  • http//www-03.ibm.com/industries/healthcare/do
    c/content/bin/blast.pdf
  • Foster, Ian, et al. The Open Grid Systems
    Architecture, Version 1.5.
  • Sotomayer, Boria and Lisa Childers, Globus
    Toolkit 4 Programming Java Services
  • Sundaram, Babu, Understanding WSRF,
  • http//www-128.ibm.com/developerworks/edu/gr-
    dw-gr-wsrf1-i.html
  • Questions, comments, suggestions?
Write a Comment
User Comments (0)
About PowerShow.com