The graph-based data model: - PowerPoint PPT Presentation

Loading...

PPT – The graph-based data model: PowerPoint presentation | free to download - id: 7365ea-OTU3O



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

The graph-based data model:

Description:

The graph-based data model: Storing and manipulating data in distributed graphs (Using RDF and Jena to put the SparQL in your smile, and the Twinkle in your eye – PowerPoint PPT presentation

Number of Views:3
Avg rating:3.0/5.0
Slides: 50
Provided by: dgrobe
Learn more at: http://people.cc.ku.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: The graph-based data model:


1
  • The graph-based data model
  • Storing and manipulating data in
  • distributed graphs
  • (Using RDF and Jena to put the
  • SparQL in your smile, and the
  • Twinkle in your eye
  • and D2R too)
  • Michael Grobe
  • Biomedical Applications Group
  • Research Technologies
  • University Information Technology Services
  • Indiana University

2
  • Table of Contents
  • Using graphs to represent data
  • Using the RDF to represent graphs
  • Using D2R-server to expose relational data as RDF
  • Jena a Java class library for manipulating RDF
  • Using SparQL graph templates to query RDF
  • Using Twinkle to make SparQL queries
  • Using iSPARQL graphical graph templates to query
    RDF
  • Thinking of SparQL queries as SQL
  • Table of Non-contents
  • OWL and inference over ontologies
  • Using the Semantic Web in bioinformatics research

3
  • Using graphs used to represent data
  • Here are 2 graphs that represent 2 kinds of
    information associated with 4 different persons.
  • Graph 1 Person ages Graph 2
    Favorite Friends

4
  • Using graphs to represent data
  • Here the 2 graphs are combined using named edges
    to represent 2 kinds of information associated
    with the same 4 persons.
  • Graph 3 Person ages (age) and favorite friends
    (fav)
  • Read these links as Smith has age 21 or Jones
    has favorite friend Smith to make them more
    sentence-like. Each arc is like the
    predicate of a sentence, connecting a subject
    with an object. (Note that a subject may have
    gt0 arcs of each type.)

5
  • Using graphs to represent data
  • Data is sometimes represented using so-called
    blank nodes to help cluster of attributes.
  • Graph 4 Blank nodes linking a name, an age,
    and a favorite friend via arcs named name,
    age, and fav, as follows
  • Blank nodes are useful for specifying lists of
    items, but are discouraged within the Semantic
    Web. Use (dereferenceable) URIs (like
    http//www.iu.edu/) whenever possible.

6
  • Using RDF-XML to serialize graphs
  • Graphs can be serialized or represented in a
    textual format. When graphs are serialized using
    RDF, each connection is represented by 3
    components, a so-called RDF triple. Each
    triple is composed of a subject, predicate
    and object where each edge between each pair
    of entities becomes a named predicate.
  • Each subject is represented as
  • - a blank node, such as _2,
  • - a literal value, such as valuetype where
    type is some URI,
  • that defines a data type, as in 21age, or
  • - a URI, like http//fake.host.edu/smith
  • Each object is represented as
  • - a blank node
  • - a literal value, or
  • - a URI
  • Each predicate is represented as
  • - a URI, like http//fake.host.edu/contact-schema
    fav, or an
  • abbreviated URI like exampleage which
    represents a URI that will

7
  • Graph 3 as a set of 12 triples (3 for each
    person)
  • -------------------------------------
  • Subject Predicate Object
  • Blake examplefav Blake
  • Blake exampleage "12"
  • Blake examplename "Blake"
  • Jones examplefav Smith
  • Jones exampleage "35"
  • Jones examplename "Jones"
  • George examplefav Smith
  • George exampleage "21"
  • George examplename "George"
  • Smith examplefav Jones
  • Smith exampleage "21"

8
  • Two ways to represent the Graph 3 triples using
    RDF-XML
  • Properties encoded as XML entities
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      xmlnsexample"http//f
    ake.host.edu/example-schema"gt      ltexamplePer
    songt          
  • ltexamplenamegtSmithlt/examplenamegt
    ltexampleagegt21lt/exampleagegt
  • ltexamplefavgtJoneslt/examplegt     lt/examplePer
    songt           lt/rdfRDFgt
  • Properties encoded as XML attributes
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      xmlnsexample"http//f
    ake.host.edu/example-schema"gt      ltrdfDescrip
    tion  examplenameSmith           
    exampleage21
  • examplefavJones
         lt/rdfDescriptiongt           lt/rdfRDFgt

9
  • Representing URIs
  • In work with RDF you will see URIs abbreviated in
    several ways, using namespaces, PREFIXes and
    ENTITIES depending on the context
  • xmlnslibhttp//some.host.edu/directory
  • or
  • PREFIX ltlibhttp//some.host.edu/directorygt
  • or
  • !ENTITY lib http//some.host.edu/directory
  • If the namespace abbreviations in the entities
    example above get expanded, then Smith is
    actually being represented as
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      lthttp//fake.host.edu/
    example-schemaPersongt
  •           
  • lthttp//fake.host.edu/example-schemanamegt
  • Smith
  • lt/http//fake.host.edu/example-schemanamegt
  • lthttp//fake.host.edu/example-schemaagegt
  • 21

10
  • A simple RDF example
  • Here is an RDF document (taken from Miloslav)
    that describes 3 books. Note that each entity is
    a book title, and there are 2 links from each
    title, one to the author (aka creator) and one
    to a page count (aka pages).
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      xmlnslib"http//www.z
    von.org/library"gt      ltrdfDescription about"H
    eart of Darkness"gt           ltlibcreatorgtJoseph 
    Conradlt/libcreatorgt           ltlibpagesgt110lt/li
    bpagesgt      lt/rdfDescriptiongt          
         ltrdfDescription about"Lord Jim"gt
              ltlibcreatorgtJoseph Conradlt/libcreator
    gt           ltlibpagesgt314lt/libpagesgt
         lt/rdfDescriptiongt          
         ltrdfDescription about"The Secret Agent"gt
              ltlibcreatorgtJoseph Conradlt/libcreator
    gt           ltlibpagesgt249lt/libpagesgt
         lt/rdfDescriptiongt           lt/rdfRDFgt
  • Clearly, the triples have been represented here
    in a compressed form namespaces define the major
    part of the URIs, and each XML element defines
    two predicate-object connections.

11
  • An alternative representation
  • (among several others)
  • The same information can be encoded in several
    other ways. Here is an example where everything
    is presented as attributes with the
    rdfDescription tag.
  • ltrdfRDF     xmlnsrdf"http//www.w3.org/1999/02
    /22-rdf-syntax-ns"     xmlnslib"http//www.zvo
    n.org/library"gt      ltrdfDescription about"Hea
    rt of Darkness"  libcreator  "Joseph Conrad"  
  • libpages  "110"/gt      ltrdfDescription about
    "Lord Jim" libcreator  "Joseph Conrad" 
  • libpages  "314"/gt      ltrdfDescription about
    "The Secret Agent" libcreator  "Joseph Conrad" 
  • libpages  "249"/gt lt/rdfRDFgt

12
  • Graph 3 using resources to represent each
    person
  • Persons are modeled as resources by replacing
    the strings for each node identifier with URIs
  • ----------------------------------------------
    ---------------------------------
  • Subject Predicate
    Object

  • lthttp//fake.host.edu/blakegt examplefav
    lthttp//fake.host.edu/blakegt
  • lthttp//fake.host.edu/blakegt exampleage
    "12"
  • lthttp//fake.host.edu/blakegt
    examplename "Blake"

  • lthttp//fake.host.edu/jonesgt examplefav
    lthttp//fake.host.edu/smithgt
  • lthttp//fake.host.edu/jonesgt exampleage
    "35"
  • lthttp//fake.host.edu/jonesgt
    examplename "Jones"

  • lthttp//fake.host.edu/georgegt examplefav
    lthttp//fake.host.edu/smithgt
  • lthttp//fake.host.edu/georgegt exampleage
    "21"
  • lthttp//fake.host.edu/georgegt
    examplename "George"


13
  • Representing entries in Graph 3 as resources
  • Format 1
  • ltrdfRDF    xmlnsrdf"http//www.w3.org/1999/02/
    22-rdf-syntax-ns"    xmlnsexample"http//fake.
    host.edu/example-schema"gt    ltexamplePerson
    rdfabouthttp//fake.host.edu/smithgt
  •        ltexamplenamegtSmithlt/examplenamegt
           ltexampleagegt21lt/exampleagegt
  • ltexamplefav rdfresourcehttp//fake.host
    .edu/jones /gt   lt/examplePersongt          
    lt/rdfRDFgt
  • - - - - - - - - - - - - - - - - - - - - -
    - - - - - - - - - - - - -
  • Format 2
  • ltrdfRDF    xmlnsrdf"http//www.w3.org/1999/02/
    22-rdf-syntax-ns"    xmlnsexample"http//fake.
    host.edu/example-schema"gt    ltrdfDescription
     abouthttp//fake.host.edu/smith
  • examplenameSmith
    exampleage21 /gt
  • ltexamplefav rdfresourcehttp//fake.host.
    edu/jones /gt   lt/rdfDescriptiongt          
    lt/rdfRDFgt
  • Note that the resource URI references in this
    example are not real documents they are not
    dereferenceable.

14
  • A person record using FOAF (from Obitko)
  • ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-
    rdf-syntax-ns"
  • xmlnsfoaf"http//xmlns.com/foaf/
    0.1/"
  • xmlns"http//www.example.org/joe
    /contact.rdf"gt
  • ltfoafPerson rdfabout "http//www.example.org
    /joe/contact.rdfjoesmith"gt
  • ltfoafmbox rdfresource"mailtojoe.smith
    _at_example.org"/gt
  • ltfoafhomepage rdfresource"http//www.e
    xample.org/joe/"/gt
  • ltfoaffamily_namegtSmithlt/foaffamily_name
    gt
  • ltfoafgivennamegtJoelt/foafgivennamegt
  • lt/foafPersongt
  • lt/rdfRDFgt

15
  • RDF summary and implications for the Semantic Web
  • A graph may be represented as a collection of
    triples.
  • Graphs are stored in triple stores or quad
    stores (when many are stored together).
  • RDF-XML representations of graphs will contain
    URIs that
  • - serve to identify and/or reference syntactic
    elements (they define
  • tag names), and
  • - identify and/or name resources subjects,
    predicates and/or objects.
  • Such URIs may be imaginary or provide addresses
    of actual, dereferenceable, web documents, in
    possibly remote locations.
  • This can result in a Gigantic Global Graph, aka
    the Semantic Web with RDF as one of W3Cs
    Semantic Web architectural levels.
  • If HTML and the Web make all online documents
    look like one huge book, RDF, schema, and
    inference languages will make all the data in the
    world look like on huge database. TimBL
  • Editors note Here TimBL is using the term
    schema to refer to an RDF schema that defines
    RDF triples much more loosely than a relational
    database schema defines a collection of tables in
    a database.

16
  • Publishing relational data as virtual RDF
    stores
  • One of the great strengths of the RDF model is
    that it allows data to be stored and queried
    without first requiring a schema. (Newman,
    2007)
  • Another strength is that it can be used to
    represent both tables and trees in a natural way,
    whereas representing graphs and trees via tables
    is a bit complicated. (Mazzocchi, 2004)
  • As a result, legacy relational databases can be
    published as RDF stores on the Semantic Web by
    using gateways like D2R and Virtuoso
    (commercial).
  • The D2RQ approach requires 2 steps
  • - interrogate the database via JDBC using
    generate-mapping to build a
  • configuration (mapping) file from the
    relational table definitions, and then
  • - start the D2R server with the mapping file.
  • Notes
  • - Each table row becomes a separate
    resource/graph.
  • - Primary keys (if any) become resource
    identifiers, and
  • - rows in linked tables identified by foreign
    keys may be
  • merged into the entity (?).

17
  • Publishing relational data as virtual RDF
    stores
  • The D2RQ package was used to publish 2 CLSD
    schemas (in the DB2 sense of the term) as
    virtual RDF graphs DiseaseGeneNet and GO (minus
    their closures).
  • These are available for browsing (for a limited
    time only) from a web page
  • at
  • kongo.uits.iupui.edu6700
  • and for querying via a web form at
  • kongo.uits.iupui.edu6700/sparql.
  • The next 3 slides show
  • - the starting browser page,
  • - the page for the Disease Gene Network Genes
    table, and
  • - a tabular representation of the graph for
    Gene 100, aka ADA.
  • The Genes page is a list of every resource
    defined within the Genes table, where a resource
    is one row in the table, identified by the
    primary key, Gene_ID.

18
  • D2R-server browsing interface

19
  • D2R-server browsing interface

20
  • D2R-server browsing interface

21
  • Portion of a D2R-server mapping file for CLSD
  • _at_prefix map ltfile/C/d2r-server-0.4/mapping-clsd
    2-GO-DGN.n3gt .
  • _at_prefix d2rq lthttp//www.wiwiss.fu-berlin.de/suhl
    /bizer/D2RQ/0.1gt .
  • mapdatabase a d2rqDatabase
  • d2rqjdbcDriver "com.ibm.db2.jcc.DB2Driver"
  • d2rqjdbcDSN "jdbcdb2//libra45.uits.iu.edu5000
    0/clsd2"
  • d2rqusername account"
  • d2rqpassword password"
  • .
  • Table DISEASE_GENE_NET.GENES
  • mapDISEASE_GENE_NET_GENES a d2rqClassMap
  • d2rqdataStorage mapdatabase
  • d2rquriPattern "DISEASE_GENE_NET.GENES/_at__at_DISEASE
    _GENE_NET.GENES.GENE_ID_at__at_"
  • d2rqclass vocabDISEASE_GENE_NET_GENES
  • .
  • mapDISEASE_GENE_NET_GENES__label a
    d2rqPropertyBridge
  • d2rqbelongsToClassMap mapDISEASE_GENE_NET_GENES

22
  • RDF graphs may be interrogated
  • - by physical inspection (for anyone willing to
    read XML)
  • - by writing programs that read RDF files,
    construct the
  • represented graphs internally, and then
  • - access graph triples in sequential order,
  • - select triples according to specified
    content, and/or
  • - apply SparQL queries and access results in
    sequential order
  • - using command-line tools that apply SparQL
    queries, and/or
  • - using GUI interfaces accepting SparQL
  • - commands written in text, or
  • - commands repesented graphically

23
  • How to query an RDF graph using Jena
  • The Java-based Jena package from HP Labs allows
    users to manipulate and query RDF graphs.
  • You can write a program that uses Jena classes to
  • - retrieve and parse an RDF file containing a
    graph or a
  • collection of graphs,
  • - store it in memory, and then
  • - examine each triple in turn, examine one
    component (say,
  • the subject) of each triple in turn, or
    examine only triples that
  • meet specified criteria.
  • For example, one might examine each stored triple
    searching for a specific reference URI, or for a
    specific literal value.
  • One might look for persons of a specific age,
    21xsdage, in the object portion of each
    triple.
  • Jena also provides support for inference.

24
  • Jena example
  • In JENA, RDF nodes can have type Resource, URI
    Resource, literal, or anonymous (slight
    extension to standard RDF).
  • A Jena model is created by a factory
  • Model m ModelFactory.createDefaultModel()
  • A Jena ontological model is a model along with a
    reasoner(sic)
  • OntModel m ModelFactory.createOntologyModel()
  • Jena can
  • - read in an RDF serialized graph (from a
    file, URL, etc.)
  • - write a serialized model to a file or STDOT,
    and
  • - perform standard operations on the model.
    For example, given the
  • populated models m and n, Jena can then do
  • Model x m.add( n ) // Union

25
  • Reading and writing a model in Jena
  • String input FIleName Some-GO-entries-diddled.r
    df
  • Model m ModelFactory.createDefaultModel()
  • InputStream in FileManager.get().open(
    inputFileName )
  • if( in null )
  • throw new IllegalArgumentException( File not
    found.\n )
  • model.read( in, ) // Treat blank lines as
    nulls.
  • model.write( System.out , N-tripleRDF/XML
    XML-ABBREV )
  • //which will yield a file of N-triple,
    RDF/XML, or XML-ABBREV records.

26
  • Cannonical process to examine each triple in a
    model
  • stmtIterator iterator model.listStatements()
  • while( iterator.hasNext() )
  • Statement statement iterator.nextStatement(
    )
  • Resource subject statement.getSubject()
  • Property predicate statement.getPredicate()
  • RDFNode object statement.getObject() //
    superclass of Resource and literal
  • // Now process the object here it is just
    printed.
  • System.out.print( subject.toString() )
  • System.out.print( predicate.toString()
    )
  • if( object instanceof Resource )
  • // its a resource.
  • System.out.print(
    object.toString() )
  • else
  • // its a literal that will be printed
    with surrounding quotes.

27
  • Statement iterators for accessing selected
    components
  • There are several methods for creating iterators
    over a model
  • - Some simply list the components of each
    triple
  • - model.listSubjects()
  • - model.listObjectss()
  • - Some compare a specific component with a
    specified value, as in
  • model.listSubjectsWithProperty( Prop p, RDFNode
    o)
  • (which will get you a
    collection of subjects possessing
  • property/predicate p and specific value o)
  • - Some compare all components against specific
    values in 2 steps
  • - define a selector possessing specific values
    s, p and o,
  • where null or (RDFNode) null matches
    anything
  • Selector selector new SimpleSelector(
    subject,

  • predicate, object )

28
  • SparQL a graph-based query language
  • Sparql is a language that lets users query RDF
    graphs . . . using graph patterns (written in
    N3?) containing variables.
  • The query engine will return an exhaustive list
    of triples that satisfy each query through value
    substitution. (aka query by example, QBE).
  • This process is not always intuitive, and/or SQL
    has perverted the minds of a generation of
    programmers (J. Random Guy somewhere on the
    Web).
  • SparQL is implemented in Jena through the ARQ
    package, and queries may be made from within Java
    scripts (McCarthy, 2005), or via a SparQL client
    distributed with Jena.
  • - build a query in a .rq file, and
  • - and execute the query using
  • sparql query filename.rq
  • or
  • sparql.bat query filename.rq
  • SparQL does not do inference (except when used
    within Jena against an inference model ?).
  • The D2R server provides a Web form that can be
    used to interrogate the CLSD stores using SparQL

29
  • A SparQL example
  • This SparQL example query simply asks for a list
    of the first 10 triples in the file specified in
    the FROM clause
  • PREFIX
  • rdf lthttp//www.w3.org/1999/02/22-rdf-syntax
    -nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select s o
  • from lthttp//kongo.uits.iupui.edu8546/rdf-example
    -1.rdfgt
  • where
  • s p o .
  • LIMIT 10
  • s, p, and o are variable names that will each
    be assigned a value as the query is satisified.
    Variable names may also start with ?.

30
  • SparQL a graph-based query language
  • The basic syntax of a SparQL query is based on
    N3(/turtle) and similar to
  • BASE ltsome URI from which relative FROM and
    PREFIX entries will be offsetgt
  • PREFIX prefix_abbreviation ltsome_URIgt
  • SELECT CONSTRUCT ASK DESCRIBE
    some_variable_list
  • FROM ltsome_RDF_sourc gt
  • WHERE
  • some_triple . .
  • GRAPH some_RDF_sourcegt
    another_triple . .
  • GRAPH some_variable
    yet_another_triple . .
  • Notes
  • - the lt and gt characters are required
    literals,
  • - the BASE and PREFIX entries are optional and
    BASE applies to relative
  • URIs appearing in either PREFIX or FROM clauses,

31
  • Querying Graph 3 format 1 using SparQL
  • Heres a reminder of one of the representations
    used to store of Graph 3 here stored in a file
    named rdf-example-1.rdf
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltrdfRDF
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
    ax-ns"
  • xmlnsexample"http//fake.host.edu/example-sche
    ma"
  • gt
  • ltexamplePerson rdfabout"http//fake.host.edu/
    smith"gt
  • ltexamplenamegtSmithlt/examplenamegt
  • ltexampleagegt21lt/exampleagegt
  • ltexamplefav rdfresource"http//fake.host.edu
    /jones" /gt
  • lt/examplePersongt
  • lt/rdfRDFgt

32
  • A SparQL query against the first data
    representation
  • C\Jena-2.5.7\Jena-2.5.7\batgt cat
    query-example-1.rq
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select
  • from lthttp//kongo.uits.iupui.edu8546/rdf-example
    -1.rdfgt
  • where
  • s p o .
  • C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
    query-example-1.rq
  • --------------------------------------------------
    ----------------------------
  • s p o

  • lthttp//fake.host.edu/smithgt examplefav
    lthttp//fake.host.edu/jonesgt
  • lthttp//fake.host.edu/smithgt exampleage
    "21"

33
  • Querying Graph 3 format 2 using Sparql
  • Heres a reminder of the other representation of
    Graph 3 stored in a file named
    rdf-example-2.rdf
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltrdfRDF
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
    ax-ns"
  • xmlnsexample"http//fake.host.edu/example-sche
    ma"
  • gt
  • ltexamplePerson rdfabout"http//fake.host.edu/
    smith"
  • examplenameSmith
  • exampleage21 /gt
  • ltexamplefav rdfresource"http//fake.host.ed
    u/jones" /gt
  • lt/examplePersongt
  • lt/rdfRDFgt

34
  • The same SparQL query against the second data
    representation
  • C\Jena-2.5.7\Jena-2.5.7\batgt cat
    query-example-2.rq
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select
  • from lthttp//kongo.uits.iupui.edu8546/rdf-example
    -2.rdfgt
  • where
  • s p o .
  • C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
    query-example-2.rq
  • --------------------------------------------------
    ----------------------------
  • s p
    o

  • lthttp//fake.host.edu/smithgt examplefav
    lthttp//fake.host.edu/jonesgt
  • lthttp//fake.host.edu/smithgt exampleage
    "21"

35
  • A distributed SparQL query against 4 separate
    RDF files
  • The next query searches 4 dereferenceable files
    holding live data in the first representation
    format above
  • C\Jena-2.5.7\Jena-2.5.7\batgt cat
    query-example-all.rq
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select
  • from lthttp//kongo.uits.iupui.edu8546/smithgt
  • from lthttp//kongo.uits.iupui.edu8546/jonesgt
  • from lthttp//kongo.uits.iupui.edu8546/georgegt
  • from lthttp//kongo.uits.iupui.edu8546/blakegt
  • where
  • s p o .

36
  • Results of the distributed SparQL query
  • C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
    query-example-all.rq
  • --------------------------------------------------
    -----------------------------------------
  • s p
    o

  • lthttp//kongo.uits.iupui.edu/blakegt
    examplefav lthttp//kongo.uits.iupui.edu/blakegt
  • lthttp//kongo.uits.iupui.edu/blakegt
    exampleage "12"
  • lthttp//kongo.uits.iupui.edu/blakegt
    examplename "Blake"
  • lthttp//kongo.uits.iupui.edu/blakegt rdftype
    examplePerson
  • lthttp//kongo.uits.iupui.edu/jonesgt
    examplefav lthttp//kongo.uits.iupui.edu/smithgt
  • lthttp//kongo.uits.iupui.edu/jonesgt
    exampleage "35"
  • lthttp//kongo.uits.iupui.edu/jonesgt
    examplename "Jones"
  • lthttp//kongo.uits.iupui.edu/jonesgt rdftype
    examplePerson
  • lthttp//kongo.uits.iupui.edu/georgegt
    examplefav lthttp//kongo.uits.iupui.edu/smithgt
  • lthttp//kongo.uits.iupui.edu/georgegt
    exampleage "21"
  • lthttp//kongo.uits.iupui.edu/georgegt
    examplename "George"
  • lthttp//kongo.uits.iupui.edu/georgegt rdftype
    examplePerson

37

Here is a portion of the GO is_a DAG (Blake,
2004) for molecular function (example chromatin
binding is_a DNA binding) (It
is easy to confuse a gene product name with its
molecular function, and for that reason many GO
molecular functions are appended with the word
"activity". www.geneontology.org, 2008)
38
  • Heres the first entry (of the 26K) in the GO
    text version (with all three parts intermixed)
  • Term
  • id GO0000001
  • name mitochondrion inheritance
  • namespace biological_process
  • def "The distribution of mitochondria, including
    the mitochondrial genome, into daughter cells
    after mitosis or meiosis, mediated by
    interactions between mitochondria and the
    cytoskeleton." GOCmcc, PMID10873824,
    PMID11389764
  • synonym "mitochondrial inheritance" EXACT
  • is_a GO0048308 ! organelle inheritance
  • is_a GO0048311 ! mitochondrion distribution
  • You can also get the GO as RDF XML, or as a MySQL
    database. A portion of the molecular function
    extract on the previous page is shown in RDF XML
    on next page

39
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
    df-syntax-ns" xmlnsgo"http//www.geneontology.o
    rg/dtds/go.dtd"gt
  • ltgoterm rdfabout"http//www.geneontology.org/
    goall"gt (Note all is like root.)
  • ltgoaccessiongtalllt/goaccessiongt
  • ltgonamegtalllt/gonamegt
  • ltgodefinitiongtThis term is the most general
    term possiblelt/godefinitiongt
  • lt/gotermgt
  • ltgoterm rdfabout"http//www.geneontology.org/go
    GO0003674"gt
  • ltgoaccessiongtGO0003674lt/goaccessiongt
  • ltgonamegtmolecular_functionlt/gonamegt
  • ltgosynonymgtGO0005554lt/gosynonymgt
  • ltgosynonymgtmolecular functionlt/gosynonymgt
  • ltgodefinitiongtElemental activities, such as
    catalysis or binding, describing the actions of a
    gene product at the molecular level. A given gene
    product may exhibit one or more molecular
    functions.lt/godefinitiongt
  • ltgois_a rdfresource"http//www.geneontology
    .org/goall" /gt
  • lt/gotermgt
  • lt/rdfRDFgt

40
  • Find parents of GO0004003 in the example GO
    subset
  • PREFIX xsd lthttp//www.w3.org/2001/XMLSchemagt
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-synt
    ax-nsgt
  • PREFIX go lthttp//www.geneontology.org/dtds/go.dt
    dgt
  • select
  • from lthttp//discern.uits.iu.edu8421/Some-GO-entr
    ies-
  • diddled.rdfgt
  • where
  • lthttp//www.geneontology.org/goGO0004003gt
    gois_a parent .
  • Result
  • C\Jena-2.5.7\batgt sparql.bat --query
    GO-paths-from-4003.rq
  • -----------------------------------------------
  • parent

41
  • Find all 3-element paths up from GO0004003
  • PREFIX go lthttp//www.geneontology.org/dtds/go.d
    tdgt
  • select
  • from
  • lthttp//discern.uits.iu.edu8421/Some-GO-entries
    -diddled.rdfgt
  • where
  • lthttp//www.geneontology.org/goGO0004003gt
    gois_a a .
  • a gois_a b .
  • b gois_a c .

42
  • Find all 3-element paths up from GO0004003 using
    Twinkle

43
  • Query dbpedia for entries about Goethe
  • (using Virtuoso iSparql text query)
  • Note that the predicate bifcontains is a
    Virtuoso Built-In Function that searches
    back-end text indexes. It might be possible to
    search using a standard SparQL regex FILTER, but
    it would be much slower.

44
  • The same query using the iSparql graphical QBE
    interface
  • Here is the same query in graphical form as
    constructed using the iSparql QBE interface
  • Components can be dragged-and-dropped from the
    menu at the top of the window. The whole
    interactive window is shown on the next page.

45
  • The same query within the whole iSparql QBE window

46
  • Results from the iSparql text and/or QBE queries

47
  • Optional clauses in SparQL queries
  • Permitted within where clauses
  • optional triple_pattern identifies a triple
    that need not appear in an RDF target but whose
    absence will not prohibit a pattern match.
  • filter restricts variable matches in the
    preceding triple to specified filter patterns, as
    in
  • s p date FILTER ( date gt
    "2005-01-01T000000Z"xsddateTime )
  • or
  • s p d FILTER ( xsddateTime( d ) lt
    xsddateTime( "2005-01-01T000000Z ) )
  • or
  • ?s ?p ?name FILTER regex( ?name, "smi",
    some_flag )
  • union where clauses may be constructed as
  • triple_pattern_1 UNION triple_pattern_2
  • and any RDF element matching either of these
    triples will be included in the resulting output.

48
  • A relational view of the Semantic Web (Newman,
    2007)
  • Relaxing certain requirements normally imposed
    upon SQL (specifically type contraints on joined
    fields), there are strong similarities among
    operations applied to relational and graph-based
    models. For example
  • - triple_pattern . triple_pattern
  • approximates an untyped join
  • - filter
  • approximates an SQL conditional
  • - union
  • approximates an outer union
  • - optional
  • approximates a left outer join( R, S ), which
  • ? join( R, S ) unioned with an anti-join( R, S),
    where an anti-join
  • ? difference with a semi-join, and a semi-join
  • ? join and a projection.

49
  • References
  • Berners-Lee, Tim, Linked Data, 2006.
    http//www.w3.org/DesignIssues/LinkedData.html
  • Blake, Judith, Using the Gene Ontology for Data
    Analysis, 2004.
  • http//www.geneontology.org/teaching_resources/pre
    sentations/2004-11_dataanalysis_jblake.ppt
  • Bizer, Chris, The D2RQ Plattform - Treating
    Non-RDF Databases as Virtual RDF Graphs,
    http//www4.wiwiss.fu-berlin.de/bizer/d2rq/
  • Bizer, Chris, Richard Cyganiak, Tom Heath, How
    to Publish Linked Data on the Web, 2007.
  • http//www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDa
    taTutorial/
  • Dodds, Leigh, Introducing SparQL Querying the
    Semantic Web, 2005. http//www.xml.com/lpt/a/1628
  • McBride, Brian, An Introduction to RDF and the
    Jena RDF API , 2007. http//jena.sourceforge.net/
    tutorial/RDF_API/index.html
  • McCarthy, Philip, Search RDF data with SPARQL,
    2005. http//www.ibm.com/developerworks/xml/librar
    y/j-sparql/
  • McCarthy, Philip, Introduction to Jena, 2004.
    http//www.ibm.com/developerworks/xml/library/j-je
    na//
About PowerShow.com