Title: OEM and LORE Query Language
1OEM and LORE Query Language
- Sanjay Madria
- Department of Computer Science
- University of Missouri-Rolla
- madrias_at_umr.edu
2- Source http//WWW-DB.Stanford.EDU/lore/
3Semistructured Data (SSD)
- No explicit schema Irregular and incomplete data
- Schema may be hidden or mixed with data
- Examples
- Semi-structured data arises mainly from the
integration of heterogeneous data sources both
structured and non-rigid structured - Information sources change, or new sources added.
- semantic discrepancies among heterogeneous data
sources - Data from the web
- Overall site structure may change often.
- Biological data
4Characteristics of SSD
- Missing or additional attributes
- Multiple attributes
- Different types in different objects
- Heterogeneous collections
- Self-describing
- Irregular, no priori structure
5Object Exchange Model (OEM)
- Motivation
- Self-describing data model
- information exchange and extraction
- Handle incomplete and irregular data
- Why a new data model? it not a new model.
6LORE
- Lore Lightweight Object Repository
- Lightweight because
- Object Model supported is lightweight
- No multiuser or heavyweight DBMS features
7Lore - motivation
- Relational data model has null values, and OO
models have inheritance and complex objects. Both
have difficulties in designing schemas to
incorporate irregular data. - To manage semi-structured data, as in such
environment - Difficult to decide in advance on a single,
correct schema as - structure of the data may evolve rapidly, or
- data elements may change types, or
- data not conforming to previous structure may be
added
8- Thus
- Need for management of semi-structured data!
- Data managed by Lore is not confined to a schema
and it may be irregular or incomplete. - OEM is the Lores data model.
- Lorel is Lores query language.
9Object Exchange Model (OEM)
- Data in this model can be thought of as a labeled
directed graph. - Schema-less and self-describing.
- nodes are objects and
- edges are labeled with attribute names and,
- leaf nodes have atomic values
- Object nesting.
- Vertices in graph are objects.
- Each object has a unique object identifier (oid),
such as 5. - Atomic objects have no outgoing edges and are of
types such as int, real, string, gif, java, etc. - All other objects that have outgoing edges are
called complex objects.
10OEM (Cont.)
- Examples
- Object 3 is complex, and its subobjects are 8,
9, 10, and 11. - Object 7 is atomic and has value Clark.
- DBGroup is a name that denotes object 1.(Names
are entry points into the database). - Type and structure heterogeneity Observe that
members may have 0, 1 or more offices, office is
a string and sometimes, also complex object, a
room may be string and integer - DbGroup.Member denotes all member-labeled
subobjects
11An OEM Database
DBGroup
1
Member
Project
Member
Member
Project
Member
2
3
4
5
6
Name
Project
Name
Office
Project
Age
Name
Age
Office
Office
9
11
8
10
12
13
14
7
15
16
Clark
Smith
46
Gates 252
Lore
Tsimmis
Jones
28
Room
Building
Room
Building
17
18
19
20
CIS
411
CIS
252
12Object Exchange Model - OEM
- Each value exchanged is given an explicit label.
- Object ?temp-in-Fahrenheit, integer, 80?
- temp-in-Fahrenheit is the label.
- Each object is self-describing, with a label,
type and value. - ?set-of-temps, set, cmpnt1, cmpnt2 ?
- cmpnt1 is ?temp-in-Fahrenheit, integer, 80?
- cmpnt2 is ?temp-in-Celsius, integer, 20?
13Labels
- Lables Play two roles
- identifying an object (component)
- identifying the meaning of an object (component)
?person-record, set, cmpnt1, cmpnt2, cmpnt3 ?
cmpnt1 is ?person-name, string, Fred?
cmpnt2 is ?office-num-in-bldg-5, integer, 333?
cmpnt3 is ?department, string, toy?
- Person-name both identifies cmpnt1 and coveys its
meaning.
- In relational data this corresponds to .
14Labels - Issues
- Labels are relative (more specific) to the source
of the data object. - Similar labels from different sources need to be
resolved.
- Labels provide the flexibility in representing
- object structure
15OEM - Specification
- Each object in OEM has the following structure
- Label A variable character string describing
what the object represents. - Type The data type of the objects value. Each
is either an atom type, or type set. - Value A variable-length value of the object.
- Object-ID A unique variable-length identifier
for the object or null.
16OEM - Summary
- OEM is an information exchange model. It does not
specify how objects are stored at source.
- OEM does specify how objects are received at a
client, but after objects are received they can
be stored in any way the client likes.
- Each source has a distinguished object with
lexical identifier root.
17- ltbiblio,set,doc1,doc2,,docngt
- doc1 is ltdoc, set, auths1, topic1, call-no1gt
- auths1 is ltauth-set,set auth11gt
- auth11 is ltauth-ln, string, Ullmangt
- topic1 is lttopic, string,Databasesgt
- call-no1 is ltinternal-call-no, integer, 25gt
- doc2 is ltdoc, set, auths2, topic2, call-no2gt
- auths2 is ltauth-set,set auth21, auth22, auth23gt
- auth21 is ltauth-ln, string, Ahogt
- auth22 is ltauth-ln, string, Hopcroftgt
- auth23 is ltauth-ln, string, Ullmangt
-
Example
- topic2 is lttopic, string,Algorithmsgt
- call-no1 is ltdewey-decimal, string, BR273gt
- docn is ltdoc, set, authsn, topicn, call-nongt
- authsn is ltauth,string,
- Crichtongt
- topic1 is lttopic, string,Dinosaursgt
- call-no1 is ltfictional-call-no, integer, 95gt
- biblio is the root object.
18OEM - QL
- SELECT Fetch-expression
- FROM Object
- WHERE Condition
- The result of this query is itself an object,
with special label answer - ?answer, set, obj1, obj2, , objn ?
- Each returned obji is a component of object
specified in the From clause of the query, where
the component is located by the Fetch-expression
and satisfies the Condition.
19Path
- The notion of path is used in both
Fetch-Expression in the Select clause and the
condition in the Where clause. - Path describes traversals through an object using
subobject structure and labels. - Example biblio.doc.auth
- Paths are used in Fetch-Expression to specify
which components are are returned in the answer
object. - Paths are used in the condition to qualify the
fetched objects or other (related) components in
the same object structure.
20Queries - Simple
- Retrieve the topic of each document for which
Ullman is one of the authors - SELECT biblio.doc.topic
- FROM root
- WHERE biblio.doc.auth-set.auth-ln Ullman
- Intuitively, the querys where clause finds all
paths through subobject structure with the
sequence of labels biblio,doc,auth-set,auth-ln
such that the object at the end of the path has
value Ullman. - ltanswer, set, obj1, obj2gt
- obj1 is lttopic, string, Databasesgt
- obj2 is lttopic, string, Algorithmsgt
21Queries - wild-cards
- Retrieve all documents with internal call number
- SELECT biblio.?.topic
- FROM root
- WHERE biblio.?.internal-call-no
- ? label matches any label. For this query,
the doc labels can be replaced by any other
strings and query would produce the same result.
By convention, two occurrences of ? In the same
query must match the same label unless variables
are used. - ltanswer, set, obj1gt
- obj1 is lttopic, string, Databasesgt
22Queries - wild-paths
- Retrieve all documents with internal call number
- SELECT .topic
- FROM root
- WHERE .internal-call-no
- Symbol matches any path of length one or
more. The use of followed by a single label is
a convenient and common way to locate objects
with a certain label in complex structure.
Similar to ?, two occurrences of in the same
query must match the same sequence of labels,
unless variables are used. - ltanswer, set, obj1gt
- obj1 is lttopic, string, Databasesgt
23Queries - variables
- Retrieve each document for which both
Hopcroft and Aho are co-authors - SELECT biblio.doc
- FROM root
- WHERE biblio.doc.auth-set.auth-lnAho and
- biblio.doc.auth-set.auth-lnHopcr
oft - Here, the query finds all the paths with
structure biblio, doc, auth-set, and with two
distinct path completions with label auth with
values Aho and Hopcroft - ltanswer, set, obj1gt
- obj1 is the complete doc2
24Lorel Query Language
- Need query language that supports path
expressions for traversing graph data and
handling of typeless data. - A simple path expression is a name followed by a
sequence of labels. - DBGroup.Member.Office.
- Set of objects that can be reached starting with
the DBGroup object, following edges labeled as
member and then office.
25Lorel (cont.)
- Example
- select DBGroup.Member.Officewhere
DBGroup.Member.Age lt 30 - Result
- Office Gates 252
- Office
- Building CIS
- Room 411
26Lorel Query Rewrite
- Previous query rewritten to (OQL style)
- select Ofrom DBGroup.Member M, M.Office Owhere
exists y in M.Age y lt 30
27Lorel Query Features
- Explicitly handle coercion.
- Automatic type coercion
- 0.5 lt 0.9 should return true
- Comparison on age transformed to existential
condition. - Since all properties are set-valued in OEM.
- A user can ask DBGroup.Member.Age lt 30 regardless
of whether Age is single valued, set valued, or
unknown.
28- Path expression queries -specification for a set
of possible paths through the graph - Example - is a path expression that matches any
number of labels - Use of Data guides Structural summary of the
database
29Lorel (cont.)
- General path expressions are loosely specified
patterns for labels in the database.(
disjunction, ? label pattern optional) - Example
- select DBGroup.Member.Namewhere
DBGroup.Member.Office(.Room.Cubicle)? like
252 - Result
- Name JonesName Smith
30Lorel Queries - Simple Path Expression
- Retrieve the offices of members with age greater
than 30 years - Query SELECT DBGroup.Member.Office
- WHERE DBGroup.Member.Age gt 30
- Result Office Gates 252
- Office
- Building CIS
- Room 411
31Queries - General Path Expression
- Query SELECT DBGroup.Member.Name
- WHERE DBGroup.Member.Office(.Room.Cubicle)?
- Like 252
- Result Name Jones
- Name Smith
- Room matches all labels starting from Room, like
Room68. stands for disjunction. ? indicates
that the label pattern is optional. like 252
specifies that the data value should end with
string 252.
32Queries - SubQueries
Retrieve Lore project members who work on other
projects Query SELECT M.Name, ( SELECT
M.Project.Title WHERE M.Project.Title !
Lore) FROM DBGroup.Member M WHERE
M.Project.Title Lore Result Member Name
Jones Title Tsimmis
33Lore - Summary
- Lore does facilitate query and updates on
semi-structural databases - There has been more work done on optimization
using data guides (vldb97). - How is this related to WWW?
- XML-QL and related work provides the answer.