Title: Intelligent Querying of Web Documents Using a Deductive XML Repository
1Intelligent Querying of Web Documents Using a
Deductive XML Repository
- Nick Bassiliades, Ioannis Vlahavas
- Dept. of Informatics
- Aristotle University of Thessaloniki
2Abstract
- X-DEVICE is a deductive OODB system
- It is used for storing XML documents as objects
- X-DEVICE has a powerful rule-based query language
for - intelligently querying stored XML documents
- publishing the results
- The rule language features
- second-order syntax
- generalized path and ordering expressions
- Metadata are used to translate the extended
features into first-order rules
3Object Model of XML Data
- DTD definitions are automatically translated into
a class schema - XML documents are automatically translated into
objects - Generated classes and objects are stored within
the underlying OODB ADAM - ADAM is an OODB built on Prolog (Norman Paton,
Peter M.D. Gray, Univ. of Aberdeen)
4Object Model of XML DataW3C XQuery TEXT Use Case
- lt!ELEMENT company (name, ticker_symbol?,
description?, business_code, partners?,
competitors?)gt - lt!ELEMENT name (PCDATA)gt
- lt!ELEMENT ticker_symbol (PCDATA)gt
- lt!ELEMENT description (PCDATA)gt
- lt!ELEMENT business_code (PCDATA)gt
- lt!ELEMENT partners (partner)gt
- lt!ELEMENT partner (PCDATA)gt
- lt!ELEMENT competitors (competitor)gt
- lt!ELEMENT competitor (PCDATA)gt
5Object Model of XML DataAlternation
- lt!ELEMENT content (par figure) gt
6Deductive XML Query Language
- The X-DEVICE language is an extension of DEVICE,
the basic deductive rule language - N. Bassiliades, I. Vlahavas, A.K. Elmagarmid,
E-DEVICE An extensible active knowledge base
system with multiple rule type support, IEEE
TKDE, 12(5), 824-844, 2000. - X-DEVICE rules are pre-compiled into DEVICE
deductive rules - Deductive rules are compiled into production
rules - ECA rules with one complex event
- Matching through RETE network
7X-DEVICE LanguageBasic first-order deductive
rules
- if C_at_company(nameXYZ Ltd,
- partner.partners ? P)
- then partner_of_xyz(partnerP)
- Selects company C with name XYZ Ltd
- Iterates over partners P through navigation
- Path inverse notation NOT partners.partner
- Defines a new derived class of partners of
company XYZ - Derived objects are materialized
8X-DEVICE Language Recursion
- if P_at_partner_of_xyz(partnerP1)and
- C_at_company(nameP1,
- partner.partners ? P2)
- then partner_of_xyz(partnerP2)
- Rule processing uses semi-naïve evaluation
- Negation is allowed (safety, stratification)
- Single-valued attributes use for instantiation
- Multi-valued attributes use ? for instantiation
- Prolog lists guarantee correct ordering
9X-DEVICE LanguageVariable-Attribute Expressions
- if C_at_company(A XYZ)
- then a_xyz_comp(companylist(C))
- We dont know which attribute of company contains
the string XYZ - A is second-order variable (meta-variable)
- list is an aggregation function (collects company
OIDs in a multi-valued attribute) - The operator performs string search
10X-DEVICE LanguageTranslation of
Variable-Attributes
- if company_at_xml_seq(elem_order ? A)
- then new_rule(
- if C_at_company(A XYZ)
- then a_xyz_comp(companylist(C))
- ) gt deductive_rule
- Iterate over meta-class xml_seq to find all
attributes (sub-elements) of class company - A production rule creates one deductive rule for
each instantiation of A - A is now a first-order variable in the condition
and a constant in the action
11X-DEVICE LanguageGeneralized Path Expressions
- if C_at_company( XYZ)
- then a_xyz_comp(companylist(C))
- The search for string XYZ must be performed
- not only to attributes of company
- but also to attributes of objects contained
within company - at all levels of nesting
12X-DEVICE LanguageTranslation of Generalized Paths
- Iterate over all immediate elements of class
company - Store them into an auxiliary derived class
- if company_at_xml_seq(elem_order ? X1)
- then tmp_elem1(cnd_elemX1,
- pathX1)
13X-DEVICE LanguageTranslation of Generalized Paths
- Recursively iterate over all elements and
sub-elements stored in the auxiliary class - The path-so-far from the root company element is
accumulated - if X1_at_tmp_elem1(cnd_elemX2,pathX3)
- and X2_at_xml_seq(elem_order ? X4)
- then tmp_elem1(cnd_elemX4,
- pathX4X3)
14X-DEVICE LanguageTranslation of Generalized Paths
- Terminate the recursion if no more nested
elements can be found - Create one deductive rule for each discovered
concrete path - if X1_at_tmp_elem1(cnd_elemX2,pathX3) and
- not X2_at_xml_seq and
- prologcreate_path(X3,PATH)
- then new_rule(
- if C_at_company(PATH XYZ)
- then a_xyz_comp(companylist(C))
- ') gt deductive_rule
15X-DEVICE LanguageTranslation of Generalized Paths
- The following deductive rules are created
- C_at_company(name XYZ)
- C_at_company(ticker_symbol XYZ)
- C_at_company(description XYZ)
- C_at_company(business_code XYZ)
- C_at_company(partner.partners XYZ)
- C_at_company(competitor.competitors XYZ)
- Optimization of multiple rules is achieved
through common parts of the RETE network - The DEVICE system takes care of that
16X-DEVICE LanguageOrdering Expressions
- W3C TEXT Case Query 5
- For each news item that is relevant to the
Gorilla Corp, create an item summary element. - The content of the item summary is the content of
the title, date, and first paragraph of the news
item - if N_at_news_item(.contentGorilla Corp,
par.content ?1 PAR, - titleT, dateD)
- then item_summary(titleT,dateD,
- parPAR)
17X-DEVICE LanguageTranslation of Ordering
- Collect all the paragraphs that satisfy the
condition - Store them in a list of an auxiliary derived
class - if N_at_news_item(.contentGorilla Corp,
par.content ? X1, - titleT, dateD)
- then tmp_elem1(tmp_var1T, tmp_var2D,
- tmp_objlist(X1))
18X-DEVICE LanguageTranslation of Ordering
- Isolate a sub-list of all the paragraphs that
satisfy the ordering expression ?1 - There is one Prolog goal for each ordering
expression - if X3_at_tmp_elem1(tmp_var1T,tmp_var2D,
- tmp_objX1) and
- prologlength(X2,1),append(X2,_,X1)
- then tmp_elem2(tmp_var1T,tmp_var2D,
- tmp_objX2)
19X-DEVICE LanguageTranslation of Ordering
- Iterate over all qualifying results and return
them into the target element - if X1_at_tmp_elem2(tmp_var1T,tmp_var2D,
- tmp_obj ? PAR)
- then item_summary(titleT,dateD,
- parPAR)
20X-DEVICE LanguageBuilding Result Documents
- The top-level element of the XML result document
is identified with the keyword xml_result - The DTD of the result document is identified
through object references - W3C TEXT Case Query 2
- Find news items where the Foo Corp company and
one or more of its partners are mentioned in the
same paragraph and/or title - List each news item by its title and date
21X-DEVICE LanguageBuilding Result Documents
- Find the Foo company and iterate over its
partners - For each partner, iterate over news items and
search for Foo and its partner inside the title
of the same news item - if C_at_company(nameFoo Corp,
- partner.partners ? P) and
- N_at_news_item(titleTFoo Corp P,
- dateD)
- then xml_result(news_item1(titleT,
- dateD))
22X-DEVICE LanguageBuilding Result Documents
- Find the Foo company and iterate over its
partners - For each partner, iterate over news items and
search for Foo and its partner inside the
nested paragraphs of the same item - if C_at_company(nameFoo Corp,
- partner.partners ? P) and
- N_at_news_item(.par.contentFoo Corp
- P,
titleT, dateD) - then news_item1(titleT,dateD)
23X-DEVICE LanguageBuilding Result Documents
- lt!DOCTYPE news_item1
- lt!ELEMENT news_item1
- (title, date)gt
- lt!ELEMENT title (PCDATA)gt
- lt!ELEMENT date (PCDATA)gt
- gt
- The structure of the title and date elements is
automatically determined by the type of the
corresponding rule variables
24Advantages of X-DEVICE
- Logic-based query languages have
- well-understood mathematical properties
- declarative nature
- advanced optimization techniques (magic-sets)
- X-DEVICE compared to XQuery (functional)
- more high-level, declarative syntax
- more compact and comprehensible
- general path expressions
- due to fixpoint semantics and second-order
variables
25Advantages of X-DEVICE
- Users can express complex XML document views
- Information customization for e-commerce,
e-learning, etc. - X-DEVICE offers multiple knowledge representation
formalisms - Deductive, Production, and Active rules
- Structured objects
- Production and Active rules can be used to update
XML documents - All the above can play an important role as an
infrastructure for the Semantic Web
26Intelligent Querying of Web Documents Using a
Deductive XML Repository
- Nick Bassiliades, Ioannis Vlahavas
- Dept. of Informatics
- Aristotle University of Thessaloniki
- X-DEVICE site
- www.csd.auth.gr/lpis/systems/
- x-device.html