Query Processing in a Mediator System for Data and Multimedia - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Query Processing in a Mediator System for Data and Multimedia

Description:

The DMS schema includes , in general, a set of standard attributes declared ... Semi-automatic discovery of relationships between local schemata ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 51
Provided by: claudio61
Category:

less

Transcript and Presenter's Notes

Title: Query Processing in a Mediator System for Data and Multimedia


1
Query Processing in a Mediator System for Data
and Multimedia
  • D. Beneventano1, C. Gennaro2, M. Mordacchini2, R.
    Carlos Nana Mbinkeu1
  • 1DII - Università di Modena e Reggio Emilia, via
    Vignolese 905, Modena, Italy
  • 2ISTI CNR, via Moruzzi 1, Pisa, Italy

2
Outline
  • Motivation
  • The system and scenario overview
  • Querying an ontology of data and multimedia
    sources
  • mapping
  • Query unfolding for multimedia conditions
  • ranking
  • Conclusion and future work

3
Motivation
  • We proposed a method for building a populated
    domain ontology representative of a set of web
    data sources.
  • The method exploits the capabilities of a
    mediator system (MOMIS) to create an integrated
    view of a set of data sources,
  • i.e. a domain ontology schema, and a set of
    annotations linking data to the integrated view.
  • We extend that approach with multimedia sources,
    thus obtaining a methodology for building and
    querying an ontology representing data and
    multimedia sources.
  • There are several use cases where applications
    interact with ontologies of data and multimedia
    sources.
  • Multimedia and data sources are usually
    represented with different models. No standard
    for representing at the same time data and
    multimedia sources has been adopted by large
    communities.
  • Different languages and different interfaces for
    querying traditional and multimedia data
    sources have been developed. The formers rely on
    expressive languages allowing expressing
    selection clauses, the latters typically
    implement similarity search techniques for
    retrieving multimedia documents similar to the
    ones provided by the user.

4
Managing a Semantic Peer MOMIS MILOS
provides a unified access to different data
sources referring to the same domain by means of
a Semantic Peer Data Ontology (SPDO) of the data
i.e. a common representation of all the data
sources belonging to the peer.
MOMIS (Mediator envirOnment for Multiple
Information Sources) is a framework to perform
information extraction and integration of
heterogeneous, structured and semistructured,
data sources
  • MILOS is a general purpose Multimedia Content
    Management System
  • Manages and serves any multimedia documents
  • Manages any metadata of documents

NeP4B Semantic Peer
5
Data and Multimedia Sources (DMSs)
  • Data and Multimedia Source (DMS) is an object
    oriented database of metadata objects describing
    a collection of multimedia documents (such as
    images, videos, etc.) represented with a schema
    defined in ODLI3
  • The DMS schema includes , in general, a set of
    standard attributes declared using standard
    predefined ODLI3 types, such as string, double,
    integer, etc, supporting selection predicates
    typical of structured and semi-structured data,
    such as , lt, gt, . . .
  • And multimedia attributes, LMS includes another
    set of special attributes, declared by means of
    special predefined classes in ODLI3 which support
    similarity based searches (Full text search,
    image similarity, geographical search, etc.)

6
A sample scenario
7
A sample scenario
8
A sample scenario
9
Quering DMSs
  • A DMS Mi can be queried using an extension of
    standard SQL-like syntax SELECT clause. The WHERE
    clause consists of a conjunctive combination of
    predicates on the single standard attributes of
    Mi, as in the following
  • ORDER BY LIMIT K, specify in practice a top-k
    similarity query

SELECT Mi.Ak,, Mi.Sl, FROM Mi WHERE Mi.Ax
op1 val1 AND Mi.Ay op2 val2 ... ORDER BY
Mi.Sw(Q1), Mi.Sz(Q2), LIMIT K
10
Quering DMSs
  • interface city()
  • // standard attributes
  • attribute string Name
  • attribute string Zip
  • attribute string Country
  • attribute integer Surface
  • attribute integer Population
  • // similarity attributes
  • attribute Image Photo
  • attribute Text Description
  • attribute GeoCoord GeoPosition,
  • // query example
  • SELECT Name
  • FROM city
  • WHERE Country "Italy
  • ORDER BY Photo("http//www.flickr.com/32e324e.jpg
    "),

This query tries to find among all Italian cities
the ones that best match the image given as
example, the textual description, and are nearest
as possible to the geographical point of location
40.25N, 14.32E.
11
DMS Assumptions
  • Since we would like to build a general purpose
    framework, we make the following assumptions
  • The way by which the returned objects are ordered
    is not known (black box)
  • The DMS does not return scores associated with
    the objects indicating the relevance of them with
    respect to the query
  • If no ORDER BY clause is specified, DMS will
    return the records sorted in random order.

12
Representing the SPDO
  • We build a conceptualization of a set of DMSs,
    composed of global classes and global attributes
    and mappings between the SPDO and the DMS
    schemata,

13
Mapping
  • The query is defined in a semiautomatic way as
    follows
  • A Mapping Table (MT) is specified for each global
    class G, whose columns represent the n local
    classes M1, ,Mn belonging to G and whose rows
    represent the h global attributes of G.
    Multimedia attributes can be mapped only onto
    Global multimedia attributes of the same type.
  • Join Conditions are defined between pairs of
    local classes belonging to G and allow the system
    to identify instances of the same real-world
    object in different sources.

14
Example of mapping
15
Mapping
  • Resolution Functions are introduced to solve data
    conflicts of local attribute values associated to
    the same real-world object. In our framework we
    consider and implement some of such resolution
    functions, in particular, the PREFERRED function,
    which takes the value of a preferred source and
    the RANDOM function, which takes a random value.
  • For what concern the multimedia attributes, we
    introduce a new resolution function, called
    MOST_SIMILAR, which returns the multimedia
    objects most similar to the one expressed in the
    query (if any).

16
Query the SPDO
  • Given a global class G with m attributes of which
    k multimedia attributes, denoted by G.S1,,G.Sk
    (as photo and description in the class Hotel) and
    h standard attributes, denoted by G.A1,,G.Ah, a
    query on G (global query) is a conjunctive query,
    expressed in a simple abstract SQL-like syntax
    as
  • SELECT G.Al,,G.Sj
  • FROM G
  • WHERE G.Ax op1 val1
  • AND G.Ay op2 val2
  • ...
  • ORDER BY G.Sw(Q1), , G.Sz(Q2)
  • LIMIT K

17
Query unfolding
  • To answer a global query on G, the query must be
    rewritten as an equivalent set of queries (local
    queries) expressed on the local classes L(G)
    belonging to G.
  • the query rewriting is performed by means of
    query unfolding, which consists of the following
    four steps
  • Computation of Local Query conditions
  • Computation of Residual Conditions
  • Fusion of local answers
  • Application of the Residual Condition

18
Query Fusion Ranking
  • Why?
  • Modern multimedia content managers typically
    return multimedia objects (i.e., which support
    similarity) in decreasing order of relevance,
    that is, so that the best answers are on the
    top
  • we want to preserve this knowledge at global
    level
  • However, since we cannot exploit scores we use
    the rank as indicator of the relevance of the
    record returned.

19
Ranking the results
  • our problem falls into the category of the
    partial rank aggregation problems, in which we
    merge top-k lists rather than fully ranked lists,
  • We use a simple but yet effective aggregation
    function for ordinal ranks is the median
    function
  • The score of an object its median position in all
    the returned lists.
  • The median function is demonstrated by Fagin et
    al., to be near-optimal, even for top-k or
    partial lists.
  • The algorithm MEDRANK is based on median rank
    aggregation

20
The MedRank algorithm
  • Access the rankings sequentially
  • when an element has appeared in more than half of
    the rankings, output it in the aggregated ranking

21
The MedRank algorithm
  • Access the rankings sequentially
  • when an element has appeared in more than half of
    the rankings, output it in the aggregated ranking

22
The MedRank algorithm
  • Access the rankings sequentially
  • when an element has appeared in more than half of
    the rankings, output it in the aggregated ranking

23
The MedRank algorithm
  • Access the rankings sequentially
  • when an element has appeared in more than half of
    the rankings, output it in the aggregated ranking

24
The MedRank algorithm
  • Access the rankings sequentially
  • when an element has appeared in more than half of
    the rankings, output it in the aggregated ranking

25
Example
  • We would like to found image about the Arch of
    Triumph of Rome by night.
  • and we assume to have two DMSs containing images
    of monuments in the world, the first DMS1 with
    geographical coordinates search capabilities, and
    the second one DMS2 with image similarity search
    capabilities

26
Example
27
SELECT FROM DMS1 WHERE subjectMonument
ORDER BY GeoCoord(4153'43.68"N, 1228'56.34"E
) STOP AFTER 5
Unfortunately if I just for geo coordinates
giving the coordinates of Rome as input I found a
lot of images of the Colosseum
Dist 1km
Dist 1km
DMS1
Dist 1km
Dist 1km
Dist 2km
Roma. Palazzo della Civiltà del Lavoro. EUR
28
SELECT FROM DMS2 WHERE typeMonument ORDER BY
Img(URL), STOP AFTER 5
And if I just search for similarity an image of
the Arch of Triumph of Rome by night I found a
lot of images about the Arch of Triumph of Paris,
which is very similar but more famous.
DMS2
Roma. Palazzo della Civiltà del Lavoro. EUR
29
SELECT FROM WorldMonuments WHERE
SubjectMonument ORDER BY Img(URL),
GeoCoord(4153'43.68"N, 1228'56.34"E ) STOP
AFTER 5
First element retrieved
Dist 1km
Dist 1km
MS1
MS2
Dist 1km
Dist 1km
Dist 2km
Roma. Palazzo della Civiltà del Lavoro. EUR
30
Conclusion and future work
  • We presented a methodology implemented in a tool
    that allows a user to create and query an
    integrated view of data and multimedia sources.
  • Future work will be devoted to experiment the
    tool in real scenarios. In particular, our tool
    will be exploited for integrating business
    catalogs related to the area of tiles.
  • We think that such data may provide useful test
    cases because of the need of connecting data
    about the features of the tiles with their
    images.

31
The end
32
Building the Data Ontology MOMIS
  • MOMIS (Mediator envirOnment for Multiple
    Information Sources) is a framework to perform
    information extraction and integration of
    heterogeneous, structured and semistructured,
    data sources
  • Semantic Integration of Information
  • A common data model ODLI3 (derived from ODL-ODMG
    and I3) mapped into OLCD description logics
  • Tool-supported techniques to construct the Global
    Virtual View (GVV)
  • Local sources wrapping
  • Local Schema Annotation w.r.t. a common lexical
    ontology (WordNet)
  • Semi-automatic discovery of relationships between
    local schemata
  • Clustering techniques to build the GVV mappings
    between the GVV and local schemata (Mapping
    Table)
  • automatic GVV Annotation w.r.t. a common lexical
    ontology OWL exportation
  • Global Query Management
  • Including services and multimedia data sources


D. Beneventano, S. Bergamaschi, F. Guerra, M.
Vincini "Synthesizing an Integrated Ontology ",
IEEE Internet Computing Magazine,
September-October 2003,42-51. S. Bergamaschi, S.
Castano, M. Vincini "Semantic Integration of
Semistructured and Structured Data Sources",
SIGMOD Record Special Issue on Semantic
Interoperability in Global Information, Vol. 28,
No. 1, March 1999.
33
MOMIS architecture
MANUALANNOTATION
SEMI-AUTOMATIC ANNOTATION
34
Mapping definition in MOMIS
  • Mappings among a Global Class G of the GVV and
    its local classes are represented by a Mapping
    Table
  • Global-as-View (GAV) mappings for each global
    class G a view VG over the local classes of G is
    defined by a Full-Join Merge Operator
  • Outer Join to include into the result all
    tuples of all local sources
  • Merge to perform data reconciliation
    (Resolution functions)

35
Building the Mappings an example
Mapping Table of the global Class Hotel
L1.resort, L2.hotel
Data Conversion Functions
DollarEuro(mean_price)
Select name, avg(T_L1.price_avg,
T_L2.mean_price) as price, T_L1.Stars,
Full Join
from T_L1 outer join T_L2
on (T_L1.Name T_L2.denomination)
36
Global Query Management
  • The querying problem How to answer queries
    expressed on the GS (global queries)?
  • In a Virtual Data Integration system, data reside
    at the data sources then the query processing is
    based on Query rewriting to rewrite a global
    query as an equivalent set of queries expressed
    on the local schemata data sources (local
    queries).
  • GAV approach query rewriting is performed by
    unfolding, i.e. by expanding a global query on G
    according to the view associated to G
  • Query Optimization Techniques for the Full-Join
    Merge Operator
  • Motivation
  • full outer join queries are very expensive,
    especially in a distributed environment
  • only limited optimization is performed on full
    outer join

37
An example of Full-Join Merge Optmization
SELECT FROM G WHERE city LIKE "Modena" AND
price lt 200
AND stars 4
AND free_wifi true
AND free_wifi true
AND stars 4
38
MILOS
XML Search Engine Structure search Fielded
search Full text search Multimedia search Schema
independent XQuery support(SOAP Web Service)
Metadata Editor Visual Basic (SOAP Comm.)
MultiMedia doc. serv.Allows homoneous acces to
heterogeneous media (SOAP Web Service)
Retrieval Interface JSP(SOAP Comm.)
Metadata independence The schema seen in the
interface logic can be different of the one(s)
used in the repository
Repository Metadata IntegratorAccess to
documents Access to metadata Metadata
indepence (SOAP Web Service)
39
MILOS (2)
  • The MILOS system is based on a threetier
    distributed architecture
  • Client tier This is the top most level of the
    system. It contains client application that
    interacts with MILOS and that displays results to
    user applications.
  • Business logic It manages query processing by
    integrating and aligning information stored in
    the databases. It performs reconciliation of
    retrieved data by managing ranking.
  • Data tier It is composed of the Large Object
    Database, that physically stores multimedia
    documents managed by the system and the metadata
    database, where all metadata associated with the
    multimedia items are stored.
  • Multimedia metadata are represented in the data
    tier in XML formats. MILOS adopts a native XML
    database, which supports XML query language
    standards and offers advanced search and indexing
    functionality on arbitrary XML documents.
  • MILOS XML database provides fulltext search,
    automatic classification, and feature similarity
    search functionalities.
  • the Large Object Database permits clients of
    MILOS to deal with multimedia in an uniform way.

40
The MedRank algorithm
  • Whenever there are multiple multimedia attributes
    strange side effects can affect the precision of
    the answer.
  • Example
  • Suppose we have two image database consisting of
    monument images.
  • MS1 provides image similarity and geografic
    coordinates
  • MS2 provides only image similarity
  • The query consists of a sample image and a point
    coordinates

41
SELECT FROM WorldMonuments ORDER BY
Image(URL), GeoCoord(4153'43.68"N, 1228'56.34"E
) STOP AFTER 5
First element retrieved
Dist 1km
Dist 1km
MS1
MS2
Dist 1km
Dist 1km
Dist 2km
Roma. Palazzo della Civiltà del Lavoro. EUR
42
DMS Assumptions
  • The rationale of the above assumptions is that
    our aim is to work in a general environment with
    heterogeneous DMSs for which we do not have any
    knowledge of their scoring functions.
  • The motivation is that the final scores
    themselves are often the result of the
    contributions of the scores of each attribute. A
    scoring function is therefore usually defined as
    an aggregation over partial heterogeneous scores
    (e.g., the relevance for text-based IR with
    keyword queries, or similarity degrees for color
    and texture of images in a multimedia database).
  • Even in the simpler case of single multimedia
    attributes the knowledge of the scores become
    meaningless outside the context in which they are
    evaluated. As an example consider the TF IDF
    scoring function used by normal text search
    engines. The score of a document depends upon the
    collection statistics and search engines could
    use different scoring algorithms.
  • However, the above assumptions of considering a
    local DMS as a black box that does not return any
    score associated to result elements, do not
    presume that local DMSs do not use internally
    scoring functions for combing different
    multimedia attributes .
  • Typically modern multimedia systems use fuzzy
    logic to aggregate scores of different multimedia
    attributes that are graded in the interval 0,1.
    Classical examples of thesefunctions are the min
    and mean functions.

43
Computation of Local Query conditions
  • Each atomic predicate Pi and similarity predicate
    in the global query are rewritten into
    corresponding constraints supported by the local
    classes.
  • For example, the constraints stars 3 is
    translated into a constrain Stars 3 considering
    the local class resort and is not translated into
    any constraint considering the local class hotel.

44
Computation of Residual Conditions
  • Conditions on not homogeneous standard attributes
    cannot be translated into local conditions they
    are considered as residual and have to be solved
    at the global level.

45
Computation of Residual Conditions
  • for multimedia attribute we use the MOST_SIMILAR.
    For example, suppose we are searching for images
    similar to one specified in the query by means of
    ORDER BY clause. If we retrieve two or more
    multimedia objects with one or more corresponding
    images, MOST_SIMILAR function will simply select
    the image that is more similar to the query
    image.
  • However since we do not know scores, how do we
    evaluate similarity?

46
Computation of Residual Conditions
  • Rank Based Similarity
  • we simply exploit the rank of the objects in the
    returned list as indicator of similarity between
    the attributes values belonging to the objects.
  • This aspect is related with the problem of the
    fusion

47
Fusion of local answers
  • For each local source involved in the global
    query, a local query is generated and executed on
    the local sources. The local answers are fused
    into the global answer on the basis of the
    mapping query qG defined for G, i.e. by using the
    Full Outerjoin-merge (FOJ) operation.
  • Computation of the full outer join of local
    answers (FOJ). The result of this operation is
    ordered on the basis of the multimedia attributes
    specified in the query, this aspect is deeply
    examined in the next Slide.
  • Application of the Resolution Functions for
    each attribute GA of the global query the related
    Resolution Function is applied to FOJ

48
Ranking the results
  • In principle, if we had ALL the (fused) records
    of the result set we can exploit an optimal rank
    aggregation method based on a distance measure to
    quantify the disagreements among different
    rankings.
  • In this respect the overall ranking is the one
    that has minimum distance to the different
    rankings obtained from different sources.
  • Several different distance measures are available
    in literature. However, the difficult of solving
    the problem of distance-based rank aggregation is
    related to the choice of the distance measure and
    its corresponding complexity that can be even
    NP-Hard in some cases (see Kendall distance).
  • However, fortunately, our case falls into this
    category of the partial rank aggregation
    problems, in which we measures the distance
    between only the top-k lists rather than fully
    ranked lists.

49
Example1
A ( 1 , 2 , 3 ) B ( 1 , 1 , 2 ) C ( 3 , 3 , 4
) D ( 3 , 4 , 4 )
1 http//www.cs.helsinki.fi/u/tsaparas/Information
Networks/lectures/lecture10.ppt
50
Combining rankings
  • In many cases the scores are not known
  • e.g. meta-search engines scores are proprietary
    information
  • or we do not know how they were obtained
  • one search engine returns score 10, the other
    100. What does this mean?
  • or the scores are incompatible
  • apples and oranges does it make sense to combine
    price with distance?
  • In this cases we can only work with the rankings
Write a Comment
User Comments (0)
About PowerShow.com