Web services and data integration - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Web services and data integration

Description:

Based on hand ... information such as database, knowledge base. Most DBMS now export in ... support for continuous services based on a mail server ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 53
Provided by: abi57
Category:

less

Transcript and Presenter's Notes

Title: Web services and data integration


1
Web services and data integration
  • S. Abiteboul Omar Benjelloun Tova Milo
  • INRIA and Xyleme INRIA INRIA and Tel Aviv
  • Serge.Abiteboul_at_inria.fr
  • Singapore, December 2002

2
Organization
  • The context
  • Accessing information on the Web
  • Web services
  • SOAP
  • WSDL
  • UDDI
  • Active XML
  • AXML documents
  • AXML services
  • Architecture et implementation
  • Applications
  • Conclusion

3
The context
  • The Web and XML are changing dramatically the
    management of distributed information

4
Distributed data management
  • Warehousing
  • Mediation
  • Management of data in cooperative work
  • Management of data in distributed scientific
    applications
  • Mobile data management
  • Document management
  • Web sites
  • Portals, etc.
  • Information used to live in islands and this is
    changing

5
The Web of yesterday
  • Protocol HTTP
  • Documents HTML
  • Millions of independent Web sites and billions of
    documents
  • Browsing and full-text indexing
  • Publication of databases using forms
  • Data management with the Web
  • HTML is primarily to be read by humans
  • Data management applications over Web data
  • Based on hand-made wrappers
  • Expensive, incomplete, short-lived, not adapted
    to the Web constant change
  • No real support for distributed data management!

6
Information used to live in islands but it is
changing
  • Different formats relational, metadata,
    documents, text, DXF
  • A Web standard for data exchange, XML, is fixing
    it
  • XML captures all kinds of information over a wide
    spectrum
  • XML comes with a family of emerging standards
    XML schema, XSL/T, Xquery, domain specific
    schemas
  • Different computers, platforms, languages,
    applications
  • A standard for Web services, SOAP, is fixing it
  • SOAP allows ubiquitous computing on the Internet
  • SOAP comes with a family of emerging standards
    WSDL, UDDI
  • This provides a uniform access to information
  • the dream for distributed data management

7
The information spectrum
Semi-structured data and XML
Structured Data
Meta data
Hierarchy
Books Contracts Catalogs Bank
accounts Emails Financial Reports Insurance
Policies Economical Analysis
Derivatives Inventory Political
analysis Insurance Claims Financial
News Sports News Resumes
8
What can be captured with XML?
  • Very structured information such as database,
    knowledge base
  • Most DBMS now export in XML
  • Semi-structured data such as data exchange
    formats (ASN.1, SGML), e.g., technical
    documentation
  • Less structured data documents
  • Meta-data Author, date, status
  • Existing structure in them chapter, section,
    table of content and index
  • Possibly tagging of elements in it (citation,
    lists)
  • Links to other documents
  • Plain text
  • Meta data for unstructured data such as images
    and sound

9
A standard for information XML
  • labeled ordered trees where leaves are text
  • Marriage of document and database worlds
  • Marriage of full text indexing and structure
    indexing
  • Is it the ultimate data model? No
  • Purely syntax more semantics needed
  • Is it OK for now? Definitely yes (because it is a
    standard)

10
The main asset of XML typing
  • Applications need typing and XML data can be
    typed if needed (DTD and XML schema)
  • Trees
  • Logical Granularity neither page or document
    level but the piece of information that is
    needed
  • Semantics and structure are in tags and paths
  • product-table/product/reference
  • product-table/product/price

11
A standard for distributed computing Web
services
  • Possibility to activate a method on some remote
    Web server
  • Exchange information in XML input and result are
    in XML
  • Ubiquitous XML distributed computing
    infrastructure
  • 2 main applications
  • E-commerce
  • Access to remote data
  • With XML and Web services, it is possible
  • To get information from virtually anywhere
  • To provide information to virtually anywhere

12
The basic picture
XML
m( )
Black box
SOAP messages
query
answer
XML
SOAP service
Web client
Internet
13
Accessing and integrating information
14
Accessing remote information
Query some data services that provide candidate
genes
Multi formats multi protocoles
Gene banks
Application using gene banks
processing
Use some processing services
processing
processing
15
Same with Web services
Query some data services that provide candidate
genes
Web
Gene banks
Application using gene banks
processing
Use some processing services
processing
processing
16
The big picture peer2peer
Web service
DB Web Service
queries
Web
queries
DB Web Service
Web service
Data warehouses Databases Web pages PC, PDA, cell
phones
17
The main roles
Client
Look up
Service Registry
bind
publish
Service Provider
18
Simple view Looking for information about Gismos
  • Query some yellow-pages
  • Who knows about Gismos?
  • Negotiate with Gismo specialists
  • Nature of the service
  • Quality, cost
  • Get the information
  • Order, payment, delivery
  • Integration in my information system
  • Eventually publish information
  • and all this automatically

19
Data integration Logical view
Service directories
Mediator or warehouse
wrapper1
Service descriptions
Get service description
wrapper2
wrapper3
source2
source3
source1
20
The Web service solution
Data and service repository
Web
UDDI
Data and service description
wsdl
RDF
Data and service semantics
worklow
wsfl
XMLSOAP
21
Mediation with Web services
Service directories
Service descriptions
wrapper3
source3
Mediator
Web
wrapper1
source1
wrapper2
source2
  • Web services
  • Service directories
  • Service descriptions
  • Wrappers
  • Sources
  • Mediators/warehouses

22
Advantages for data integration
  • A universal model for data integration XML
  • Solves the heterogeneity issue
  • A universal protocol for distribution SOAP
  • A language for describing the interface of data
    sources WSDL
  • Simple object access protocol (something like
    Corba)
  • Web service description language (something like
    IDL)
  • Solves the interoperability issue
  • A standard for publication and discovery of
    information UDDI
  • Universal Description, Discovery and Integration
  • A standard for describing the semantics of
    sources RDF
  • Resource description framework

23
Advantages continued the goal
  • The system can find a new source of information
    using UDDI
  • Understand its syntax using WSDL
  • Understand its semantics using RDF
  • Get it using SOAP
  • The information is in XML, can be restructured
    and integrated automatically
  • Not yet But soon?

24
Jargon
Help!
WSFL
XHTML
.NET
XML
DTD
RDF
RosettaNet
XSL-FO
Xschema
namespace
XSL
ebXML
XSLT
HTTPS
SOAP
HTTP
OASIS
OAGIS
ICE
MIME
WSDL
UDDI
WSDL
RSS
25
Active XML
  • Joint work with Bernd Amann, Jerôme Baumgarten,
    Angela Bonifati, Ioana Manolescu, Frederic Ngoc
    and others

26
AXML XML embedded SOAP calls
SOAP messages
AXML
AXML
AXML
m( )
query
query
Web server
Web client
answer
answer
AXML
q1(1,2), Q2, Q3 (XPATH, Xquery)
Internet
Internet
AXML peer client and server
27
Active XML
AXML peer
  • Peer-to-peer architecture
  • Each Active XML peer
  • Repository manages active XML data with
    embedded Web service calls
  • Web client activate calls in the documents
  • Web server provides Web services defined as
    (parameterized) queries over the repository

soap
28
Build on existing standards
  • Tree data XML
  • internal data representation and
  • data exchange

XML
AXML
Web services SOAP, WSDL
Query languages Xquery/Xpath
29
AXML peer repository of AXML documents
  • ltdirectorygt
  • ltdep name"Toygt
  • ltscgttoy.xyz.com/GetToyPersonel()lt/scgt
  • lt/depgt
  • ltdep nameDVDgt
  • ltscgtdvd2000.com/GetDVDPersonnel()lt/scgt
  • lt/deptgt
  • lt/directorygt

Service calls
May contain calls to any SOAP Web service
e-bay.net, google.com, etc. to any AXML Web
service
30
AXML peer Web client
  • ltdirectorygt
  • ltdep name"Toygt
  • ltperson pnameSmithgt
  • ltphonegt01lt/phonegt
  • ltpdagt
  • ltscgttoy.xyz.com/GetPDA(../../_at_pname)lt/scgt
  • lt/pdagt
  • lt/persongt
  • ltscgttoy.xyz.com/GetToyPersonel()lt/scgt
  • lt/depgt
  • ltdep nameDVDgt
  • ltscgtdvd2000.com/GetDVDPersonnel()lt/scgt
  • lt/deptgt
  • lt/directorygt

Result
31
Controlling the evaluation
  • Activation of calls and data lifespan are
    controlled
  • frequency when is the service called ? ( call
    each day )
  • validity how long is the retrieved data valid ?
  • mode immediate or lazy ?

32
Example control attributes
  • ltdirectorygt
  • ltdep name"Toygt
  • ltsc validrt1 week modeimmediate
    gt
  • toy.xyz.com/GetToyPersonel()
  • lt/scgt
  • lt/depgt
  • ltdep nameDVDgt
  • ltsc valid0 modelazy gt
    dvd2000.com/GetDVDPersonnel()
  • lt/scgt
  • lt/deptgt
  • lt/directorygt

33
AXML peer Web server
  • AXML Web services defined using XQuery over AXML
    documents

let service Get-Toy-Personnel( ) be for a in
document("toy.xyz.com/members.axml")/member,
b in a//name, c in a//phone,
d in a//pda return ltperson pname
b/text() gt c d lt/persongt
34
The crux the exchange of AXML data
  • Arguments result of calls are AXML
  • Data is thus intentional dynamic
  • Distributed computing by sending data containing
    service calls, one can delegate some work to
    other peers
  • Partial computations by returning data
    containing service calls, one can give to the
    receiver the control of these calls
  • All this can be controlled

35
Example Tourist guide
  • ltscgtyahoo.com/Temp(Paris)lt/scgt
  • I need to evaluate the temperature of Paris
  • I call Yahoo ltscgtmeteoF.com/t(Paris)lt/scgt
  • I call meteoF ltt typecelciusgt0lt/tgt
  • I am asked what is the temperature of Paris
  • ltt typecelciusgt0lt/tgt
  • ltscgtmeteoF.com/t(Paris)lt/scgt
  • ltscgtyahoo.com/Temp(Paris)lt/scgt

36
Continuous services
  • Inside the tourist guide new events
  • Pull mode standard SOAP query
  • Ask once a week
  • Push mode subscription to a continuous service
  • When new events are announced, they are pushed to
    the AXML document
  • Possibility to define AXML continuous services

37
Architecture andimplementation
38
Global architecture
AXML peer S2
AXML peer S1
query
SOAP
XQuery processor
Evaluator
AXML
AXML peer S3
AXML
read update
SOAP wrapper
read update
consults
SOAP
service descriptions
SOAP service
XML
AXML document store
AXML
SOAP client
service call
service result
39
Implementation
  • SUNs Java SDK 1.4 (includes XML parser, XPath
    processor, XSLT engine)
  • Apache Tomcat 4.0 servlet engine
  • Apache Axis SOAP toolkit 1.0 beta 3
  • X-OQL query processor, persistent DOM repository
  • JSP-based user interface, using JSTL 1.0
    standard tag library
  • First prototype
  • No lazy evaluation
  • No continuous services
  • On going work on typing, security, replication
  • Demo for VLDB02
  • P2P auctioning system

40
Illustration 3 applications
41
Application 1 Warehousing
  • Construction of warehouses with Web data
  • Monitoring of changes on the Web
  • Kind of services that are used
  • Google search engine
  • wget
  • Classification
  • XML Diff and site changes
  • Page monitoring system
  • etc.

42
Application 2 Mobile data
  • AXML peers as mobile entities
  • Active data store with query capabilities
  • Metadata and object profiles
  • Issues
  • Storage services for mobile objects
  • Processing services for mobile objects
  • Use proxies for that
  • European Project DBGlobe

43
Application 2 Mobile data
  • Light-weight AXML peers
  • PDA, cellular phone, laptop
  • Limited storage, network bandwidth
  • Sometime disconnected
  • Limited functionalities
  • E.g., support for continuous services based on a
    mail server and SMTP

44
Application 2 context awareness
  • Where am I? (geographical position)
  • Where is the nearest  AXML proxy? (network
    position)
  • Active use of this information
  • For providing context dependent data (e.g., time,
    temperature, nearest restaurants, etc.)
  • For selecting services (e.g., choose a nearby
    proxy for caching)

45
Application 3 P2P Auction
  • Each peer proposes some auctions
  • The document records the peers items and the
    bids
  • Each peer knows about some auctions of other
    peers
  • Each peer can bid on any auction
  • The peer recalls the bids she has put
  • When an auction closes, the winner is notified
  • No centralization

46
Conclusion and on-going work
47
AXML services
  • A simple, declarative way to create Web services
    compatible with current standards for Web
    services invocation
  • AXML services are powerful tools for data
    integration
  • They allows for new, powerful features
  • Intentional parameters and results AXML
    documents (containing service calls) that are
    exchanged.
  • Continuous services send back a stream of
    answers (SOAP messages) to the caller

48
Many issues
  • Security
  • Typing of parameters
  • Lazy evaluation and optimization
  • Replication
  • Mobility dbglobe project
  • Termination
  • Implementation
  • Foundations
  • And more

49
Security
  • Peers exchange AXML documents containing service
    calls
  • A server (resp. client) might ask the client
    (resp. server) to do something  bad 
  • ltscgtqod.com/QuoteOfDay lt/scgt
  • ltquote datejuly 8th 2002gt
  • My heart was bumping ltcontextgtTskitishvili,
    picked 5th in the NBA draft by the Denver
    Nuggetslt/contextgt
  • ltscgtbuy.com/BuyCar( BMW Z3 )lt/scgt
  • lt/quotegt

50
Using type to control the use of services
Accept
Peer1
Peer2
f
f
g
Evaluate g before sending data
Peer1 tells which kind of data it exports and
Peer2 which kind it accepts
51
Distribution and replication
  • Motivated by mobile devices with limited
    resources
  • Allows to distribute one XML document on several
    peers
  • Allows to replicate an XML-sub-tree on several
    peers
  • Query optimization

52
Thanxmore questions Serge.Abiteboul_at_inria.fr
Write a Comment
User Comments (0)
About PowerShow.com