Internet Databases PowerPoint PPT Presentation

presentation player overlay
1 / 38
About This Presentation
Transcript and Presenter's Notes

Title: Internet Databases


1
Internet Databases
  • Hicham Elmongui

2
World Wide Web
  • Web browser
  • URL universal resource locator
  • Web site
  • Web server
  • HTML HyperText Markup Language

3
HTML
  • Markup language much simpler than SGML Standard
    Generalized Markup Language
  • Text is annotated with commands tags usually
    consisting of a start tag and an end tag

4
HTML example
  • ltHTMLgt
  • ltBODYgt
  • Fiction
  • ltULgt
  • ltLIgtAuthor Milan Kunderalt/LIgt
  • ltLIgtTitle Identitylt/LIgt
  • ltLIgtPublished 1998lt/LIgt
  • lt/ULgt
  • Science
  • ltULgt
  • ltLIgtAuthor Richard Feynmanlt/LIgt
  • ltLIgtTitle The Character of Physical Lawlt/LIgt
  • ltLIgtHardcoverlt/LIgt
  • lt/ULgt
  • lt/BODYgt
  • lt/HTMLgt

5
HTML viewed on a browser
6
Databases and the Web
7
How to generate query results?
  • Web server creates a new process for a program
    interacts with the database. very inefficient
  • Web server communicates with this program via CGI
    (Common Gateway Interface)
  • Program generates result page with content from
    the database
  • Dynamic HTML pages

8
Application servers
  • Piece of software between the web server and the
    applications
  • Functionality
  • Hold a set of pre-forked threads for performance
  • Integration of heterogeneous data sources
  • Transactions involving several data sources
  • Session management

9
Server-side processing Java Servlet
  • Extension of the functionality of the web server
  • Java programs that run on the server and interact
    with the server through a well-defined API
  • A Servlet consists of mostly business logic and
    routines to format small datasets into HTML

10
Server-side processing JavaBeans
  • Reusable software components written in Java
  • Can be assembled to create larger applications
    and
  • Can be easily manipulated using visual tools.

11
Server-side processing JSP, ASP
  • Code inside a web page that is interpreted by the
    web server
  • Separate application logic from the appearance of
    the Web page (unlike the servlets)

12
Beyond HTML
  • HTML adequate to represent the structure of
    documents for display purposes
  • Features of HTML are not sufficient to represent
    the structure of data within a document for
    general applications

13
Beyond HTML XML
  • Extensible Markup Language (XML) Extensible
    HTML
  • Confluence of SGML and HTML The power of SGML
    with the simplicity of HTML
  • Allows definition of new markup languages, called
    document type declarations (DTDs)

14
Design goals of XML
  • Compatible with SGML
  • Easiness of writing programs that process XML
    documents
  • Design should be formal and concise

15
XML Constructs
  • Elements
  • Main structural building blocks of XML
  • Start and end tag
  • Must be properly nested
  • Element can have attributes that provide
    additional information about the element

16
XML Constructs (cntd)
  • Entities like macros, represent common text.
  • Comments
  • Document type declarations (DTDs)

17
Booklist Example
  • lt?XML version1.0 standaloneyes?gt
  • lt!DOCTYPE BOOKLIST SYSTEM booklist.dtdgt
  • ltBOOKLISTgt
  • ltBOOK genreFictiongt
  • ltAUTHORgt
  • ltFIRSTgtMilanlt/FIRSTgt
  • ltLASTgtKunderalt/LASTgt
  • lt/AUTHORgt
  • ltTITLEgtIdentitylt/TITLEgt
  • ltPUBLISHEDgt1998lt/PUBLISHEDgt
  • lt/BOOKgt
  • ltBOOK genreScience formatHardcovergt
  • ltAUTHORgt
  • ltFIRSTgtRichardlt/FIRSTgt
  • ltLASTgtFeynmanlt/LASTgt
  • lt/AUTHORgt
  • ltTITLEgtThe Character of Physical Lawlt/TITLEgt
  • lt/BOOKgt
  • lt/BOOKLISTgt

18
Booklist Example
19
XML DTDs
  • A DTD is a set of rules that defines the
    elements, attributes, and entities that are
    allowed in the document.
  • An XML document is well-formed if it does not
    have an associated DTD but it is properly nested.
  • An XML document is valid if it has a DTD and the
    document follows the rules in the DTD.

20
Example DTD
  • lt!DOCTYPE BOOKLIST
  • lt!ELEMENT BOOKLIST (BOOK)gt
  • lt!ELEMENT BOOK (AUTHOR, TITLE, PUBLISHED?)gt
  • lt!ELEMENT AUTHOR (FIRST, LAST)gt
  • lt!ELEMENT FIRST (PCDATA)gt
  • lt!ELEMENT LAST (PCDATA)gt
  • lt!ELEMENT TITLE (PCDATA)gt
  • lt!ELEMENT PUBLISHED (PCDATA)gt
  • lt!ATTLIST BOOK genre (ScienceFiction) REQUIREDgt
  • lt!ATTLIST BOOK format (PaperbackHardcover)
    Paperbackgt
  • gt

21
Domain-Specific DTDs
  • Development of standardized DTDs for specialized
    domains enables data exchange between
    heterogeneous sources
  • Example Mathematical Markup Language (MathML)
  • Encodes mathematical material on the web
  • In HTML ltIMG SRCxysq.gif ALTx2-4x-320gt
  • In MathML ???

22
MathML presentation elements
  • x2 - 4x - 32 0
  • ltmrowgt
  • ltmrowgt ltmsupgt ltmigtxlt/migtltmngt2lt/mngt lt/msupgt
  • ltmogt-lt/mogt
  • ltmrowgtltmngt4lt/mngtltmogtinvisibletimeslt/mogtltmigtxlt
    /migtlt/mrowgt
  • ltmogt-lt/mogtltmngt32lt/mngt
  • lt/mrowgt ltmogtlt/mogtltmngt0lt/mngt
  • lt/mrowgt

23
MathML content elements
  • x2 - 4x - 32 0
  • ltrelngtlteq/gt
  • ltapplygt
  • ltminus/gt
  • ltapplygt ltpower/gt ltcigtxlt/cigt ltcngt2lt/cngt lt/applygt
  • ltapplygt lttimes/gt ltcngt4lt/cngt ltcigtxlt/cigt lt/applygt
  • ltcngt4lt/cngt
  • lt/applygt ltcngt0lt/cngt
  • lt/relngt

24
XML-QL Querying XML Data
  • Goal High-level, declarative language that
    allows manipulation of XML documents
  • No standard yet
  • Example query in XML-QL
  • WHERE
  • ltBOOKgt
  • ltNAMEgtltLASTgt1lt/LASTgtlt/NAMEgt
  • lt/BOOKgt in www.booklist.com/books.xml
  • CONSTRUCT ltRESULTgt 1 lt/RESULTgt

25
XML-QL
  • A more complicated example
  • WHERE
  • ltBOOKgt b ltBOOKgt IN www.booklist.com/books.xml,
  • ltAUTHORgt n lt/AUTHORgt
  • ltPUBLISHEDgt p lt/PUBLISHEDgt in e
  • CONSTRUCT
  • ltRESULTgt
  • ltPUBLISHEDgt p lt/PUBLISHEDgt
  • WHERE ltLASTgt l lt/LASTgt IN n
  • CONSTRUCT ltLASTgt l lt/LASTgt
  • lt/RESULTgt

26
Query output
  • ltRESULTgt
  • ltPUBLISHEDgt1980lt/PUBLISHEDgt
  • ltLASTNAMEgtFeynmanlt/LASTNAMEgt
  • ltLASTNAMEgtNarayanlt/LASTNAMEgt
  • lt/RESULTgt
  • ltRESULTgt
  • ltPUBLISHEDgt1981lt/PUBLISHEDgt
  • ltLASTNAMEgtNarayanlt/LASTNAMEgt
  • lt/RESULTgt

27
Semi-structured data
  • Data with partial structure
  • All data models for semi-structured data use some
    type of labeled graph
  • We introduce the object exchange model (OEM)
  • Object is triple (label, type, value)
  • Complex objects are decomposed hierarchically
    into smaller objects

28
OEM
  • ltlastname, string, Feynmangt
  • ltauthorname, set, firstname, lastnamegt
  • ltfirstname, string, Richardgt
  • ltlastname, string, Feynmangt

29
Booklist Data in OEM
30
Indexing for Text Search
  • Text database Collection of text documents
  • Important class of queries Keyword searches
  • Boolean queries Query terms connected with AND,
    OR and NOT. Result is list of documents that
    satisfy the boolean expression.
  • Ranked queries Result is list of documents
    ranked by their relevance.
  • IR Precision (percentage of retrieved documents
    that are relevant) and recall (percentage of
    relevant objects that are retrieved)

31
Inverted files
  • For each possible query term, store an ordered
    list (the inverted list) of document identifiers
    that contain the term.
  • Query evaluation Intersection or Union of
    inverted lists.

32
Inverted files query example
  • Example Agent AND James

33
Signature files
  • Index structure (the signature file) with one
    data entry for each document
  • Hash function hashes words to bit-vector.
  • Data entry for a document (the signature of the
    document) is the OR of all hashed words.
  • Signature S1 matches signature S2 if S2S1S2

34
Signature filesquery evaluation
  • Boolean query consisting of conjunction of words
  • Generate query signature Sq
  • Scan signatures of all documents.
  • If signature S matches Sq, then retrieve document
    and check for false positives.
  • Boolean query consisting of disjunction of k
    words
  • Generate k query signatures S1, , Sk
  • Scan signature file to find documents whose
    signature

35
Signature files Example
36
Summary
  • Publishing databases on the web requires
    server-side processing such as CGI-scripts,
    Servlets, ASP, or JSP
  • XML is an emerging document description standard
    that allows the definition of new DTDs. Query
    languages for XML documents such as XQL are
    emerging.

37
Summary
  • Text databases have gained importance with the
    proliferation of text data on the web. Boolean
    queries can be efficiently evaluated using an
    inverted index or a signature file. Evaluation of
    ranked queries is a more difficult problem.

38
References
  • Raghu Ramakrishnan Database Management Systems,
    3rd edition, 2003
  • http//www.w3.org/XML/
Write a Comment
User Comments (0)
About PowerShow.com