Distributed Databases - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Distributed Databases

Description:

Slice document tree up into sections. Store sub-sections using a course grained approach ... make programmer do query composition by hand. Slide 25. CIFM01 ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 45
Provided by: DavidN161
Category:
Tags: cat | databases | distributed | how | make | to | tree

less

Transcript and Presenter's Notes

Title: Distributed Databases


1
Distributed Databases
  • David Nelson
  • CAT
  • May 2006

2
Contents
  • Internet Databases
  • Client/Server Architectures, advantages and
    disadvantages
  • Web Database Approaches
  • XML
  • Semi-structured data
  • Distributed Database Systems
  • Definitions
  • Homogeneous and Heterogeneous Systems
  • Federated DBMS Systems
  • Interoperability
  • The Grid

3
Traditional Architecture
  • Traditional Database Systems are based on a
    two-tier client-server architecture
  • User interface
  • Main business and data processing logic

Client
Database Server
  • Server-side validation
  • Database access

4
Web Architecture
  • Need for enterprise scalability causes problems
    which can be solved by a three-tier architecture
  • Generalised to n-tier
  • User interface

Client
Application Server
  • Business logic
  • Data processing logic
  • Server-side validation
  • Database access

Database Server
5
Web as a Database Platform
  • Advantages
  • DBMS advantages
  • Simplicity
  • Platform independence
  • Graphical User Interface
  • Standardization
  • Cross-platform support
  • Transparent network access
  • Scalable deployment
  • Innovation

6
Web as a Database Platform
  • Disadvantages
  • Reliability
  • Security
  • Cost
  • Scalability
  • Limited HTML Functionality
  • Statelessness
  • Bandwidth
  • Performance
  • Immaturity of development tools

7
Approaches
  • CGI
  • Server Side Includes
  • HTTP Cookies
  • API (non-CGI gateways)
  • ODBC
  • Java (JDBC, JSQL, JRB)
  • JavaScript, JScript
  • Microsoft Active Platform (ASP, ADO, ActiveX)
  • PHP (Hypertext Preprocessor)
  • XML

8
Extensible Markup Language (XML)
  • A simplified version of SGML, designed
    specifically for Web documents
  • a meta-language to create customised tags which
    provide functionality not available in HTML
  • links can point to multiple documents
  • links can be bi-directional
  • links to relative objects
  • broken into
  • document data
  • document type definition (DTD) for well-formed
    documents
  • stylesheet (XSL standard)

9
Semi-structured Data
  • Typical data models (e.g. relational) are
    structured
  • i.e. has a separate schema
  • Semi-structured data is self describing
  • aka schemaless
  • no separate description of the type/structure of
    data
  • e.g. XML

10
Sample XML Database
unicode
  • lt?xml version 1.0 encoding UTF-8
    standaloneyesgt
  • lt?xmlstylesheet type text/xsl
    hrefstaff_list.xslgt
  • lt!DOCTYPE STAFFLIST SYSTEM staff_list.dtdgt
  • ltSTAFFLISTgt
  • ltSTAFF branchNo B005gt
  • ltSTAFFNOgtSL21lt/STAFFNOgt
  • ltNAMEgt
  • ltFNAMEgtJohnlt/FNAMEgtltLNAMEgtWhitelt/LNAMEgt
  • lt/NAMEgt
  • ltPOSITIONgtManagergt
  • lt/STAFFgt
  • ltSTAFF branchNoB003gt
  • lt/STAFFLISTgt

root element only 1 per document
attribute
elements ordered attributes unordered
11
Sample DTD
  • lt!ELEMENT STAFFLIST (STAFF)gt
  • lt!ELEMENT STAFF (NAME, POSITION, DOB?, SALARY)gt
  • lt!ELEMENT NAME (FNAME, LNAME)gt
  • lt!ELEMENT FNAME (PCDATA)gt
  • lt!ELEMENT LNAME (PCDATA)gt
  • lt!ELEMENT POSITION (PCDATA)gt
  • lt!ATTLIST STAFF branchNo CDATA IMPLIED)gt

12
Sample StyleSheet
  • lt?xml version 1.0?gt
  • ltxslstylesheet xmlnsxsl http//www.w3.org/TR/
    WD-xslgt
  • ltxsltemplate match /gt
  • lthtmlgtltbodygt
  • ltcentergtlth2gtDreamHome Estate agentslt/h2gtlt/center
    gt
  • lttable border 1 bgcolor ffffffgt
  • lttrgt
  • ltthgtstaffNolt/thgt
  • --- repeat for other column headings
  • ltxslfor-each selectSTAFFLIST/STAFFgt
  • lttrgtltxslvalue-of-selectSTAFFNO/gtlt/tdgt
  • lttrgtltxslvalue-of-selectNAME/FNAME/gtlt/tdgtlt/t
    rgt
  • lt/xslfor-eachgtlt/tablegtlt/bodygtlt/htmlgt
  • lt/xsl-stylesheetgt

13
Benefits of XML
  • Simplicity
  • Open standard and platform/vendor-independent
  • Extensibility
  • Reuse
  • Separation of content and presentation
  • Improved load balancing

14
Benefits of XML
  • Support for integration of data from multiple
    sources
  • Ability to describe data from a wide variety of
    applications
  • More advanced search engines
  • XQuery

15
XML Schema
  • ltxsdgroup-name STAFFTYPE
  • ltxsdelementnameSTAFFgt
  • ltxsdcomplexTypegt
  • ltxsdsequencegt
  • ltxsdelement name STAFFNO
    typeSTAFFNOTYPE/gt
  • ltxsdelement name NAMEgt
  • ltxsdcomplexTypegt
  • ltxsdsequencegt
  • ltxsdelement name FNAMEgt
    type xsdstring/gt
  • ltxsdelement name LNAMEgt
    type xsdstring/gt
  • lt/xsdsequencegt
  • lt/xsdcomplexTypegt
  • ...

16
Querying
  • W3C working group has produced
  • XML Query Requirements
  • XML Query Data Model
  • XML Query Algebra
  • projection, iteration, selection, join, sorting,
    aggregation
  • XQuery - a query language for XML

17
XQuery Queries
  • List the staff at branch B005 with a salary
    greater than 15000
  • FOR S IN document(staff_list.xml)//STAFF
  • WHERE S/SALARY gt 15000 AND
  • S/_at_branchNo B005
  • RETURN S/STAFFNO

18
XML Databases
  • Native
  • XML is the primary data store of the DBMS
  • Semi-structured databases
  • e.g. Lore
  • XML Enabled
  • Traditional RDBMS provides mappings between XML
    and data store
  • Can be stored
  • Course grained
  • Medium grained
  • Fine grained
  • E.g. Oracle, SQL Server, SQL2003

19
Fine Grained Approach
  • Good for queries which need to inspect/manipulate
    specific elements in the XML document
  • Not good for queries which manipulate (e.g.
    retrieve/store) the entire document

Child
Element ( parent)
Document
CharData
Attribute
20
Course Grained Approach
  • One table
  • Best for queries which manipulate whole document
  • e.g. retrieve/store a document
  • Worst for queries which manipulate elements
  • e.g. retrieve children of a tag

21
Medium Grained Approach
  • A compromise between fine and course grained
  • Slice document tree up into sections
  • Store sub-sections using a course grained
    approach
  • Good for both types of queries

22
Distributed Db Definitions
  • Distributed Database System
  • the ability of the DDBS users to run applications
    at each node
  • Federated Database System
  • a DDBS is usually a single application
    distributed over various sites
  • a FDBS is a cooperating multiple system
  • a simple solution for interoperability (as
    discussed later)

23
Distributed Database Systems
  • System needs facilities to be able to
  • perform distributed query optimization
  • manage distributed transactions
  • manage data replication
  • Homogeneous DDBS simplest case
  • several sites, each running their own
    applications on same DBMS with same schema and
    transactions
  • location transparency
  • can communicate over large distances, and are
    autonomous

24
Heterogeneous DDBMS
  • Several existing databases (using different
    DBMSs) linked into a single system
  • Problems
  • variation in costs of operation between sites
  • some operations may not be available at some
    sites
  • some DBMSs cannot read records of others
  • varying base types
  • Requesting site must
  • have detailed knowledge of operation of remote
    system
  • assume remote system has only rudimentary
    functionality
  • make programmer do query composition by hand

25
Federated DBMS
  • A collection of independently managed,
    heterogeneous database systems
  • allow partial and controlled sharing of data
    without affecting existing applications

Federated schema
Federated to local schema mapping
Local schema
Federated schema
Federated to local schema mapping
Local schema
26
Interoperability
  • The web is the ultimate interoperable database
    platform
  • Need to be able to query using various sources on
    the web
  • Without duplication of data as in a data warehouse

27
Interoperability
  • IEEE (1990) Definition
  • the ability of two or more systems or components
    to exchange information and to use the
    information that has been exchanged
  • IEEE Standard Computer Dictionary A Compilation
    of IEEE Standard Computer Glossaries
  • Current simple solutions
  • transformation
  • mediation

28
Interoperability Definition 2
  • The ability to request and receive services
    between various systems and use their
    functionality
  • More than data exchange
  • Implies a close integration

29
Interoperability Definitions 3
  • Semantic Interoperability
  • agreements about content description standards
  • Ontologies
  • Structural Interoperability
  • Specifying semantic schemas such that they can be
    shared, e.g. RDF
  • Syntactic Interoperability
  • How to tag and mark data to facilitate exchange
  • E.g. XML

30
Features
  • Exchange of messages and requests
  • Use of each others functionality
  • Client-server abilities
  • Distribution
  • Operate multiple systems as single unit
  • Communication despite incompatibilities
  • Extensibility and evolution

31
The Problems and Difficulties
  • Different data models
  • There can be major semantic differences even
    within the same data model
  • Properties may be called by different names
  • Different data types may be used
  • What about recreating local defined functions?
  • All this implies we know where they are and we
    have a physical means of getting to them

32
The Problems and Difficulties
  • Databases are by their nature protectors of
    data, they do not share easily
  • Many (particularly legacy systems) do not have
    any form of web interface
  • Most databases are security protected
  • Databases do not advertise their services to the
    web
  • Even client/Server databases operate within a
    cocoon of silence

33
EBCDIC
  • EBCDIC /eb's-dik/, /eb'seedik/, or
    /eb'k-dik/ n.
  • abbreviation, Extended Binary Coded Decimal
    Interchange Code
  • A character set used on early IBM computers. It
    exists in at least six mutually incompatible
    versions, all featuring such delights as
    non-contiguous letter sequences and the absence
    of several punctuation characters fairly
    important for modern computer languages (exactly
    which characters are absent varies according to
    which version of EBCDIC you're looking at). IBM
    adapted EBCDIC from punched card code in the
    early 1960s and promulgated it as a
    customer-control tactic, spurning the already
    established ASCII standard. Today, IBM claims to
    be an open-systems company, but IBM's own
    description of the EBCDIC variants and how to
    convert between them is still an internally
    classified top-secret.
  • EBCDIC is the most common alternate character
    code but there are others.
  • http//www.cheverus.org/advanced/data/EBCDIC.html

34
Some Simple Integration Problems 1
  • Differing schema
  • author char(50) author_surname char(50)
  • author_inits char(10)
  • title varchar(300) title varchar(200)
  • keyword set(char(30)) keywd array(8) (char(30))
  • - both are valid schema in SQL-3
  • also A.N.Other, A N Other, Other N A, ...

35
Some Simple Problems 2
  • Homogeneous Models
  • the same information may be held as attribute
    name, relation name or a value in different
    databases
  • e.g. library fines
  • as a dedicated relation Fine(amount, borrowed_id)
  • as an attribute Loan(id, isbn, date_out, fine)
  • or as a value Charge(1.25, fine)

36
Complex Problems
  • Heterogeneous models
  • Need to relate model constructions to one
    another, for example
  • relate classes in object-oriented to user-defined
    types in object-relational
  • All problems are magnified at this level!

37
Data Models
  • We are only touching the surface in repositories
    and data warehouses

38
XML RDF
  • Resource Description Framework
  • XML Schema defines a grammar
  • therefore we have all the problems shown
    previously (e.g. names)
  • RDF provides a way to encode domain models
  • an infrastructure that enables the encoding,
    exchange and reuse of structured meta-data (W3C)
  • this is what we need for interoperable systems

39
RDF Data Model
  • RDF Data Model consists of three objects
  • Resource
  • anything that can have a URL
  • Property
  • a specific attribute which is used to describe a
    resource
  • Statement
  • a combination of a resource, a property and a
    value
  • known as the subject, predicate and object
  • e.g. The author of http//www.myhome.net/staff_li
    st.xml is Fred Smith

40
RDF Example
  • The statement would be defined in RDF
    (simplified) as
  • lt?xml version"1.0"?gt
  • ltRDFgt
  • ltDescription about" http//www.myhome.net/staff_
    list.xml "gt
  • ltauthorgtFred Smithlt/authorgt
  • ltcreatedgt25 May 2006lt/createdgt
  • lt/Descriptiongt
  • lt/RDFgt

41
The Grid
  • Original Motivation
  • the need for a distributed computing
    infrastructure for advanced science and
    engineering (Walker)
  • Used originally in science for large number
    crunching applications
  • but now finding larger appeal
  • Compare to the national power grid
  • Interoperability is a key issue

42
Examples of Computational and Information Grids
  • NASA Information Power Grid
  • access to large-scale computing resources, large
    databases, and high-end instruments
  • dynamically co-allocated resources (e.g.
    supercomputers)
  • AstroGrid
  • a virtual observatory
  • European Data Grid
  • high energy physics, biology and Earth
    observation
  • distributed, large-scale data intensive computing

43
Summary
  • Distributed and web-databases are increasingly
    important areas
  • XML is being increasingly used in data models,
    data transmission and data integration
  • Interoperability is the key issue and the major
    research area in database systems
  • XML and RDF have the potential as a stepping
    stone to achieving this
  • The Grid is an example of a system which could
    require interoperability to integrate database
    systems

44
Further Reading
  • Connolly and Begg, Database Systems, chapters
    22,23,29,and 30.
  • Ozsu and Valduriez, Principles of Distributed
    Database Systems, 2nd edition
  • everything you ever wanted to know about
    distributed database systems
  • Chaudri and Zicari, Succeeding with Object
    Databases, 2001
  • D Walker, Emerging Distributed Computing
    Technologies, Cardiff University
  • http//www.cs.cf.ac.uk/User/David.W.Walker/IGDS/Gr
    idCourse.htm
  • an introduction to the Grid
  • XML and RDF
  • www.w3schools.com
Write a Comment
User Comments (0)
About PowerShow.com