Files systems centralized DBS distributed DBS - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Files systems centralized DBS distributed DBS

Description:

Database system provides facilities for better management and control of data ... Middleware (Oracle, Sybase, DB2, etc..) Network and Operating systems. Application ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 49
Provided by: CIT788
Category:

less

Transcript and Presenter's Notes

Title: Files systems centralized DBS distributed DBS


1
Introduction
  • Files systems -gt centralized DBS -gt distributed
    DBS
  • Review of DB technologies
  • Why DDBS? Pros and Cons
  • What is a DDBS?
  • What are provided in a DDBS?
  • What are the main design issues (components) of a
    DDBS?

2
From File Systems to DBS
  • File Systems -
  • What is a file system?
  • A file contains data for a program
  • When do I need a file?
  • I want to store some data for later use by a
    program
  • Main()
  • Const int DATA_SIZE 100
  • Int data_arrayDATA_SIZE
  • Ifstream data_file (numbers.dat)
  • Int i

3
Database Applications
  • Why do you use a database for your application?
  • Database system provides facilities for better
    management and control of data
  • It is easier to access the data comparing to a
    file system
  • In the development of a database application, you
    start with data modeling first (define the
    databases)
  • NOTE In file systems, you may start with the
    program first

4
From File Systems to DBS
  • Each application (program) has its own data
    descriptions (file)
  • Why you design a new file? You need to write a
    program
  • New applications mean new files may have to be
    created
  • Files for different applications may have
    different formats and the programs may be written
    in different languages
  • Disadvantages
  • Low degree of data sharing
  • High degree of data redundancy - same (similar)
    data (information) may be stored in different
    files
  • Data isolation (replicated) data is scattered
    in various files in different formats. Difficult
    to write program to retrieve the data (update
    (data consistency) problem)
  • Security problem (lack of a centralized
    controller)

5
File Systems
6
Database Applications
  • Database applications
  • contains a collection of interrelated data and a
    set of programs that allow users to access and
    modify data
  • provide an abstract view of the data (data
    abstraction) - hide the details on how the data
    are stored and maintained
  • sharing of data and modeling of the real world
    information (entity and entity relationships)
  • Why you need to design a database? You want to
    model the information about the application
  • The system provides
  • the definitions of the structures of the data
  • mechanisms for manipulation of data (information)
  • safety information for recovery and security

7
Database Management
8
Database application example
  • Begin
  • input(flight_no, date, customer_name)
  • EXEC SQL SELECT STSOLD, CAP
  • INTO temp1, temp2
  • FROM FLIGHT
  • WHERE FNO flight_no AND
    DATE date
  • END

9
Data Abstraction
  • Separate application from data (information)
  • Three levels to hide how the information is
    stored and manipulated from applications
  • Physical level (lower level)
  • describe how the data are actually stored in the
    disk (physical format)
  • Conceptual level
  • describe what data are actually stored and the
    relationships that exist among data (relations)
  • View level
  • describe only part of the entire database which
    the user is interested. Different users may have
    different views (created by applications)

10
ANSI/SPARC Architecture
11
File Systems? Databases Systems?
  • Which one you prefer to use for your applications?

Currently, database systems are widely used for
various computer applications. The reasons are
Better organization of real-world information
(data modeling on real world entities), better
definition, control and manipulation of data, and
share of data, etc.. A tradeoff between
processing cost and efficiency in information
management
12
Relational Database
  • Data modeling techniques relational database,
    object-oriented database, hierarchical DB, etc.
  • Relational DB
  • databases are tables (regular shape, fixed no. of
    attributes)
  • relation R defined over n sets D1, D2, , Dn is
    a set of n-tuples ltd1, d2, , dngt such that
    d1?D1, d2?D2, , dn?Dn
  • a table consists of rows and columns
  • a row is a record (tuple) and a column is an
    attribute (field)
  • a table may be defined with a key (keys)
  • a key uniquely identifies a tuple in a relation
  • the records in a table are unordered (why??)

13
Example
14
Relational DBS
  • Normalization
  • break a large bad table into several smaller
    good tables
  • what is a good table (relation)? (from
    transaction management viewpoints)
  • No repetition, update, insertion and deletion
    anomaly
  • a step-by-step reversible process of replacing a
    given collection of relations by successive
    collections in which relations have a
    progressively simpler and more regular structure
  • Integrity rules
  • are constraints that define consistent states on
    database, i.e., referential integrity
    (relationships between records)

15
Referential Integrity
Customer Table
Sale Table
16
Relational Data Language
  • Manipulation on data
  • Relational Algebra and relational calculus
  • Relational Algebra
  • consist of a set of operators that operate on
    relations
  • each operator takes one or two relations as
    operands to produce a result relation
  • operators select, project, Cartesian product,
    union and set difference (join, semi-join,
    natural join, ...)
  • Relational Calculus
  • specify the formal description of the result
    without specifying how to obtain them, i.e.,
    using SQL
  • specify a condition or an action on a domain
    (data set)

17
SAMPLE SQL
  • SELECT EMP.ENAME
  • FROM EMP, ASG, PROJ
  • WHERE EMP.ENO ASG.ENO
  • AND ASG.PNO PROJ.PRO
  • AND PROJ.PNAME CAD/CAM
  • UPDATE PAY
  • SET SAL 25000
  • WHERE PAY.TITLE Programmer

18
Distributed Computing
  • Non-centralized computing (not parallel
    computing)
  • A number of (autonomous) processing elements (not
    necessarily homogeneous) that are interconnected
    by a computer network (may be a mobile network)
  • The processes cooperate with each other to
    perform their assigned tasks
  • Require the support of network (delay in
    communication)

Example Begin Read A site 1 Read B site
2 C A B write C site 3 End
site 1
A
C
B
site 2
site 3
19
What are the main differences between DS and
DDBS?
  • Distributed database system is one of the topics
    under Distributed Systems
  • Distributed Systems normally deal with the lower
    level communications between processes
  • Distributed database systems provides procedure
    and facilities to support higher level database
    applications
  • Processes -gt transactions
  • Concentrate on the upper middleware layer and the
    interactions with applications (not on
    communication)

20
DS Vs DDBS
  • A distributed system consists of four levels
    operating systems, network, middleware and
    applications

Applications Middleware Operating
System Network (physical)
Distributed Database Systems
Distributed Systems
21
Motivation of Distributed Database Systems
22
Motivation of Distributed Database Systems
  • Centralization
  • put the components of a system at a centralized
    site
  • Integration
  • How different system components (or processing
    elements) are joint together. They need not to be
    resided at the same site, I.e., in a DDBS, the
    system components are distributed at several
    sites
  • They need to work together (and depend on each
    other) to support the applications
  • Degree of integration
  • Coupling how closely the components (of a DDBS)
    are related
  • weak and strong coupling
  • Synchronization how the actions of the
    components are related
  • synchronous (mostly strong coupling) Vs
    asynchronous

23
Motivation of Distributed Database Systems
Site 1 Site 2 Site 3 Site 4
Application
Middleware (Oracle, Sybase, DB2, etc..)
Network and Operating systems
24
Motivation of Distributed Database Systems
Site 1 Site 2 Site 3 Site 4
Application
Middleware
Network and Operating systems
25
From Centralized DBS to DDBS
  • What are distributed in a DDBS
  • Processing Logic and functions
  • Data
  • Control
  • Distribution of data partition a database into
    fragments and distribute the fragments at several
    sites (nodes)
  • Distribution of processing a transaction may
    access data items located at several sites (a
    transaction may have several processes at several
    sites)
  • Distribution of control several sites work
    together to management a transaction and control
    the data

26
Why Distributed Database Systems
  • Organization reasons
  • many applications are distributed in nature,
    i.e., banking systems, air-flight booking
    systems, and Internet applications
  • Interconnection of existing databases (degree of
    sharing)
  • integration of databases of different
    applications by a network
  • Incremental growth (expansibility)
  • the growth of an applications may require to
    create sub-systems at other sites
  • Autonomy
  • each site may have its own database for its own
    applications (transactions)

27
Benefits of Distributed Database Systems
  • Performance considerations (in addition to data
    correctness)
  • distribution of workload to several sites (load
    balancing)
  • reduce response time (the time duration between
    completion time and submission time of a
    request)
  • processing delay communication delays
  • attempt to reduce the access delay and access
    cost for data items at remote sites
  • Process the transaction (request) at local site
  • Reliability and availability
  • duplication of information at several sites
  • higher reliability even with site failures

28
Data Locality
Local Site
Remote Site
B
A
Application 01
90 (A) 10 (B)
29
Cost of Distributed Database Systems
  • Greater complexity
  • Management of distributed data and higher cost to
    maintain data integrity (data correctness)
  • database partition, consistency and mutual
    consistency (the replicated data items have the
    same value)
  • Management of distributed transactions
  • processing of transactions in a distributed
    environment
  • atomicity of distributed transactions
    (all-or-none)
  • Impact of network
  • message loss and errors, disconnection and
    partition
  • Security
  • security measures become more complicated and
    have to be added at each site

30
What is a Distributed Database System?
  • An informal definition
  • A distributed database (DDB) is a collection of
    data items which belong logically to the same
    system but are distributed at several sites over
    a computer network
  • A distributed database system (DDBS) is
  • A software ( or system) that manages the DDB.
  • It provides access mechanism that makes this
    distribution transparent to users. (Different
    degrees of transparency)

31
Implicit Assumptions
  • Data
  • stored at multiple sites
  • Distributed database
  • is a database, not a collection of files
  • data items are logically related as exhibited in
    the users access patterns
  • Processors
  • at different sites are interconnected by a
    computer network, not multiprocessors (what are
    the differences? Communication delays)
  • DDBS
  • Each site is a full-fledged DBS (each site shares
    the responsibility in data management and
    processing applications
  • not a remote file system

32
What are NOT DDBSs?
  • Not just a timesharing computer system
  • Not a loosely or tightly coupled multiprocessor
    system
  • Not a database system resides at one of the nodes
    of a network of computers this is a centralized
    database on a network node

33
Centralized DBS on a Network
34
Distributed DBS Environment
35
Applications (conventional)
  • Manufacturing especially multi-plant
    manufacturing
  • Banking systems
  • Corporate management information system (MIS)
  • Airline reservation systems
  • Hotel chains
  • Any organization which has a decentralized
    organization structure

36
New database applications
  • CAD/CAM design (object based database)
  • Project management - workflow (long transactions)
  • Knowledge discovery (data mining)
  • Real-time applications (flight navigation and
    Military command and control)
  • telephone management systems (location
    management)
  • System monitoring (stock trading systems)
  • E-commerce and Internet applications
  • Sensor systems

37
Distributed DBS Promises
  • Transparent management of distributed,
    fragmented, and replicated data (fragmentation
    divides a large table into several smaller
    tables)
  • Transparency
  • The separation of the higher level semantics of a
    system from the lower level implementation issues
  • Data transparency
  • The applications do not know how the data are
    represented physically
  • Location transparency
  • The applications do not know what are locations
    of the required data
  • Replication transparency
  • The applications do not know whether the required
    data are replicated or not

38
Distributed DBS Promises
  • Fragmentation transparency
  • The applications do not know whether the required
    data are fragmented or not
  • Horizontal fragmentation selection
  • Vertical fragmentation projection
  • Hybrid (both horizontal and vertical)
  • Improved reliability/availability through
    distributed transactions and data replication
  • Improved performance (locality) (get the data at
    its local site)
  • Easier and more economical system expansion
    (distribution)

39
Transparent Access
40
Distributed Database User View
What are the advantages of defining a DDBS in
this way?
41
Distributed DBS - Reality
42
Distributed DBS Issues
  • Distributed Database Design
  • How to distribute the database (fragments)
  • Replicated non-replicated database distribution
  • A related problem in directory management
    (central vs. distributed)
  • Query Processing
  • Convert transactions to data manipulation
    instructions
  • Optimization problem (query optimization)
  • Mincost data transmission local processing

43
Distributed DBS Issues
  • Concurrency Control and Deadlock Resolution
  • Synchronization of concurrent accesses
  • Consistency and isolation of transactions
    effects
  • Deadlock management (I.e., in Two Phase Locking)
  • Reliability Recovery (commitment, logging and
    checkpointing)
  • How to make the system resilient to failures
  • Atomicity and durability

44
Relationship Between Issues
Transaction Processing
45
Related Issues
  • Operating System Support
  • Operating system with proper support for database
    operations
  • Open Systems and Interoperability
  • Distributed Multi-database Systems
  • More portable scenario

46
Database Technology Timeline
Simple Data Management
Global Enterprise Management
Early 80s
Late 80s
Early - Mid 90s
Late 90s - 21st C
EarlyRelational
Client-server Relational
Enterprise -capable Relational
Internet Computing
Pre- relational
Packaged Vertical Applications
Data Warehouse Hi-end OLTP
Simple OLTP
Active Database
Middleware (messaging, queues, events) Java,
CORBA, Web interfaces
Scaleable OLTP, parallel query, partitioning,
cluster support, row-level locking, high
availability
Simple transactions, on-line backup recovery
Support for all types of data, extensibility,
objects
Stored procedures, triggers
47
Current State of DDBSs
  • These applications require
  • Large users/transactions
  • High performance
  • High availability (7x24 operations)
  • Scalability
  • High levels of security
  • Administrative support
  • Good utilities

48
References
  • Ozsu Ch1, Ch2 (overview)
  • Ceri Ch1
Write a Comment
User Comments (0)
About PowerShow.com