An Overview of a Scalable Distributed Database System: SDSQL Server - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

An Overview of a Scalable Distributed Database System: SDSQL Server

Description:

Witold LITWIN, Soror SAHRI & Thomas SCHWARZ ... Most DBSs have distributed/parallel ... An SD-DBS reuses SDDS design principles. With DB management specificity ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 55
Provided by: irc72
Category:

less

Transcript and Presenter's Notes

Title: An Overview of a Scalable Distributed Database System: SDSQL Server


1
An Overview of a Scalable Distributed Database
System SD-SQL Server
Witold LITWIN, Soror SAHRI Thomas SCHWARZ
Witold.litwin_at_dauphine.fr Soror.sahri_at_dauphine
.fr tjschwarz_at_scu.edu
Ceria Laboratory
Comp. Eng. Dep.
Paris-Dauphine University
Santa Clara U. BNCOD 2006
2
Overview
  • Introduction
  • Architecture
  • Command Interface
  • Processing
  • Performance
  • Conclusion Future Work

3
Partitioned Tables
  • Most DBSs have distributed/parallel versions with
    partitioned tables
  • SQL Server, Oracle, DB2, MySQL, Postgres

4
BENEFITS OF PARTITIONING
BENEFITS OF PARTITIONING Partitioning can
provide tremendous benefits to a wide variety of
applications by improving manageability,
performance, and availability. It is not unusual
for partitioning to improve the performance of
certain queries or maintenance operations by an
order of magnitude. Moreover, partitioning can
greatly simplify common administration
tasks. Partitioning also enables database
designers and administrators to tackle some
of the toughest problems posed by cutting-edge
applications. Partitioning is a key tool for
building multi-terabyte systems or systems with
extremely high availability requirements. Partitio
ning in Oracle Database 10g Release 2 An Oracle
White Paper May 2005
5
Hassle of Partitioning
  • DBSs require manual partitioning
  • And manual repartitioning when tables scale-up
  • DBSs do not provide dynamically scalable tables

6
Facts
  • http//ceria.dauphine.fr/CERIA-publications.html
  • Research Report, December 2005
  • Oracle Database 10g

7
SD-SQL Server Goal
Scalable Distributed Partitioning of
Relational Tables
Scalable Distributed Database System
SD-DBS
8
Role Model an SDDS
  • A scalable distributed data structure
  • Specifically designed for possibly very large
    data on multi-computers or networks of WSs
  • P2P Grids in modern vocabulary
  • Why SDDS Role Model?
  • Several SDDS schemes are well-known by now
  • LH, RP, k-RP, LHRSChord, VBI most of P2P
    schemes
  • The domain has over 20.000 references on Google
  • An SD-DBS reuses SDDS design principles
  • With DB management specificity

9
SD-SQL Server
  • The first and yet the only SD-DBS
  • Implements the SD-DBS architecture
  • Litwin, Schwartz Risch (2002)
  • Runs on Microsoft SQL Server 2000
  • Shared Nothing Architecture
  • Up to 250 nodes at present

10
Gross Architecture
D1_T
11
Nodes, SDBs NDBs
12
NDB Types
  • Client NDB
  • Interfaces applications users
  • Carries only images
  • No actual tables with application data
  • Server NDB
  • Carries actual tables
  • segments
  • Peer NDB
  • Both functions
  • Primary NDB
  • First for an SDB

13
Scalable (Distributed) Table
  • For the application a table of an SDB
  • Internally a collection of segments behind
    client images
  • A segment is an SQL table
  • One per NDB of the SDB
  • Sharing the scalable table scheme
  • Except its check constraint
  • Min and Max value of the partition key
  • With size limit
  • Splitting when overflows occur
  • The check constraints partition the partition key
    space

14
Scalable (Distributed) Table
  • The primary segment
  • First allocated for a new table
  • At some server or peer NDB of SDB
  • The peer creating the table
  • The primary server of the client creating the
    table

15
Gross Architecture
D1_T
16
Scalable (Distributed) Table
  • For the application, it the client image is the
    table
  • The image name is the table name
  • Primary image
  • Created during the scalable table creation
  • at the client or peer NDB creating the table
  • Secondary images
  • Created later on
  • On other NDBs of the SDB
  • For local applications
  • By a dedicated command
  • sd_Create Image.

17
Scalable (Distributed) Table
  • Internally, every image is a specific SQL Server
    view of the segments
  • Distributed partitioned union view
  • CREATE VIEW T AS SELECT FROM N2.DB1.SD._N1_T
    UNION ALL SELECT FROM N3.DB1.SD._N1_T
  • UNION ALL SELECT FROM N4.DB1.SD._N1_T
  • Updatable
  • Through the check constraints
  • With or without Lazy Schema Validation

18
Gross Architecture
D1_T
19
SD SQL Server Meta-Tables
  • Store various SD-SQL Server meta-data
  • In particular about each scalable table
  • At each server or peer NDB
  • SD.Size meta-table
  • Segment capacity
  • The number of stored tuples triggering a split
  • Same for every segment at present
  • SD.RP meta-table
  • the actual partitioning of the scalable table
  • The location of each segment
  • SD.Primary table
  • The location of the SD.RP table for each segment
    in the NDB

20
SD SQL Server Meta-Tables
  • At every client or peer NDB
  • In SD.Image table
  • All the local images
  • The name of the image
  • The type
  • Primary or secondary
  • The number of segments
  • As seen by an image
  • Not necessarily the actual one

21
SD SQL Server Meta-Tables
  • At every NDB
  • SD.SDBNode points towards the primary NDB
  • SD.MDBNode points towards the MDB
  • At MDB
  • SD.Nodes indicates all the available SD-SQL
    Server nodes
  • Over linked SQL Server nodes
  • SD.SDB describes all the SDBs
  • At every primary NDB
  • SD.NDB points to every NDB of the SDB

22
Scalable Tables Meta-data
T Scalable Table
SDBNode
Meta-Tables
23
Scalable Table Expansion
  • The number of segments in a scalable table may
    grow
  • An overflowing segment splits
  • Creating one or more new segments
  • A split occurs when an insert overflows the
    segment capacity
  • The trigger launches the split as an asynchronous
    job called splitter
  • To avoid the application level timeout

24
Gross Architecture
D1_T
25
Scalable Table Expansion
  • Every new segment
  • Is basically created at an existing NDB that does
    not yet have any segments of the expanding table
  • provided there is any
  • Otherwise a new NDB is first appended to SDB
  • Provided there is an available SD SQL Server node
  • Inherits the fathers schema
  • Gets its new check constraint
  • Gets indexes as defined at the father

26
Single Segment Split Single Tuple Insert
Check Constraint?
b1
b
SELECT TOP Pi INTO Ni.Si FROM S ORDER BY C ASC
SELECT TOP Pi WITH TIES INTO Ni.S1 FROM S ORDER
BY C ASC
27
Single Segment Split Bulk Insert
Single segment split
28
Multi-Segment Split Bulk Insert
Multi-segment split
29
Split with SDB Expansion
sd_create_node
sd_create_node_database
.
30
Images
  • An Image hides the scalable table partitioning
  • An image is an SQL Server distributed updateable
    partitioned view of the table
  • An SQL Server Union-all view with check
    constraints
  • An image resides on client or peer NDBs
  • All meta-data of an image are stored in the Image
    meta-table

31
Image Types
  • Primary image
  • Resides at the creation node
  • Has the name of the scalable table
  • Secondary images
  • Reside at other client or peer NDBs of the SDB
  • Have a specific name, other than that of the
    table
  • To avoid name conflict

32
Image Adjustment
  • The splits do not modify synchronously the images
  • Any split makes every image outdated
  • The client or peer verifies every image
    dynamically when a query to the image comes in
  • Image checking
  • Image adjustment if necessary

33
Image Adjustment
  • Get the number of segments presented in the
    image, N1
  • Get the number of segments of the scalable table,
    N2
  • Compare N1 and N2
  • If N1ltN2 then Image Adjustment
  • Alter the partitioned view definition

34
Image Example
DB1 SDB
T Image
T Scalable Table
CREATE VIEW T AS SELECT FROM N2.DB1.SD._N1_T
CREATE VIEW T AS SELECT FROM N2.DB1.SD._N1_T
UNION ALL SELECT FROM N3.DB1.SD._N1_T
UNION ALL SELECT FROM N4.DB1.SD._N1_T
35
Application Interface
  • The application interface manipulates scalable
    tables through SD-SQL Server commands
  • The SD-SQL Server commands start with sd_ to
    distinguish from SQL Server commands for static
    tables

36
Nodes Management
  • Node Creation
  • sd_create_node Dell1 / Server by default /
  • sd_create_node Ceria, client
  • Node Alteration
  • sd_alter_node Ceria, ADD server / Becomes
    peer/
  • Node Removal
  • sd_drop_node Ceria

37
SDB NDB Management
  • SDB Creation
  • sd_create_scalable_database
  • SkyServer, Dell1, Server, 2
  • / Creates the primary SkyServer NDB as well at
    Dell1/
  • SDB Alteration
  • sd_create_node_database SkyServer, Ceria,
    Client
  • SDB Removal
  • sd_drop_scalable_database SkyServer

38
Scalable Tables Management
  • Scalable Table Creation
  • sd_create_table PhotoObj (objid BIGINT PRIMARY
    KEY), 10000
  • No foreign keys yet
  • Scalable Table Alteration
  • sd_alter_table PhotoObj ADD t INT, 1000
  • sd_create_index run_index ON Photoobj (run)
  • sd_drop_index PhotoObj.run_index
  • Scalable Table Removal
  • sd_drop_table PhotoObj

39
Image Adjustment
  • Secondary Image Creation
  • sd_create_image Ceria, PhotoObj
  • sd_create_image Dell2, PhotoObj
  • Secondary Image Removal
  • sd_drop_image 'PhotoObj

40
Scalable View
  • A view of an image
  • Involving perhaps static tables
  • And perhaps static views
  • Declared under SD-SQL Server by the SQL Server
    CREATE VIEW command

41
Scalable Queries Management
  • USE SkyServer / SQL Server command /
  • Scalable Update Queries
  • sd_insert INTO PhotoObj SELECT FROM
    Ceria5.Skyserver-S.PhotoObj
  • Scalable Search Queries
  • sd_select FROM PhotoObj
  • sd_select TOP 5000 INTO PhotoObj1 FROM
    PhotoObj, 500

42
Command Processing
  • Let Q a scalable query using the PhotoObj image
  • sd_select COUNT () FROM PhotoObj

Find Images in Q
Check PhotoObj Image for Correctness
Adjust PhotoObj Image if needed
Send Q to SQL Server for Execution
43
Concurrency
  • SD-SQL Server processes every command as SQL
    distributed transaction at Repeatable Read
    isolation level
  • Tuple level locks
  • Shared locks
  • Exclusive 2PL locks
  • Much less blocking than the Serializable Level

44
Concurrency
  • Splits use exclusive locks on segments and tuples
    in RP meta-table.
  • Shared locks on other meta-tables Primary, NDB
    meta-tables
  • Scalable queries use basically shared locks on
    meta-tables and any other table involved
  • All the conccurent executions can be shown
    serializable

45
Concurrency Example
X
Exclusive Lock
Waiting
Shared Lock
X
Exclusive Lock
Exclusive Lock
46
Experimental Environment
  • 6 Machines Pentium IV 1.7 GHz
  • RAM 780 Mb 1 Gb
  • Operating System Windows 2K Server
  • Ethernet Network max bandwidth of 1 Gb/s
  • Use of SQL Analyzer for editing queries
  • Use of SQL Profiler to take measurements

47
The SkyServer Benchmark
  • We use SkyServer database as benchmark
  • Provided and installed at Ceria by Dr. Gray
  • SkyServer brings the entire database of the Sloan
    Digital Sky Survey, SDSS
  • We use of the PhotoObj table as an example
    scalable table
  • In our experiments, PhotoObj has almost 159 K
    tuples (about 260 MB)
  • Originally, it has 14 M tuples

48
Split Time
Splitting PhotoObj with 160 k tuples into 25
segments, according to segment capacity
49
Split Time
Splitting PhotoObj with 160 k tuples and indexes
into 2 5 segments according to segment capacity
50
Split Time Analysis
  • Longer split time may timeout a query put on wait
  • Future solution Incremental Splitting
  • The splitter moves tuples by an increment at a
    time
  • Let us say 1000 tuples
  • Then ends up by calling upon itself
  • The query may proceed as the splitter releases
    the exclusive lock on the RP tuple
  • The process continues for next increment etc as
    long as there are tuples to move

51
Image Adjustment
(Q) sd_select COUNT () FROM PhotoObj
Query (Q1) execution time
52
Scalable View Processing
(Q) sd_select COUNT () FROM Ti
53
SD-SQL Server / SQL Server
  • (Q) sd_select COUNT () FROM PhotoObj

Execution time of (Q) on SQL Server and SD-SQL
Server
54
Conclusion
  • Scalable tables are now a reality
  • with SD-SQL Server
  • No more manual repartitioning
  • Unlike in any other DBS we know about
  • Performance analysis proves
  • Efficiency of our design
  • Immediate utility of SD-SQL Server

55
Future Works
  • SQL Server 2005 portage
  • Incremental splits
  • Virtual repository of eGov documents
  • SQL Server XML View
  • Foreign keys for scalable tables
  • More performance measurements
  • Skyserver other benchmarks
  • Error processing
  • High availability
  • Parity segments
  • Application to other DBMSs
  • Oracle, DB2, etc.

56
Thank you for your attentionWork performed
between 2003 -2006 Partly founded
byMsResearchEEC Icons ProjectEEC E-Gov Project
Write a Comment
User Comments (0)
About PowerShow.com