Title: An Overview of a Scalable Distributed Database System: SDSQL Server
1An Overview of a Scalable Distributed Database
System SD-SQL Server
Witold LITWIN, Soror SAHRI Thomas SCHWARZ
Witold.litwin_at_dauphine.fr Soror.sahri_at_dauphine
.fr tjschwarz_at_scu.edu
Ceria Laboratory
Comp. Eng. Dep.
Paris-Dauphine University
Santa Clara U. BNCOD 2006
2Overview
- Introduction
- Architecture
- Command Interface
- Processing
- Performance
- Conclusion Future Work
3Partitioned Tables
- Most DBSs have distributed/parallel versions with
partitioned tables - SQL Server, Oracle, DB2, MySQL, Postgres
4BENEFITS OF PARTITIONING
BENEFITS OF PARTITIONING Partitioning can
provide tremendous benefits to a wide variety of
applications by improving manageability,
performance, and availability. It is not unusual
for partitioning to improve the performance of
certain queries or maintenance operations by an
order of magnitude. Moreover, partitioning can
greatly simplify common administration
tasks. Partitioning also enables database
designers and administrators to tackle some
of the toughest problems posed by cutting-edge
applications. Partitioning is a key tool for
building multi-terabyte systems or systems with
extremely high availability requirements. Partitio
ning in Oracle Database 10g Release 2 An Oracle
White Paper May 2005
5Hassle of Partitioning
- DBSs require manual partitioning
- And manual repartitioning when tables scale-up
- DBSs do not provide dynamically scalable tables
6Facts
- http//ceria.dauphine.fr/CERIA-publications.html
- Research Report, December 2005
- Oracle Database 10g
7SD-SQL Server Goal
Scalable Distributed Partitioning of
Relational Tables
Scalable Distributed Database System
SD-DBS
8Role Model an SDDS
- A scalable distributed data structure
- Specifically designed for possibly very large
data on multi-computers or networks of WSs - P2P Grids in modern vocabulary
- Several SDDS schemes are well-known by now
- LH, RP, k-RP, LHRSChord, VBI most of P2P
schemes - The domain has over 20.000 references on Google
- An SD-DBS reuses SDDS design principles
- With DB management specificity
9SD-SQL Server
- The first and yet the only SD-DBS
- Implements the SD-DBS architecture
- Litwin, Schwartz Risch (2002)
- Runs on Microsoft SQL Server 2000
- Shared Nothing Architecture
- Up to 250 nodes at present
10Gross Architecture
D1_T
11Nodes, SDBs NDBs
12NDB Types
- Client NDB
- Interfaces applications users
- Carries only images
- No actual tables with application data
- Server NDB
- Carries actual tables
- segments
- Peer NDB
- Both functions
- Primary NDB
- First for an SDB
13Scalable (Distributed) Table
- For the application a table of an SDB
- Internally a collection of segments behind
client images - A segment is an SQL table
- One per NDB of the SDB
- Sharing the scalable table scheme
- Except its check constraint
- Min and Max value of the partition key
- With size limit
- Splitting when overflows occur
- The check constraints partition the partition key
space
14Scalable (Distributed) Table
- The primary segment
- First allocated for a new table
- At some server or peer NDB of SDB
- The peer creating the table
- The primary server of the client creating the
table
15Gross Architecture
D1_T
16Scalable (Distributed) Table
- For the application, it the client image is the
table - The image name is the table name
- Primary image
- Created during the scalable table creation
- at the client or peer NDB creating the table
- Secondary images
- Created later on
- On other NDBs of the SDB
- For local applications
- By a dedicated command
- sd_Create Image.
17Scalable (Distributed) Table
- Internally, every image is a specific SQL Server
view of the segments - Distributed partitioned union view
- CREATE VIEW T AS SELECT FROM N2.DB1.SD._N1_T
UNION ALL SELECT FROM N3.DB1.SD._N1_T - UNION ALL SELECT FROM N4.DB1.SD._N1_T
- Updatable
- Through the check constraints
- With or without Lazy Schema Validation
18Gross Architecture
D1_T
19SD SQL Server Meta-Tables
- Store various SD-SQL Server meta-data
- In particular about each scalable table
- At each server or peer NDB
- SD.Size meta-table
- Segment capacity
- The number of stored tuples triggering a split
- Same for every segment at present
- SD.RP meta-table
- the actual partitioning of the scalable table
- The location of each segment
- SD.Primary table
- The location of the SD.RP table for each segment
in the NDB
20SD SQL Server Meta-Tables
- At every client or peer NDB
- In SD.Image table
- All the local images
- The name of the image
- The type
- Primary or secondary
- The number of segments
- As seen by an image
- Not necessarily the actual one
21SD SQL Server Meta-Tables
- At every NDB
- SD.SDBNode points towards the primary NDB
- SD.MDBNode points towards the MDB
- At MDB
- SD.Nodes indicates all the available SD-SQL
Server nodes - Over linked SQL Server nodes
- SD.SDB describes all the SDBs
- At every primary NDB
- SD.NDB points to every NDB of the SDB
22Scalable Tables Meta-data
T Scalable Table
SDBNode
Meta-Tables
23Scalable Table Expansion
- The number of segments in a scalable table may
grow - An overflowing segment splits
- Creating one or more new segments
- A split occurs when an insert overflows the
segment capacity - The trigger launches the split as an asynchronous
job called splitter - To avoid the application level timeout
24Gross Architecture
D1_T
25Scalable Table Expansion
- Every new segment
- Is basically created at an existing NDB that does
not yet have any segments of the expanding table - provided there is any
- Otherwise a new NDB is first appended to SDB
- Provided there is an available SD SQL Server node
- Inherits the fathers schema
- Gets its new check constraint
- Gets indexes as defined at the father
26Single Segment Split Single Tuple Insert
Check Constraint?
b1
b
SELECT TOP Pi INTO Ni.Si FROM S ORDER BY C ASC
SELECT TOP Pi WITH TIES INTO Ni.S1 FROM S ORDER
BY C ASC
27Single Segment Split Bulk Insert
Single segment split
28Multi-Segment Split Bulk Insert
Multi-segment split
29Split with SDB Expansion
sd_create_node
sd_create_node_database
.
30Images
- An Image hides the scalable table partitioning
- An image is an SQL Server distributed updateable
partitioned view of the table - An SQL Server Union-all view with check
constraints - An image resides on client or peer NDBs
- All meta-data of an image are stored in the Image
meta-table
31Image Types
- Primary image
- Resides at the creation node
- Has the name of the scalable table
- Secondary images
- Reside at other client or peer NDBs of the SDB
- Have a specific name, other than that of the
table - To avoid name conflict
32Image Adjustment
- The splits do not modify synchronously the images
- Any split makes every image outdated
- The client or peer verifies every image
dynamically when a query to the image comes in - Image checking
- Image adjustment if necessary
33Image Adjustment
- Get the number of segments presented in the
image, N1 - Get the number of segments of the scalable table,
N2 - Compare N1 and N2
- If N1ltN2 then Image Adjustment
- Alter the partitioned view definition
34Image Example
DB1 SDB
T Image
T Scalable Table
CREATE VIEW T AS SELECT FROM N2.DB1.SD._N1_T
CREATE VIEW T AS SELECT FROM N2.DB1.SD._N1_T
UNION ALL SELECT FROM N3.DB1.SD._N1_T
UNION ALL SELECT FROM N4.DB1.SD._N1_T
35Application Interface
- The application interface manipulates scalable
tables through SD-SQL Server commands - The SD-SQL Server commands start with sd_ to
distinguish from SQL Server commands for static
tables -
36Nodes Management
- Node Creation
- sd_create_node Dell1 / Server by default /
- sd_create_node Ceria, client
- Node Alteration
- sd_alter_node Ceria, ADD server / Becomes
peer/ - Node Removal
- sd_drop_node Ceria
37SDB NDB Management
- SDB Creation
- sd_create_scalable_database
- SkyServer, Dell1, Server, 2
- / Creates the primary SkyServer NDB as well at
Dell1/ - SDB Alteration
- sd_create_node_database SkyServer, Ceria,
Client - SDB Removal
- sd_drop_scalable_database SkyServer
38Scalable Tables Management
- Scalable Table Creation
- sd_create_table PhotoObj (objid BIGINT PRIMARY
KEY), 10000 - No foreign keys yet
- Scalable Table Alteration
- sd_alter_table PhotoObj ADD t INT, 1000
- sd_create_index run_index ON Photoobj (run)
- sd_drop_index PhotoObj.run_index
- Scalable Table Removal
- sd_drop_table PhotoObj
39Image Adjustment
- Secondary Image Creation
- sd_create_image Ceria, PhotoObj
- sd_create_image Dell2, PhotoObj
- Secondary Image Removal
- sd_drop_image 'PhotoObj
40Scalable View
- A view of an image
- Involving perhaps static tables
- And perhaps static views
-
- Declared under SD-SQL Server by the SQL Server
CREATE VIEW command
41Scalable Queries Management
- USE SkyServer / SQL Server command /
- Scalable Update Queries
- sd_insert INTO PhotoObj SELECT FROM
Ceria5.Skyserver-S.PhotoObj - Scalable Search Queries
- sd_select FROM PhotoObj
- sd_select TOP 5000 INTO PhotoObj1 FROM
PhotoObj, 500
42Command Processing
- Let Q a scalable query using the PhotoObj image
- sd_select COUNT () FROM PhotoObj
Find Images in Q
Check PhotoObj Image for Correctness
Adjust PhotoObj Image if needed
Send Q to SQL Server for Execution
43Concurrency
- SD-SQL Server processes every command as SQL
distributed transaction at Repeatable Read
isolation level - Tuple level locks
- Shared locks
- Exclusive 2PL locks
- Much less blocking than the Serializable Level
44Concurrency
- Splits use exclusive locks on segments and tuples
in RP meta-table. - Shared locks on other meta-tables Primary, NDB
meta-tables - Scalable queries use basically shared locks on
meta-tables and any other table involved - All the conccurent executions can be shown
serializable
45Concurrency Example
X
Exclusive Lock
Waiting
Shared Lock
X
Exclusive Lock
Exclusive Lock
46Experimental Environment
- 6 Machines Pentium IV 1.7 GHz
- RAM 780 Mb 1 Gb
- Operating System Windows 2K Server
- Ethernet Network max bandwidth of 1 Gb/s
- Use of SQL Analyzer for editing queries
- Use of SQL Profiler to take measurements
47The SkyServer Benchmark
- We use SkyServer database as benchmark
- Provided and installed at Ceria by Dr. Gray
- SkyServer brings the entire database of the Sloan
Digital Sky Survey, SDSS - We use of the PhotoObj table as an example
scalable table - In our experiments, PhotoObj has almost 159 K
tuples (about 260 MB) - Originally, it has 14 M tuples
48Split Time
Splitting PhotoObj with 160 k tuples into 25
segments, according to segment capacity
49Split Time
Splitting PhotoObj with 160 k tuples and indexes
into 2 5 segments according to segment capacity
50Split Time Analysis
- Longer split time may timeout a query put on wait
- Future solution Incremental Splitting
- The splitter moves tuples by an increment at a
time - Let us say 1000 tuples
- Then ends up by calling upon itself
- The query may proceed as the splitter releases
the exclusive lock on the RP tuple - The process continues for next increment etc as
long as there are tuples to move
51Image Adjustment
(Q) sd_select COUNT () FROM PhotoObj
Query (Q1) execution time
52Scalable View Processing
(Q) sd_select COUNT () FROM Ti
53SD-SQL Server / SQL Server
- (Q) sd_select COUNT () FROM PhotoObj
-
Execution time of (Q) on SQL Server and SD-SQL
Server
54Conclusion
- Scalable tables are now a reality
- with SD-SQL Server
- No more manual repartitioning
- Unlike in any other DBS we know about
- Performance analysis proves
- Efficiency of our design
- Immediate utility of SD-SQL Server
55Future Works
- SQL Server 2005 portage
- Incremental splits
- Virtual repository of eGov documents
- SQL Server XML View
- Foreign keys for scalable tables
- More performance measurements
- Skyserver other benchmarks
- Error processing
- High availability
- Parity segments
- Application to other DBMSs
- Oracle, DB2, etc.
56Thank you for your attentionWork performed
between 2003 -2006 Partly founded
byMsResearchEEC Icons ProjectEEC E-Gov Project