Prototyping SDSQL Server: a Scalable Distributed Database System - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Prototyping SDSQL Server: a Scalable Distributed Database System

Description:

WDAS Workshop, Santa Clara, CA, January 5th. 3. Most of DBSs have distributed/parallel versions ... WDAS Workshop, Santa Clara, CA, January 5th. 10. An SD-SQL ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 44
Provided by: rim74
Category:

less

Transcript and Presenter's Notes

Title: Prototyping SDSQL Server: a Scalable Distributed Database System


1
Prototyping SD-SQL Server a Scalable
Distributed Database System
Soror SAHRI Witold LITWIN
Thomas Schwarz Soror.sahri_at_dauphine.fr
Witold.litwin_at_dauphine.fr tjschwarz_at_scu.edu
Ceria Laboratory
Comp. Eng. Dep.
Santa Clara U.
2
Overview
  • Introduction
  • Overall Architecture
  • Application Interface
  • Implementation
  • Performance
  • Conclusion

3
Introduction Architecture
Issue Solution
  • Most of DBSs have distributed/parallel versions
  • SQL Server, Oracle, DB2
  • DBSs do not provide dynamically scalable tables.
  • All require manual repartitioning when tables
    scale-up.

A Scalable Distributed Database System SD-DBS
4
Introduction Architecture
Issue SolutionSolution
  • Applies SDDS technology to DBSs
  • Provide many scalable distributed partitioning
    schemes.
  • LH, RP, k-RP, LHRS
  • These schemes can serve as the basis for SD-DBS
    architecture

5
SDDS Technology for DBSsBasic Design Constraints
  • SDDS
  • Key tuple-at-the-time based access
  • Search and insert
  • Forwarding
  • Access to system internals
  • SD-DBS
  • Rich assertional SQL queries
  • No access under the cover
  • Application interface has to be used
  • No forwarding
  • Not built-in in any major DBMS

6
SD-SQL Server
  • A prototype SD-DBS
  • Runs on SQL Server
  • Uses linked SQL Server nodes
  • Shared Nothing Architecture
  • Client, Server Peer SDDS nodes
  • Up to 250 nodes at present
  • Uses updatable distributed partitioned views
  • SDDS client image
  • Uses AFTER triggers
  • To monitor local tables
  • To split locally overflowing ones

7
SD-SQL Server Architecture
8
SD-SQL Server ArchitectureNodes, SDBs, NDBs
  • SD-SQL Server is a collection of distributed
    SD-SQL Server nodes.
  • Linked SQL Server Nodes
  • An SD-SQL Server Node carries node databases
    (NDBs)
  • SQL Server DBs
  • Elements of some SDBs
  • A dynamic collection of NDBs with the same name
    forms a Scalable (Distributed) Database (SDB)
  • Created at some node with one local NDB
  • Becoming primary NDB and node for the SDB

9
SD-SQL Server Architecture
10
SD-SQL Server ArchitectureNodes, SDBs, NDBs
  • An SD-SQL Server NDB is
  • Client NDB
  • Carries only images
  • Interfaces applications
  • Server NDB
  • Carries only the segments
  • Peer NDB
  • Both functions
  • Primary NDB
  • First created for an SDB
  • Carries SDB meta-data
  • Can be server or peer NDB only

11
SD-SQL Server ArchitectureNodes, SDBs, NDBs
  • An SD-SQL Server node is
  • Peer Node
  • Carries any NDBs
  • Client Node
  • Carries only Client NDBs
  • Server Node
  • Carries only Server NDBs
  • No application interface
  • Primary Node
  • First ever created
  • By a script
  • Can only be server or peer node
  • Carries the meta-DB (MDB)

12
SD-SQL Server ArchitectureNodes, SDBs, NDBs
  • sd_create_node Dell1 / Primary node
    created by script /
  • sd_create_node Dell2 / Server by default /
  • sd_create_node Dell3, client
  • sd_create_node Ceria1,peer
  • sd_alter_ node Dell3, ADD server /
    Becomes peer/
  • sd_create_scalable_database SkyServer, Dell1
    / Creates the primary SkyServer NDB as well
    at Dell1/
  • sd_create_node_database SkyServer, Dell3,
    client

13
SD-SQL Server ArchitectureScalable Table
Creation
  • An SDB contains scalable (distributed) tables
  • Created by the sd_create_table command
  • Issued to client or peer NDB
  • sd_create_table PhotoObj (objid BIGINT PRIMARY
    KEY), 10000

14
SD-SQL Server Architecture Scalable Table
Images
  • A scalable (distributed) table is a collection of
    segments hidden behind images
  • Scalable SQL Server distributed updatable
    partitioned views of the segments
  • Union-all views
  • Using Lazy Schema Validation option
  • Primary image
  • Created by SD-SQL Server at the table creation
  • Resides at the creation node
  • Client or peer NDB where the command was issued
  • In the current prototype
  • Has the name of the scalable table
  • Secondary images
  • Created later
  • by sd_create_image command
  • Reside at other client or peer NDBs of the SDB
  • Have a specific name, other than that of the
    table
  • To avoid name conflict

15
Images
CREATE VIEW PhotoObj AS SELECT FROM
N1.DB1.PhotoObj UNION ALL SELECT FROM
N2.DB1.PhotoObj UNION ALL SELECT FROM
N3.DB1.PhotoObj
16
SD-SQL Server ArchitectureScalable Table
Segments
  • Segments are SQL tables
  • Initially, the table has only one primary segment
  • At some server or peer node
  • Peer node could be the table creation node
  • Splits produce the other segments
  • Each is located at a different NDB
  • Within the SDB
  • If there is not enough NDBs, splits dynamically
    append new ones
  • A split occurs when an insert overflows the
    segment capacity
  • Measured in of tuples
  • At present all segments of a table have the same
    capacity
  • Segments may be indexed
  • By segments of SD-SQL Server scalable indexes

17
SD-SQL Server ArchitectureScalable Table
Split
  • A single insert may overflow
  • One segment by one tuple
  • Tuple insert split
  • Produces half-half split appending a single new
    segment
  • One segment by any number of tuples
  • Bulk insert single segment split
  • Appends one or several new segments
  • Each new segment is 50 loaded
  • Splitting segment is at least 50 - 100 loaded
  • Several segments, each by any number of tuples
  • Multiple segment split
  • Appends one or several new segments
  • Each new segment is 50 loaded
  • Each splitting segment is at least 50 - 100
    loaded

18
SD-SQL Server Architecture Scalable Table
Split
  • Splits are range partitioned
  • With respect to the partition key
  • Must be a key attribute (SQL Server restriction)
  • 1st key attribute (SD-SQL Server default)
  • Any other key attribute (user defined in
    sd_create_table command)
  • E.g., foreign key
  • Split generate SQL Server check constraints
  • Whenever the table has several segments
  • The constraints fix the range of key for each
    segment

19
Tuple insert split
Check Constraint?
b1
b
20
Bulk Insert Single Segment Split
(a) Initially (b) After the insert (c) After
the split
21
Multiple Segment Split
22
SD-SQL Server Architecture Scalable Table
Image Adjustment
  • Splits do not manipulate images
  • A split makes all existing images outdated
  • The existing distributed partitioned views do not
    address any new segments
  • Image correctness is checked when a query
    addressing the image comes in
  • Before SD-SQL Server executes the query
  • Image is adjusted if needed
  • New view is produced

23
SD-SQL Server Command Interface
  • The application manipulates scalable tables
    through SD-SQL Server commands.
  • These start with sd_.... to distinguish from SQL
    Server commands for static tables
  • Command types
  • Creation sd_create_node SDB, NDB, table,
    image, index
  • Alteration sd_alter_node, sd_alter_table
  • Removal sd_drop_node
  • Search queries sd_select
  • includes sd_select into
  • Creating a scalable table
  • Update queries sd_insert, sd_update, sd_ delete

24
SD-SQL Server Command Interface
  • Every command is implemented as SQL Server stored
    procedure
  • Initially in MDB
  • Every standard SQL command has SD-SQL Server
    counterpart
  • With slightly different syntax, besides the sd_
    prefix
  • Brackets around standard SQL clauses, SD-SQL
    Server specific clauses
  • Performing some SD-SQL Server specific processing
  • Generating some SQL command
  • To image(s) or every segment
  • SD-SQL Server commands do not support some SQL
    Server specific clauses
  • Case Of for instance
  • SQL Server create view command does not have
    SD-SQL Server counterpart

25
SD-SQL Server Command Interface
  • sd_create_table Neighbors (htmid BIGINT, objid
    BIGINT, Neighborobjid BIGINT) ON PRIMARY KEY),
    500, objid
  • sd_alter_table PhotoObj ADD t INT, 1000
  • sd_create_index run_index ON Photoobj (run)
  • sd_create_image Ceria1, PhotoObj
  • sd_drop_image 'SD.Dell3_Photoobj
  • USE Skyserver / SQL Server command /
  • sd_insert INTO PhotoObj SELECT FROM
    Ceria5.Skyserver-S.PhotoObj
  • sd_select FROM PhotoObj
  • sd_select TOP 5000 INTO PhotoObj1 FROM
    PhotoObj, 500

26
SD-SQL Server Naming Rules
  • At each NDB, including MDB, SD-SQL Server has its
    own account named SD
  • For secondary images and segments
  • For the meta-tables
  • Otherwise SD-SQL Server uses the SQL Server
    public dbo account
  • For any primary image
  • Hence for every scalable table, for the
    applications
  • For SD-SQL Server stored procedures
  • commands etc.
  • SD-SQL Server does let scalable tables to be
    under user accounts
  • At present
  • E.g., table dell1.Skyserver.soror.photoObj can
    only be a static table

27
SD-SQL Server Naming Rules
  • Primary image of scalable table T has SQL Server
    name dbo.T in its NDB
  • Users at different NDBs may create different
    scalable tables T
  • Not at the same NDB
  • At every NDB, segment of T created at node N of
    the SDB bears the name SD._N_T.
  • At every NDB, secondary image of T created at
    node N of the SDB bears the name SD.N_T.
  • The rules avoid the name conflict
  • Between primary and secondary images and the
    segments of different scalable tables named T at
    their NDBs
  • Between SD-SQL Server objects and other SQL
    Server objects
  • Static tables and views

28
SD-SQL Server Meta-Tables
  • Contain various SD-SQL Server specific data in
    every NDB
  • Every server NDB (S-catalog)
  • SD.RP (SgmNd, CreatNd, Table)
  • Describes the actual partitioning of every
    scalable table with the primary segment at the
    NDB
  • SD.Size (CreatNd, Table, Size)
  • Contains the segment size for every scalable
    table at the NDB
  • SD.Primary (PrimNd, CreatNd, Table).
  • For every segment at the NDB, a tuple points
    towards the primary segment of the table the
    segment belongs to
  • SD.SDBNode (Node)
  • Points towards the primary NDB of the SDB.
  • SD.MDBNode (Node).
  • Points towards the primary node.

29
Scalable Tables
30
SD-SQL Client Meta-Tables
  • Client NDB (C-catalog)
  • SD.Image (Name, Type, PrimNd,Size)
  • registers all the local images
  • SD.Server (Node)
  • provides the server (peer) node(s) available for
    the primary segment of a table to create.
  • Contains only one tuple at present
  • May contain more
  • e.g., for the fault tolerance or load balancing.
  • SD.SDBNode (Node)
  • Points towards the primary NDB of the SDB.
  • SD.MDBNode (Node).
  • Points towards the primary node.

31
SD-SQL Peer al. Meta-Tables
  • Peer NDB (P-catalog)
  • C-catalog UNION S-catalog
  • MDB
  • SD.Nodes (Node, Type)
  • Each tuple registers an SD-SQL Server node
    currently forming the SD-SQL configuration.
  • Each primary NDB
  • SD.NDB (Node, NDBType).
  • Registers all the NDBs currently composing the
    SDB.
  • NDBType indicates whether the NDB is a peer,
    server or client.

32
SD-SQL Server Table Evolution
  • The split leaves the overflowing segment at
    least half full.
  • Every new segment ends up half full
  • To attain the typical load factor of almost 70 
  • Split processing tries to do not delay the commit
    of the insert triggering it
  • Splitting may be a relatively long operation
  • AFTER trigger tests the overflow
  • Asynchronous SQL Server job termed Splitter
    performs the split

33
SD-SQL Server Table Evolution
  • The allocation of nodes to new segments of a
    scalable table tries to randomly balance node
    loads among the clients and /or peers.
  • The splitting algorithm allocates nevertheless
    the same nodes to the successive segments of
    different scalable tables of the same client.
  • All this, to reduce query execution time
  • Usually the queries tend to address the tables of
    the same client

34
SD-SQL Server Table Evolution
  • Concurrent execution of the split and of the
    scalable queries is efficient and serializable.
  • A concurrent scalable query that addresses the
    tuples in an overflowing segment
  • either manipulates them before the split migrates
    out any of them
  • or manipulates them only when the split is over

35
SD-SQL Server Table Evolution
  • SD SQL Server processes every command as a
    distributed transaction at Repeatable Read
    isolation level
  • Splits use exclusive locks on RP and segments
  • Shared locks on other meta-tables
  • E.g. SD.Primary
  • Scalable queries use basically shared locks on
    RP, Image and any other table involved
  • See details in the paper
  • Creation of new segment scheme
  • Keys
  • Check Constraint calculus
  • Indexing new segments
  • Serializability analysis
  • Deadlocks
  • Etc.

36
Concurrent Split Processing
X
Exclusive Lock
attente
Shared Lock
X
Exclusive Lock
Exclusive Lock
37
SD-SQL Server Image Processing
  • Image Checking Adjustment
  • Compares Image meta-table and RP
  • Expected / Actual of segments of the table
  • Recreates the distributed partitioned view if
    needed
  • Updates Image
  • Image Binding
  • Finds whether a name in FROM clause depends on a
    scalable table
  • The name can be a view name or a table name
  • A view may depend on a view etc.
  • Processing parses the query and goes recursively
    through
  • Image table
  • SQL Server system tables
  • sysobjects and sysdepends
  • Et the end, it determines all the image names
    involved and checks upon each of them

38
Experimental Performance Analysis
  • To determine the SD-SQL Server processing
    efficiency
  • On P4 1.8 GHz PCs with 1 Gbs local net.
  • Use of the SkyServer BD as benchmark
    http//research.microsoft.com/gray/SDSS
  • Use of the PhotoObj table as a scalable table.
  • PhotoObj has 158,426 tuples (about 260 MB)

39
Experiments
(Q) SELECT COUNT () FROM PhotoObj
Execution time of (Q) on SQL Server and SD-SQL
Server
40
Conclusion
  • Scalable distributed databases with scalable
    tables are now a reality with SD-SQL Server
  • No more manual repartitioning
  • Unlike in any other DBS we know about
  • See the Related Work in the paper
  • The performance analysis proves
  • Efficiency of our design
  • Immediate utility of SD-SQL Server
  • Future Work
  • Quite a lot
  • Our system is only the proof-of-the-concept
  • See the paper

41
Thank You
42
D1_T
43
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com