C-Store: Class Overview Spring, 2009 - PowerPoint PPT Presentation

Loading...

PPT – C-Store: Class Overview Spring, 2009 PowerPoint presentation | free to download - id: 520ffd-OWUzM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

C-Store: Class Overview Spring, 2009

Description:

C-Store: Class Overview Spring, 2009 Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009 C-Store: A Column-Oriented DBMS Instructor: Jianlin Feng ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 27
Provided by: ssSysuE
Learn more at: http://ss.sysu.edu.cn
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: C-Store: Class Overview Spring, 2009


1
C-Store Class OverviewSpring, 2009
  • Jianlin Feng
  • School of Software
  • SUN YAT-SEN UNIVERSITY
  • Feb 27, 2009

2
C-Store A Column-Oriented DBMS
  • Instructor Jianlin Feng (???)
  • Office Lab Center B111
  • Teaching Friday (2-3 and 4-5), D202.
  • Teaching Style
  • Try to present the Basic Ideas in a clear and
    unified manner
  • Be your guide if you like
  • Email fengjl9_at_gmail.com

3
C-Store Class Motivation
  • We are doing Software!!!
  • A database management system (DBMS) is computer
    software that manages databases.
  • 3 Turing Award Winners since 1966
  • Oracle, DB2, SQl Server
  • Wanna be a Software Architect?
  • Not a Naïve Coder
  • Learning from top software developers
  • Learning from open source code
  • Understanding System Design and Implementation
    Better

4
C-Stores FatherMichael Stonebraker
  • A former Professor at Berkeley,
  • an Adjunct Professor at M.I.T.
  • ACM Software System Award, 1988
  • INGRES, developed by undergraduates
  • POSTGRES, Mariposa, C-Store
  • ACM SIGMOD Innovation Award, 1994
  • National Academy of Engineering , 1998

5
(No Transcript)
6
C-Store The Home Pagehttp//db.lcs.mit.edu/proje
cts/cstore/
  • C-Store A Column-Oriented DBMS
  • download-Source code
  • overview-Project description
  • papers-Publications
  • people-Who are we?
  • The CStore project is a collaboration between
    MIT, Yale, Brandeis University. Brown University,
    and UMass Boston .
  • Commercialized C-Store Vertica

7
Course Work Assignments, and Course Project
  • Reading papers
  • Each student will be individually responsible for
    writing up a short summary of every paper.
  • Reading source codes
  • Team work
  • 5 students
  • Some related project as you like,
  • Or specified by Instructor
  • Doing presentation

8
An example summary
  • LRVM (Satyanarayanan, et al.)
  • Good points
  • 1) Providing an abstraction of a greatly needed
    behavior (transactions) makes system code
    implementation much easier this stuff is useful.
  • 2) Returns to UNIX mentality of small and simple
    building blocks.
  • 3) Performance analysis (Rmem/Pmem) very
    applicable to stated domain (fs metadata).
  • Bad points
  • 1) It would have been nice if they had explicitly
    stated that set-range can be called multiple
    times within a transaction they only comment on
    it in 5.2 when discussing optimizations (for
    overlapping region specification).
  • 2) It's unclear why the throughputs are almost
    equivalent for sequential access even though
    their CPU utilization is much different. This
    seems to contradict their scalability concern, as
    it would seem both systems are IO bound as
    opposed to to CPU bound given the rate of CPU
    improvement, IO would seem to be the greater
    concern. Of course, it's still good that the very
    simple RVM performs better.

9
The Starting Point
  • C-Store A Column Oriented DBMS
  • Mike Stonebraker, Daniel Abadi, Adam Batkin,
    Xuedong Chen, Mitch Cherniack, Miguel Ferreira,
    Edmond Lau, Amerson Lin, Sam Madden, Elizabeth
    O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan
    Zdonik.
  • VLDB, pages 553-564, 2005.

10
C-Store the Column Store Project
  • Row Store or Column Store ?

Column 1
Column 2
Column 3
Record 1
Record 2
Record 3
Relation or Tables
11
Example of a Relation
12
The History Relational Model
  • Codd, E.F. (1970). "A Relational Model of Data
    for Large Shared Data Banks". Communications of
    the ACM 13 (6) 377387.
  • Physical Data Independence
  • Row Store Vs. Column Store on the same Conceptual
    Model Relation

13
Row Store Why?
  • OLTP (On-Line Transaction Processing)
  • ATM, POS in supermarkets
  • Characteristics of OLTP applications
  • Transactions that involve small numbers of
    records (or tuples)
  • Frequent updates (including queries)
  • Many users
  • Fast response times
  • OLTP Needs Write-Optimized Row Store.
  • Insert and delete a record in one physical write.

14
Row Store Columns Stored Together
Data
Rid (i,N)
Page i
Rid (i,2)
Rid (i,1)
N
Pointer to start of free space
20
16
24
N . . . 2 1
slots
Slot Array
SLOT DIRECTORY
  • Record id ltpage id, slot gt

15
Current DBMS Gold Standard
  • Store Columns in one record contiguously on disk
  • Use B-tree indexing
  • Use small (e.g. 4K) disk blocks
  • Align fields on byte or word boundaries
  • Conventional (row-oriented) query optimizer and
    executor (technology from 1979)
  • Aries-style transactions

16
From OLTP to OLAP and Data Warehouse
  • OLAP (On-Line Analytical Processing, Codd, 1993)
  • Flexible Reporting for Business Intelligence
  • Characteristics of OLAP applications
  • Transactions that involve large numbers of
    records
  • Frequent Ad-hoc queries and Infrequent updates
  • A few decision making users
  • Fast response times
  • Data warehouses are designed to facilitate
    reporting and analysis.
  • Read-Mostly

17
A Flavor of OLAP Data Cube(Jim Gray, 1996)
18
Data Cube vs. Star Schema
19
Data Warehouse Architecture
20
Other Read-Mostly Applications
  • CRM (Customer Relationship Management )
  • Siebel (Oracle)
  • Catalog Search in Electronic Commerce
  • Amazon.com
  • Shopping.com

21
Column Store Why?
  • The Intuition Only read relevant columns
  • Say, Ad-hoc queries read 2 columns out of 20
  • Column Store is not a new idea
  • Sybase IQ (early 90s, bitmap index)
  • Addamark (i.e., SenSage, for Event Log data
    warehouse)
  • MonetDB (Hyper-Pipelining Query Execution,
    CIDR05)

22
C-Store Technical Ideas
  • Logical Data Model Relational Model
  • Column Store
  • Only Materialized Views on Each Relation (perhaps
    many)
  • Active Data Compression
  • Column-Oriented Query Executor and Optimizer
  • Shared Nothing Architecture
  • Replication-Based Concurrency Control and Recovery

23
How to Evaluate The C-Store Paper
  • None of the ideas in isolation merit publication
  • Judge the complete system by its (hopefully
    intelligent) choice of
  • Small collection of inter-related powerful ideas
  • That together put performance in a new sandbox

24
Architecture of C-Store (Vertica)On a Single Node
25
C-Store code base version 0.2
  • http//db.lcs.mit.edu/projects/cstore/cstore0.2.ta
    r.gz
  • runs on Linux x86 computers
  • Tested on RedHat Linux
  • This code compiles on old versions BerkeleyDB and
    gcc.
  • BerkeleyDB.4.2
  • LZO version 1 (http//www.oberhumer.com/opensource
    /lzo/)

26
References
  • Mike Stonebraker, Daniel Abadi, Adam Batkin,
    Xuedong Chen, Mitch Cherniack, Miguel Ferreira,
    Edmond Lau, Amerson Lin, Sam Madden, Elizabeth
    O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan
    Zdonik. C-Store A Column Oriented DBMS VLDB,
    pages 553-564, 2005.
  • VERTICA DATABASE TECHNICAL OVERVIEW WHITE PAPER.
    http//www.vertica.com/php/pdfgateway?fileVertica
    ArchitectureWhitePaper.pdf
  • http//www.sensage.com/English/Products/Event_Data
    _Warehouse.html
About PowerShow.com