Hypertable

About This Presentation

Title:

Hypertable

Description:

High random insert, update, and delete rate. hypertable.org. Data Model ... Deletes are carried out by inserting 'delete' records. CellStore ... – PowerPoint PPT presentation

Number of Views:101

Avg rating:3.0/5.0

Slides: 38

Provided by: dou144

Category:

more less

Transcript and Presenter's Notes

Title: Hypertable

1
Hypertable

Doug Judd
Zvents, Inc.

2
Background
3
Web 2.0 Data Explosion
Web 2.0
Web 1.0
Web 2.0
Web 1.0
4
Traditional ToolsDont Scale Well

Designed for a single machine
Typical scaling solutions
ad-hoc
manual/static resource allocation

5
The Google Stack

Google File System (GFS)
Map-reduce
Bigtable

6
Architectural Overview
7
What is Hypertable?

A open source high performance, scalable
database, modelled after Google's Bigtable
Not relational
Does not support transactions

8
Hypertable Improvements Over Traditional RDBMS

Scalable
High random insert, update, and delete rate

9
Data Model

Sparse, two-dimensional table with cell versions
Cells are identified by a 4-part key
Row
Column Family
Column Qualifier
Timestamp

10
Table Visual Representation
11
Table Actual Representation
12
Anatomy of a Key

Row key is \0 terminated
Column Family is represented with 1 byte
Column qualifier is \0 terminated
Timestamp is stored big-endian ones-compliment

13
Concurrency

Bigtable uses copy-on-write
Hypertable uses a form of MVCC(multi-version
concurrency control)
Deletes are carried out by inserting delete
records

14
CellStore

Sequence of 65K blocks of compressed key/value
pairs

15
System Overview
16
Range Server

Manages ranges of table data
Caches updates in memory (CellCache)
Periodically spills (compacts) cached updates to
disk (CellStore)

17
Client API
class Client void create_table(const String
name, const String
schema) Table open_table(const String
name) String get_schema(const String
name) void get_tables(vectorltStringgt
tables) void drop_table(const String name,
bool if_exists)
18
Client API (cont.)
class Table TableMutator create_mutator()
TableScanner create_scanner(ScanSpec
scan_spec) class TableMutator void
set(KeySpec key, const void value, int
value_len) void set_delete(KeySpec key)
void flush() class TableScanner bool
next(CellT cell)
19
Language Bindings

Currently C only
Thrift Broker

20
Write Ahead Commit Log

Persists all modifications (inserts and deletes)
Written into underlying DFS

21
Range Meta-Operation Log

Facilitates Range meta operation
Loads
Splits
Moves
Part of Master and RangeServer
Ensures Range state and location consistency

22
Compression

Cell Stores store compressed blocks of key/value
pairs
Commit Log stores compressed blocks of updates
Supported Compression Schemes
zlib (--best and --fast)
lzo
quicklz
bmz
none

23
Caching

Block Cache
Caches CellStore blocks
Blocks are cached uncompressed
Query Cache
Caches query results
TBD

24
Bloom Filter

Negative Cache
Probabilistic data structure
Indicates if key is not present

25
Scaling (part I)
26
Scaling (part II)
27
Scaling (part III)
28
Access Groups

Provides control of physical data layout --
hybrid row/column oriented
Improves performance by minimizing I/OCREATE
TABLE crawldb Title MAX_VERSIONS3, Content
MAX_VERSIONS3, PageRank MAX_VERSIONS10,
ClickRank MAX_VERSIONS10, ACCESS GROUP default
(Title, Content), ACCESS GROUP ranking
(PageRank, ClickRank)

29
Filesystem Broker Architecture

Hypertable can run on top of any distributed
filesystem (e.g. Hadoop, KFS, etc.)

30
Keys To Performance

C
Asynchronous communication

31
C vs. Java

Hypertable is CPU intensive
Manages large in-memory key/value map
Alternate compression codecs (e.g. BMZ)
Hypertable is memory intensive
Java uses 2-3 times the amount of memory to
manage large in-memory map (e.g. TreeMap)
Poor processor cache performance

32
Performance Test(AOL Query Logs)

75,274,825 inserted cells
8 node cluster
1 1.8 GHz Dual-core Opteron
4 GB RAM
3 x 7200 RPM SATA drives
Average row key 7 bytes
Average value 15 bytes
Replication factor 3
4 simultaneous insert clients
500K random inserts/s
680K scanned cells/s

33
Performance Test II

Simulated AOL query log data
1TB data
9 node cluster
1 2.33 GHz quad-core Intel
16 GB RAM
3 x 7200 RPM SATA drives
Average row key 9 bytes
Average value 18 bytes
Replication factor 3
4 simultaneous insert clients
Over 1M random inserts/s (sustained)

34
Weaknesses

Range data managed by a single range server
Though no data loss, can cause periods of
unavailability
Can be mitigated with client-side cache or
memcached

35
Project Status

Currently in alpha
Just released version 0.9.0.7
Will release beta version end of August
Waiting on Hadoop JIRA 1700

36
License

GPL 2.0
Why not Apache?

37
Questions?

www.hypertable.org

Write a Comment

User Comments (0)

About PowerShow.com

Hypertable - PowerPoint PPT Presentation

Hypertable

High random insert, update, and delete rate. hypertable.org. Data Model ... Deletes are carried out by inserting 'delete' records. CellStore ... – PowerPoint PPT presentation