BEST USECASE FOR USING CASSANDRA AS EVENTLOGGIN - PowerPoint PPT Presentation

About This Presentation
Title:

BEST USECASE FOR USING CASSANDRA AS EVENTLOGGIN

Description:

CONFERENCE MATERIAL – PowerPoint PPT presentation

Number of Views:14
Slides: 24
Provided by: Username withheld or not provided
Category:
Tags:

less

Transcript and Presenter's Notes

Title: BEST USECASE FOR USING CASSANDRA AS EVENTLOGGIN


1
CASSANDRA for BigData Event loggingRamesh
Veeramani
2
PROBLEM SCENARIO
  • Event logs and Permission logs Storage ,
    Retrieval and Maintenance poor performance and
    scalability issue
  • Currently HBase and Oracle DB used.
  • Need for better storage and query capability
  • Data are immutable.
  • Eventual consistency acceptable.
  • New Infra to scale to 1 PB data.

3
Option n Cassandra
  • Why Cassandra ?
  • How with Cassandra?
  • Internal working
  • Setup
  • Tools
  • Maintenance
  • Cassandra constraints
  • Comparison

4
Why Cassandra
  • Scales Incrementally
  • Highly Available
  • Uses Peer Peer topology instead of naive
    Master slave.
  • Good for OLTP and for data changes.
  • Availability a high priority
  • Write Throughput higher than Read Throughput.
  • Lot like Relational model easy to develop.
  • Well established since 2008

5
CAP THEOREM
6
Cassandra - Internal
  • Tokens and Hashing
  • Virtual nodes
  • Allows for equal CPU utilization for all the
    server in case of removing and adding nodes.
  • Token ID assignments automatic.
  • Configured in Cassandra.yaml num_nodes 256
  • Replication
  • Gossip / Snitch
  • Read and Write ?
  • Compaction Strategy.
  • Add and Remove nodes

7
TOKEN AND HASHING
8
REPLICATION
9
Constraints with Replication
  • Consistency an issue.
  • Aims for Eventual Consistency.
  • Read Write Possible with strict consistency.
  • Configurable Consistency to different level
  • ConfigurationLevel All, Quorum, One
  • Discretion of the Co-Ordinator node to enforces
    Replication and CL
  • There is possibility of stale data in production
  • Operation effort to synchronize the data
    (nodetool repair ltnodegt)
  • Synchonizes the data on each node is timely
    operation to be done

10
Gossip /Snitch
  • Cassandra uses Gossip Protocol instead of naïve
    ping or other comm protocol
  • Gossip is epidemic and probabilistic protocol .
  • Gossip is not deterministic.

11
Why write are faster
12
Why read are relatively slower.
  • Retrieving of rows and columns from the
    datastore.
  • If all columns present in the MemTable. The
    results are returned.
  • If data not found control pass through the
    SSTable in order of entry.
  • Bloom filter expedites the search in SSTable.
  • BF is a Hash table Datastructure signifying if
    criteria in the SSTable

13
COMPACTION STATERGY
  • Four compaction strategy
  • Size
  • Data
  • Time
  • Level Recommended for read intensive workload

14
SETUP
  • 2 NODE EACH
  • processor Intel(R) Xeon(R) CPU E5-2683 v4 _at_
    2.10GHz
  • processor CPU
  • memory 8188MiB System Memory
  • memory 8188MiB DIMM RAM
  • CentOS Linux release 7.5.1804 (Core)
  • sudo vim /etc/yum.repos.d/cassandra.repo 
  • cassandra
  • nameApache Cassandra
  • baseurlhttps//www.apache.org/dist/cassandra/redh
    at/311x/
  • gpgcheck1
  • repo_gpgcheck1
  • gpgkeyhttps//www.apache.org/dist/cassandra/KEYS
  • yum -y install cassandra
  • systemctl start cassandra
  • systemctl enable cassandra
  • sudo systemctl enable cassandra
  • nodetool status

15
CREATING A KEYSPACE. (A.K.A Db)
  • CREATE KEYSPACE events WITH
  • REPLICATION classSimpleStratergy,replicat
    ion_factor1

16
CREATE TABLE
  • CREATE TABLE events (
  • ID UUID,
  • USER_TYPE text,
  • ACCOUNT_ID text,
  • CLASS_NAME text,
  • CREATE_DATE timestamp,
  • PRIMARY KEY((ID),ACCOUNT_ID)
  • )

17
CREATING SASI (SECONDARY INDEX)
  • CREATE CUSTOM INDEX class_user ON events
    (class_name,user_type)
  • USING 'org.apache.cassandra.index.sasi.SASIIndex
  • WITH OPTIONS 'mode''contains
  • CREATE CUSTOM INDEX user_index ON events
  • (user_type)
  • USING 'org.apache.cassandra.index.sasi.SASIIndex
  • WITH OPTIONS 'mode''contains'

18
CREATING MATERIALIZED VIEW
  • CREATE MATERIALIZED VIEW MV AS
  • SELECT account_id, class_name,id,create_date,user
    _type from events2 where account_id'MYSPACE' AND
    user_type 'VIP
  • PRIMARY KEY ((user_type,class_name),account_id)
  • CREATE MATERIALIZED VIEW MV AS
  • SELECT account_id, class_name,id,create_date,user_
    type from events2 where account_id'MYSPACE' and
    user_type is not null and class_name is not null
    PRIMARY KEY ((user_type,class_name),account_id)

19
Benchmarking
  • A million record in 444 seconds when its a
    single threaded sequential.
  • A million record in 240 seconds for 2 concurrent
    threads that are sequential

20
Other benchmarking
21
Maintenance tools
  • calls python-based tool to query cassandra
    using CQL (Cassandra's query language)
  • cassandra-stress benchmarking tool
  • nodetool command line administration tool that
    uses JMX to get operational information from
    Cassandra nodes and to kick off administration
    tasks (repair, compaction, cleanup)
  • DSE OpsCenter DSE Visual monitoring with
    Enterprise license
  • Paid Monitoring

22
Conclusion
  • A good candidate for logging system
  • Easy to scale.
  • Native driver for PHP
  • Data model design to be given lot of thought
  • Data query to be known and designed as per
    application

23
Apache Ignite
Write a Comment
User Comments (0)
About PowerShow.com