Distributed Data Storage and Parallel Processing Engine - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Data Storage and Parallel Processing Engine

Description:

10Gb/s inter-site connection on CiscoWave. 2Gb/s inter-rack connection ... Drive-by problem: visit a web site and get comprised by malware. MalStone-A: compute ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 34
Provided by: gu13125
Category:

less

Transcript and Presenter's Notes

Title: Distributed Data Storage and Parallel Processing Engine


1
Distributed Data Storage and Parallel Processing
Engine
Sector Sphere
Yunhong Gu Univ. of Illinois at Chicago
2
What is Sector/Sphere?
  • Sector Distributed File System
  • Sphere Parallel Data Processing Engine (generic
    MapReduce)
  • Open source software, GPL/BSD, written in C.
  • Started since 2006, current version 1.23
  • http//sector.sf.net

3
Overview
  • Motivation
  • Sector
  • Sphere
  • Experimental Results

4
Motivation
Super-computer model Expensive, data IO
bottleneck
Sector/Sphere model Inexpensive, parallel data
IO, data locality
5
Motivation
Parallel/Distributed Programming with MPI,
etc. Flexible and powerful. But too complicated
Sector/Sphere model (cloud model) Clusters are a
unity to the developer, simplified programming
interface. Limited to certain data parallel
applications.
6
Motivation
Systems for single data centers Requires
additional effort to locate and move data.
Sector/Sphere model Support wide-area data
collection and distribution.
7
Sector Distributed File System
User account Data protection System Security
Metadata Scheduling Service provider
System access tools App. Programming Interfaces
Security Server
Masters
Clients
SSL
SSL
Data
UDT Encryption optional
slaves
slaves
Storage and Processing
8
Sector Distributed File System
  • Sector stores files on the native/local file
    system of each slave node.
  • Sector does not split files into blocks
  • Pro simple/robust, suitable for wide area, fast
    and flexible data processing
  • Con users need to handle file size properly
  • The master nodes maintain the file system
    metadata. No permanent metadata is needed.
  • Topology aware

9
Sector Performance
  • Data channel is set up directly between a slave
    and a client
  • Multiple active-active masters (load balance),
    starting from 1.24
  • UDT is used for high speed data transfer
  • UDT is a high performance UDP-based data transfer
    protocol.
  • Much faster than TCP over wide area

10
UDT UDP-based Data Transfer
  • http//udt.sf.net
  • Open source UDP based data transfer protocol
  • With reliability control and congestion control
  • Fast, firewall friendly, easy to use
  • Already used in many commercial and research
    software

11
Sector Fault Tolerance
  • Sector uses replications for better reliability
    and availability
  • Replicas can be made either at write time
    (instantly) or periodically
  • Sector supports multiple active-active masters
    for high availability

12
Sector Security
  • Sector uses a security server to maintain user
    account and IP access control for masters,
    slaves, and clients
  • Control messages are encrypted
  • not completely finished in the current version
  • Data transfer can be encrypted as an option
  • Data transfer channel is set up by rendezvous, no
    listening server.

13
Sector Tools and API
  • Supported file system operation ls, stat, mv,
    cp, mkdir, rm, upload, download
  • Wild card characters supported
  • System monitoring sysinfo.
  • C API list, stat, move, copy, mkdir, remove,
    open, close, read, write, sysinfo.
  • FUSE

14
Sphere Simplified Data Processing
  • Data parallel applications
  • Data is processed at where it resides, or on the
    nearest possible node (locality)
  • Same user defined functions (UDF) are applied on
    all elements (records, blocks, or files)
  • Processing output can be written to Sector files
    or sent back to the client
  • Generalized Map/Reduce

15
Sphere Simplified Data Processing
for each file F in (SDSS datasets) for each
image I in F findBrownDwarf(I, )
SphereStream sdss sdss.init("sdss
files") SphereProcess myproc myproc-gtrun(sdss,"f
indBrownDwarf", ) myproc-gtread(result)
findBrownDwarf(char image, int isize, char
result, int rsize)
16
Sphere Data Movement
  • Slave -gt Slave Local
  • Slave -gt Slaves (Shuffle/Hash)
  • Slave -gt Client

17
Sphere/UDF vs. MapReduce
  • Record Offset Index
  • UDF
  • Hashing / Bucket
  • -
  • UDF
  • -
  • Parser / Input Reader
  • Map
  • Partition
  • Compare
  • Reduce
  • Output Writer

18
Sphere/UDF vs. MapReduce
  • Sphere is more straightforward and flexible
  • UDF can be applied directly on records, blocks,
    files, and even directories
  • Native binary data support
  • Sorting is required by Reduce, but it is optional
    in Sphere
  • Sphere uses PUSH model for data movement, faster
    than the PULL model used by MapReduce

19
Why Sector doesnt Split Files?
  • Certain applications need to process a whole file
    or even directory
  • Certain legacy applications need a file or a
    directory as input
  • Certain applications need multiple inputs, e.g.,
    everything in a directory
  • In Hadoop, all blocks would have to be moved to
    one node for processing, hence no data locality
    benefit.

20
Load Balance
  • The number of data segments is much more than the
    number of SPEs. When an SPE completes a data
    segment, a new segment will be assigned to the
    SPE.
  • Data transfer is balanced across the system to
    optimize network bandwidth usage.

21
Fault Tolerance
  • Map failure is recoverable
  • If one SPE fails, the data segment assigned to it
    will be re-assigned to another SPE and be
    processed again.
  • Reduce failure is unrecoverable
  • In small-medium systems, machine failure during
    run time is rare
  • If necessary, developers can split the input into
    multiple sub-tasks to reduce the cost of reduce
    failure.

22
Open Cloud Testbed
  • 4 Racks in Baltimore (JHU), Chicago (StarLight
    and UIC), and San Diego (Calit2)
  • 10Gb/s inter-site connection on CiscoWave
  • 2Gb/s inter-rack connection
  • Two dual-core AMD CPU, 12GB RAM, 1TB single disk
  • Will be doubled by Sept. 2009.

23
Open Cloud Testbed
24
The TeraSort Benchmark
  • Data is split into small files, scattered on all
    slaves
  • Stage 1 On each slave, an SPE scans local files,
    sends each record to a bucket file on a remote
    node according to the key.
  • Stage 2 On each destination node, an SPE sort
    all data inside each bucket.

25
TeraSort
100 bytes record
Stage 2 Sort each bucket on local node
10-byte
90-byte
Value
Key
Bucket-0
Bucket-0
Bucket-1
Bucket-1
10-bit
0-1023
Stage 1 Hash based on the first 10 bits
Bucket-1023
Bucket-1023
26
Performance Results TeraSort
Run time seconds Sector v1.16 vs Hadoop 0.17
Data Size Sphere Hadoop (3 replicas) Hadoop (1 replica)
UIC 300GB 1265 2889 2252
UIC StarLight 600GB 1361 2896 2617
UIC StarLight Calit2 900GB 1430 4341 3069
UIC StarLight Calit2 JHU 1.2TB 1526 6675 3702
27
Performance Results TeraSort
  • Sorting 1.2TB on 120 nodes
  • Sphere Hash Local Sort 981sec 545sec
  • Hadoop 3702/6675 seconds
  • Sphere Hash
  • CPU 130 MEM 900MB
  • Sphere Local Sort
  • CPU 80 MEM 1.4GB
  • Hadoop CPU 150 MEM 2GB

28
The MalStone Benchmark
  • Drive-by problem visit a web site and get
    comprised by malware.
  • MalStone-A compute the infection ratio of each
    site.
  • MalStone-B compute the infection ratio of each
    site from the beginning to the end of every week.

http//code.google.com/p/malgen/
29
MalStone
Text Record
Event ID Timestamp Site ID Compromise Flag
Entity ID 00000000005000000043852268954353585368
2008-11-08 175652.42264038572689543536285991
000000497829
Transform
Stage 2 Compute infection rate for each merchant
Site ID
Time
Flag
Key
Value
site-000X
site-000X
3-byte
site-001X
site-001X
000-999
Stage 1 Process each record and hash into
buckets according to site ID
site-999X
site-999x
30
Performance Results MalStone
Process 10 billions records on 20 OCT nodes
(local).
MalStone-A MalStone-B
Hadoop 454m 13s 840m 50s
Hadoop Streaming/Python 87m 29s 142m 32s
Sector/Sphere 33m 40s 43m 44s
Courtesy of Collin Bennet and Jonathan Seidman
of Open Data Group.
31
System Monitoring (Testbed)
32
System Monitoring (Sector/Sphere)
33
For More Information
  • Sector/Sphere code docs http//sector.sf.net
  • Open Cloud Consortium http//www.opencloudconsort
    ium.org
  • NCDM http//www.ncdm.uic.edu
Write a Comment
User Comments (0)
About PowerShow.com