Advanced data management - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Advanced data management

Description:

Make distributed system as easy to use and manage as a ... DEC or SGI or Sequent 16x nodes. Shared Disk Cluster. an array of nodes. all shared common disks ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 63
Provided by: jiahe
Category:

less

Transcript and Presenter's Notes

Title: Advanced data management


1
Advanced data management
  • Jiaheng Lu
  • Department of Computer Science
  • Renmin University of China
  • www.jiahenglu.net

2
Cloud computing
3
(No Transcript)
4
Distributed system
5
OutlineConcepts and Terminology
  • What is Distributed
  • Distributed data objects
  • Distributed execution
  • Three tier architectures
  • Transaction concepts

6
Whats a Distributed System?
  • Centralized
  • everything in one place
  • stand-alone PC or Mainframe
  • Distributed
  • some parts remote
  • distributed users
  • distributed execution
  • distributed data

7
Transparency in Distributed Systems
  • Make distributed system as easy to use and manage
    as a centralized system
  • Give a Single-System Image
  • Location transparency
  • hide fact that object is remote
  • hide fact that object has moved
  • hide fact that object is partitioned or
    replicated
  • Name doesnt change if object is replicated,
    partitioned or moved.

8
Naming- The basics
  • Objects have
  • Globally Unique Identifier (GUIDs)
  • location(s) address(es)
  • name(s)
  • addresses can change
  • objects can have many names
  • Names are context dependent
  • (Jim _at_ KGB Jim _at_ CIA)
  • Many naming systems
  • UNC \\node\device\dir\dir\dir\object
  • Internet http//node.domain.root/dir/dir/dir/obje
    ct
  • LDAP ldap//ldap.domain.root/oorg,cUS,cndir

guid
James
9
Name Serversin Distributed Systems
North
  • Name servers translate names context to
    address ( GUID)
  • Name servers are partitioned (subtrees of name
    space)
  • Name servers replicate root of name tree
  • Name servers form a hierarchy
  • Distributed data from hell
  • high read traffic
  • high reliability availability
  • autonomy

root
Northern names
South
root
Southern names
10
Autonomy in Distributed Systems
  • Owner of site (or node, or application, or
    database)Wants to control it
  • If my part is working , must be able to access
    manage it (reorganize, upgrade, add user,)
  • Autonomy is
  • Essential
  • Difficult to implement.
  • Conflicts with global consistency
  • examples naming, authentication, admin

11
Security The Basics
  • Authentication server subject Authenticator gt
    (Yes token) No
  • Security matrix
  • who can do what to whom
  • Access control list is column of matrix
  • who is authenticated ID
  • In a distributed system, who and what and
    whom are distributed objects

12
Security in Distributed Systems
  • Security domain nodes with a shared security
    server.
  • Security domains can have trust relationships
  • A trusts B A believes B when it says this is
    Jim_at_B
  • Security domains form a hierarchy.
  • Delegation passing authority to a server when A
    asks B to do something (e.g. print a file, read a
    database)B may need As authority
  • Autonomy requires
  • each node is an authenticator
  • each node does own security checks
  • Internet Today
  • no trust among domains (fire walls, many
    passwords)
  • trust based on digital signatures

13
Clusters The Ideal Distributed System.
  • Cluster is distributed system BUT single
  • location
  • manager
  • security policy
  • relatively homogeneous
  • communications is
  • high bandwidth
  • low latency
  • low error rate
  • Clusters use distributed system techniques for
  • load distribution
  • storage
  • execution
  • growth
  • fault tolerance

14
Cluster Shared What?
  • Shared Memory Multiprocessor
  • Multiple processors, one memory
  • all devices are local
  • DEC or SGI or Sequent 16x nodes
  • Shared Disk Cluster
  • an array of nodes
  • all shared common disks
  • VAXcluster Oracle
  • Shared Nothing Cluster
  • each device local to a node
  • ownership may change
  • Tandem, SP2, Wolfpack

15
OutlineConcepts and Terminology
  • Why Distribute
  • Distributed data objects
  • Partitioned
  • Replicated
  • Distributed execution
  • Three tier architectures
  • Transaction concepts

16
Partitioned Data Break file into disjoint groups
Orders
  • Exploit data access locality
  • Put data near consumer
  • Less network traffic
  • Better response time
  • Better availability
  • Owner controls data autonomy
  • Spread Load
  • data or traffic may exceed single store

N.A. S.A. Europe Asia
17
How to Partition Data?
  • How to Partition
  • by attribute or
  • random or
  • by source or
  • by use
  • Problem to find it must have
  • Directory (replicated) or
  • Algorithm
  • Encourages attribute-based partitioning

N.A. S.A. Europe Asia
18
Naming- The basics
  • Objects have
  • Globally Unique Identifier (GUIDs)
  • location(s) address(es)
  • name(s)
  • addresses can change
  • objects can have many names
  • Names are context dependent
  • (Jim _at_ KGB Jim _at_ CIA)
  • Many naming systems
  • UNC \\node\device\dir\dir\dir\object
  • Internet http//node.domain.root/dir/dir/dir/obje
    ct
  • LDAP ldap//ldap.domain.root/oorg,cUS,cndir

guid
James
19
Autonomy in Distributed Systems
  • Owner of site (or node, or application, or
    database)Wants to control it
  • If my part is working , must be able to access
    manage it (reorganize, upgrade, add user,)
  • Autonomy is
  • Essential
  • Difficult to implement.
  • Conflicts with global consistency
  • examples naming, authentication, admin

20
Security The Basics
  • Authentication server subject Authenticator gt
    (Yes token) No
  • Security matrix
  • who can do what to whom
  • Access control list is column of matrix
  • who is authenticated ID
  • In a distributed system, who and what and
    whom are distributed objects

21
Security in Distributed Systems
  • Security domain nodes with a shared security
    server.
  • Security domains can have trust relationships
  • A trusts B A believes B when it says this is
    Jim_at_B
  • Security domains form a hierarchy.
  • Delegation passing authority to a server when A
    asks B to do something (e.g. print a file, read a
    database)B may need As authority
  • Autonomy requires
  • each node is an authenticator
  • each node does own security checks
  • Internet Today
  • no trust among domains (fire walls, many
    passwords)
  • trust based on digital signatures

22
Clusters The Ideal Distributed System.
  • Cluster is distributed system BUT single
  • location
  • manager
  • security policy
  • relatively homogeneous
  • communications is
  • high bandwidth
  • low latency
  • low error rate
  • Clusters use distributed system techniques for
  • load distribution
  • storage
  • execution
  • growth
  • fault tolerance

23
Cluster Shared What?
  • Shared Memory Multiprocessor
  • Multiple processors, one memory
  • all devices are local
  • DEC or SGI or Sequent 16x nodes
  • Shared Disk Cluster
  • an array of nodes
  • all shared common disks
  • VAXcluster Oracle
  • Shared Nothing Cluster
  • each device local to a node
  • ownership may change
  • Tandem, SP2, Wolfpack

24
Partitioned Data Break file into disjoint groups
Orders
  • Exploit data access locality
  • Put data near consumer
  • Less network traffic
  • Better response time
  • Better availability
  • Owner controls data autonomy
  • Spread Load
  • data or traffic may exceed single store

N.A. S.A. Europe Asia
25
How to Partition Data?
  • How to Partition
  • by attribute or
  • random or
  • by source or
  • by use
  • Problem to find it must have
  • Directory (replicated) or
  • Algorithm
  • Encourages attribute-based partitioning

N.A. S.A. Europe Asia
26
Replicated DataPlace fragment at many sites
  • Pros
  • Improves availability
  • Disconnected (mobile) operation
  • Distributes load
  • Reads are cheaper
  • Cons
  • N times more updates
  • N times more storage
  • Placement strategies
  • Dynamic cache on demand
  • Static place specific

Catalog
27
Updating Replicated Data
  • When a replica is updated, how do changes
    propagate?
  • Master copy, many slave copies (SQL Server)
  • always know the correct value (master)
  • change propagation can be
  • transactional
  • as soon as possible
  • periodic
  • on demand
  • Symmetric, and anytime (Access)
  • allows mobile (disconnected) updates
  • updates propagated ASAP, periodic, on demand
  • non-serializable
  • colliding updates must be reconciled.
  • hard to know real value

28
OutlineConcepts and Terminology
  • Why Distribute
  • Distributed data objects
  • Partitioned
  • Replicated
  • Distributed execution
  • remote procedure call
  • queues
  • Three tier architectures
  • Transaction concepts

29
Distributed ExecutionThreads and Messages
  • Thread is Execution unit(software analog of
    cpumemory)
  • Threads execute at a node
  • Threads communicate via
  • Shared memory (local)
  • Messages (local and remote)

messages
30
Peer-to-Peer or Client-Server
  • Peer-to-Peer is symmetric
  • Either side can send
  • Client-server
  • client sends requests
  • server sends responses
  • simple subset of peer-to-peer

request
response
31
Connection-less or Connected
  • Connected (sessions)
  • open - request/reply - close
  • client authenticated once
  • Messages arrive in order
  • Can send many replies (e.g. FTP)
  • Server has client context (context sensitive)
  • e.g. Winsock and ODBC
  • HTTP adding connections
  • Connection-less
  • request contains
  • client id
  • client context
  • work request
  • client authenticated on each message
  • only a single response message
  • e.g. HTTP, NFS v1

32
Remote Procedure Call The key to transparency
  • Object may be local or remote
  • Methods on object work wherever it is.
  • Local invocation

33
Remote Procedure Call The key to transparency
  • Remote invocation

y pObj-gtf(x)
f()
return val
y val
34
Object Request Broker (ORB)
  • Registers Servers
  • Manages pools of servers
  • Connects clients to servers
  • Does Naming, request-level authorization,
  • Provides transaction coordination (new feature)
  • Old names
  • Transaction Processing Monitor,
  • Web server,
  • NetWare

Object-Request Broker
35
OutlineConcepts and Terminology
  • Why Distributed
  • Distributed data objects
  • Distributed execution
  • remote procedure call
  • queues
  • Three tier architectures
  • what
  • why
  • Transaction concepts

36
Client/Server Interactions All can be done with
RPC
C
S
  • Request-Response response may be many messages
  • Conversational server keeps client context
  • Dispatcherthree-tier complex operation at
    server
  • Queuedde-couples client from serverallows
    disconnected operation

C
S
S
S
C
S
S
C
S
37
Queued Request/Response
  • Time-decouples client and server
  • Three Transactions
  • Almost real time, ASAP processing
  • Communicate at each others convenienceAllows
    mobile (disconnected) operation
  • Disk queues survive client server failures

Client
Server
38
Why Queued Processing?
  • Prioritize requestsambulance dispatcher favors
    high-priority calls
  • Manage Workflows
  • Deferred processing in mobile apps

39
Google Cloud computing techniques
40
The Google File System
41
The Google File System (GFS)
  • A scalable distributed file system for large
    distributed data intensive applications
  • Multiple GFS clusters are currently deployed.
  • The largest ones have
  • 1000 storage nodes
  • 300 TeraBytes of disk storage
  • heavily accessed by hundreds of clients on
    distinct machines

42
Introduction
  • Shares many same goals as previous distributed
    file systems
  • performance, scalability, reliability, etc
  • GFS design has been driven by four key
    observation of Google application workloads and
    technological environment

43
Intro Observations 1
  • 1. Component failures are the norm
  • constant monitoring, error detection, fault
    tolerance and automatic recovery are integral to
    the system
  • 2. Huge files (by traditional standards)
  • Multi GB files are common
  • I/O operations and blocks sizes must be revisited

44
Intro Observations 2
  • 3. Most files are mutated by appending new data
  • This is the focus of performance optimization and
    atomicity guarantees
  • 4. Co-designing the applications and APIs
    benefits overall system by increasing flexibility

45
The Design
  • Cluster consists of a single master and multiple
    chunkservers and is accessed by multiple clients

46
The Master
  • Maintains all file system metadata.
  • names space, access control info, file to chunk
    mappings, chunk (including replicas) location,
    etc.
  • Periodically communicates with chunkservers in
    HeartBeat messages to give instructions and check
    state

47
The Master
  • Helps make sophisticated chunk placement and
    replication decision, using global knowledge
  • For reading and writing, client contacts Master
    to get chunk locations, then deals directly with
    chunkservers
  • Master is not a bottleneck for reads/writes

48
Chunkservers
  • Files are broken into chunks. Each chunk has a
    immutable globally unique 64-bit chunk-handle.
  • handle is assigned by the master at chunk
    creation
  • Chunk size is 64 MB
  • Each chunk is replicated on 3 (default) servers

49
Clients
  • Linked to apps using the file system API.
  • Communicates with master and chunkservers for
    reading and writing
  • Master interactions only for metadata
  • Chunkserver interactions for data
  • Only caches metadata information
  • Data is too large to cache.

50
Chunk Locations
  • Master does not keep a persistent record of
    locations of chunks and replicas.
  • Polls chunkservers at startup, and when new
    chunkservers join/leave for this.
  • Stays up to date by controlling placement of new
    chunks and through HeartBeat messages (when
    monitoring chunkservers)

51
Operation Log
  • Record of all critical metadata changes
  • Stored on Master and replicated on other machines
  • Defines order of concurrent operations
  • Changes not visible to clients until they
    propigate to all chunk replicas
  • Also used to recover the file system state

52
System Interactions Leases and Mutation Order
  • Leases maintain a mutation order across all chunk
    replicas
  • Master grants a lease to a replica, called the
    primary
  • The primary choses the serial mutation order, and
    all replicas follow this order
  • Minimizes management overhead for the Master

53
System Interactions Leases and Mutation Order
54
Atomic Record Append
  • Client specifies the data to write GFS chooses
    and returns the offset it writes to and appends
    the data to each replica at least once
  • Heavily used by Googles Distributed
    applications.
  • No need for a distributed lock manager
  • GFS choses the offset, not the client

55
Atomic Record Append How?
  • Follows similar control flow as mutations
  • Primary tells secondary replicas to append at the
    same offset as the primary
  • If a replica append fails at any replica, it is
    retried by the client.
  • So replicas of the same chunk may contain
    different data, including duplicates, whole or in
    part, of the same record

56
Atomic Record Append How?
  • GFS does not guarantee that all replicas are
    bitwise identical.
  • Only guarantees that data is written at least
    once in an atomic unit.
  • Data must be written at the same offset for all
    chunk replicas for success to be reported.

57
Replica Placement
  • Placement policy maximizes data reliability and
    network bandwidth
  • Spread replicas not only across machines, but
    also across racks
  • Guards against machine failures, and racks
    getting damaged or going offline
  • Reads for a chunk exploit aggregate bandwidth of
    multiple racks
  • Writes have to flow through multiple racks
  • tradeoff made willingly

58
Chunk creation
  • created and placed by master.
  • placed on chunkservers with below average disk
    utilization
  • limit number of recent creations on a
    chunkserver
  • with creations comes lots of writes

59
Detecting Stale Replicas
  • Master has a chunk version number to distinguish
    up to date and stale replicas
  • Increase version when granting a lease
  • If a replica is not available, its version is not
    increased
  • master detects stale replicas when a chunkservers
    report chunks and versions
  • Remove stale replicas during garbage collection

60
Garbage collection
  • When a client deletes a file, master logs it like
    other changes and changes filename to a hidden
    file.
  • Master removes files hidden for longer than 3
    days when scanning file system name space
  • metadata is also erased
  • During HeartBeat messages, the chunkservers send
    the master a subset of its chunks, and the
    master tells it which files have no metadata.
  • Chunkserver removes these files on its own

61
Fault ToleranceHigh Availability
  • Fast recovery
  • Master and chunkservers can restart in seconds
  • Chunk Replication
  • Master Replication
  • shadow masters provide read-only access when
    primary master is down
  • mutations not done until recorded on all master
    replicas

62
Fault ToleranceData Integrity
  • Chunkservers use checksums to detect corrupt data
  • Since replicas are not bitwise identical,
    chunkservers maintain their own checksums
  • For reads, chunkserver verifies checksum before
    sending chunk
  • Update checksums during writes

63
QA Thanks
Write a Comment
User Comments (0)
About PowerShow.com