Distributed Systems - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Distributed Systems

Description:

Edges often turned off, without permanent IP addresses, etc. ... (KaZaA, iMesh), Gnutella (LimeWire, BearShare), Overnet, BitTorrent, etc ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 73
Provided by: Xini6
Category:

less

Transcript and Presenter's Notes

Title: Distributed Systems


1
Chapter 11 Advanced Distributed Systems
2
P2P Computing
  • Def 1 A class of applications that take
    advantage of resources (e.g., storage, cycles,
    content) available at the edge of the
    Internet.
  • Edges often turned off, without permanent IP
    addresses, etc.
  • Def 2 A class of decentralized,
    self-organizing distributed systems, in which
    all or most communication is symmetric.
    (IPTPS02)
  • Lots of other definitions that fit in between

3
Applications Computing
  • Examples Seti_at_Home, UnitedDevices, Gnome_at_home,
    many others
  • Approach suitable for a particular class of
    problems.
  • Massive parallelism
  • Low bandwidth/computation ratio
  • Error tolerance, independence from solving a
    particular task
  • Problems
  • Centralized.
  • How to extend the model to problems that are not
    massively parallel?
  • Ability to operate in an environment with limited
    trust and dynamic resources

4
Applications File sharing
  • The killer application to date
  • Too many to list them all Napster, FastTrack
    (KaZaA, iMesh), Gnutella (LimeWire, BearShare),
    Overnet, BitTorrent, etc
  • Decentralized control
  • Building a (relatively) reliable, data-delivery
    service using a large, heterogeneous set of
    unreliable components.

FastTrack (Kazaa) ,2003
5
Applications Content Streaming
  • Streaming the user plays the data as as it
    arrives
  • ExamplesPplive, SplitStream, etc

6
Many other P2P applications
  • Backup storage (HiveNet, OceanStore)
  • Collaborative environments (Groove Networks)
  • Web serving communities (uServ)
  • Instant messaging (Yahoo, AOL)
  • Anonymous email
  • Censorship-resistant publishing systems
    (Ethernity, Freenet)
  • Spam filtering

7
Client/Server vs. P2P
8
Client/Server vs. P2P
9
Overlay Network
10
Overlay Network
  • An abstract layer built on top of the physical
    network
  • Neighbors in the overlay can be several hops away
    in the physical network
  • Why do we need overlays?
  • Flexibility in
  • Choosing neighbors
  • Forming and customizing topology to fit
    application needs (e.g., short delay,
    reliability, high BW, )
  • Designing communication protocols among nodes
  • Get around limitations in legacy networks

11
abstract P2P overlay architecture
12
Network Communications Layer
  • It describes the network characteristics of
    desktop machines connected over the Internet or
    small wireless or sensor-based devices that are
    connected in an ad-hoc manner.

13
Overlay Nodes Management layer
  • The Overlay Nodes Management layer covers the
    management of peers, which include discovery of
    peers and routing algorithms for optimization.

14
Features Management layer
  • The Features Management layer deals with the
    security, reliability, fault resiliency and
    aggregated resource availability aspects of
    maintaining the robustness of P2P systems.

15
Services Specific layer
  • The Services Specific layer supports the
    underlying P2P infrastructure and the
    application-specific components through
    scheduling of parallel and computation intensive
    tasks, content and file management.

16
Application-level layer
  • The Application-level layer is concerned with
    tools, applications and services that are
    implemented with specific functionalities on top
    of the underlying P2P overlay infrastructure.

17
P2P Systems Simple Model
18
Peer Software Architecture Model
  • P2P Substrate (key component)
  • Overlay management
  • Construction
  • Maintenance (peer join/leave/fail and network
    dynamics)
  • Resource management
  • Allocation (storage)
  • Discovery (routing and lookup)
  • Can be classified according to the flexibility of
    placing objects at peers

19
P2P Substrates Classification
  • Structured (or tightly controlled, DHT)
  • Objects are rigidly assigned to specific peers
  • Looks like as a Distributed Hash Table (DHT)
  • Efficient search guarantee of finding
  • Lack of partial name and keyword queries
  • Maintenance overhead
  • Ex Chord, CAN, Pastry, Tapestry, Kademila
    (Overnet)
  • Unstructured (or loosely controlled)
  • Objects can be anywhere
  • Support partial name and keyword queries
  • Inefficient search no guarantee of finding
  • Some heuristics exist to enhance performance
  • Ex Gnutella, Kazaa (super node), GIA

20
Types of P2P Systems
21
Napster (1)
  • Sharing of music files
  • Lists of files are uploaded to Napster server
  • Queries contain various keywords of required file
  • Server returns IP address of user machines having
    the file
  • File transfer is direct

22
Napster (2)
  • Centralised model
  • Napster server ensures correct results
  • Only used for finding the location of the files
  • Scalability bottleneck
  • Single point of failure
  • Denial of Service attacks possible
  • Lawsuits

23
Gnutella (1)
  • Sharing of any type of files
  • Decentralised search
  • Queries are sent to the neighbour nodes
  • Neighbours ask their own neighbours and so on
  • Time To Live (TTL) field on queries
  • File transfer is direct

24
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • note do not know where is A,
  • Flooding.

7
1
4
2
6
3
5
25
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors

7
1
4
2
6
3
5
26
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message

7
1
4
2
6
3
5
27
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that haveA initiate a reply message

7
1
4
2
6
3
5
28
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that haveA initiate a reply message
  • Query reply message is back-propagated

7
1
4
2
6
3
5
29
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that haveA initiate a reply message
  • Query reply message is back-propagated
  • Node 2 gets replies

7
1
4
2
6
3
5
30
Gnutella Network
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that haveA initiate a reply message
  • Query reply message is back-propagated
  • Node 2 gets replies
  • File download

download A
7
1
4
2
6
3
5
31
Gnutella (2)
  • Decentralised model
  • No single point of failure
  • Less susceptible to denial of service
  • SCALABILITY (flooding)
  • Cannot ensure correct results

32
KaZaA
  • Hybrid of Napster and Gnutella
  • Super-peers act as local search hubs
  • Each super-peer is like a constrained Napster
    server
  • Automatically chosen based on capacity and
    availability
  • Lists of files are uploaded to a super-peer
  • Super-peers periodically exchange file lists
  • Queries are sent to super-peers

33
Freenet
  • Ensures anonymity
  • Decentralised search
  • Queries are sent to the neighbour nodes
  • Neighbours ask their own neighbours and so on
  • The query process is sequential
  • Learning ability

34
Structured P2P
  • Second generation P2P (overlay) networks
  • Self-organizing
  • Load balanced
  • Fault-tolerant
  • Guarantees on numbers of hops to answer a query
  • Based on a distributed hash table interface

35
Distributed Hash Tables (DHT)
  • Distributed version of a hash table data
    structure
  • Stores (key, value) pairs
  • The key is like a filename
  • The value can be file contents
  • Goal Efficiently insert/lookup/delete (key,
    value) pairs
  • Each peer stores a subset of (key, value) pairs
    in the system
  • Core operation Find node responsible for a key
  • Map key to node
  • Efficiently route insert/lookup/delete request
    to this node

36
DHT Generic Interface
  • Node id m-bit identifier (similar to an IP
    address)
  • Key sequence of bytes
  • Value sequence of bytes
  • put(key, value)
  • Store (key,value) at the node responsible for
    the key
  • value get(key)
  • Retrieve value associated with key (from the
    appropriate node)

37
DHT Applications
  • File sharing
  • Databases
  • Service discovery
  • Chat service
  • Publish/subscribe networks

38
DHT Desirable Properties
  • Keys mapped evenly to all nodes in the network
  • Each node maintains information about only a few
    other nodes
  • Efficient routing of messages to nodes
  • Node insertion/deletion only affects a few nodes

39
Chord API
  • Node id m-bit identifier (similar to an IP
    address)
  • Key m-bit identifier (hash of a sequence of
    bytes)
  • Value sequence of bytes
  • API
  • insert(key, value)
  • lookup(key)
  • update(key, newval)
  • join(n)
  • leave()

40
Consistent Hashing
41
Chord Operation (1)
  • Nodes form a circle based on node identifiers
  • Each node is responsible in storing a portion of
    the keys
  • Hash function ensures even distribution of keys
    and nodes in the circle

42
Chord Ring definition
  • Finger table node k stores pointers to k1,
    k2, k4 ..., k2m -1 (mod n)
  • Find node for every data in O(log(nodes)) steps
    O(log(nodes)) storage per node

43
Chord Operation (2)
44
Chord Operation (3)
  • Lookup the furthest node that precedes the key
  • Query reaches target node in O(logN) hops

45
Scalable Lookup Scheme
Finger Table for N8
finger k 1st node that succeeds (n2k-1)mod2m
46
Lookup Using Finger Table
N1
lookup(54)
N56
N8
N51
N48
N14
N42
N38
N21
N32
47
Scalable Lookup Scheme
  • // ask node n to find the successor of id
  • n.find_successor(id)
  • if (id belongs to (n, successor)
  • return successor
  • else
  • n0 closest preceding node(id)
  • return n0.find_successor(id)
  • // search the local table for the highest
    predecessor of id
  • n.closest_preceding_node(id)
  • for i m downto 1
  • if (fingeri belongs to (n, id))
  • return fingeri
  • return n

48
Chord Properties
  • In a system with N nodes and K keys
  • Each node manages at most K/N keys
  • Bound information stored in every node
  • Lookups resolved with O(logN) hops
  • No delivery guarantees
  • Poor network locality

49
Network Locality
Nodes close on ring can be far in the network
50
Grid Computing
  • What is a Grid an integrated advanced cyber
    infrastructure that delivers
  • Computing capacity
  • Data capacity
  • Communication capacity
  • Analogy to the Electrical Power Grid

51
History
  • For many years, a few wacky computer scientists
    have been trying to help other scientists use
    distributed computing.
  • Interactive simulation (climate modeling)
  • Very large-scale simulation and analysis (galaxy
    formation, gravity waves, battlefield simulation)
  • Engineering (parameter studies, linked component
    models)
  • Experimental data analysis (high-energy physics)
  • Image and sensor analysis (astronomy, climate
    study, ecology)
  • Online instrumentation (microscopes, x-ray
    devices, etc.)
  • Remote visualization (climate studies, biology)
  • Engineering (large-scale structural testing,
    chemical engineering)
  • In these cases, the scientific problems are big
    enough that they required people in several
    organization to collaborate and share computing
    resources, data, instruments.

52
Some Core Problems
  • Too hard to keep track of authentication data
    (ID/password) across institutions
  • Too hard to monitor system and application status
    across institutions
  • Too many ways to submit jobs
  • Too many ways to store access files and data
  • Too many ways to keep track of data
  • Too easy to leave dangling resources lying
    around (robustness)

53
Challenging Applications
  • The applications that Grid technology is aimed at
    are not easy applications!
  • The reason these things havent been done before
    is because people believed it was too hard to
    bother trying.
  • If youre trying to do these things, youd better
    be prepared for it to be challenging.
  • Grid technologies are aimed at helping to
    overcome the challenges.
  • They solve some of the most common problems
  • They encourage standard solutions that make
    future interoperability easier
  • They were developed as parts of real projects
  • In many cases, they benefit from years of lessons
    from multiple applications
  • Ever-improving documentation, installation,
    configuration, training

54
Earth System Grid
  • Goal address technical obstacles to the
    sharing analysis of high-volume data from
    advanced earth system models

55
Other Examples of Grids
  • TeraGrid NSF funded linking 5 major research
    sites at 40 Gbs (www.teragrid.org)
  • European Union Data Grid grid for applications
    in high energy physics, environmental science,
    bioinformatics (www.eu-datagrid.org)
  • Access Grid collaboration systems using
    commodity technologies (www.accessgrid.org)
  • Network for Earthquake Engineering Simulations
    Grid - grid for earthquake engineering
    (www.nees.org)

56
Current Status of the Grid
  • Dozens of Grid projects in scientific and
    technical computing in academic research
    community
  • Consensus on Key concepts and technologies (GGF
    Global Grid Forum)
  • Open source Globus Toolkit a standard for major
    protocols and services
  • Funding agencies funding a lot of grid projects
  • Business interest emerging rapidly
  • Standards still emerging grid services, web
    services resource framework
  • Requires significant user training

57
Use Grid Now
  • A lot of work to make applications grid-ready
  • adopt new algorithms for parallel computation
  • change user interface
  • Have to build application on different
    architectures
  • Need to move application and data to different
    computers
  • Security and Licensing issues
  • Requires a lot of system administration expertise
  • Largely UNIX-based

58
Software Layers
Web browser or command window
(User interface)
Globus Client on Users Workstation
(Certs, submit job)
Globus Server on Master Node
(Job manager)
Queue Managers and Schedulers on Master Node
Applications Running on Grid Clusters
59
Developing Grid Standards
Increased functionality, standardization
60
Sand Glass Model
  • Trying to force homogeneity on users is futile.
    Everyone has their own preferences, sometimes
    even dogma.
  • The Internet provides the model

61
Evolution of the Grid
App-specific Services
Open Grid Services Arch
Increased functionality, standardization
Web services
GGF OGSI, WSRF, (leveraging OASIS, W3C,
IETF) Multiple implementations, including Globus
Toolkit
X.509, LDAP, FTP,
Globus Toolkit
Defacto standards GGF GridFTP, GSI (leveraging
IETF)
Custom solutions
Time
62
Open Grid Services Architecture
  • Define a service-oriented architecture
  • the key to effective virtualization
  • to address vital Grid requirements
  • utility, on-demand, system management,
    collaborative computing, etc.
  • building on Web service standards.
  • extending those standards when needed

63
Grid and Web Services Convergence
  • The definition of WSRF means that the Grid and
    Web services communities can move forward on a
    common base.

64
Who Is the Grid For?
  • Any Grid (distributed/collaborative) application
    or system will involve several classes of
    people.
  • End users (e.g., Scientists, Engineers,
    Customers)
  • Application/Product Developers
  • System Administrators
  • System Architects and Integrators
  • Each user class has unique skills and unique
    requirements.
  • The user class whose needs are met varies from
    tool to tool (even within the Globus Toolkit).

65
What End Users Need
Secure, reliable, on-demand access to
data, software, people, and other
resources (ideally all via a Web Browser!)
66
General Architecture
67
Grid Community Software
68
Social Policies/Procedures
  • How will people use the system?
  • Who will set up access control?
  • Who creates the data?
  • How will computational resources be added to the
    system?
  • How will simulation capabilities be used?
  • What will accounting data be used for?
  • Not all problems are solved by technology!
  • Understanding how the system will be used is
    important for narrowing the requirements.

69
What Is the Globus Toolkit?
  • The Globus Toolkit is a collection of solutions
    to problems that frequently come up when trying
    to build collaborative distributed applications.
  • Heterogeneity
  • To date (v1.0 - v4.0), the Toolkit has focused on
    simplifying heterogenity for application
    developers.
  • We aspire to include more vertical solutions in
    future versions.
  • Standards
  • Our goal has been to capitalize on and encourage
    use of existing standards (IETF, W3C, OASIS,
    GGF).
  • The Toolkit also includes reference
    implementations of new/proposed standards in
    these organizations.

70
What Does the Globus Toolkit Cover?
71
Globus Toolkit Components
72
Comparisons of P2P and Grid
Write a Comment
User Comments (0)
About PowerShow.com