Naming

About This Presentation

Title:

Naming

Description:

Use your phone number as your name? Use your SSN as your name? Use your name as your SSN? ... 1. New node asks any node to lookup succ(p 1) ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 111

Provided by: Ken667

Category:

more less

Transcript and Presenter's Notes

Title: Naming

1
Naming

Introduction to Distributed SystemsCS
457/557Fall 2008Kenneth Chiu

2
Entities, Names, IDs, and Addresses

A name is a sequence of bits that can be used to
refer to an entity.
Entities The thing that we want to refer to.
What properties are desired in name?
Location independence
Easy to remember
Suppose I have the name of an entity. Can I now
operate on it immediately? Suppose I have your
name and want to access you?
To operate on an entity, necessary to access it,
via an access point. Names of access points are
addresses.
Can an entity have more than one access point?
Can an entities access point change over time?
Are names unique? Permanent?
Addresses unique? Permanent?

IDs are also a kind of name. What properties do
they have?
Refers to at most one entity
Each entity has at most one ID
ID is permanent
Examples?
What is your name? What is your address? What is
your ID?
SSN, phone number, passport number, street
address, e-mail address.
Can we substitute one for the other?
Use your phone number as your name?
Use your SSN as your name?
Use your name as your SSN? Phone number?

4
Central Question

How to resolve names to addresses?

5
Naming vs. Locating

DNS works well for static situations, where
addresses dont change very often.
What if we assume that it does?
Suppose we want to change ftp.cs.binghamton.edu
to ftp.cs.albany.edu. How?
Change the IP address for ftp.cs.binghamton.edu.
Make a symbolic link from ftp.cs.binghamton.edu
to ftp.albany.cs.edu. In other words, put an
entry in DNS that says ftp.cs.binghamton.edu has
been renamed to ftp.cs.albany.cs.edu.
Compare and contrast?
If the first is over long distances, latency may
be slow. Also, could be a bottleneck, since it is
centralized.
Adding indirection via a symbolic link can
create very long chains. Chain management is an
issue.

A better solution is to divide into separate
naming and location service.
How many levels of naming are used in typical
Internet?

Two-level mapping using identities and separate
naming and location service.
Direct, single level mapping between names and
addresses.
7
Name Spaces

The name space is the way that names in a
particular system are organized. This also
defines the set of all possible names.
Examples?
Phone numbers
Credit card numbers
DNS
Human names in the US
Files in UNIX, Windows
URLs

8
Flat Naming
9
Flat Naming

Given an unstructured name (ID), how we locate
the access point?
Broadcasting
Forwarding
Home-based
Distributed hash tables
Hierarchical location service

10
Broadcast/Multicast Location

How does Ethernet addressing work?
MAC address
How does your switch/hub know the IP to MAC
address mapping?
ARP, sends request, gets answer
Disadvantage?
Could waste bandwidth if network is large.
Interrupts hosts to check if they are the one
being sought.
What if multicast instead of broadcast?
Can also be used to find best replica.

11
Forwarding

Question How does the post office deal with
mobility?
When entity moves, leave a reference.
Disadvantages?
Chain too long if lots of movement.
All intermediate locations have to maintain the
forwarding.
Vulnerable to failure.
Performance is bad.

12
Forwarding Pointers in SSP

Object originally in P3, then moved to P4. P1
passes object reference to P2.
Do we always need to go through the whole chain?

13
Shortcut

Redirecting a forwarding pointer, by storing a
shortcut in a client stub.

How to deal with broken chains?
14
Home-Based Approaches

Let a home keep track of where the entity is.
Mobile IP
All hosts use fixed IP (essentially functions as
an ID) as a home location (or home address). Home
location registered with a naming service.
A home agent is monitoring this address.
When entity moves, it registers a foreign address
as the care-of address.
Clients send to home location first. When home
agent receives a packet, it tunnels it to current
care-of address, and also responds back to client
with current location.

15
(No Transcript)
16

Disadvantages?
Home address has to be supported as long as
entity is alive.
Home address is fixed, what happens when the move
is permanent.
What if entity is local, but home is far away?
Try a two-tiered scheme, first see if the entity
is local, then try the home.
Solution for moves?
Use a naming service to find the home.

17
Distributed Hash Tables

Chord
Organize all nodes in a ring.
Each node is assigned a random m-bit ID.
Each entity is assigned a unique m-bit key.
Entity with key k managed by node with smallest
id gt k (called the successor).
Simple solution?

18
Node 1 receives request to find key 19. What does
it do?
p 1
pred(p)
succ(p1)
Possible key values
Actual nodes
19

Finger tables
Each node p maintains a finger table FTp with at
most m entries FTpi
points to the first node succeeding p by at least
2i1.
To look up a key k, node p forwards the request
to node with index j satisfying
If p lt k lt FTp1, then the request is also
forwarded to FTp1.

20
Possible key values
Actual nodes
21

Resolving key 26 from node 1 and key 12 from node
28 in a Chord system.

22
Lookup

Primary task is how to lookup the node
responsible for storing the value of a key, given
a key.
Typically, the key might be the hash of a file,
for example.
The process is that each node uses its finger
table to decide which node to send it to.

23
succ(p2i-1)
i
Resolve k12 from node 28
Possiblekey values
Resolve k26 from node 1
Actual nodes
24
Joining

How does a node join?

25
succ(p2i-1)
i
2. New node informs succ(p1) that itself is new
predecessor.
Possiblekey values
Insert new node
1. New node asks any node to lookup succ(p1)
24
Actual nodes
26
succ(p2i-1)
2. New node informs succ(p1) that itself is new
predecessor.
i
Possiblekey values
24
1. New node asks any node to lookup succ(p1)
Insert
3. New node builds its own finger table by doing
successive lookups.
27

Maintaining connectivity
Nodes periodically contact succ(p1) and ask it
to return pred(succ(p1)).
If same, all is fine.
If different, what happened?
If different, some node q must have joined. p
will set FT1 to q, then contact q to ask for
its predecessor.
Nodes periodically contact pred(p) to see if
still alive.
If dead, pred(p) is set to null.
If a node discovers that pred(succ(p1)) is null,
then it informs succ(p1) that its predecessor
is very likely to be p.
Maintaining finger tables.
Nodes periodically lookup succ(p2i-1).
As long as its not too far out of whack, will
still succeed efficiently.

Locality Chord ignores the network topology.
Topology-based assignment When assigning an ID
to a node, make sure that nodes close in ID space
are also close in the network.
How do network failures impact this?
Proximity routing Maintain more than one
alternative for each entry in the table.
Currently, each entry is the first node in the
range p2i-1, p2i-1. It can point to multiple
ones, scattered in network space.
Proximity neighbor selection If there is a
choice as to neighbors, select the closest one.

29
Hierarchical Approaches

Build a large-scale search tree by dividing
network in hierarchical domains.
Each domain has a node.
Generalization of
First try local registry, then try the home.
Can generalize to multiple tiers.

Hierarchical organization of a location service
into domains, each having an associated directory
node.
Is this a hierarchical namespace?
Note that namespace is still flat!

Each entity in domain D has a location record in
dir(D).
The location record in the leaf domain contains
the current address.
The location record in a non-leaf domain contains
a reference to the directory node of the correct
next lower domain.

R
E N1
N1
E N2
N2
E Address of E
Entity (E)
Client
32

An example of storing information of an entity
having two addresses in different leaf domains.

Looking up a location in a hierarchically
organized location service.

Each entity in domain D has a location record in
dir(D).
The location record in the leaf domain contains
the current address.
The location record in a non-leaf domain contains
a reference to the directory node of the correct
next lower domain.

R
E N1
N1
E N2
N2
E Address of E
Entity (E)
Client
35

Inserting a replica The request is forwarded to
the first node that knows about entity E.

A chain of forwarding pointers to the leaf node
is created.

Delete operations are analogous.
Directory node dir(D) in leaf domain is requested
to remove entry for E.
If dir(D) has no more entries for E (no more
replicas), it then requests its parent to also
remove the entry pointing to dir(D).

38
Why Is This Any Better?

Lets play Devils Advocate and compare this to
straightforward solutions.
Exploits locality
Search expands in a ring.
As an entity moves, it usually is a local
operation.

39
Pointer Caching

How effective is caching addresses?
Depends on degree of mobility.
If very mobile, what can we do to help?
If D is the smallest domain that E moves around
in, can cache dir(D).
Called pointer caching.

Caching a reference to a directory node of the
lowest-level domain in which an entity will
reside most of the time.

If a replica is inserted locally, caches should
be updated to point to the local replica.

42
Scalability

Where is the bottleneck here?
Root has to store everything.
How to address?
Federate/distribute the root (multiple roots).
Each root is responsible for a subset.
Cluster solution?
Also distribute geographically.

43
Scalability Issues

Locating subnodes correctly is a challenge.

44
Structured Names
45
Naming Graph

Path, local name, absolute name
Should it be a tree, DAG, allow cycles?

46
Name Spaces (2)

The general organization of the UNIX file system
implementation on a logical disk of contiguous
disk blocks.

47
Name Resolution

Looking up a name (finding the value) is called
name resolution.
Closure mechanism (or where to start)
How to find the root node, for example.
Examples, file systems, ZIP code, DNS

48
Aliases

Aliases
Can be hard.
Can be soft, like a forwarding address.

Naming graph for symbolic link.

50
Merging Namespaces

How can we merge namespaces? Are there any
issues?
Mounting
Can be used to merge namespaces.
In the hierarchical case, what is needed is a
special directory node that jumps to the other
namespace.

51
Linking and Mounting

Consider a collection of heirarchical namespaces
distributed across different machines.
Each namespace implemented by different server.
Information required to mount a foreign name
space in a distributed system
The name of an access protocol.
The name of the server.
The name of the mounting point in the foreign
name space.

Mounting remote name spaces through a specific
process protocol.
How do you access steens mbox from Machine A?

Network
53
Name Space Implementation

Name spaces always map names to something.
DNS maps what to what?
Can be divided into three layers
Global layer Doesnt change very often.
Administrational layer Single organization, like
a department, or division.
Managerial layer Change regularly, such as local
area network.

54
Global layer
Administrational layer
Managerial layer
55
Name Server Characteristics

A comparison between name servers for
implementing nodes from a large-scale name space
partitioned into a global layer, as an
administrational layer, and a managerial layer.

56
Name Resolution

A name resolver looks up names.
How about a simple hash table?
Bottleneck if just one.
Replicate?
How do you update a record? Every single replica?
The idea is that you want to distribute the load,
but do it in the right way.
Assume that we use a hierarchical name space.

57
Iterative Name Resolution

Consider the name rootltnl, vu, cs, ftp, pub,
globe, index.txtgt.

58
Recursive Name Resolution

Which loads root server more?

59
Recursion and Caching

Recursive name resolution of ltnl, vu, cs, ftpgt.
Name servers cache intermediate results for
subsequent lookups.
Is iterative or recursive better for caching?

60
Name Resolution Communication Costs

The comparison between recursive and iterative
name resolution with respect to communication
costs.

61
Name Resolution Communication Costs

A comparison of iterative vs. recursive

R1
Name servernl node
Recursive name resolution
R2
I1
Name servervu node
Client
I2
R3
Iterative name resolution
Name servercs node
I3
Long distance communication
62
Example DNS

Name space is a tree of nodes.
Label can be 63 characters.
Max name is 255 characters.
Path names represented two ways
rootltnl, vu, cs, flitsgt
flits.cs.vu.nl.
Subtree is a domain. Path name is a domain name.
A node contains a collection of resource records.
A zone is the part of the tree that a nameserver
is responsible for.
A domain is made up of one or more zones.

63
Resource Records
64
DNS Implementation

Each zone is managed by a name server.

65
Node Contents

An excerpt from the DNS database for the zone
cs.vu.nl.

66
(No Transcript)
67
DNS Subdomains

Part of the description for the vu.nl domain
which contains the cs.vu.nl domain.

68
Decentralized DNS

Basic idea Take the DNS name, hash it, and use
DHT to find the key.
Disadvantage?
Pastry
Prefixes of keys are used to route to nodes.
Each digit taken from base b.
Suppose you have base 4. A node with ID 3210 is
responsible for all keys with prefix 321. it
keeps the following table.

Suppose it receives a lookup request for 3123?
1000?

70
Replication

The main problem with this is that there is going
to be a lot of hops.
Replicate to higher levels. For example, key 3211
is replicated to all nodes havin prefix 321.
What happens if you replicate everything?
Suppose you want to guarantee that on average, it
takes C hops? Which keys should be replicated?

71
Distribution

How are queries distributed? Are some more common
than others? What does it look like?
Zipf distribution says that the frequency of the
n-th ranked item is proportional to 1/n?, with ?
being close to 1.

72
Selective Replication

Assume Zipf distribution of queries, then the
formula above shows fraction of most popular keys
that should be replicated at level i. d is based
on ? base b. N is total number of nodes. C is
desired hop count.

73
Example

Example Assume that you want an average of one
hop, with base b4, ?0.9, N10,000, and
1,000,000 records.
61 most popular should be replicated at level 0.
284 next most popular should be replicated at
level 1.
1323 next most popular should be replicated at
level 2.
6177 next most popular should be replicated at
level 3.
28826 next most popular should be replicated at
level 4.
134505 next most popular should not be replicated.

74
Attribute-Based Naming
75
Directory Services

Sometimes you want to search for things based on
some kind of description of them.
Usually known as directory services.
Coming up with attributes can be hard.
Resource Description Framework (RDF) is
specifically designed for this.

76
Example LDAP

DNS resolves a name to a node in the namespace
graph.
LDAP is a directory service, which allows more
general queries.
Consists of a set of records.
Each record is a list of attribute-value pairs,
with possible multiple values.

77
LDAP Directory Entry

A simple example of a LDAP directory entry using
LDAP naming conventions.
/CNL/OVrije Universiteit/OUMath. Comp. Sc.

Collection of all entries is a directory
information base (DIB).
Each naming attribute is a relative distinguished
name (RDN).
The RDNs, in sequence, can be used to form a
directory information tree (DIT).

79
Hierarchical Implementations LDAP (2)

Part of a directory information tree.

80
Children Nodes

Two directory entries having Host_Name as RDN.

81
Using DHTs for attribute-value searches

So far, we have assumed that the search is
centralized.

82
Lookups

To do a lookup, represent as a path, then hash
the path.

83
Range Queries

Divide key into two parts, name and value.
Hash the name. Assume that a group of servers is
responsible for that.
Each server in the group is responsible for a
range.
For a resource description described by two
attribute-values, it must be stored via both of
them.
Example, movie made after 1980 with rating of
four-five stars.

84
Semantic Overlay Networks

Maintaining a semantic overlay through gossiping.

85
Garbage Collection
86
Garbage Collection

How do you handle a server object that is unused?
Can it be deleted by the server?
More context.

87
Unreferenced Objects

An example of a graph representing objects
containing references to each other.

88
Example

class Class1 Class2 cls2 Class2 global
new Class2foo() Class1 obj new
Class1 obj-gtcls2 new Class2 global
new Class2

89
Reference Counting

How to avoid the double-counting?

90
Passing a Reference

Copying a reference to another process and
incrementing the counter too late
A solution.

91
Weighted Reference Counting

Can we avoid sending increment messages?

2
1
1
2
X
Increment
Decrement
92
Weighted References

Think of each reference as a token. If you give
each reference multiple tokens, then it can hand
those out without contacting the skeleton.

1
2
1
Step 2 Copy of reference is made. Weight is
divided by two.
2
2
Decrement 1
X
2
Step 1 Reference created with weight 2.
1
Step 3 A reference is deleted. A decrement by 1
message is sent.
93
Weighted References

Works correctly in the simple case.

2
2
Step 1 Reference created with weight 2.
Decrement 2
2
X
Step 2 A reference is deleted. A decrement by 2
message is sent.
94
Total and Partial Weights

To work in real situation, the skeleton keeps
track of the initial total weight that is
available.
Each proxy/stub then keeps track of how much
weight it is carrying (the partial weight).
When a proxy is duplicated, the partial weight is
halved.
When a proxy is deleted, a decrement message is
sent.

95
Weighted Reference Counting

The initial assignment of weights in weighted
reference counting
Weight assignment when creating a new reference.

96
Passing a Weighted Ref Count

Weight assignment when copying a reference.

97
Indirection

Creating an indirection when the partial weight
of a reference has reached 1.

98
Generation Reference Counting

Simple reference counting requires message for
incrementing and a message for decrementing.
Can we somehow combine those into one message?
Maybe delay the increment somehow?
Generation reference counting gets rid of one of
the messages (the increment message).
Basic idea is to try to defer the increment until
we actually decrement.

99
Delayed Incrementing
1
C0
1 A proxy and a skeleton.
X
Increment 2, Decrement 1
1
C2
C0
1
C0
C0
C0
3 First proxy is deleted. It sends a message
saying it created two other proxies, so increment
ref count by two before decrementing by one.
2 Proxy created two more. Ref count at skeleton
is not updated, but first proxy keeps track of
the fact that it created two other proxies (C2).
100
A Problem
1
C0
C2
1 A proxy and a skeleton.
1
C0
C2
Decrement 1
1
X
C0
C0
3 Third proxy is deleted. It has not created any
proxies, so it just sends a message to decrement
by one. This causes the object to be improperly
deleted.
2 Proxy created two more. Ref count at skeleton
is not updated, but first proxy keeps track of
the fact that it created two other proxies (C2).
101
Generations

Proxies have generations, as in humans.

G0C2
Generation 0
G1C2
G1C1
Generation 1
G2C0
G2C0
G2C0
Generation 2
102
Generational Ref Counting

Creating and copying a remote reference in
generation reference counting.

103
Deleting a Proxy

Skeleton maintains a table Gi, which denotes
the references for generation i.
When a proxy is deleted, a message is sent with
the generation number k and the number of copies
c.
Skeleton decrements Gk and increments Gk1 by
c.
Only when table is all 0 is the skeleton deleted.

104
Deleting a Proxy
G01G1-1
G0C0
G0C2
G01
1 A proxy and a skeleton.
G1C0
G0C2
Decrement G1 by 1
G01
X
G1C0
G1C0
3 Third proxy is deleted. It has not created any
proxies, so it just sends a message to decrement
by one. Generation number is different from first
one.
2 Proxy created two more. Ref count at skeleton
is not updated, but first proxy keeps track of
the fact that it created two other proxies (C2).
105
Refresh Distributed Garbage Collection

The problem. How do you discover when no one
needs a server object?
One solution Develop distributed versions of
garbage collection algorithms.

106
Reference Listing

Distributed reference counting is tricky because
of failures.
If you send an increment, how do you know it
arrived?
What if the ack is lost?
If you can design it so that it doesnt matter
how many times you send a message, then it is
simpler.
This is called idempotency. An idempotent
operation can be done many times without negative
effect.
Are these idempotent?
Withdrawing 50 from your bank account.
Cancelling a credit card account.
Registering for a course.

107
Idempotent Reference Counting

How can we make reference counting idempotent?
What turns non-idempotent registration into
idempotent registration?
Keep track of which proxies have been created in
the skeleton.

108
Reference Listing
P1
Reference ListP1P2
P2

Keep a list of proxies in the skeleton.
Failures can be handled with heartbeats, etc.
Main issue is scaling, if millions of proxies.

109
Tracing in Groups

Garbage collect first within a group of
distributed processes.
An object is not collected if
It has a reference from outside the group
It has a reference from the root set
This is conservative.
It is possible that a reference from outside the
group is not reachable.

110
The Model

Proxies (stubs), skeletons, and objects.
Only one proxy per object per process.
A root set of references. The root set is not
proxies.

Proxy
Skeleton
Object
111
Basic Steps

Find all skeletons in a group that are reachable
from outside or from root set.
Mark them hard, others are soft.
Within a process, proxies that are reachable from
hard are marked hard, reachable from soft are
marked soft. Some are marked none.
Repeat the above until stable (no change).

112