Title: Introduction to Distributed Computing (Distributed System Support for Pervasive Computing)
1Introduction to Distributed Computing
(Distributed System Support for Pervasive
Computing)
- Wei Li
- liwei_at_dsv.su.se
2Reference Book
- Distributed Systems Principles and Paradigms
- Andrew S. Tanenbaum and Maarten van Steen
- http//www.cs.vu.nl/ast/books/ds1/
- (Examination wont be based on this book!)
3Remote communicationprotocol layering, RPC,
end-to-end args . . .Fault toleranceACID,
two-phase commit, nested transactions . . .High
Availabilityreplication, rollback recovery, . .
.Remote information accessdist. file systems,
dist. databases, caching, . . .Distributed
securityencryption, mutual authentication, . . .
- Mobile networkingMobile IP, ad hoc networks,
wireless TCP fixes, . . .Mobile information
accessdisconnected operation, weak consistency,
. . .Adaptive applicationsproxies, transcoding,
agility, . . .Energy-aware systemsgoal-directed
adaptation, disk spin-down, . . .Location
sensitivityGPS, WaveLan triangulation,
context-awareness, . . .
Pervasive Computing Vision and Challenges M.
Satyanarayanan, School of Computer Science
Carnegie Mellon University
4DC-gtMC-gtPvC/UC
- New problems are encountered and become more
complex (more dynamic). - DS and MC more focus on system interaction, while
PvC/UC involves in the adaptation to the user and
environment (introduces more HCI concerns about
users perception during the interaction). - Two key characteristics of System Software in
PvC Physical Integration Spontaneous
interoperation (Invisibility Adaptation) - Communication is the core aspect
5Definition of a Distributed System
- A distributed system is A collection of
independent computers that appears to its users
as a single coherent system. - -- Andrew S. Tanenbaum
- Machines are running autonomously
- Software hides the fact that processes and
resources are physically distributed across
multiple computers over networks - Goal Users and applications can access remote
resources and share them with other users in a
controlled way through the interaction with a DS
in a consistent and uniform way
6Transparency in a Distributed System
Different forms of transparency in a distributed
system.
7Important Issues (Principles) in Distributed
Systems
- Communication basis of DS is to support access
to remote resources, such as computation and data - Processes communication takes place between
processes, how to schedule and manage processes
are crucial for making communication thread,
code migration, client/server, software agent - Naming the shared resources in DS have to be
identified uniquely and each identification
should be resolvable for retrieving the entity it
refers to - Synchronization how to protect concurrent access
without conflicts read write, (processing
synchronization) - Consistency and Replication data are replicated
to enhance reliability performance in DS, keep
replicas consistent (also called data
synchronization) is an important issue local
cache. - Fault Tolerance DS are subject to failures as
communication spans multiple computers or even
networks, it is important to automatic recovery
from failures without affecting the overall
performance - Security one part is to secure communication
(secure channel), and the other is to provide
access protection to prevent malicious access
8Communication
- Layers, interfaces, and protocols in the OSI
(Open Systems Interconnection) reference model. - Divided into 7 layers each deals with one
- specific aspects of the communication
9Positioning Distributed System (Middleware)
- Distributed systems are often organized as a
software layer placed between user and
application and the underneath operation system. - A distributed system is also called middleware
and the middleware layer extends over multiple
machines. - Middleware is a application layer protocol (the
layers above transport layer are all categorized
into application layer).
10Remote Procedure Call (RPC)
- Extend the procedure call over the network by
allowing programs to call procedures located on
other machines through Stubs - Client program calls client stub to place a
remote procedure call - Client stub builds a request message and sends to
remote server - Server stub receives the message and unpacks
parameters, calls the remote procedure - Procedure executes and returns result to the
server stub - Server stub packs it in message, and sends back
to client stub - Client stub unpacks result, returns to client
program
A typical Client/Server Model
11Client Server Stubs
- The Stubs take charge of
- 1) Building message (parameters and results),
also called marshaling and unmarshaling - 2) Establishing connection to transfer messages.
12Writing RPC Client Server
1. The Uuidgen program generates a prototype IDL
file in Interface Definition Language format with
a unique identifier. 2. Edit the IDL file filling
the content such as procedure name and
parameters. 3. Compile the IDL into Headers and
stubs. 4. Implement the client and service codes.
2-14
- The steps in writing a client and a server in DCE
RPC. - OSFs Distributed Computing Environment (DCE)
standard
13Binding a Client to a Server
2-15
Endpoint Port
(server, endpoint) pairs
- Client-to-server binding in DCE.
14Remote Method Invocation
- Object-oriented technology encapsulates data,
(state/Property) and operations (method) on those
data,. - This encapsulation offers a better transparency
for system design and programming - The principle in RPC can be equally applied to
objects - Client uses proxy (a local representative of the
remote object) to operate with the remote one. - Proxy/Skeleton is analog to stubs in RPC, in
addition, it presents an object view.
15Passing Object by Value or Reference
Object1.method1 (L1, R1)
L1 R1
- A program on Machine A wants to call a method of
Object1 with two parameters (L1, R1) - L1 is a reference to local Object O1, and R1 is
a reference (proxy) to a remote object O2 on
Machine B - The Object1 is located on Machine A and the
method will run on Machine A
Object1.method1 (L1, R1)
- A program on Machine A wants to call a method of
Object1 with two parameters (L1, R1) and the
method will run on Machine C - L1 is a reference to Object O1 on Machine A, and
R2 is a reference to a remote object O2 on
Machine B - The Object1 is located on Machine C, but invoked
by program on Machine A - The program on Machine A needs to use a proxy
which refers to the Object1 on Machine C and
transfer parameters to Machine C
16Conclusion (RPC RMI)
- To be able to access a remote object, a stub
which refers to the remote object is required. - This stub appears as a local object, but delivers
access to the remote object. - This stub can be passed (e.g. in Java RMI) to
other programs (on remote computer) to share the
access to the same remote object. - Another way to access remote object is to clone a
local copy and access on it, this improves
performance by removing the call delay over the
network, but then the consistency becomes a issue
if they need to be synchronized since they are
two independent objects (from the same class) in
the network. - Stubs/Proxy Skeleton hide complexity of the
marshaling unmarshaling and network
communication to enhance the access transparency
to the upper-layer applications. - Main Drawback RPC and RMI use transient
synchronous communication model The sender
blocks until it receives a reply message from the
other side. This model is not suitable for
pervasive computing scenarios.
17Message-Oriented Communication
- RPC and RMI are based on Messages, but they are
one type of MOC model - Two orthogonal aspects to categorize MOC
- Synchronous vs. Asynchronous client blocks or
not after sending the request message - Further distinction on blocking until received
(on the remote side) or replied - Receipt-based synchronous communication (TCP)
- Reply-based synchronous communication
- Asynchronous communication (UDP)
- Persistent vs. Transient the message sent will
be guaranteed to be received by the receiver or
not. - TCP for Persistent
- UDP for Transient
- More alternatives (combinations) to choose to
fit different scenarios - More combinations when communication is a chain
of parties. Different segment can adopt different
models. - Normally, transient asynchronous MOC is commonly
used in PvC - RPC and RMI use transient synchronous MOC
18Berkeley Sockets
TCP/UDP Network communication like plug-in sockets
- Connection-oriented communication (TCP) pattern
using sockets. - UDP communication is asynchronous, so does not
have the synchronization point as in TCP - UDP server just creates a UDP socket and then
receives (blocking), and UDP client has no
connect phase to block, but just sends. - UDP port / TCP port, they may use the same port
number without conflict
19Message-Oriented Middleware
- Socket communication gives a easy-to-use
abstraction to network programming. - Socket is supported by most programming languages
and operating systems supporting network, - To achieve efficiency and simplicity, many
middlewares are implemented in terms of message
delivery based on socket communication (but hide
it). - This kind of middleware is called
Message-oriented middleware (MOM). - Examples IBM MQSeries, Tuple Space, JavaSpace.
20General Architecture of a Message-Queuing System
- Messages are delivered in a sorting-storing-forwar
ding fashion - Applications are loose-coupled by asynchronous
messages (events) - R1, R2 are Message Servers in MOM
- In email systems, R1, R2 are email servers
- The general organization of a message-queuing
system with routers.
21JavaSpace (Object version of Tuple Space)
- A Tuple is an ordered set of values without fixed
length - e.g., a tuple about a person 17,Male,1.75,Steven
- A JavaSpace is a shared data-space (Tuple
storage) that stores Tuples representing (a typed
set of) references to Java objects. - Write puts an object copy into the JavaSpace
- Read fetches an object matching the template
- Take removes an object
- JavaSpace can be used to snapshot a running
system in terms of objects and persistent them
for the use like system recovery.
22Overlay Network Based on MOC Socket
- Socket gives a simple abstraction for message
transfer over network, based on it, one can
further construct a new network over the
underlying IP network with two prerequisites - New address schema (Naming) and its
- Name resolution mechanism (routing)
- Examples
- JXTA and other P2P networks, JINI etc
- Naming Compiling Registering
- Routing JINI Lookup Service/JXTA Rendezvous etc
23Web naming scheme
- Uniform Resource Identifiers (URI) come in two
forms - URL Uniform Resource Locator
- URN Uniform Resource Name
24Uniform Resource Locators
Other common schemes
- Often-used structures for URLs.
- Using only a DNS name.
- Combining a DNS name with a port number.
- combining an IP address with a port number.
25Uniform Resource Names
urn isbn 0-13-349945-6
- Consist of 3 parts Scheme urn Name space
identifier Name of resource - The name space identifier determines the
syntactic rules for the third part, i.e., The
third party may have different structure
depending on the Name space identifier, So URN is
not publicly resolveable. - In contrast to URLs, URNs are location-independent
which means URNs usually are not related to any
specific entity (only used as a namespace).
26Locating URL (Name Resolving)
- Domain Name System (HostName -gt IP)
- Each computer has to be assigned an IP address
and DNS server IP address manually or through
DHCP server. - DNS Request (nslookup) sends the hostname to the
specified DNS server (root). - The DNS server returns the IP if it knows it,
otherwise, the request is forwarded to
upper-layer DNS server.
Record of Es IP
User host
27Overlay Network
- In DS, a client has to get the server location by
some discovery mechanism. (a lookup server) - Extend DS to be a new network
- One can use socket message to construct a
network based on underlying TCP/IP network
infrastructure. - In an overlay network, any server request to the
network can be routed (through several nodes) to
the server which can answer the request without
the client specifically knowing the location of
server. - Example, content-oriented network, P2P file
sharing
28Middleware and Openness
1.23
- In an open middleware-based distributed
system, the protocols used by each middleware
layer should be the same, as well as the
interfaces they offer to applications.
29Data Representation / XML (1)
- Extensible Markup Language (XML) is a standard
format for interchanging structured documents. - XML is a complement to HTML, however it was
designed to describe data, while HTML was
designed to display data with concerns how data
look like. - Anyone can use XML to define data in any
arbitrary structure (tree). - to be able to distinguish different structure,
XML Name Space is used to enable different
struture data co-exist in one document.
lt?xml version"1.0" ?gt ltnote xmlnsnotehttp//t
ove.com/note_structure.xmlgt
ltnotetogtTovelt/notetogt ltnotefromgtJanilt/notefr
omgt ltnoteheadinggtReminderlt/noteheadinggt
ltnotebodygtDon't forget me this
weekend!lt/notebodygt lt/notegt
30Data Representation /XML (2)
- It is the reader applications responsibility how
to understand (parse) different elements in the
XML document. - Extensible Stylesheet Language (XSL) is a
language stylesheet which can be used to convert
XML document from one structrue to another. This
transformation is also called XSLT. XSLT is one
way to help the interoperation between
distributed systems using different standards. - XML is a netural way to define data, it has
widely used not only for describing data, but for
describing communication such as address and
routing (e.g., in JXTA, every message is in XML
format). - Other example
- XML-RPC,
- Web Service, SOAP
31Security
- Goals
- Secure communication channel Integrity Privacy
- Authentication prevent undesirable acess.
- Cryptography technology
- Symmetric Cryptoraphic Algorithms
- Asymmetric Cryptoraphic Algorithms (Public
Private Key Pairs) - Digital digest (Hash, MAC)
- Digital Signature
32More Tricky in PvC
- Permanent Keys are not suitable (it is hard to
distribute keys in an unexpected place) - Temporary Key (time-limited capability)
- Encryption needs computation
- Privacy is an open problem, low-level
communication is prone to exposing users
persistent identity (e.g., Network Card MAC
address)
33Clients/Servers Architecture
- General interaction between a client and a server.
1.25
34Three-tiered C/S model
- The general organization of an Internet
search engine into three different layers
1-28
35Multitiered Architectures
- Alternative client-server organizations (a) (e).
1-29
36Conclusion
- DS technologies
- Procedure Object based communication
- MOM Socket
- Naming and resolution
- Overlay network
- XML, Security, Software architecture
- In general, Light-weight asynchronous
message-oriented communication is more suitable
for PvC - New Challenges are involved in to extending the
distributed system to support pervasive
computing. - Re-think the distributed system technologies and
evaluate the use in concrete scenarios.