Introduction to Distributed Systems - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Introduction to Distributed Systems

Description:

Distribute : To divide among several or many, systematically ... 129.65.242.4 hornet.csc.calpoly.edu hornet. 129.65.241.8 hornet-srv.csc.calpoly.edu hornet-srv ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 47
Provided by: infm3
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Distributed Systems


1
Introduction to Distributed Systems
  • What is distributed Computing?
  • Distribute To divide among several or many,
    systematically or merely at random.
  • Distributed system Collection of independent
    computers that appear to the users of the system
    as a single computer.
  • Distributed programming techniques allow software
    to take advantage of resources located on the
    Internet, on corporate and organization
    intranets, and on networks.
  • Distributed programming usually involves network
    programming in one form or another. That is, a
    program on one computer on a network needs some
    hardware or software resource that belongs to
    another computer either on the same network or on
    some remote network.

2
Introduction to Distributed Systems
  • A Distributed System

3
Introduction to Distributed Systems
  • Examples of Distributed Systems
  • Network of workstations (NOW) a group of
    networked personal workstations connected to one
    or more server machines.
  • The Internet
  • An intranet a network of computers and
    workstations within an organization, segregated
    from the Internet via a protective device (a
    firewall).
  • Actual example of a large-scale distributed
    system eBay
  • Actual example of a small-scale distributed
    system smart home
  • Computers in a distributed system
  • Workstations computers used by end-users to
    perform computing
  • Server machines computers which provide
    resources and services
  • Personal Assistance Devices handheld computers
    connected to the system via a wireless
    communication link.

4
Introduction to Distributed Systems
  • The network really is the computer.
  • Tim OReilly, in an address at 6/2000 Java One
  • By now, it's a truism that the Internet runs on
    open source. Bind, the Berkeley Internet Name
    Daemon, is the single most mission critical
    program on the Internet, followed closely by
    Sendmail and Apache, open source servers for two
    of the Internet's most widely used application
    protocols, SMTP and HTTP.
  • Early killer apps
  • - usenet distributed bulletin board
  • - email
  • - talk
  • Recent killer apps
  • - the web
  • - collaborative computing

5
Introduction to Distributed Systems
Centralized vs. Distributed Computing
6
Introduction to Distributed Systems
  • Monolithic mainframe applications vs. distributed
    applications
  • The monolithic mainframe application
    architecture
  • Separate, single-function applications, such as
    order-entry or billing
  • Applications cannot share data or other resources
  • Developers must create multiple instances of the
    same functionality (service).
  • Proprietary (user) interfaces
  • The distributed application architecture
  • Integrated applications
  • Applications can share resources
  • A single instance of functionality (service) can
    be reused.
  • Common user interfaces

7
Introduction to Distributed Systems
  • Evolution of pardigms
  • Client-server Socket API, remote method
    invocation
  • Distributed objects
  • Object broker CORBA
  • Network service Jini
  • Object space JavaSpaces
  • Mobile agents
  • Message oriented middleware (MOM) Java Message
    Service
  • Collaborative applications

8
Introduction to Distributed Systems
  • Cooperative distributed computing projects
  • Cooperative distributed computing projects (also
    called distributed computing in some literature)
    these are projects that parcel out large-scale
    computing to workstations, often making use of
    surplus CPU cycles.
  • Example seti_at_home project to scan data
    retrieved by a radio telescope to search for
    radio signals from another world.
  • Why distributed computing?
  • Economics distributed systems allow the pooling
    of resources, including CPU cycles, data storage,
    input/output devices, and services.
  • Reliability a distributed system allow
    replication of resources and/or services, thus
    reducing service outage due to failures.
  • The Internet has become a universal platform for
    distributed computing

9
Introduction to Distributed Systems
  • The Weaknesses and Strengths of Distributed
    Computing
  • In any form of computing, there is always a
    tradeoff in advantages and disadvantages
  • Some of the reasons for the popularity of
    distributed computing
  • The affordability of computers and availability
    of network access
  • Resource sharing
  • Scalability
  • Fault Tolerance
  • The disadvantages of distributed computing
  • Multiple Points of Failures the failure of one
    or more participating computers, or one or more
    network links, can spell trouble.
  • Security Concerns In a distributed system, there
    are more opportunities for unauthorized attack.

10
Introduction to Distributed Systems
  • The Architecture of Distributed Applications

11
Introduction to Distributed Systems
  • Network standards and protocols
  • On public networks such as the Internet, it is
    necessary for a common set of rules to be
    specified for the exchange of data.
  • Such rules, called protocols, specify such
    matters as the formatting and semantics of data,
    flow control, error correction.
  • Software can share data over the network using
    network software which supports a common set of
    protocols.
  • Protocols
  • In the context of communications, a protocol is a
    set of rules that must be observed by the
    participants.
  • In communications involving computers, protocols
    must be formally defined and precisely
    implemented. For each protocol, there must be
    rules that specify the followings
  • How is the data exchanged encoded?
  • How are events (sending , receiving) synchronized
    so that the participants can send and receive in
    a coordinated order?
  • The specification of a protocol does not dictate
    how the rules are to be implemented.

12
Introduction to Distributed Systems
  • The network architecture
  • Network hardware transfers electronic signals,
    which represent a bit stream, between two
    devices.
  • Modern day network applications require an
    application programming interface (API) which
    masks the underlying complexities of data
    transmission.
  • A layered network architecture allows the
    functionalities needed to mask the complexities
    to be provided incrementally, layer by layer.
  • Actual implementation of the functionalities may
    not be clearly divided by layer.

13
Introduction to Distributed Systems
  • The OSI seven-layer network architecture

14
Introduction to Distributed Systems
  • Network Architecture
  • The division of the layers is conceptual the
    implementation of the functionalities need not be
    clearly divided as such in the hardware and
    software that implements the architecture.
  • The conceptual division serves at least two
    useful purposes
  • Systematic specification of protocols it allows
    protocols to be specified systematically
  • Conceptual Data Flow it allows programs to be
    written in terms of logical data flow.

15
Introduction to Distributed Systems
  • The TCP/IP Protocol Suite
  • The Transmission Control Protocol/Internet
    Protocol suite is a set of network protocols
    which supports a four-layer network architecture.
  • It is currently the protocol suite employed on
    the Internet.

16
Introduction to Distributed Systems
  • The TCP/IP Protocol Suite -2
  • The Internet layer implements the Internet
    Protocol, which provides the functionalities for
    allowing data to be transmitted between any two
    hosts on the Internet.
  • The Transport layer delivers the transmitted data
    to a specific process running on an Internet
    host.
  • The Application layer supports the programming
    interface used for building a program.

17
Introduction to Distributed Systems
  • Network Resources
  • Network resources are resources available to the
    participants of a distributed computing
    community.
  • Network resources include hardware such as
    computers and equipment, and software such as
    processes, email mailboxes, files, web documents.
  • An important class of network resources is
    network services such as the World Wide Web and
    file transfer (FTP), which are provided by
    specific processes running on computers.
  • One of the key challenges in distributed
    computing is the unique identification of
    resources available on the network, such as
    e-mail mailboxes, and web documents.
  • Addressing an Internet Host
  • Addressing a process running on a host
  • Email Addresses
  • Addressing web contents URL

18
Introduction to Distributed Systems
The Internet Topology
19
Introduction to Distributed Systems
  • The Internet Topology
  • The internet consists of an hierarchy of
    networks, interconnected via a network backbone.
  • Each network has a unique network address.
  • Computers, or hosts, are connected to a network.
    Each host has a unique ID within its network.
  • Each process running on a host is associated with
    zero or more ports. A port is a logical entity
    for data transmission.

20
Introduction to Distributed Systems
  • The Internet addressing scheme
  • In IP version 4, each address is 32 bit long.
  • The address space accommodates 232 (4.3 billion)
    addresses in total.
  • Addresses are divided into 5 classes (A through
    E)

21
Introduction to Distributed Systems
  • The Internet addressing scheme - 2

22
Introduction to Distributed Systems
  • Example
  • Suppose the dotted-decimal notation for a
    particular Internet address is129.65.24.50. The
    32-bit binary expansion of the notation is as
    follows
  •  
  • Since the leading bit sequence is 10, the
    address is a Class B address. Within the class,
    the network portion is identified by the
    remaining bits in the first two bytes, that is,
    00000101000001, and the host portion is the
    values in the last two bytes, or
    0001100000110010. For convenience, the binary
    prefix for class identification is often included
    as part of the network portion of the address, so
    that we would say that this particular address is
    at network 129.65 and then at host address 24.50
    on that network.

23
Introduction to Distributed Systems
  • Another example
  • Given the address 224.0.0.1, one can expand it as
    follows
  •  
  • The binary prefix of 1110 signifies that this is
    class D, or multicast, address. Data packets
    sent to this address should therefore be
    delivered to the multicast group
    0000000000000000000000000001.

24
Introduction to Distributed Systems
  • The Internet Address Scheme 3
  • For human readability, Internet addresses are
    written in a dotted decimal notation
    nnn.nnn.nnn.nnn, where each nnn group is a
    decimal value in the range of 0 through 255
  • Internet host table (found in /etc/hosts file)
  • 127.0.0.1 localhost
  • 129.65.242.5 falcon.csc.calpoly.edu falcon
    loghost
  • 129.65.241.9 falcon-srv.csc.calpoly.edu
    falcon-srv
  • 129.65.242.4 hornet.csc.calpoly.edu hornet
  • 129.65.241.8 hornet-srv.csc.calpoly.edu
    hornet-srv
  • 129.65.54.9 onion.csc.calpoly.edu onion
  • 129.65.241.3 hercules.csc.calpoly.edu
    hercules

25
Introduction to Distributed Systems
  • IP version 6 Addressing Scheme
  • There are three types of addresses
  • Unicast An identifier for a single interface.
  • Anycast An identifier for a set of interfaces
    (typically belonging to different nodes).
  • Multicast An identifier for a set of interfaces
    (typically belonging to different nodes). A
    packet sent to a multicast address is delivered
    to all interfaces identified by that address.
  • The Domain Name System (DNS)
  • For user friendliness, each Internet address is
    mapped to a symbolic name, using the DNS, in the
    format of
  • ltcomputer-namegt.ltsubdomain hierarchygt.ltorganizatio
    ngt.ltsector namegt.ltcountry codegt
  • e.g., www.csc.calpoly.edu.us

26
Introduction to Distributed Systems
27
Introduction to Distributed Systems
  • The Domain Name System
  • For network applications, a domain name must be
    mapped to its corresponding Internet address.
  • Processes known as domain name system servers
    provide the mapping service, based on a
    distributed database of the mapping scheme.
  • The mapping service is offered by thousands of
    DNS servers on the Internet, each responsible for
    a portion of the name space, called a zone. The
    servers that have access to the DNS information
    (zone file) for a zone is said to have authority
    for that zone.
  • Top Level Domain Names
  • .com For commercial entities, anyone in the
    world, can register.
  • .net Originally designated for organizations
    directly involved in Internet operations. It is
    increasingly being used by businesses when the
    desired name under "com" is already registered by
    another organization. Today anyone can register a
    name in the Net domain.
  • .org For miscellaneous organizations, including
    non-profits.
  • .edu For four-year accredited institutions of
    higher learning.
  • .gov For US Federal Government entities
  • .mil For US military
  • Country Codes For individual countries based on
    the International Standards Organization. For
    example, ca for Canada, and jp for Japan.

28
Introduction to Distributed Systems
  • Domain Name Hierarchy

29
Introduction to Distributed Systems
  • Name lookup and resolution
  • If a domain name is used to address a host, its
    corresponding IP address must be obtained for the
    lower-layer network software.
  • The mapping, or name resolution, must be
    maintained in some registry.
  • For runtime name resolution, a network service is
    needed a protocol must be defined for the naming
    scheme and for the service.
  • Example
  • The DNS service supports the DNS
  • the Java RMI registry supports RMI object lookup
  • JNDI is a network service lookup protocol.

30
Introduction to Distributed Systems
  • Addressing a process running on a host logical
    ports

31
Introduction to Distributed Systems
  • Well Known Ports
  • Each Internet host has 216 (65,535) logical
    ports. Each port is identified by a number
    between 1 and 65535, and can be allocated to a
    particular process.
  • Port numbers between 1 and 1023 are reserved for
    processes which provide well-known services such
    as finger, FTP, HTTP, and email.

32
Introduction to Distributed Systems
  • Choosing a port to run your program
  • For our programming exercises when a port is
    needed, choose a random number above the well
    known ports 1,024- 65,535.
  • If you are providing a network service for the
    community, then arrange to have a port assigned
    to and reserved for your service.
  • The Uniform Resource Identifier (URI)
  • Resources to be shared on a network need to be
    uniquely identifiable.
  • On the Internet, a URI is a character string
    which allows a resource to be located.
  • There are two types of URIs
  • URL (Uniform Resource Locator) points to a
    specific resource at a specific location
  • URN (Uniform Resource Name) points to a specific
    resource at a nonspecific location.

33
Introduction to Distributed Systems
  • A URL has the format of
  • protocol//host addressport/directory
    path/file namesection

34
Introduction to Distributed Systems
  • More on URLs
  • The path in a URL is relative to the document
    root of the server. On the CSL systems, a users
    document root is /www.
  • A URL may appear in a document in a relative
    form
  • lt a hrefanother.htmlgt
  • and the actual URL referred to will be
    another.html preceded by the protocol, hostname,
    directory path of the document .

35
Introduction to Distributed Systems
  • Design Issues in Distributed Systems
  • Transparency is the most important issue in
    truly distributed systems is to make a group of
    machines appear as if it is an old timesharing
    system.
  • Different types of transparency
  • Location Transparency Users can not tell where
    the resources are located (hardware, software
    resources, CPU, printers, files, databases, etc.)
  • Migration Transparency Resources must be free
    to move from one machine to another without
    changing their names. E.g. Moving the mount
    points of remote file systems. /usr/dist on the
    sun cluster.
  • Replication Transparency Users can't tell how
    many copies exist. System may make multiple
    copies for reliability (a disk failure), improved
    performance (heavily used files). As long as the
    users don't observe anomalous behavior
    (coherency) it should not matter.

36
Introduction to Distributed Systems
  • Concurrency Transparency Multiple users can
    share resources automatically.
  • Multiple readers OK
  • Multiple writers Provide automatic mechanisms
    to sequentialize this to maintain correctness.
  • Parallelism Transparency Activities may happen
    in parallel without the users knowing about it.
    Hard to achieve. Advanced users may want to
    exploit the presence of multiple processors.
    Because the state-of-the-art is not close to
    achieving this automatically. The end is not in
    sight!!!!
  • Sometimes users don't want total transparency.
  • Use a special printer
  • Use a special hardware accelerator attached to a
    particular machine.

37
Introduction to Distributed Systems
  • Reliability
  • One machine goes down -gt another one performs the
    computation.
  • User never sees the difference, except perhaps in
    the performance level.
  • E.g. 5 file servers that have duplicate data.
  • Probability of one failing 0.05.
  • Probability of all of them failing simultaneously
    is 0.54 0.000006 practically negligible.
    (Logical OR of the individuals)
  • In practice distributed systems depend on
    several pieces all working simultaneously for the
    system to work. (Logical AND of the components)
    Distributed system is one on which I cannot get
    any wok done because some machine I have never
    heard of has crashed. (Lamport)

38
Introduction to Distributed Systems
  • Reliability has several facets
  • Availability
  • Fraction of the time the system is available for
    use.
  • Use as few components that need to work as
    simultaneously as possible. (reduce the logical
    AND)
  • Allow redundancy (increase logical OR). Replicate
    key pieces of hardware and software.
  • However one has to worry about the issues of
    consistency as the degree of redundancy
    increases. Tradeoff.
  • Security
  • Also a key issue in reliability
  • Easier to authenticate in centralized systems
    Use password and OK after that.
  • Distributed systems Messages between machines.
    How do you authenticate? Anybody can put any kind
    of message on the network.

39
Introduction to Distributed Systems
  • Fault tolerance
  • How easily / transparently does the system get
    out of a failure of some kind? E.g. A machine
    goes down? What happens to the process that was
    running? Can it be restarted in some other place
    exactly at the point the original process left
    off.
  • Important in business/banking systems.
  • Performance
  • An important aspect of distributed systems
  • Many different metrics can be used
  • response time/turnaround time
  • system utilization
  • network capacity utilization
  • Performance measurements depend a great deal on
    the types of situation. E.g. large number of
    compute bound jobs with little/no I/O Vs. large
    database applications.

40
Introduction to Distributed Systems
  • Granularity of computation
  • Fine grained e.g. simple operations that can be
    done with a few instructions. Lots of
    interaction, coordination, I/O, etc. Distributing
    them would be too much overhead.
  • Coarse-grained long computation times. little
    I/O, coordination, interaction, Better suited for
    distribution.
  • Scalability
  • Designed for 100s of CPUs. How will it work for
    100, 000 CPUs?
  • E.g. French PTT system
  • What principles to use?
  • Avoid centralized components, e.g. single mail
    servers, single file servers.
  • Avoid centralized tables, databases, etc., e.g.
    telephone directory
  • Avoid centralized algorithms, .e.g an algorithm
    that first collects information about the whole
    system before computing an optimal route to send
    a message.

41
Introduction to Distributed Systems
  • Characteristics of decentralized algorithms
  • Lack of complete information about the whole
    system
  • Make decisions based on local information
  • If one machine is down, the algorithm should
    still work.
  • No assumption about a global clock
  • General Discussion
  • Distributed To divide the computation among
    several what ?
  • Processors/nodes
  • Processor CPU (include cache, etc.)
  • Nodes single/multiprocessor, memory, I/O,
    possibly network interface
  • Communication The work has to be divided and
    distributed so, communication is central.
  • Bus (processor) or Network (node)
  • The parameters CPU, Memory, I/O, Network,
    System Software

42
Introduction to Distributed Systems
  • Many types of Systems
  • The differences are difficult to clearly state.
  • Some believe that it is a continuum.
  • Centralized Single system
  • Decentralized multiple systems, but no
    coordination
  • Distributed multiple systems with coordination
  • Homogeneous All systems are same/similar
  • Heterogeneous Dissimilar nodes in the system
  • Server A system providing some services, e.g.
    file systems usually more powerful and complex
    hardware
  • Client A system with minimal resources, depends
    on servers to get tasks accomplished.

43
Introduction to Distributed Systems
  • Networked System
  • High degree of autonomy of machines
  • Loosely-coupled hardware and loosely coupled
    software
  • Machines run their own OS
  • May have their own local disk
  • Operations have explicit names of the machines
  • rlogin Eagle
  • May have a file server but the view from
    different machines is different.
  • True Distributed System
  • Tightly-coupled software on loosely coupled
    hardware
  • Create an illusion of a Single System Image ,
    Virtual Uniprocessor
  • Uniform view of file system, uniform protection
    mechanisms
  • Uniform communication schemes

44
Introduction to Distributed Systems
  • Multiprocessor Systems
  • Tightly coupled software on tightly coupled
    hardware
  • Typically single run queue
  • Shared (logically) memory
  • File system is like the centralized system
  • Cluster Systems
  • Parallel or distributed system
  • Consists of a collection of interconnected whole
    computers
  • Utilized as a single computing resource
  • Peer relationship between the nodes in a cluster
  • Nodes of a cluster do not maintain their internal
    anonymity

45
Introduction to Distributed Systems
  • Networked Distributed Multiprocessor Cluster
  • Number of Nodes 1000s 1000s 1000s 10s
  • Performance Metric Response Time Response
    Time TurnaroundTime TurnAroundTime
  • Virtual Processor View No Yes Yes Yes
  • Node Individualization Yes Yes No No
  • Operating Systems Heterogenous
    Homogeneous Homogeneous Homogeneous
  • Copies of OS N N 1 N
  • Communication Shared Files Messages Shared
    Memory Messages
  • Network Protocol Required Required Not
    Required Not Required
  • Run Queue No No Yes No
  • Inter-node Security None Required None Required

46
Introduction to Distributed Systems
  • Summary
  • We discussed the following topics
  • What is meant by distributed computing
  • Rationale for distributed Systems
  • Centralised versus distributed Systems
  • Basic concepts in data communication in
    distributed systems
  • Network architectures the OSI model and the
    Internet model
  • Naming schemes for network resources
  • The three-layered architecture of distributed
    applications presentation layer, application or
    business logic, the service layer
  • Design issues in distributed systems
Write a Comment
User Comments (0)
About PowerShow.com