SHARCnet - PowerPoint PPT Presentation

About This Presentation
Title:

SHARCnet

Description:

... remote resources and to share them with other users in a controlled fashion. ... Build on emerging optical communications ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 68
Provided by: michae77
Category:
Tags: sharcnet

less

Transcript and Presenter's Notes

Title: SHARCnet


1
Introduction
2
Outline
  • Definitions
  • Examples
  • Hardware concepts
  • Software concepts
  • Readings Chapter 1
  • Acknowledgements Grid notes from UCSD

3
Definition of a Distributed System (Tannenbaum
and van Steen)
  • A distributed system is a piece of software that
    ensures that
  • A collection of independent computers that
    appears to its users as a single coherent system.

4
Definition of a Distributed System (Colouris)
  • A distributed system is
  • In which hardware and software components located
    at networked computers communicate and coordinate
    their actions only by passing messages.

5
Definition of a Distributed System (Lamport)
  • A distributed system is one in which I cannot get
    something done because a machine I've never heard
    of is down.

6
Primary Characteristics of a Distributed System
  • Multiple computers
  • Concurrent execution
  • Independent operation and failures
  • Communications
  • Ability to communicate
  • No tight synchronization
  • Relatively easy to expand or scale
  • Transparency

7
Example A Typical Intranet (Coulouris)
8
Example A Typical Portion of the Internet
(Coulouris)
9
Example Portable and Handheld Devices in a
Distributed System (Coulouris)
10
Motivation for Building Distributed Systems
  • Economics
  • Share resources
  • Relatively easy to expand or scale
  • Speed A distributed system may have more total
    computing power then a mainframe.
  • Cost
  • Personalize environments
  • Location Independence
  • People and information are distributed
  • Expandibility
  • Availability and Reliability
  • If a machine crashes, the system as a whole can
    survive.

11
Distributed Application Examples
  • Automated banking systems
  • Retail
  • Air-traffic control
  • The World Wide Web
  • Student Record System
  • Distributed Calendar
  • Gnutella and Napster
  • GAUL

12
Examples in More Detail
  • Air-Traffic Control
  • This is not an Internet application.
  • In many countries, airspace is divided into areas
    which in turn may be divided into sectors.
  • Each area is managed by a control center.
  • Control systems communicate with tower control
    and other control systems (to allow a plane to
    cross boundaries).
  • The planes and air-traffic controls are
    distributed. A single centralized systems is
    not feasible.

13
Examples in More Detail
  • World Wide Web
  • Shared Resources Documents
  • Unique identification using URLs
  • Users interested in the documents are
    distributed.
  • The documents are also distributed.
  • Banking
  • Clients may access their accounts from ATM
    machines.
  • There may be multiple clients attempt to access
    their accounts simultaneously.
  • Multiple copies of account information allows
    quicker access.

14
Examples in More Detail
  • Retail
  • Stores are located near their customer base.
  • Point of Sale (POS) terminals are used to
    customer interactions while mobile units are used
    for inventory control.
  • These units talk to a local processor which in
    turn may communicate with remote processors.

15
Examples in More Detail
  • Gnutella and Napster
  • What is being shared are files.
  • Gaul
  • What is being shared includes disk space, e-mail
    server, web server, software

16
Key Design Goals
  • Connectivity
  • Transparency
  • Reliability
  • Consistency
  • Security
  • Openness
  • Scalability

17
Connectivity
  • It should be easy for users to access remote
    resources and to share them with other users in a
    controlled fashion.
  • Resources that can be shared include printers,
    storage facilities, data, files, web pages, etc
  • Why? Economical
  • Connecting users and resources makes
    collaboration and the exchange of information
    easier.
  • Just look at e-mail

18
Transparency
  • A distributed system that is able to present
    itself to users and applications as if it were
    only a single computer system is said to be
    transparent.
  • Very difficult to make distributed systems
    completely transparent.
  • You may not want to, since transparency often
    comes at the cost of performance.

19
Transparency in a Distributed System
Transparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
Different forms of transparency in a distributed
system.
20
Degree of Transparency
  • The goal of full transparency is not always
    desirable.
  • Users may be located in different continents
    distribution is apparent and not something you
    want to hide.
  • Completely hiding failures of networks and nodes
    is (theoretically and practically) impossible
  • You cannot distinguish a slow compuer from a
    failing one.
  • You can never be sure that a server actually
    performed an operation before a crash.
  • Full transparency will cost in performance.
  • Keeping Web caches exactly up-to-date with the
    master copy
  • Immediately flushing write operations to disk for
    fault tolerance.

21
Openness
  • An open distributed system allows for interaction
    with services from other open systems,
    irrespectively of the underlying environment.
  • Systems should conform to well-defined
    interfaces.
  • Systems should support portability of
    applications.
  • Systems should easily interoperate.
    Interoperability is characterized by the extent
    by which two implementations of systems or
    components from different manufacturers can
    co-exist and work together.
  • Example In computer networks there are rules
    that govern the format, contents and meaning of
    messages send and received.

22
Scalability
  • There are three dimensions to scalability
  • The number of users and processes (size
    scalability)
  • The maximum distance between nodes (geographical
    scalability)
  • The number of administrative domains
    (administrative scalability)

23
Techniques for Scaling
  • Partition data and computations across multiple
    machines
  • Move computations to clients (Java applets)
  • Decentralized naming services (DNS)
  • Decentralized information systems (WWW)
  • Make copies of data available at different
    machines
  • Replicated file servers (for fault tolerance)
  • Replicated databases
  • Mirrored web sites
  • Allow client processes to access local copies
  • Web caches (browser/Web proxy)
  • File caching (at server and client)

24
Scaling The problem
  • Applying scaling techniques is easy, except for
    the following
  • Having multiple copies (cached or replicated)
    leads to inconsistencies modifying one copy
    makes that copy different from the rest.
  • Always keeping copies consistent requires global
    synchronization.
  • Global synchronization is expensive with respect
    to performance.
  • We have learned to tolerate some inconsistencies.

25
Challenges
  • Heterogeneity
  • Networks
  • Hardware
  • Operating systems
  • Programming languages

26
Challenges
  • Failure Handling
  • Partial failures
  • Can non-failed components continue operation?
  • Can the failed components easily recover?
  • Detecting failures
  • Recovery
  • Replication

27
Hardware Concepts
  • Multiprocessors
  • Multicomputers
  • Networks of computers

28
Multiprocessors and Multicomputers
1.6
Different basic organizations and memories in
distributed computer systems
29
Shared Memory
  • Coherent memory
  • Each CPU writes though, reflected at other
    immediately
  • Note the use of cache memory for efficiency
  • Limited to small number of processors

30
Shared Memory
  • All processors share access to a common memory.
  • Each CPU writes though, reflected at other
    immediately
  • Scaling requires a memory hierarchy of some kind.
    Note the use of cache memory for efficiency.

31
Shared Memory
  • A crossbar switch
  • (b) An omega switching network

32
Shared Memory
  • Shared memory is considered an efficient
    implementation of message passing.
  • Problem Cache consistency difficult to
    maintain if there are a large number of
    processors since the probability of an
    inconsistency between processors increase.
  • Problem Bus can become a bottleneck.
  • Problem Switch technology requires a lot of
    hardware which can be expensive.
  • Usually these systems have a relatively small
    number of processors.
  • Example Applications Real-time entertainment
    applications since they are sensitive to image
    quality and performance.
  • Examples Silicon Graphics Challenge, Sequent
    Symmetry

33
Multicomputer Systems
  • A multicomputer system comprises of a number of
    independent machines linked by an interconnection
    network.
  • Each computer executes its own program which may
    access its local memory and may send and receive
    messages over the network.
  • The nature of the interconnection network has
    been a major topic of research for both academia
    and industry.

34
Multicomputer Systems
  • Pipelined architecture
  • Pipelined program divided into a series of tasks
    that have to be completed one after the other.
  • Each task executed by a separate pipeline stage
  • Data streamed from stage to stage to form
    computation

35
Multicomputer Systems
  • Pipelined architecture
  • Computation consists of data streaming through
    pipeline stages

36
Multicomputer Systems
  • Take a list of integers greater than 1 and
    produce a list of primes
  • e.g. For input 2 3 4 5 6 7 8 9 10, output is
    2 3 5 7
  • A pipelined approach
  • Assume that processors are labeled by P2 ,P3 , P4
  • Processor Pi divides each input by i
  • If the input is not divisible, it is forwarded
  • Last processor only forwards primes
  • If you are looking for first N prime numbers you
    need sqrt(N) processors

37
Multicomputer Systems
  • Other example interconnection networks
  • Grid
  • Hypercube

1-9
38
Using a Grid (or systolic array)
  • Problem multiply two nxn matrices A aij and
    Bbij. Product matrix will be Rrij.
  • One solution uses an array with nxn cells.

39
Using a Grid (or systolic array)
  • Let A and B be the following
  • a11 a12 a13 a14 b11 b12 b13 b14
  • a21 a22 a23 a24 b21 b22 b23 b24
  • a31 a32 a33 a34 b31 b32 b33 b34
  • a41 a42 a43 a44 b41 b42 b43 b44
  • The product of A and B is calculated as follows
  • r11 a11b11a12b21a13b31a14b41
  • r12 a11b12a12b22a13b32a14b42
  • r21 a21b11a22b21a23b31a24b41
  • r22 a21b12a22b22a23b32a24b42

40
Using a Grid
b41 b42 b43 b44 b31 b32
b33 b34 b21 b22 b23
b24 b11 b12 b13 b14 --
-- -- -- --
-- ----
a44 a34 a24 a14 a43
a33 a23 a13 a42 a32
a22 a12 a41 a31 a21
a11 -- -- -- -- -- --
P11
P12
P21
P31
P13
P22
P32
P41
P14
P23
P33
P42
P24
P34
P43
P44
41
Using a Grid
  • Each cell updates at each time step as
    shown below
  • initialized to 0

42
Using a Grid
  • Beat 2
  • Beat 1

43
Using a Grid
  • Beat 4
  • Beat 3

44
Using a Grid
  • Beat 6
  • Beat 5

45
Multicomputer Systems
  • Hypercube Why?
  • Lets say that you had a hypercube of 8 nodes.
  • Their addresses are 000,001,010,011,100,101,110,11
    1
  • Nodes that have a difference of one bit are
    adjacent
  • Lets say you wanted to route a message from 000
    to 111.
  • This is easily done in three hops.
  • You go from 000 to 001, then 001 to 011 and then
    011 to 111.
  • Routing is simple and fast (certainly simpler
    than on the Internet).
  • BTW, the number of nodes is always a power of 2.

46
Multicomputer Systems
110
111
100
101
010
011
000
001
47
Multicomputer Systems
  • Hypercube
  • Pipelines and grids can be embedded into a
    hypercube system.
  • Example
  • 000 001 011 010 110 111 101 100
  • Example
  • 000 001 011 010
  • 100 101 111 110

48
Multiprocessor Usage
  • Scientific and engineering applications often
    require loops over large vectors e.g., matrix
    elements or points in a grid or 3D mesh.
    Applications include
  • Computational fluid dynamics
  • Scheduling (airline)
  • Health and biological modeling
  • Economics and financial modelling (e.g., option
    pricing)

49
Multiprocessor Usage
  • It should be noted that people have been
    developing clusters of machines that are
    connected using Ethernet for parallel
    applications.
  • The first such cluster (developed by two
    researchers at NASA) had 16 486 machines and was
    connected using 10 Mb Ethernet.
  • This is known as the Beowulf approach to
    developing a parallel computing and the clusters
    are sometimes called Beowulf clusters.

50
Sharcnet
  • UWO has taken a leading role in North America in
    exploiting the concepts behind the Beowulf
    cluster.
  • High performance clusters Beowulf on steroids
  • Powerful off the shelf computational elements
  • Advanced communications
  • Geographical separation (local use)
  • Connect clusters emerging optical communications
  • This is referred to as Shared Hierarchical
    Academic Research Computing Network or Sharcnet

51
Sharcnet
  • One cluster is called Great White
  • Processors
  • 4 alpha processors 833Mhz (4p-SMP)
  • 4 Gb of memory
  • 38 SMPs a total of 152 processors
  • Communications
  • 1 Gb/sec ethernet
  • 1.6 Gb/sec quadrics connection
  • November 2001 183 in the world
  • Fastest academic computer in Canada
  • 6th fastest academic computer in North America

52
Sharcnet
Great White (in Western Science Building)
53
Sharcnet
  • Extend Beowulf approach to clusters of high
    performance clusters
  • Connect clusters clusters of clusters
  • Build on emerging optical communications
  • Initial configuration used optical equipment from
    telecommunications industry
  • Collectively a supercomputer!

54
Sharcnet
Clusters across Universities (initial cluster)
55
Sharcnet
  • In 2004, UWO received an investment of 56 million
    dollars from the government and private industry
    (HP) to expand Sharcnet.
  • With the new capabilities, Sharcnet could be in
    the top 100 or 150 of supercomputers.
  • Will be the fastest supercomputer of its kind
    I.e.,a distributed system where nodes are
    clusters.

56
Sharcnet
57
Sharcnet
  • Applications running on Sharcnet come from all
    sorts of domains including
  • Chemistry
  • Bioinformatics
  • Economics
  • Astrophysics
  • Material Science and Engineering

58
Networks of Computers
  • High degree of node heterogeneity
  • Nodes include PCs, workstations, multimedia
    workstations, palmtops, laptops
  • High degree of network heterogeneity
  • This includes local-area Ethernet, ATM and
    wireless connections.
  • A distributed system should try to hide these
    differences.
  • In this course, the focus really is in networks
    of computers.

59
Software Concepts
System Description Main Goal
DOS Tightly-coupled operating system for multi-processors and homogeneous multicomputers Hide and manage hardware resources
NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients
Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency
  • An overview between
  • DOS (Distributed Operating Systems)
  • NOS (Network Operating Systems)
  • Middleware

60
Distributed Operating System
  • OS on each computer knows about the other
    computers
  • OS on different computer is generally the same
  • Services are generally (transparently)
    distributed across computers.

1.14
61
Distributed Operating System
  • This is harder to implement then a traditional
    operating system. Why?
  • Memory is not shared
  • No simple global communication
  • No simple systemwide synchronization mechanisms
  • May require that OS maintain global memory map in
    software.
  • No central point where resource allocation
    decisions can be made.
  • Only very few truly multicomputer operating
    systems exist.

62
Network Operating System
  • Each computer has its own operating system with
    networking facilities
  • Computers work independently i.e., they may even
    have different operating systems
  • Services are tied to individual nodes (ftp,
    telnet, www)
  • Highly file oriented

63
Middleware
  • OS on each computer need not know about the other
    computers
  • OS on different computers may be different
  • Services are generally (transparently)
    distributed across computers.

64
Middleware and Openness
1.23
  • In an open middleware-based distributed system,
    the protocols used by each middleware layer
    should be the same, as well as the interfaces
    they offer to applications.

65
Middleware Services
  • Communication Services
  • Hide primitive socket programming
  • Data management in a distributed system
  • Naming services
  • Directory services (e.g., LDAP, search engines)
  • Location services for tracking mobile objects
  • Persistent storage facilities
  • Data caching and replication

66
Middleware Services
  • Services giving applications control over when,
    where and how they access data.
  • Distributed transaction processing
  • Code migration
  • Services for securing processing and
    communication
  • Authentication and authorization services
  • Simple encryption services
  • Auditing services
  • There are varying levels of success in being able
    to provide these types of middleware services.

67
Summary
  • Distributed systems consist of autonomous
    computers that work together.
  • When properly designed, distributed systems can
    scale well with respect to the size of the
    underlying network.
  • Sometimes the lines are blurred between a
    distributed system and a system that can support
    parallel processing.
Write a Comment
User Comments (0)
About PowerShow.com