SHARCnet

About This Presentation

Transcript and Presenter's Notes

Title: SHARCnet

1
Introduction
2
Outline

Definitions
Examples
Hardware concepts
Software concepts
Readings Chapter 1
Acknowledgements Grid notes from UCSD

3
Definition of a Distributed System (Tannenbaum
and van Steen)

A distributed system is a piece of software that
ensures that
A collection of independent computers that
appears to its users as a single coherent system.

4
Definition of a Distributed System (Colouris)

A distributed system is
In which hardware and software components located
at networked computers communicate and coordinate
their actions only by passing messages.

5
Definition of a Distributed System (Lamport)

A distributed system is one in which I cannot get
something done because a machine I've never heard
of is down.

6
Primary Characteristics of a Distributed System

Multiple computers
Concurrent execution
Independent operation and failures
Communications
Ability to communicate
No tight synchronization
Relatively easy to expand or scale
Transparency

7
Example A Typical Intranet (Coulouris)
8
Example A Typical Portion of the Internet
(Coulouris)
9
Example Portable and Handheld Devices in a
Distributed System (Coulouris)
10
Motivation for Building Distributed Systems

Economics
Share resources
Relatively easy to expand or scale
Speed A distributed system may have more total
computing power then a mainframe.
Cost
Personalize environments
Location Independence
People and information are distributed
Expandibility
Availability and Reliability
If a machine crashes, the system as a whole can
survive.

11
Distributed Application Examples

Automated banking systems
Retail
Air-traffic control
The World Wide Web
Student Record System
Distributed Calendar
Gnutella and Napster
GAUL

12
Examples in More Detail

Air-Traffic Control
This is not an Internet application.
In many countries, airspace is divided into areas
which in turn may be divided into sectors.
Each area is managed by a control center.
Control systems communicate with tower control
and other control systems (to allow a plane to
cross boundaries).
The planes and air-traffic controls are
distributed. A single centralized systems is
not feasible.

13
Examples in More Detail

World Wide Web
Shared Resources Documents
Unique identification using URLs
Users interested in the documents are
distributed.
The documents are also distributed.
Banking
Clients may access their accounts from ATM
machines.
There may be multiple clients attempt to access
their accounts simultaneously.
Multiple copies of account information allows
quicker access.

14
Examples in More Detail

Retail
Stores are located near their customer base.
Point of Sale (POS) terminals are used to
customer interactions while mobile units are used
for inventory control.
These units talk to a local processor which in
turn may communicate with remote processors.

15
Examples in More Detail

Gnutella and Napster
What is being shared are files.
Gaul
What is being shared includes disk space, e-mail
server, web server, software

16
Key Design Goals

Connectivity
Transparency
Reliability
Consistency
Security
Openness
Scalability

17
Connectivity

It should be easy for users to access remote
resources and to share them with other users in a
controlled fashion.
Resources that can be shared include printers,
storage facilities, data, files, web pages, etc
Why? Economical
Connecting users and resources makes
collaboration and the exchange of information
easier.
Just look at e-mail

18
Transparency

A distributed system that is able to present
itself to users and applications as if it were
only a single computer system is said to be
transparent.
Very difficult to make distributed systems
completely transparent.
You may not want to, since transparency often
comes at the cost of performance.

19
Transparency in a Distributed System
Transparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
Different forms of transparency in a distributed
system.
20
Degree of Transparency

The goal of full transparency is not always
desirable.
Users may be located in different continents
distribution is apparent and not something you
want to hide.
Completely hiding failures of networks and nodes
is (theoretically and practically) impossible
You cannot distinguish a slow compuer from a
failing one.
You can never be sure that a server actually
performed an operation before a crash.
Full transparency will cost in performance.
Keeping Web caches exactly up-to-date with the
master copy
Immediately flushing write operations to disk for
fault tolerance.

21
Openness

An open distributed system allows for interaction
with services from other open systems,
irrespectively of the underlying environment.
Systems should conform to well-defined
interfaces.
Systems should support portability of
applications.
Systems should easily interoperate.
Interoperability is characterized by the extent
by which two implementations of systems or
components from different manufacturers can
co-exist and work together.
Example In computer networks there are rules
that govern the format, contents and meaning of
messages send and received.

22
Scalability

There are three dimensions to scalability
The number of users and processes (size
scalability)
The maximum distance between nodes (geographical
scalability)
The number of administrative domains
(administrative scalability)

23
Techniques for Scaling

Partition data and computations across multiple
machines
Move computations to clients (Java applets)
Decentralized naming services (DNS)
Decentralized information systems (WWW)
Make copies of data available at different
machines
Replicated file servers (for fault tolerance)
Replicated databases
Mirrored web sites
Allow client processes to access local copies
Web caches (browser/Web proxy)
File caching (at server and client)

24
Scaling The problem

Applying scaling techniques is easy, except for
the following
Having multiple copies (cached or replicated)
leads to inconsistencies modifying one copy
makes that copy different from the rest.
Always keeping copies consistent requires global
synchronization.
Global synchronization is expensive with respect
to performance.
We have learned to tolerate some inconsistencies.

25
Challenges

Heterogeneity
Networks
Hardware
Operating systems
Programming languages

26
Challenges

Failure Handling
Partial failures
Can non-failed components continue operation?
Can the failed components easily recover?
Detecting failures
Recovery
Replication

27
Hardware Concepts

Multiprocessors
Multicomputers
Networks of computers

28
Multiprocessors and Multicomputers
1.6
Different basic organizations and memories in
distributed computer systems
29
Shared Memory

Coherent memory
Each CPU writes though, reflected at other
immediately
Note the use of cache memory for efficiency
Limited to small number of processors

30
Shared Memory

All processors share access to a common memory.
Each CPU writes though, reflected at other
immediately
Scaling requires a memory hierarchy of some kind.
Note the use of cache memory for efficiency.

31
Shared Memory

A crossbar switch
(b) An omega switching network

32
Shared Memory

Shared memory is considered an efficient
implementation of message passing.
Problem Cache consistency difficult to
maintain if there are a large number of
processors since the probability of an
inconsistency between processors increase.
Problem Bus can become a bottleneck.
Problem Switch technology requires a lot of
hardware which can be expensive.
Usually these systems have a relatively small
number of processors.
Example Applications Real-time entertainment
applications since they are sensitive to image
quality and performance.
Examples Silicon Graphics Challenge, Sequent
Symmetry

33
Multicomputer Systems

A multicomputer system comprises of a number of
independent machines linked by an interconnection
network.
Each computer executes its own program which may
access its local memory and may send and receive
messages over the network.
The nature of the interconnection network has
been a major topic of research for both academia
and industry.

34
Multicomputer Systems

Pipelined architecture
Pipelined program divided into a series of tasks
that have to be completed one after the other.
Each task executed by a separate pipeline stage
Data streamed from stage to stage to form
computation

35
Multicomputer Systems

Pipelined architecture
Computation consists of data streaming through
pipeline stages

36
Multicomputer Systems

Take a list of integers greater than 1 and
produce a list of primes
e.g. For input 2 3 4 5 6 7 8 9 10, output is
2 3 5 7
A pipelined approach
Assume that processors are labeled by P2 ,P3 , P4
Processor Pi divides each input by i
If the input is not divisible, it is forwarded
Last processor only forwards primes
If you are looking for first N prime numbers you
need sqrt(N) processors

37
Multicomputer Systems

Other example interconnection networks
Grid
Hypercube

1-9
38
Using a Grid (or systolic array)

Problem multiply two nxn matrices A aij and
Bbij. Product matrix will be Rrij.
One solution uses an array with nxn cells.

39
Using a Grid (or systolic array)

Let A and B be the following
a11 a12 a13 a14 b11 b12 b13 b14
a21 a22 a23 a24 b21 b22 b23 b24
a31 a32 a33 a34 b31 b32 b33 b34
a41 a42 a43 a44 b41 b42 b43 b44
The product of A and B is calculated as follows
r11 a11b11a12b21a13b31a14b41
r12 a11b12a12b22a13b32a14b42
r21 a21b11a22b21a23b31a24b41
r22 a21b12a22b22a23b32a24b42

40
Using a Grid
b41 b42 b43 b44 b31 b32
b33 b34 b21 b22 b23
b24 b11 b12 b13 b14 --
-- -- -- --
-- ----
a44 a34 a24 a14 a43
a33 a23 a13 a42 a32
a22 a12 a41 a31 a21
a11 -- -- -- -- -- --
P11
P12
P21
P31
P13
P22
P32
P41
P14
P23
P33
P42
P24
P34
P43
P44
41
Using a Grid

Each cell updates at each time step as
shown below
initialized to 0

42
Using a Grid

Beat 2

Beat 1

43
Using a Grid

Beat 4

Beat 3

44
Using a Grid

Beat 6

Beat 5

45
Multicomputer Systems

Hypercube Why?
Lets say that you had a hypercube of 8 nodes.
Their addresses are 000,001,010,011,100,101,110,11
1
Nodes that have a difference of one bit are
adjacent
Lets say you wanted to route a message from 000
to 111.
This is easily done in three hops.
You go from 000 to 001, then 001 to 011 and then
011 to 111.
Routing is simple and fast (certainly simpler
than on the Internet).
BTW, the number of nodes is always a power of 2.

46
Multicomputer Systems
110
111
100
101
010
011
000
001
47
Multicomputer Systems

Hypercube
Pipelines and grids can be embedded into a
hypercube system.
Example
000 001 011 010 110 111 101 100
Example
000 001 011 010
100 101 111 110

48
Multiprocessor Usage

Scientific and engineering applications often
require loops over large vectors e.g., matrix
elements or points in a grid or 3D mesh.
Applications include
Computational fluid dynamics
Scheduling (airline)
Health and biological modeling
Economics and financial modelling (e.g., option
pricing)

49
Multiprocessor Usage

It should be noted that people have been
developing clusters of machines that are
connected using Ethernet for parallel
applications.
The first such cluster (developed by two
researchers at NASA) had 16 486 machines and was
connected using 10 Mb Ethernet.
This is known as the Beowulf approach to
developing a parallel computing and the clusters
are sometimes called Beowulf clusters.

50
Sharcnet

UWO has taken a leading role in North America in
exploiting the concepts behind the Beowulf
cluster.
High performance clusters Beowulf on steroids
Powerful off the shelf computational elements
Advanced communications
Geographical separation (local use)
Connect clusters emerging optical communications
This is referred to as Shared Hierarchical
Academic Research Computing Network or Sharcnet

51
Sharcnet

One cluster is called Great White
Processors
4 alpha processors 833Mhz (4p-SMP)
4 Gb of memory
38 SMPs a total of 152 processors
Communications
1 Gb/sec ethernet
1.6 Gb/sec quadrics connection
November 2001 183 in the world
Fastest academic computer in Canada
6th fastest academic computer in North America

52
Sharcnet
Great White (in Western Science Building)
53
Sharcnet

Extend Beowulf approach to clusters of high
performance clusters
Connect clusters clusters of clusters
Build on emerging optical communications
Initial configuration used optical equipment from
telecommunications industry
Collectively a supercomputer!

54
Sharcnet
Clusters across Universities (initial cluster)
55
Sharcnet

In 2004, UWO received an investment of 56 million
dollars from the government and private industry
(HP) to expand Sharcnet.
With the new capabilities, Sharcnet could be in
the top 100 or 150 of supercomputers.
Will be the fastest supercomputer of its kind
I.e.,a distributed system where nodes are
clusters.

56
Sharcnet
57
Sharcnet

Applications running on Sharcnet come from all
sorts of domains including
Chemistry
Bioinformatics
Economics
Astrophysics
Material Science and Engineering

58
Networks of Computers

High degree of node heterogeneity
Nodes include PCs, workstations, multimedia
workstations, palmtops, laptops
High degree of network heterogeneity
This includes local-area Ethernet, ATM and
wireless connections.
A distributed system should try to hide these
differences.
In this course, the focus really is in networks
of computers.

59
Software Concepts
System Description Main Goal
DOS Tightly-coupled operating system for multi-processors and homogeneous multicomputers Hide and manage hardware resources
NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients
Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency

An overview between
DOS (Distributed Operating Systems)
NOS (Network Operating Systems)
Middleware

60
Distributed Operating System

OS on each computer knows about the other
computers
OS on different computer is generally the same
Services are generally (transparently)
distributed across computers.

1.14
61
Distributed Operating System

This is harder to implement then a traditional
operating system. Why?
Memory is not shared
No simple global communication
No simple systemwide synchronization mechanisms
May require that OS maintain global memory map in
software.
No central point where resource allocation
decisions can be made.
Only very few truly multicomputer operating
systems exist.

62
Network Operating System

Each computer has its own operating system with
networking facilities
Computers work independently i.e., they may even
have different operating systems
Services are tied to individual nodes (ftp,
telnet, www)
Highly file oriented

63
Middleware

OS on each computer need not know about the other
computers
OS on different computers may be different
Services are generally (transparently)
distributed across computers.

64
Middleware and Openness
1.23

In an open middleware-based distributed system,
the protocols used by each middleware layer
should be the same, as well as the interfaces
they offer to applications.

65
Middleware Services

Communication Services
Hide primitive socket programming
Data management in a distributed system
Naming services
Directory services (e.g., LDAP, search engines)
Location services for tracking mobile objects
Persistent storage facilities
Data caching and replication

66
Middleware Services

Services giving applications control over when,
where and how they access data.
Distributed transaction processing
Code migration
Services for securing processing and
communication
Authentication and authorization services
Simple encryption services
Auditing services
There are varying levels of success in being able
to provide these types of middleware services.

67
Summary

Distributed systems consist of autonomous
computers that work together.
When properly designed, distributed systems can
scale well with respect to the size of the
underlying network.
Sometimes the lines are blurred between a
distributed system and a system that can support
parallel processing.

Write a Comment

User Comments (0)

About PowerShow.com

SHARCnet PowerPoint PPT Presentation