Processes and Threads

About This Presentation

Title:

Processes and Threads

Description:

Priority inversion: high-priority jobs can be blocked behind low-priority jobs ... examples: Amoeba, Clouds, Plan 9. 24. Use of Idle Workstations ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 70

Provided by: steve1797

Category:

more less

Transcript and Presenter's Notes

Title: Processes and Threads

1
(No Transcript)
2
Problems with Scheduling

Priority systems are ad hoc at best
highest priority always wins
Fair share implemented by adjusting priorities
with a feedback loop
complex mechanism
Priority inversion high-priority jobs can be
blocked behind low-priority jobs
Schedulers are complex and difficult to control
what we need
proportional sharing
dynamic flexibility
simplicity

3
Tickets in Lottery Scheduling

Priority determined by the number of tickets each
process has
Scheduler picks winning ticket randomly, gives
owner the resource
Tickets can be used for a wide variety of
different resources (uniform) and are machine
independent (abstract)

4
Performance Characteristics

If client has probability p of winning, then the
expected number of wins is np. (n of
lotteries)
Variance of binomial distribution np(1-p)
Accuracy improves with n ½
need frequent lotteries
Big picture answer mostly accurate, but
short-term inaccuracies are possible
see Stride scheduling below.

5
Ticket Inflation

Make up your own tickets (print your own money)
Only works among mutually trusting clients
Presumably works best if inflation is temporary
Allows clients to adjust their priority
dynamically with zero communication

6
Ticket Transfer

Basic idea if you are blocked on someone else,
give them your tickets
Example client-server
Server has no tickets of its own
clients give server all of their tickets during
RPC
server's priority is the sum of the priorities of
all of its active clients
server can use lottery scheduling to give
preferential service to high-priority clients
Very elegant solution to a long-standing problem

7
Trust Boundaries

A group contains mutually trusting clients
A unique currency is used inside a group
simplifies mini lottery like mutex inside a group
supports fine-grain allocation decisions
Exchange rate is needed between groups
effect of inflation can be localized to a group

8
Compensation tickets

What happens if a thread is I/O bound and blocks
before its quantum expires?
the thread gets less than its share of the
processor.
Basic idea if you complete fraction f of the
quantum, your tickets are inflated by 1/f until
the next time you win.
example if B on average uses 1/5 of a quantum,
its tickets will be inflated 5x and it will win 5
times as often and get its correct share overall.
What if B alternates between 1/5 and whole
quantums?

9
Implementation

Frequent lotteries mean that lotteries must be
efficient
a fast random number generator
fast selection of ticket based on random number
Ticket selection
straightforward algorithm O(n)
tree-based implementation O(log n)

10
Implementation Ticket Object
11
Currency Graph
12
Problems

Not as fair as we'd like
mutex comes out 1.81 instead of 21,
possible starvation
multimedia apps come out 1.921.501 instead of
321
possible jitter
Every queue is an implicit scheduling decision...
Every spinlock ignores priority...
Can we force it to be unfair? Is there a way to
use compensation tickets to get more time, e.g.,
quit early to get compensation tickets and then
run for the full time next time?
What about kernel cycles? If a process uses a lot
of cycles indirectly, such as through the
ethernet driver, does it get higher priority
implicitly? (probably)

13
Stride Scheduling

Basic idea make a deterministic version to
reduce short-term variability
Mark time virtually using passes as the unit
A process has a stride, which is the number of
passes between executions. Strides are inversely
proportional to the number of tickets, so high
priority jobs have low strides and thus run
often.
Very regular a job with priority p will run
every 1/p passes.

14
Stride Scheduling (contd)

Algorithm (roughly) always pick the job with the
lowest pass number. Updates its pass number by
adding its stride.
Similar mechanism to compensation tickets if a
job uses only fraction f, update its pass number
by instead of just using the stride.
Overall result it is far more accurate than
lottery scheduling and error can be bounded
absolutely instead of probabilistically

15
Stride Scheduling Example
16
Distributed System

Distributed System (DS)
consists of a collection of autonomous computers
linked by a computer network and equipped with
distributed system software.
DS software
enables computers to coordinate their activities
and to share the resources of the system, i.e.,
hardware, software and data.
Users of a DS should perceive a single,
integrated computing facility even though it may
be implemented by many computers in different
locations.

17
Characteristics of Distributed Systems

The following characteristics are primarily
responsible for the usefulness of distributed
systems
Resource Sharing
Openness
Concurrency
Scalability
Fault tolerance
Transparency
They are not automatic consequences of
distribution system and application software
must be carefully designed

18
DESIGN GOALS

Key design goals
Performance, Reliability, Consistency,
Scalability, Security
Basic design issues
Naming
Communication optimize the implementation while
retaining a high level programming model
Software structure structure a system so that
new services can be introduced that will
interwork fully with existing services
Workload allocation deploy the processing,
communication and resources for optimum effect in
the processing of changing workload
Consistency maintenance the maintenance of
consistency at reasonable cost

19
Naming

Distributed systems are based on the sharing of
resources and on the transparency of resource
distribution
Names assigned to resources must
have global meanings that are independent of
location
be supported by a name interpretation system that
can translate names to enable programs to access
the resources
Design issue
design a naming scheme that will scale, and
translate names efficiently to meet appropriate
performance goals

20
Communication

Communication between a pair of processes
involves
transfer of data from the sending process to the
receiving process
synchronization of the receiving process with the
sending process may be required
Programming Primitives
Communication Structure
Client- Server
Group Communication

21
Software Structure

Addition of new service should be easy

Applications
Open services
Distributed programming support
Operating system kernel services
Computer and network hardware
The main categories of software in a distributed
system
22
Workload Allocation

How is work allocated amongst resources in a DS ?
Workstation-Server Model
putting the processor cycles near the user
good for interactive applications
capacity of workstation determines the size of
largest task that can be performed on behalf of
the user
does not optimize the use of processing and
memory resources
a single user with a large computing task is not
able to obtain additional resources
Some modifications of the workstation-server
model
processor pool model, shared memory multiprocessor

23
Processor Pool Model

Processor pool model
allocate processors dynamically to users
a processor pool usually consists of a collection
of low-cost computers
each processor in a pool has an independent
network connection
processors do not have to be homogeneous
processors are allocated to processes for their
lifetime
Users
use a simple computer or X-terminal
a users work can be performed partly or entirely
on the pool processors
examples Amoeba, Clouds, Plan 9

24
Use of Idle Workstations

A significant proportion of workstations on a
network may be unused or be used for lightweight
activities (at some time especially overnight)
The idle workstations can be used to run jobs for
users who are logged on at other stations and do
not have sufficient capacity at their machine
In Sprite OS
the target workstation is chosen transparently by
the system
include a facility for process migration
NOW(Networks of Workstations)
MPP is expensive and workstations are NOT
network is getting faster than any other
components
for what?
network RAM, cooperative file cacheing, software
RAID, parallel computing, etc

25
Consistency Maintenance

Update Consistency
Arises when several processes access and update
data concurrently
changing a data value cannot be performed
instantaneously
desired effect
the update looks atomic - a related set of
changes made by a given process should appear to
all other processes as if it was done
instantaneous
Significant because
many processes share data
operation of system itself depends on the
consistency of file directories managed by file
services, naming databases etc

26
Consistency Maintenance (contd)

Replication Consistency
motivations of data replication
increased availability and performance
if data have been copied to several computers and
subsequently modified at one or more of them,
the possibility of inconsistencies arises between
the values of data items at different computers

27
Consistency Maintenance (contd)

Cache Consistency
cacheing vs replication
same consistency problem as replication
examples
multiprocessor caches
file caches
cluster web server

28
User Requirements

Functionality
What the system should do for users
Quality of Service
issues of performance, reliability and security
Reconfigurability
accommodate changes without causing disruption to
existing services

29
Distributed File System

Introduction
The SUN Network File System
The Andrew File System
The Coda File System
The xFS

30
Introduction

Three practical implementations.
Sun Network File System
Andrew File System
Coda File System
These systems aim to emulate the UNIX file system
interface
Emulation of a UNIX file system interface
caching of file data in client computers is an
essential design feature, but the conventional
UNIX file system offers one-copy update semantics
one-copy update semantics file contents seen by
all of the concurrent processes are those that
they would see if only single copy of the file
contents existed
These three implementations allow some deviation
from one-copy semantics
one-copy model has not been strictly adhered

31
Server Structure

Connectionless
Connection-Oriented
Iterative Server
Concurrent Server

32
Stateful Server
file position is updated here
fopen(...)
fread(fp, nbytes)
file descriptor for client A
data
file system
client A
33
Stateless Server
fopen(fp, read)) fread(.,position.) fclose(fp)
file descriptor for client A
data
file system
client A
file position is updated here
34
The Sun NFS

provide transparent access to remote files for
client programs
each computer has client and server modules in
its kernel
the client and server relationship is symmetric
each computer in an NFS can act as both a client
and a server
larger installations may be configured as
dedicated servers
available for almost every major system

35
The Sun NFS (contd)

Design goals with respect to transparency
Access transparency
An API is identical to the local OSs interface.
Thus, in a UNIX client, no modifications to
existing programs are required for accesses to
remote files.
Location transparency
each client establishes a file name space by
adding remote file systems to its local name
space for each client (mount)
NFS does not enforce a single network-wide file
name space.
each client may see a unique set of name space

36
The Sun NFS (contd)

Failure transparency
NFS server is stateless and most file access
operations are idempotent
UNIX file operations are translated to NFS
operations by an NFS client module
Stateless and idempotent nature of NFS ensures
that failure semantics for remote file access are
similar to those for local file access
Performance transparency
both the client and server employ caching to
achieve satisfactory performance
For clients, the maintenance of cache coherence
is somewhat complex, because several clients may
be using and updating the same file

37
The Sun NFS (contd)

Migration transparency
Mount service
establish the file name space in client computers
file systems may be moved between servers, but
the remote mount tables in each client must then
be separately updated to enable the clients to
access the file system in its new location
migration transparency is not fully achieved by
NFS
Automounter
runs in each NFS client and enables pathnames to
be used that refer to unmounted file systems

38
The Sun NFS (contd)

Replication transparency
NFS does not support file replication in a
general sense
Concurrency transparency
UNIX support only rudimentary locking facilities
for concurrency control
NFS does not aim to improve upon the UNIX
approach to the control of concurrent updates to
files

39
The Sun NFS (contd)

Scalability
Scalability of the NFS is limited.
Due to the lack of replication
The number of clients that can simultaneously
access a shared file is restricted by the
performance of the server that holds the file.
can become a system-wide performance bottleneck
for heavily-used files.

40
Implementation of NFS

User-level client process process using NFS
NFS client and server modules communicate using
remote procedure calling.

41
The Andrew File System

Andrew
a distributed computing environment developed at
CMU
Andrew File System (AFS)
reflects an intention to support
information-sharing on a large scale
provides transparent access to remote shared
files for UNIX programs
scalability is the most important design goal
implemented on workstations and servers running
BSD4.3 UNIX or Mach

42
The Andrew File System (contd)

Two unusual design characteristics
whole-file serving
the entire contents of files are transmitted to
client computers by AFS servers.
whole-file caching
a copy of a file is stored in a cache on the
clients local disk.
the cache is permanent, surviving reboots of the
client computer.

43
The Andrew File System (contd)

The design strategy is based on some assumptions
files are small
reads are much more common than writes (about 6
times)
sequential access is common and random access is
rare
most files are read and written by only one user
temporal locality of reference for files is high
Databases do not fit the design assumptions of
AFS
typically shared by many users and are often
updated quite frequently
DB are treated by its own storage control, anyway

44
Implementation

Some questions about the implementation of AFS
How does AFS gain control when an open or close
system call referring to a file in the shared
file space is issued by a client?
How is the server holding the required file
located?
What space is allocated to cached files in
workstations?
How does AFS ensure that the cached copies of
files are up-to-date when files may be updated by
several clients?

45
Implementation (contd)

Vice name given to the server software that runs
as a user-level UNIX process in each server
computer.
Venus a user-level process that runs in each
client computer.

46
Cache coherence

Callback promise
mechanism for ensuring that cached copies of
files are updated when another client closes the
same file after updating it.
Vice supplies a copy of a file to a Venus with a
callback
callback promises are stored with the cached
files
state of callback promise either valid or
cancelled
When a Vice update a file, it notifies all of
the Venus processes to which it has issued
callback promises by sending a callback
callback is a RPC from a server to a client
(i.e., Venus)
When a Venus receives a callback, it sets the
callback promise token for the relevant file to
cancelled

47
Cache coherence (contd)

Handling open in Venus
If the required file is found in the cache, then
its token is checked.
If its value is cancelled, then get a new copy
If valid, then use it
Restart of a client computer after a failure
some callbacks may have been missed
for each file with a valid token, Venus sends a
timestamp to the server
If timestamp is current, the server responds with
valid.
Otherwise, the server responds with cancelled

48
Cache coherence (contd)

Callback promise renewal interval
Callback promises must be renewed before an open
if a time T (say, 10 minutes) has elapsed without
communication from the server for a cached file
deals with communication failure

49
Update semantics

For a client C operating on a file F on a server
S, the followings are guaranteed
Update semantics for AFS-1
after a successful open latest(F,S)
after a failed open failure(S)
after a successful close updated(F,S)
after a failed close failure(S)
latest(F,S) current value of F at C is the same
as the value at S
failure(S) open or close has not been performed
at S
updated(F,S) Cs value of F has been
successfully propagated to S

50
Update semantics (2)

Update semantics for AFS-2
currency guarantee for open is slightly weaker
after a successful open
latest(F,S,0) or (lostCallback(S,T) and
inCache(F) and latest(F,S,T))
latestes(F,S,T) the copy of F seen by client is
no more than T out of date
lostCallback(S,T) callback message from S to C
has been lost during the last T time
inCache(F) F was in the cache at C before open
was attempted

51
Update semantics (3)

AFS does not provide any further concurrency
control mechanism
If clients in different workstations open, write
and close the same file concurrently,
only the updates from the last close remain and
all others will be silently lost (no error
report)
clients must implement concurrency control
independently if they require it
When two client processes in the same workstation
open a file,
they share the same cached copy, and updates are
performed in the normal UNIX fashion
block-by-block.

52
The Coda File System

Coda File System
a descendent of AFS that addresses several new
requirements CMU
replication for a large scale system
improvement in fault-tolerance
mobile use of portable computers
Goal
constant data availability
provide users with the benefits of a shared file
repository, but allow them to rely entirely on
local resources when the repository is partially
or totally inaccessible
retain the original goals of AFS with regard to
scalability and the emulation of UNIX

53
The Coda File System (contd)

read-write volumes
can be stored on several servers
higher throughput of file accesses and a greater
degree of fault tolerance
Support of disconnected operation
an extension of the mechanism in AFS for caching
copies of files at workstations
enable workstations to operate when disconnected
from the network

54
The Coda File System (contd)

Volume storage group (VSG)
set of servers holding replicas of a file volume
Available volume storage group (AVSG)
some subset of VSG in which a client wishing to
open a file
Callback promise mechanism
Clients are notified of a change, as in AFS
Updates instead of invalidations

55
The Coda File System (contd)

Coda version vector (CVV)
attached to each version of a file
vector of integers with one element for each
server in VSG
server-i1, server-i2, . . ., server-ik
each element of CVV denotes the number of
modifications on the version of the file held at
the corresponding server
Provide information about the update history of
each file version to enable inconsistencies to be
detected and corrected automatically if updates
do not conflict, or with manual intervention if
they do

56
The Coda File System (contd)

Repair of inconsistency
if all the elements of CVV at one site gt those of
all other sites
inconsistency can be automatically repaired
otherwise, the conflict cannot in general be
resolved automatically
the file is marked as inoperable, and the owner
of the file is informed of the conflict
needs a manual intervention

57
The Coda File System (contd)

Scenario
when a modified file is closed, Venus sends to
each site in AVSG an update message (new contents
of the file and CVV)
Vice at each site checks CVV
if consistent, store new contents and returns ACK
Venus increments elements of CVV for the servers
that responded positively to the update message,
and distributes the new CVV to members of AVSG

58
The Coda File System Example

F is a file in a volume replicated at servers S1,
S2 and S3
C1 and C2 clients
VSG for F S1, S2, S3
AVSG for C1 S1, S2, AVSG for C2 S3
Initially, CVVs for F at all three servers are
1, 1, 1
C1 modifies F
CVVs for F at S1 and S2 are 2, 2, 1
C2 modifies F
CVV for F at S3 is 1, 1, 2
No CVV dominates all other CVVs
conflict requiring manual intervention
Suppose F is not modified in step 3 above. Then
2, 2, 1 dominates 1, 1, 1. Thus, the version
of the file at S1 or S2 should replace that at S3

59
Update semantics

The currency guarantees by Coda when a file is
opened at a client are weaker than for AFS
The Guarantee offered by
successful open
It provides the most recent copy of file from the
current AVSG
If no server is accessible, a locally cached copy
of file is used if available.
successful close
The file has been propagated to the currently
accessible set of servers
If no server is available, the file has been
marked for propagation at the earliest
opportunity.

60
Update semantics (contd)

S server, S set of servers (the files VSG)
s the AVSG for the file seen by a client C
after a successful open s ¹ Æ and
(latest(F,s,0) or
(latest(F,s,T) and lostCallback(s,T) and
inCache(F)))
or (s Æ and inCache(F))
after a failed open s ¹ Æ and conflict(F, s)
or (s Æ and Ø inCache(F))
after a successful close s ¹ Æ and updated(F,
s)
or (s Æ)
after a failed close s ¹ Æ and conflict(F, s)
conflict(F, s) means that the values of F at some
servers in s are currently in conflict

61
Cache coherence

Venus at each client must detect the following
events within T seconds
enlargement of AVSG
due to accessibility of a previously inaccessible
server
shrinking of an AVSG
due to a server becoming inaccessible
a lost callback
Multicast messages to VSG

62
xFS

xFS Serverless Network File System
in the paper " A Case for NOW", Experience with
a ...
idea
file system as a parallel program
exploit fast LANs
Cooperative Cacheing
use remote memory to avoid going to disk
manage client memory as a global resource
much of client memory is not used
server get file from client's memory instead of
from disk
better send to idle client than discarding
replaced file copy

63
xFS Cache Coherence

Write Ownership Cache Coherence
each node can own a file
owner has the most up to date copy
server just keeps track of who "owns" file
any request to a file is forwarded to the owner
a file is either
owned only one copy exists
read-only multiple copies
to modify a file,
secure a file as owned
modify as many time as you want
if someone else reads the file, send the up to
date version, and marks the file as read-only

64
xFS Cache Coherence
invalid
write by other node
read
write by other node
write
read-only
write
owned
read by other node
65
xFS Software RAID

Cooperative cacheing makes availability nightmare
any crash will damage a part of a file system
stripe data redundantly over multiple disks
software RAID
reconstruct missing part from remaining parts
logging makes reconstruction easy

66
xFS Software RAID

Motivations
high nadwidth requirements from
multimedia
parallel computing
economic workstations
high speed network
lets learn from RAID
parallel IO from inexpensive hard disks
fault managements
limitations
single server
small write problem

67
xFS Software RAID

Approaches
stripe each file across multiple file servers
small file problems
when stripping units is too small
ideal size is 10s of Kbytes
two reads and two writes for a write (parity
check/build)
when a file is a stripping unit
parity will consume the same space
load cannot be spread across servers

68
xFS Experiences

Need of a formal method for cache coherence
it is much more complicated than it looks
lot of trasient states
3 formal states gt 22 implementation states
ad hoc test-and-retry leaves unknown errorr
permanently
no one is sure about the correctness
software protability is poor

69
xFS Experiences

Threads in a server
it is a nice concept but
it incurs too much concurrency
too much data races
the most difficult thing to understand in the
world
dificult to debug
solutioniterative server
difficult to design but simple to debug
less error-prone
efficient
RPC
not suitable for multi-party communication
need to gather/scatter RPC servers

Write a Comment

User Comments (0)

About PowerShow.com

Processes and Threads - PowerPoint PPT Presentation

Processes and Threads

Priority inversion: high-priority jobs can be blocked behind low-priority jobs ... examples: Amoeba, Clouds, Plan 9. 24. Use of Idle Workstations ... – PowerPoint PPT presentation