Processes and Threads - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Processes and Threads

Description:

Priority inversion: high-priority jobs can be blocked behind low-priority jobs ... examples: Amoeba, Clouds, Plan 9. 24. Use of Idle Workstations ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 70
Provided by: steve1797
Category:

less

Transcript and Presenter's Notes

Title: Processes and Threads


1
(No Transcript)
2
Problems with Scheduling
  • Priority systems are ad hoc at best
  • highest priority always wins
  • Fair share implemented by adjusting priorities
    with a feedback loop
  • complex mechanism
  • Priority inversion high-priority jobs can be
    blocked behind low-priority jobs
  • Schedulers are complex and difficult to control
  • what we need
  • proportional sharing
  • dynamic flexibility
  • simplicity

3
Tickets in Lottery Scheduling
  • Priority determined by the number of tickets each
    process has
  • Scheduler picks winning ticket randomly, gives
    owner the resource
  • Tickets can be used for a wide variety of
    different resources (uniform) and are machine
    independent (abstract)

4
Performance Characteristics
  • If client has probability p of winning, then the
    expected number of wins is np. (n of
    lotteries)
  • Variance of binomial distribution np(1-p)
  • Accuracy improves with n ½
  • need frequent lotteries
  • Big picture answer mostly accurate, but
    short-term inaccuracies are possible
  • see Stride scheduling below.

5
Ticket Inflation
  • Make up your own tickets (print your own money)
  • Only works among mutually trusting clients
  • Presumably works best if inflation is temporary
  • Allows clients to adjust their priority
    dynamically with zero communication

6
Ticket Transfer
  • Basic idea if you are blocked on someone else,
    give them your tickets
  • Example client-server
  • Server has no tickets of its own
  • clients give server all of their tickets during
    RPC
  • server's priority is the sum of the priorities of
    all of its active clients
  • server can use lottery scheduling to give
    preferential service to high-priority clients
  • Very elegant solution to a long-standing problem

7
Trust Boundaries
  • A group contains mutually trusting clients
  • A unique currency is used inside a group
  • simplifies mini lottery like mutex inside a group
  • supports fine-grain allocation decisions
  • Exchange rate is needed between groups
  • effect of inflation can be localized to a group

8
Compensation tickets
  • What happens if a thread is I/O bound and blocks
    before its quantum expires?
  • the thread gets less than its share of the
    processor.
  • Basic idea if you complete fraction f of the
    quantum, your tickets are inflated by 1/f until
    the next time you win.
  • example if B on average uses 1/5 of a quantum,
    its tickets will be inflated 5x and it will win 5
    times as often and get its correct share overall.
  • What if B alternates between 1/5 and whole
    quantums?

9
Implementation
  • Frequent lotteries mean that lotteries must be
    efficient
  • a fast random number generator
  • fast selection of ticket based on random number
  • Ticket selection
  • straightforward algorithm O(n)
  • tree-based implementation O(log n)

10
Implementation Ticket Object
11
Currency Graph
12
Problems
  • Not as fair as we'd like
  • mutex comes out 1.81 instead of 21,
  • possible starvation
  • multimedia apps come out 1.921.501 instead of
    321
  • possible jitter
  • Every queue is an implicit scheduling decision...
  • Every spinlock ignores priority...
  • Can we force it to be unfair? Is there a way to
    use compensation tickets to get more time, e.g.,
    quit early to get compensation tickets and then
    run for the full time next time?
  • What about kernel cycles? If a process uses a lot
    of cycles indirectly, such as through the
    ethernet driver, does it get higher priority
    implicitly? (probably)

13
Stride Scheduling
  • Basic idea make a deterministic version to
    reduce short-term variability
  • Mark time virtually using passes as the unit
  • A process has a stride, which is the number of
    passes between executions. Strides are inversely
    proportional to the number of tickets, so high
    priority jobs have low strides and thus run
    often.
  • Very regular a job with priority p will run
    every 1/p passes.

14
Stride Scheduling (contd)
  • Algorithm (roughly) always pick the job with the
    lowest pass number. Updates its pass number by
    adding its stride.
  • Similar mechanism to compensation tickets if a
    job uses only fraction f, update its pass number
    by instead of just using the stride.
  • Overall result it is far more accurate than
    lottery scheduling and error can be bounded
    absolutely instead of probabilistically

15
Stride Scheduling Example
16
Distributed System
  • Distributed System (DS)
  • consists of a collection of autonomous computers
    linked by a computer network and equipped with
    distributed system software.
  • DS software
  • enables computers to coordinate their activities
    and to share the resources of the system, i.e.,
    hardware, software and data.
  • Users of a DS should perceive a single,
    integrated computing facility even though it may
    be implemented by many computers in different
    locations.

17
Characteristics of Distributed Systems
  • The following characteristics are primarily
    responsible for the usefulness of distributed
    systems
  • Resource Sharing
  • Openness
  • Concurrency
  • Scalability
  • Fault tolerance
  • Transparency
  • They are not automatic consequences of
    distribution system and application software
    must be carefully designed

18
DESIGN GOALS
  • Key design goals
  • Performance, Reliability, Consistency,
    Scalability, Security
  • Basic design issues
  • Naming
  • Communication optimize the implementation while
    retaining a high level programming model
  • Software structure structure a system so that
    new services can be introduced that will
    interwork fully with existing services
  • Workload allocation deploy the processing,
    communication and resources for optimum effect in
    the processing of changing workload
  • Consistency maintenance the maintenance of
    consistency at reasonable cost

19
Naming
  • Distributed systems are based on the sharing of
    resources and on the transparency of resource
    distribution
  • Names assigned to resources must
  • have global meanings that are independent of
    location
  • be supported by a name interpretation system that
    can translate names to enable programs to access
    the resources
  • Design issue
  • design a naming scheme that will scale, and
    translate names efficiently to meet appropriate
    performance goals

20
Communication
  • Communication between a pair of processes
    involves
  • transfer of data from the sending process to the
    receiving process
  • synchronization of the receiving process with the
    sending process may be required
  • Programming Primitives
  • Communication Structure
  • Client- Server
  • Group Communication

21
Software Structure
  • Addition of new service should be easy

Applications
Open services
Distributed programming support
Operating system kernel services
Computer and network hardware
The main categories of software in a distributed
system
22
Workload Allocation
  • How is work allocated amongst resources in a DS ?
  • Workstation-Server Model
  • putting the processor cycles near the user
    good for interactive applications
  • capacity of workstation determines the size of
    largest task that can be performed on behalf of
    the user
  • does not optimize the use of processing and
    memory resources
  • a single user with a large computing task is not
    able to obtain additional resources
  • Some modifications of the workstation-server
    model
  • processor pool model, shared memory multiprocessor

23
Processor Pool Model
  • Processor pool model
  • allocate processors dynamically to users
  • a processor pool usually consists of a collection
    of low-cost computers
  • each processor in a pool has an independent
    network connection
  • processors do not have to be homogeneous
  • processors are allocated to processes for their
    lifetime
  • Users
  • use a simple computer or X-terminal
  • a users work can be performed partly or entirely
    on the pool processors
  • examples Amoeba, Clouds, Plan 9

24
Use of Idle Workstations
  • A significant proportion of workstations on a
    network may be unused or be used for lightweight
    activities (at some time especially overnight)
  • The idle workstations can be used to run jobs for
    users who are logged on at other stations and do
    not have sufficient capacity at their machine
  • In Sprite OS
  • the target workstation is chosen transparently by
    the system
  • include a facility for process migration
  • NOW(Networks of Workstations)
  • MPP is expensive and workstations are NOT
  • network is getting faster than any other
    components
  • for what?
  • network RAM, cooperative file cacheing, software
    RAID, parallel computing, etc

25
Consistency Maintenance
  • Update Consistency
  • Arises when several processes access and update
    data concurrently
  • changing a data value cannot be performed
    instantaneously
  • desired effect
  • the update looks atomic - a related set of
    changes made by a given process should appear to
    all other processes as if it was done
    instantaneous
  • Significant because
  • many processes share data
  • operation of system itself depends on the
    consistency of file directories managed by file
    services, naming databases etc

26
Consistency Maintenance (contd)
  • Replication Consistency
  • motivations of data replication
  • increased availability and performance
  • if data have been copied to several computers and
    subsequently modified at one or more of them,
  • the possibility of inconsistencies arises between
    the values of data items at different computers

27
Consistency Maintenance (contd)
  • Cache Consistency
  • cacheing vs replication
  • same consistency problem as replication
  • examples
  • multiprocessor caches
  • file caches
  • cluster web server

28
User Requirements
  • Functionality
  • What the system should do for users
  • Quality of Service
  • issues of performance, reliability and security
  • Reconfigurability
  • accommodate changes without causing disruption to
    existing services

29
Distributed File System
  • Introduction
  • The SUN Network File System
  • The Andrew File System
  • The Coda File System
  • The xFS

30
Introduction
  • Three practical implementations.
  • Sun Network File System
  • Andrew File System
  • Coda File System
  • These systems aim to emulate the UNIX file system
    interface
  • Emulation of a UNIX file system interface
  • caching of file data in client computers is an
    essential design feature, but the conventional
    UNIX file system offers one-copy update semantics
  • one-copy update semantics file contents seen by
    all of the concurrent processes are those that
    they would see if only single copy of the file
    contents existed
  • These three implementations allow some deviation
    from one-copy semantics
  • one-copy model has not been strictly adhered

31
Server Structure
  • Connectionless
  • Connection-Oriented
  • Iterative Server
  • Concurrent Server

32
Stateful Server
file position is updated here
fopen(...)
fread(fp, nbytes)
file descriptor for client A
data
file system
client A
33
Stateless Server
fopen(fp, read)) fread(.,position.) fclose(fp)
file descriptor for client A
data
file system
client A
file position is updated here
34
The Sun NFS
  • provide transparent access to remote files for
    client programs
  • each computer has client and server modules in
    its kernel
  • the client and server relationship is symmetric
  • each computer in an NFS can act as both a client
    and a server
  • larger installations may be configured as
    dedicated servers
  • available for almost every major system

35
The Sun NFS (contd)
  • Design goals with respect to transparency
  • Access transparency
  • An API is identical to the local OSs interface.
    Thus, in a UNIX client, no modifications to
    existing programs are required for accesses to
    remote files.
  • Location transparency
  • each client establishes a file name space by
    adding remote file systems to its local name
    space for each client (mount)
  • NFS does not enforce a single network-wide file
    name space.
  • each client may see a unique set of name space

36
The Sun NFS (contd)
  • Failure transparency
  • NFS server is stateless and most file access
    operations are idempotent
  • UNIX file operations are translated to NFS
    operations by an NFS client module
  • Stateless and idempotent nature of NFS ensures
    that failure semantics for remote file access are
    similar to those for local file access
  • Performance transparency
  • both the client and server employ caching to
    achieve satisfactory performance
  • For clients, the maintenance of cache coherence
    is somewhat complex, because several clients may
    be using and updating the same file

37
The Sun NFS (contd)
  • Migration transparency
  • Mount service
  • establish the file name space in client computers
  • file systems may be moved between servers, but
    the remote mount tables in each client must then
    be separately updated to enable the clients to
    access the file system in its new location
  • migration transparency is not fully achieved by
    NFS
  • Automounter
  • runs in each NFS client and enables pathnames to
    be used that refer to unmounted file systems

38
The Sun NFS (contd)
  • Replication transparency
  • NFS does not support file replication in a
    general sense
  • Concurrency transparency
  • UNIX support only rudimentary locking facilities
    for concurrency control
  • NFS does not aim to improve upon the UNIX
    approach to the control of concurrent updates to
    files

39
The Sun NFS (contd)
  • Scalability
  • Scalability of the NFS is limited.
  • Due to the lack of replication
  • The number of clients that can simultaneously
    access a shared file is restricted by the
    performance of the server that holds the file.
  • can become a system-wide performance bottleneck
    for heavily-used files.

40
Implementation of NFS
  • User-level client process process using NFS
  • NFS client and server modules communicate using
    remote procedure calling.

41
The Andrew File System
  • Andrew
  • a distributed computing environment developed at
    CMU
  • Andrew File System (AFS)
  • reflects an intention to support
    information-sharing on a large scale
  • provides transparent access to remote shared
    files for UNIX programs
  • scalability is the most important design goal
  • implemented on workstations and servers running
    BSD4.3 UNIX or Mach

42
The Andrew File System (contd)
  • Two unusual design characteristics
  • whole-file serving
  • the entire contents of files are transmitted to
    client computers by AFS servers.
  • whole-file caching
  • a copy of a file is stored in a cache on the
    clients local disk.
  • the cache is permanent, surviving reboots of the
    client computer.

43
The Andrew File System (contd)
  • The design strategy is based on some assumptions
  • files are small
  • reads are much more common than writes (about 6
    times)
  • sequential access is common and random access is
    rare
  • most files are read and written by only one user
  • temporal locality of reference for files is high
  • Databases do not fit the design assumptions of
    AFS
  • typically shared by many users and are often
    updated quite frequently
  • DB are treated by its own storage control, anyway

44
Implementation
  • Some questions about the implementation of AFS
  • How does AFS gain control when an open or close
    system call referring to a file in the shared
    file space is issued by a client?
  • How is the server holding the required file
    located?
  • What space is allocated to cached files in
    workstations?
  • How does AFS ensure that the cached copies of
    files are up-to-date when files may be updated by
    several clients?

45
Implementation (contd)
  • Vice name given to the server software that runs
    as a user-level UNIX process in each server
    computer.
  • Venus a user-level process that runs in each
    client computer.

46
Cache coherence
  • Callback promise
  • mechanism for ensuring that cached copies of
    files are updated when another client closes the
    same file after updating it.
  • Vice supplies a copy of a file to a Venus with a
    callback
  • callback promises are stored with the cached
    files
  • state of callback promise either valid or
    cancelled
  • When a Vice update a file, it notifies all of
    the Venus processes to which it has issued
    callback promises by sending a callback
  • callback is a RPC from a server to a client
    (i.e., Venus)
  • When a Venus receives a callback, it sets the
    callback promise token for the relevant file to
    cancelled

47
Cache coherence (contd)
  • Handling open in Venus
  • If the required file is found in the cache, then
    its token is checked.
  • If its value is cancelled, then get a new copy
  • If valid, then use it
  • Restart of a client computer after a failure
  • some callbacks may have been missed
  • for each file with a valid token, Venus sends a
    timestamp to the server
  • If timestamp is current, the server responds with
    valid.
  • Otherwise, the server responds with cancelled

48
Cache coherence (contd)
  • Callback promise renewal interval
  • Callback promises must be renewed before an open
    if a time T (say, 10 minutes) has elapsed without
    communication from the server for a cached file
  • deals with communication failure

49
Update semantics
  • For a client C operating on a file F on a server
    S, the followings are guaranteed
  • Update semantics for AFS-1
  • after a successful open latest(F,S)
  • after a failed open failure(S)
  • after a successful close updated(F,S)
  • after a failed close failure(S)
  • latest(F,S) current value of F at C is the same
    as the value at S
  • failure(S) open or close has not been performed
    at S
  • updated(F,S) Cs value of F has been
    successfully propagated to S

50
Update semantics (2)
  • Update semantics for AFS-2
  • currency guarantee for open is slightly weaker
  • after a successful open
  • latest(F,S,0) or (lostCallback(S,T) and
    inCache(F) and latest(F,S,T))
  • latestes(F,S,T) the copy of F seen by client is
    no more than T out of date
  • lostCallback(S,T) callback message from S to C
    has been lost during the last T time
  • inCache(F) F was in the cache at C before open
    was attempted

51
Update semantics (3)
  • AFS does not provide any further concurrency
    control mechanism
  • If clients in different workstations open, write
    and close the same file concurrently,
  • only the updates from the last close remain and
    all others will be silently lost (no error
    report)
  • clients must implement concurrency control
    independently if they require it
  • When two client processes in the same workstation
    open a file,
  • they share the same cached copy, and updates are
    performed in the normal UNIX fashion
    block-by-block.

52
The Coda File System
  • Coda File System
  • a descendent of AFS that addresses several new
    requirements CMU
  • replication for a large scale system
  • improvement in fault-tolerance
  • mobile use of portable computers
  • Goal
  • constant data availability
  • provide users with the benefits of a shared file
    repository, but allow them to rely entirely on
    local resources when the repository is partially
    or totally inaccessible
  • retain the original goals of AFS with regard to
    scalability and the emulation of UNIX

53
The Coda File System (contd)
  • read-write volumes
  • can be stored on several servers
  • higher throughput of file accesses and a greater
    degree of fault tolerance
  • Support of disconnected operation
  • an extension of the mechanism in AFS for caching
    copies of files at workstations
  • enable workstations to operate when disconnected
    from the network

54
The Coda File System (contd)
  • Volume storage group (VSG)
  • set of servers holding replicas of a file volume
  • Available volume storage group (AVSG)
  • some subset of VSG in which a client wishing to
    open a file
  • Callback promise mechanism
  • Clients are notified of a change, as in AFS
  • Updates instead of invalidations

55
The Coda File System (contd)
  • Coda version vector (CVV)
  • attached to each version of a file
  • vector of integers with one element for each
    server in VSG
  • server-i1, server-i2, . . ., server-ik
  • each element of CVV denotes the number of
    modifications on the version of the file held at
    the corresponding server
  • Provide information about the update history of
    each file version to enable inconsistencies to be
    detected and corrected automatically if updates
    do not conflict, or with manual intervention if
    they do

56
The Coda File System (contd)
  • Repair of inconsistency
  • if all the elements of CVV at one site gt those of
    all other sites
  • inconsistency can be automatically repaired
  • otherwise, the conflict cannot in general be
    resolved automatically
  • the file is marked as inoperable, and the owner
    of the file is informed of the conflict
  • needs a manual intervention

57
The Coda File System (contd)
  • Scenario
  • when a modified file is closed, Venus sends to
    each site in AVSG an update message (new contents
    of the file and CVV)
  • Vice at each site checks CVV
  • if consistent, store new contents and returns ACK
  • Venus increments elements of CVV for the servers
    that responded positively to the update message,
    and distributes the new CVV to members of AVSG

58
The Coda File System Example
  • F is a file in a volume replicated at servers S1,
    S2 and S3
  • C1 and C2 clients
  • VSG for F S1, S2, S3
  • AVSG for C1 S1, S2, AVSG for C2 S3
  • Initially, CVVs for F at all three servers are
    1, 1, 1
  • C1 modifies F
  • CVVs for F at S1 and S2 are 2, 2, 1
  • C2 modifies F
  • CVV for F at S3 is 1, 1, 2
  • No CVV dominates all other CVVs
  • conflict requiring manual intervention
  • Suppose F is not modified in step 3 above. Then
    2, 2, 1 dominates 1, 1, 1. Thus, the version
    of the file at S1 or S2 should replace that at S3

59
Update semantics
  • The currency guarantees by Coda when a file is
    opened at a client are weaker than for AFS
  • The Guarantee offered by
  • successful open
  • It provides the most recent copy of file from the
    current AVSG
  • If no server is accessible, a locally cached copy
    of file is used if available.
  • successful close
  • The file has been propagated to the currently
    accessible set of servers
  • If no server is available, the file has been
    marked for propagation at the earliest
    opportunity.

60
Update semantics (contd)
  • S server, S set of servers (the files VSG)
  • s the AVSG for the file seen by a client C
  • after a successful open s ¹ Æ and
    (latest(F,s,0) or
  • (latest(F,s,T) and lostCallback(s,T) and
    inCache(F)))
  • or (s Æ and inCache(F))
  • after a failed open s ¹ Æ and conflict(F, s)
  • or (s Æ and Ø inCache(F))
  • after a successful close s ¹ Æ and updated(F,
    s)
  • or (s Æ)
  • after a failed close s ¹ Æ and conflict(F, s)
  • conflict(F, s) means that the values of F at some
    servers in s are currently in conflict

61
Cache coherence
  • Venus at each client must detect the following
    events within T seconds
  • enlargement of AVSG
  • due to accessibility of a previously inaccessible
    server
  • shrinking of an AVSG
  • due to a server becoming inaccessible
  • a lost callback
  • Multicast messages to VSG

62
xFS
  • xFS Serverless Network File System
  • in the paper " A Case for NOW", Experience with
    a ...
  • idea
  • file system as a parallel program
  • exploit fast LANs
  • Cooperative Cacheing
  • use remote memory to avoid going to disk
  • manage client memory as a global resource
  • much of client memory is not used
  • server get file from client's memory instead of
    from disk
  • better send to idle client than discarding
    replaced file copy

63
xFS Cache Coherence
  • Write Ownership Cache Coherence
  • each node can own a file
  • owner has the most up to date copy
  • server just keeps track of who "owns" file
  • any request to a file is forwarded to the owner
  • a file is either
  • owned only one copy exists
  • read-only multiple copies
  • to modify a file,
  • secure a file as owned
  • modify as many time as you want
  • if someone else reads the file, send the up to
    date version, and marks the file as read-only

64
xFS Cache Coherence
invalid
write by other node
read
write by other node
write
read-only
write
owned
read by other node
65
xFS Software RAID
  • Cooperative cacheing makes availability nightmare
  • any crash will damage a part of a file system
  • stripe data redundantly over multiple disks
  • software RAID
  • reconstruct missing part from remaining parts
  • logging makes reconstruction easy

66
xFS Software RAID
  • Motivations
  • high nadwidth requirements from
  • multimedia
  • parallel computing
  • economic workstations
  • high speed network
  • lets learn from RAID
  • parallel IO from inexpensive hard disks
  • fault managements
  • limitations
  • single server
  • small write problem

67
xFS Software RAID
  • Approaches
  • stripe each file across multiple file servers
  • small file problems
  • when stripping units is too small
  • ideal size is 10s of Kbytes
  • two reads and two writes for a write (parity
    check/build)
  • when a file is a stripping unit
  • parity will consume the same space
  • load cannot be spread across servers

68
xFS Experiences
  • Need of a formal method for cache coherence
  • it is much more complicated than it looks
  • lot of trasient states
  • 3 formal states gt 22 implementation states
  • ad hoc test-and-retry leaves unknown errorr
    permanently
  • no one is sure about the correctness
  • software protability is poor

69
xFS Experiences
  • Threads in a server
  • it is a nice concept but
  • it incurs too much concurrency
  • too much data races
  • the most difficult thing to understand in the
    world
  • dificult to debug
  • solutioniterative server
  • difficult to design but simple to debug
  • less error-prone
  • efficient
  • RPC
  • not suitable for multi-party communication
  • need to gather/scatter RPC servers
Write a Comment
User Comments (0)
About PowerShow.com