Advanced Operating Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: Advanced Operating Systems

1
Advanced Operating Systems
Lecture 7 Concurrency

University of Tehran
Dept. of EE and Computer Engineering
By
Dr. Nasser Yazdani

2
How to use shared resource

Some general problem and solutions.
References
Fast Mutual Exclusion for Uniprocessors
On Optimistic Methods for Concurrency Control

3
Outline

Introduction
Motivation
Implementing mutual exclusion
Implementing restartable atomic sequence
Kernel design considerations
The performance of three software techniques
Conclusions

4
Why Coordinate?

Critical section
Must execute atomically, without interruption.
Atomicity usually only w.r.t. other operations on
the same data structures.
What are sources of interruption?
Hardware interrupts, UNIX signals.
Thread pre-emption.
Interleaving of multiple CPUs.

5
Spooling Example Correct
Shared memory
Process 1
Process 2
int next_free
int next_free

out
next_free in
1
abc
4
Stores F1 into next_free
Prog.c
5
2
Prog.n
6
innext_free1
in
3
F1
7
next_free in
4
F2
Stores F2 into next_free
5

innext_free1
6
6
Spooling Example Races
Shared memory
Process 1
Process 2
int next_free
int next_free

out
next_free in
1
abc
4
Prog.c
next_free in / value 7 /
5
2
Stores F1 into next_free
Prog.n
6
3
in
F1
7
F2
innext_free1
4
Stores F2 into next_free
5

innext_free1
6
7
Critical Section Problem

N threads all competing to use the same shared
data
It might eventuate to Race condition
Each thread has a code segment, called a critical
section, in which share data is accessed
We need to ensure that when one thread is
executing in its critical section, no other
thread is allowed to execute in its critical
section

8
Critical Region (Critical Section)

Process
while (true)
ENTER CRITICAL SECTION
Access shared variables // Critical Section
LEAVE CRITICAL SECTION
Do other work

9
Critical Region Requirement

Mutual Exclusion
one process must execute within the critical.
Progress
If no Waiting process in its critical section,
any process entry to its critical section cannot
be postponed indefinitely.
No process running outside its critical region
may block other processes
Bounded Wait
A process requesting entry to a critical section
should only have to wait for a bounded number of
other processes to enter and leave the critical
section.
No process should have to wait forever to enter
its critical region
Speed and Number of CPUs
No assumption may be made about speeds or number
of CPUs.

10
Critical Regions (2)

Mutual exclusion using critical regions

11
Synchronization approaches

Disabling Interrupts
Lock Variables
Strict Alternation
Petersons solution
TSL
Sleep and Wakeup
Message sending

12
Disabling Interrupts

How does it work?
Disable all interrupts just after entering a
critical section and re-enable them just before
leaving it.
Why does it work?
With interrupts disabled, no clock interrupts and
switch can occur.
Problems
What if the process forgets to enable the
interrupts?
Multiprocessor? (disabling interrupts only
affects one CPU)
Only used inside OS

13
Lock Variables

Int lock
lock0
While (lock)
lock 1
EnterCriticalSection
access shared variable
LeaveCriticalSection
lock 0
Does the above code work?

14
Strict Alternation

Thread Me / For two threads /
while (true)
while ( turn ! my_thread_id)
Access shared variables // Critical
Section
turn other_thread_id
Do other work
Satisfies mutual exclusion but not progress.
Why?
Notes
While turn ! my_thread_id / busy
waiting/
A lock (turn variable) that uses busy waiting is
called a spin lock

15
Using Flags

int flag2 false, false
Thread Me
while (true)
flagmy_thread_id true
while (flagother_thread_id )
Access shared variables // Critical
Section
flagmy_thread_id false
Do other work
Can block indefinitely
Why? (You go ahead!)

16
Test Set (TSL)

Requires hardware support
Does test and set atomically
char Test_and_Set ( char target)
\\ All done atomically
char temp target
target true
return(temp)

17
Problems with TSL

Operates at motherboard speeds, not CPU.
Much slower than cached load or store.
Prevents other use of the memory system.
Interferes with other CPUs and DMA.
Silly to spin in TSL on a uniprocessor.
Add a thread_yield() after every TSL.

18
Other Similar Hardware Instruction

Swap TSL
void Swap (char x, y)
\\ All done atomically
char temp x
x y
y temp

19
Petersons Solution

int flag2false, false
int turn
Thread Me
while (true)
flagmy_thread_id true
turn other_thread_id
while (flagother_thread_id
and turn other_thread_id )
Access shared variables // Critical
Section
flagmy_thread_id false
Do other work
It works!!!
Why?

20
Sleep and Wakeup

Problem with previous solutions
Busy waiting
Wasting CPU
Priority Inversion
a high priority waits for a low priority to leave
the critical section
the low priority can never execute since the high
priority is not blocked.
Solution sleep and wakeup
When blocked, go to sleep
Wakeup when it is OK to retry entering the
critical section
Semaphore operation that executes sleep and wakeup

21
Semaphores

A semaphore count represents count number of
abstract resources.
New variable having 2 operations
The Down (P) operation is used to acquire a
resource and decrements count.
The Up (V) operation is used to release a
resource and increments count.
Any semaphore operation is indivisible (atomic)
Semaphores solve the problem of the wakeup-bit

22
Whats Up? Whats Down?

Definitions of P and V
Down(S)
while (S lt 0) // no-op
S S-1
Up(S)
S
Counting semaphores 0..N
Binary semaphores 0,1

23
Possible Deadlocks with Semaphores

Example
P0 P1
share two semaphores S and Q
S 1 Q1
Down(S) // S0 ------------gt Down(Q) //Q0
Down(Q) // Q -1 lt--------
--------------------gt Down(S) //
S-1
// P0 blocked // P1 blocked
DEADLOCK
Up(S) Up(Q)
Up(Q) Up(S)

24
Monitor

A simpler way to synchronize
A set of programmer defined operators
monitor monitor-name
// variable declaration
public entry P1(..)
...
......
public entry Pn(..)
...
begin
initialization code
end

25
Monitor Properties

The internal implementation of a monitor type
cannot be accessed directly by the various
threads.
The encapsulation provided by the monitor type
limits access to the local variables only by the
local procedures.
Monitor construct does not allow concurrent
access to all procedures defined within the
monitor.
Only one thread/process can be active within the
monitor at a time.
Synchronization is built in.

26
Cooperating Processors via Message Passing

IPC is best provided by a messaging system
Messaging system and shared memory system are not
mutually exclusive, they can be used
simultaneously within a single OS or single
process
Two basic operations
Send (destination, message)
Receive (source, message)
Message size Fixed or Variable size.
Real life analogy conversation

27
Message Passing
28
Direct Communication

Binds the algorithm to Process name
Sender explicitly names the received or receiver
explicitly names the sender
Send(P,message)
Receive(Q,message)
Link is established automatically between every
pair of processes that want to communicate
Processes must know about each other identity
One link per pair of processes

29
Indirect Communication

send(A,message) / send a message to mailbox A
/
receive(A,message) / receive a message from
mailbox A /
Mailbox is an abstract object into which a
message can be placed to or removed from.
Mailbox is owned either by a process or by the
system

30
Fast Mutual Exclusion for Uniprocessors

Describe restartable atomic sequences (an
optimistic mechanism for implementing atomic
operations on a uniprocessor)
Assumes that short, atomic sequences are rarely
interrupted.
Rely on a recovery mechanisms.
Performance improvements.

31
Motivation of efficient mutual-exclusion

Modern applications use multiple threads
As a program structuring device
As a mechanism for portability to multiprocessors
As a way to manage I/O and server concurrency
Many OSs are build on top of a microkernel
Many services are implemented as multithreaded
user-level applications
Even single threaded programs rely on basic OS
services that are implemented outside the kernel

32
Implementing mutual exclusion on a uniprocessor

Pessimistic methods
Memory-interlocked instruction
Software reservation
Kernel emulation
Restartable atomic sequences

33
Memory-interlocked instruction

Implicitly delays interrupts until the
instruction completes.
Require special hardware support from the
processor and bus.
The cycle time for an interlocked access is
several times greater than that for a
non-interlocked access.

34
Software reservation

Explicitly guards against arbitrary interleaving.
A thread must register its intent to perform an
atomic operation, and then wait.
Examples
Dekkers algorithm
Lamports algorithm
Petersons algorithm

35
Kernel emulation

A strictly uniprocessor solution
Explicitly disables interrupts during operations
that must execute atomically.
Although requires no special hardware, its
runtime cost is high.
The kernel must be invoked on every
synchronization operation

36
Restartable atomic sequence

Instead of using a mechanism that guards against
interrupts, we can instead recognize when an
interrupt occurs and recover.
The recovery process restart the sequence.
Are attractive because
Do not require hardware support.
Have a short code path with one load and store
per atomic read-modify-write.
Do not involve the kernel on every atomic
operation.

37
Implementing restartable atomic sequences

Require kernel support to ensure that a suspended
thread is resumed at the beginning of the
sequence.
Strategies for implementing kernel
Explicit registration in Mach
Designated sequences in Taos

38
Explicit registration in Mach

The kernel keeps track of each address spaces
restartable atomic sequence.
An application registers the starting address and
length of the sequence with kernel.
In response to the failure
Replace restartable atomic sequence with
conventional mechanisms code.

39
Costs of explicit registration

Cost of subroutine linkage
Because the kernel identifies restartable atomic
sequences by a single PC range per address space,
They cannot be inlined.
Cost of checking return PC
Kernel must check the return PC, whenever a
thread is suspended.
Make additional scheduling overhead worthwhile.

40
Designated sequences in Taos

The kernel must recognize every interrupted
sequence.
Uses two-stage check to recognize atomic
sequences.
1st rejects most interrupted code sequences that
are not restartable.
(the opcode of the suspended instruction is
used as an index into a hash table containing
instructions eligible to appear in a restartable
atomic sequence)
2nd uses another table (indexed by opcode)

41
Kernel design considerations

Cost of the two-stage check on every thread
switch
Placement of the PC check
Mutual exclusion in the kernel

42
Placement of the PC check

When should the kernel check/adjust the PC of a
suspended thread?
When it is first suspended.
When it is about to be resumed.
Detection at user level
Whenever a suspended thread is resumed by the
kernel, it returns to a fixed user-level
sequence.
Determine if the thread was suspended within a
restartable atomic sequence.
(complexity and overhead -- save return address
to user-level stack at each suspension)

43
Mutual exclusion in the kernel

The kernel is itself a client of thread
management facilities.
Two events, can trigger a thread switching
Page fault
Thread preemption
Careless ordering of the PC check could lead to
mutual recursion between the thread scheduler and
the virtual memory system.

44
The performance

R.A.S. via Kernel Emulation via Software
reservation
Discuss performance at three levels
Basic overhead of various mechanisms.
Effect on the performance of common thread
management operations.
Effect of mutual exclusion overhead on the
performance of several application.

45
Microbenchmarks

The performance is with test which enters
critical section (TSL) in a loop for 1M
Two version of Lamprot algorith (fast and meta)

46
Thread management overhead

Different thread management packages

Two thread using mutex and condition variable
alternatively
47
Application performance

afs-bench file sys intensive like cp
Parthenon-n theorem prover with n threads
Procon-64 producer-consumer
Thread suspensions for R.A.S of time to check

48
Conclusions

R.A.S. represent a common case approach to
mutual exclusion on a uniprocessor.
R.A.S. are appropriate for uniprocessors that do
not support memory-interlocked atomic
instructions.
Also on processors that do have hardware support
for synchronization, better performance may be
possible.

Advanced Operating Systems PowerPoint PPT Presentation