Title: Distributed Systems
1Distributed Systems
- Session 8 Concurrency Control
- Christos Kloukinas
- Dept. of Computing
- City University London
2Last session
- 1 Location Transparency
- Not a good idea to hard code location information
in components--gt Migration difficult - 2 Naming
- Associating external names to references
- 3 Trading
- looking up servers by what services they offer
30.1 Naming
- 1 Naming Service Examples
- e.g NFS, X.500, DNS
- 2 Common Characteristics
- External names, hierarchies, contexts,
persistence of bindings, resolve and bind
operations. - 3 CORBA Naming Service
- interface NamingContext
- 4 Limitations- not always the case that we know
names
40.2. Java Example Client Finding Objects
- ORB ORB.init(args,null)
- 1. org.omg.CORBA.Object objRef
org.omg.CORBA.resolve_initial_references
("NameService") - CosNaming.NamingContext root
CosNaming.NamingContextHelper.narrow(objRef) - 2. CosNaming.NameComponent name
- new NameComponent(UEFA,ORG),
- new NameComponent(England,Country),
- new NameComponent(Premier,League),
- new NameComponent(Arsenal,Club)
- 3. Team tTeamHelper.narrow(root.resolve(name))
- 4. t.print()
Transparently get the naming service
Casting
50.3 Trading
- Characteristics
- Need a trader (mediator), Quality of service,
language to express quality of service. - Quality of service can be expressed statically
(e.g. privacy, precision) or dynamically (e.g
performance) - Service matching and service shopping
- Example Video on Demand
- OMG/CORBA Trading Service
6Session 8 - Outline
- 1 Motivation
- 2 Concurrency Control Techniques
- 3 CORBA Concurrency Control Service
- 4 Summary
71 Motivation
- How can multiple components in a distributed
system use a shared component concurrently
without violating the integrity of the component? - This question is of fundamental importance as
there are only very few distributed systems where
all components are only used by a single
component at a time.
81 Motivation (ctd.)
- Resources maintained concurrently may be hardware
components (e.g. a printer), operating system
resources (e.g. files or sockets), databases
(e.g. the bank accounts kept by different banks)
or CORBA objects. - For some types of accesses, resources may have to
be accessed in mutual exclusion - It does not make sense to have print jobs of
different users being printed in an interleaved
way - Only one user should be editing a file at a time,
otherwise the changes made by other users would
be overwritten if the last user saves his or her
file - integrity of databases or CORBA objects may be
lost through concurrent updates. - Hence, the need arises to restrict the concurrent
access of multiple components to a shared
resource in a sensible way.
91 Motivation (ctd.)
- Concurrent access and updates of resources which
maintain state information may lead to - lost updates
- inconsistent analysis
- Motivating example for lost updates
- Cash withdrawal from ATM and concurrent
- Credit of cheque
- Motivating example for inconsistent analysis
- Funds transfer between accounts of one customer
- Sum of account balances (Report for Inland
Revenue)
101 Motivating Examples
- class Account
- protected float balance
- public float get_balance() return balance
- void debit(float amount)
- float newbalance-amount
- balancenew
-
- void credit(float amount)
- float newbalanceamount
- balancenew
-
The object stores the balance in the instance
variable balance. The object can return the
current balance through operation
get_balance().
The debit() operation subtracts the amount
passed as a parameter from the balance and the
credit() operation adds the amount passed as a
parameter.
111 Lost Updates
Balance of account anAcc at t0 is 75
Customer_at_ATM
Clerk_at_Counter
t0
anAcc.debit(50) new25 balance25
t1
anAcc.credit(50) new125 balance125
t2
t3
t4
t5
t6
Time WRITER
WRITER
121 Inconsistent Analysis
Balances at t0 Acc1 7500, Acc2 0
Funds transfer
Inland Revenue Report
t0
Acc1.debit(7500) Acc1.new0
Acc1.balance0 Acc2.credit(7500)
Acc2.new7500 Acc2.balance7500
t1
float sum0 sumAcc2.get_bal() //
sum0 sumAcc1.get_bal() // sum0
t2
t3
t4
t5
t6
t7
Time WRITER
READER
132 Concurrency Control Techniques
- 1 Assessment Criteria
- 2 Pessimistic Concurrency Control
- e.g. Two Phase Locking (2PL)
- 3 Optimistic Concurrency Control
- 4 Comparison
14Concurrency Control Techniques
- Ensures integrity of shared resource amidst
concurrent access - e.g in database, ensures users from editing same
record at the same time - concerned with serialising transactions, ensuring
safe execution - resolving conflicts and deadlocks
- ensuring fairness among concurrent processes
- restoring component integrity
152.1 Assessment Criteria
- Serialisability Concurrent threads are
serialisable, if they can be executed one after
another and have the same effect on shared
resources. It can be proven that serialisable
threads do not lead to lost updates and
inconsistent analysis. - Deadlock freedom Concurrency control techniques
that use locking may force threads to wait for
other threads to release a lock before they can
access a resource. This may lead to situations
where the wait-for relationship is cyclic and
threads are deadlocked. - Fairness refers to the fact whether all threads
have the same chances to get access to resources. - Complexity On the other hand to compute
precisely those and only those schedules that are
serialisable may be very complex and we are
interested in the complexity that a concurrency
control schedule has in order to estimate its
performance overhead. - Concurrency!!! We are also interested in the
degree of concurrency that a control scheme
allows threads to perform. It is obviously
undesirable to restrict schedules that do not
cause serialisability problems.
16Concurrency Control Techniques Families
- Pessimistic
- Assumes that collisions are likely to occur.
Locks are used. - Changes are consistent and safe
- - Is not scalable
- Optimistic
- The idea is that you accept the fact that
collisions occur infrequently, and instead of
trying to prevent them you simply choose to
detect them and then resolve the collision when
it does occur. - Uses timestamps, and actions can be rolled back
172.2 Two Phase Locking (2PL)
- The most popular concurrency control technique.
Used in - RDBMSs (Oracle, Ingres, Sybase, DB/2, etc.)
- ODBMSs (O2, ObjectStore, Versant, etc.)
- Transaction Monitors (CICS, etc)
- The principal component that implements 2PL is a
lock manager from which concurrent processes or
threads acquire locks on every shared resource
they access. - The lock manager investigates the request and
compares it with the locks that were already
granted on the resource . - If the requested lock does not conflict with an
already granted lock, the lock manager will grant
the lock and note that the requester is now using
the resource.
18Terminology
- Locks and Locksets
- Locking
- Lock Compatibility
- Locking Conflict
- Deadlocks
- Waiting graph
- Locking granularity
- Hierarchical Locking
- Locking transparency
192.2 Locks
- A lock is a token that indicates that a process
accesses a resource in a particular mode. - Minimal lock modes read and write.
- Locks are used to indicate to concurrent
processes or threads the way in which a resource
is used. - The lock manager, therefore, maintains a set of
locks for each resource I.e. associates locksets
with every shared object
202.2 Locking
- Processes acquire locks before they access
shared resources and release locks afterwards. - 2PL Processes do not acquire locks once they
have released a lock. - Typical 2PL locking profile of a process
Number of locks held
Time
212.2 Locking
- 2PL is based on the assumption that processes or
threads always acquire locks before they access a
shared resource and that they release a lock if
they do not need the resource anymore. - In 2PL, processes do not acquire locks once they
have released a lock. - This means that threads operate in cycles where
there is a lock acquisition phase and a lock
release phase in each cycle. - 2PL has its name due to these two phases.
222.2 Lock Compatibility
- The lock manager grants locks to requesting
processes or threads on the basis of already
granted locks and their compatibility with the
requested lock. - The very core of any pessimistic concurrency
control technique that is based on locking is the
definition of a lock compatibility matrix. It
defines the different lock modes and the
compatibility between them. - Minimal lock compatibility matrix
232.2 Locking Conflicts
- Locking conflict When access cannot be granted
due to incompatibility between requested lock and
previously-granted lock - On the occasion of a locking conflict,
- Requester cannot use the resource until the
conflicting lock has been released. - There are two approaches to handle locking
conflicts. - The requesting process can be forced to wait
until the conflicting locks are released. This
may, however, be too restrictive since the
process or thread may well do other computations
in between. - Alert the process or thread that the lock cannot
be granted. It can then continue with other
processing until a point in time when it
definitely needs to get access to the resource. - Several 2PL implementations provide two locking
operations, a blocking and a non-blocking one, so
the requester can decide.
242.2 Example (Avoiding Lost Updates)
Balance of account anAcc at t0 is 75
Customer_at_ATM
Clerk_at_Counter
anAcc.debit(50) anAcc.lock(write)
new75-5025 balance25 anAcc.unlock(write)
t0
anAcc.credit(50) anAcc.lock(write)
new255075 balance75 anAcc.unlock(write)
t1
t2
t3
t4
t5
t6
Time
252.2 Example (Avoiding Lost Updates)
- Before the account objects are changed, the debit
and credit operations request a lock on the
account object from the lock manager. - Then the lock manager detects a write/write
locking conflict and forces the second process to
wait until the first process has released its
lock. Then the second process reads the
up-to-date value of the balance of the account
and modifies it without loosing the update of the
first process.
262.2 Deadlocks
- Recall that lock manager may force processes or
threads to wait for other processes to release
locks. - This solves problem of lost update and
inconsistent analysis. - Processes may request locks for more than one
object - Situations may arise where two or more processes
or threads are mutually waiting for each other to
release their locks.. - These situations are called deadlocks and
- Very undesirable as they block threads and
prevent them from finishing their jobs. - Hence 2PL is NOT deadlock-free.
27Waiting Graph
p4
p2
p1
p3
p9
p6
p5
p8
p7
In this process waiting graph, the four processes
P1, P2,P3,P7 are in a deadlock
282.2.1 Deadlock Detection and Resolution
- Deadlocks are resolved by lock managers.
- Manager maintains up-to-date representation of
the waiting graph. - Manager records every locking conflict by
inserting a graph edge. - Also when a conflict is resolved by releasing a
conflicting lock the respective edge has to be
deleted. - Manager uses waiting graph to detect deadlocks.
- Resolution Break cycles, i.e. select one process
or thread that participates in such a cycle and
abort it. - Select a node that has maximum incoming or
outgoing edges to reduce chances of further
deadlock - An abortion of a process requires to undo all
actions that the process has performed and to
release all locks the process has held!!!
292.2 Locking Granularity
- Observation Objects that are accessed
concurrently are often contained in more coarse
grained composite objects e.g - Directories can contain other directories, files
are contained in directories, files have records - Relational databases contain a set of tables,
which contain a set of tuples, which contain
attributes or - Distributed composite objects may act as
containers for component objects, which may again
contain other objects - A normal access pattern is to visit all or a
large subset of the objects that are contained. - Concurrency control manager can save effort by
exploiting containment hierarchies.
302.2.1 Locking Granularity
- Two phase locking is applicable to resources of
any granularity. - It works for CORBA objects as well as for files
and directories or even complete databases. - However, the degree of concurrency that is
achieved with 2PL depends on the granularity that
is used for locking. - A high degree of concurrency is achieved with
small locking granules. - The disadvantage of choosing a small locking
granularity is that a huge number of locks have
to be acquired if bigger granules have to be
locked. - Trade-off Degree of concurrency Vs locking
overhead. - If we decrease the granularity we can process
more processes concurrently but have to be
prepared to spend higher costs for the management
of locks. - The dilemma can be resolved using an
optimisation, which is hierarchical locking.
312.2.2 Containment Hierarchy
Bank
Bank
G2
Gn
G1
Group of Branches
B1
B2
Bn
Branches
Containment hierarchy of account objects
Accounts
322.3 Hierarchical Locking
- Allows locking of all objects contained in a
composite object (container). - BUT also allows a process to indicate, at
container level, the sub-resources that it is
intending to use in a particular mode. - The hierarchical locking schemes therefore
introduce intention locks, such as intention read
and intention write locks. - I.e intention locks are acquired for a composite
object before a process requests a real lock for
an object that is contained in the composite
object. - Intention locks signal to those processes that
wish to lock entire composite object that some
other processes currently has locks for objects
contained in composite object
332.3.1 Hierarchical Locking
- Intention Read Indicate that some process has or
is about to acquire read lock on the objects
inside a composite object - Intention Write indicate that some process has
or is about to acquire write locks on object in
composite object. - Processes that want to lock a certain resource
would then acquire intention locks on the
container of that resource and all its
containers. - The lock compatibility matrix is defined in a way
that a locking conflict will arise if a container
object is already locked in either read or write
mode.
342.3.2 Hierarchical Locking
- NB Intention read and intention write are
compatible because they do not actually
correspond to any locks. - Other modes
- IR lock is compatible with R lock because
accessing object for reading does not change
values - IR lock is incompatible with W lock because it
is not possible to modify every element of the
composite object while some other process process
is reading the state of an object of the
composite - etc etc
- Hence the advantage of hierarchical locking is
that it - enables different lock granularities to be used
at the same time - Overhead is that for every individual object
intention locks have to be used on every
composite object in which the object is
contained. (may be contained in more than one
containers)
352.4 Transparency of Locking
- The last question that we have to discuss is WHO
is acquiring the locks, i.e. who invokes the lock
operation for a resource. The options are - the concurrency control infrastructure, such as
the concurrency control manager of a database
management system - the implementation of components or
- the clients of the components.
- The first option is very much desirable as then
concurrency control would be transparent to the
application programmers of both the component and
its clients. - Unfortunately this is only possible on limited
occasions (in a database system) because the
concurrency control manager would have to manage
all resources and it would have to be informed
about every single resource access. - The last option is very undesirable and it is in
fact always avoidable. Hence distributed
components should be designed in a way that
concurrency control is hidden within their
implementation and not exposed at their interface
and is transparent to designers of CLIENTS
362.4 Optimistic Concurrency Control
- In general, the complexity of two phase locking
is linear in the number of the accessed
resources. With hierarchical locking it is even
slightly more complex as also containers of
resources have to be locked in intentional mode. - This overhead, however, is unreasonable if the
probability of a locking conflict is very
limited. - Given the motivating examples we discussed
earlier, it is quite unlikely that you withdraw
cash from an ATM in that very millisecond when a
clerk credits a cheque. - This is where optimistic concurrency control
comes in. - It follows a laissez-faire approach and works as
a watchdog that detects conflicts only when they
really happen.
372.3 Optimistic Concurrency Control (ctd.)
- Every thread or process works on its private
logical copy of the set of shared resources. - While a process or thread accesses resources, the
concurrency control manager keeps a log of them. - Timestamps are required
- At a certain point in time, the access patterns
are validated against conflicts with concurrent
processes or threads. - If no conflicts occurred the changes done can be
made known to the global set of resources. - If conflicts occurred the process has to discard
its logical copy and start over again on an
up-to-date copy of the resources.
38Phases
- 1. Read
- Process/transaction executes reading values
,writing to a private copy - 2. Validation
- when process completes, manager checks whether
process could have possibly conflicted with any
other concurrent process. If there is a
possibility, the process aborts, and restarts. - 3. Write
- If there is no possibility of conflict, the
transactions commits. - If there are few conflicts,
- validation can be done efficiently, and leads to
better performance than other concurrency control
methods. Unfortunately, if there are many
conflicts, the cost of repeatedly restarting
operation, hurts performance significantly
392.3 Validation Prerequisites
- As a pre-requisite for optimistic concurrency
control it is required to separate the overall
sequence of operations a process performs into
distinguishable units. A validation of the access
pattern of a unit is then performed during a
validation phase at the end of each unit. - For each unit the following information has to be
gathered - Starting time of the unit st(U).
- Time stamp for start of validation TS(U).
- Ending time of unit E(U).
- Read and write sets RS(U) and WS(U). (set of
resources U has accessed in read and write mode) - Needs precise time information!!!
- Requires synchronisation of the local clocks!!!
(of resources CORBA objects)
402.3 Validation Set
- The validation of a unit has to be done against
all concurrent units that have already been
successfully validated. We, therefore denote the
set of those units as the validation set VU(u). - VU(u) is formally defined as
- VU(u)x st(u)ltE(x) and x has been validated
- i.e VU(u) contains units x that were active
concurrently with u but have been validated
before it
412.3 Conflict Detection
- During the validation phase, the concurrency
control manager has to look for two types of
conflicts read/write and write/write conflicts. - A read/write conflict occurred during the course
of a unit u iff - ??u ?? VU(u) WS(u) ? RS(u) ??????
- ?RS(u) ? WS(u) ????
- A write/write conflict occurred during the course
of a unit u iff - ??u ? VU(u) WS(u) ? WS(u) ????
- In both cases the unit cannot be completed but
has to be undone.
--u has written a resource that this other unit
U has read and vice versa
--u has modified a resource that this other unit
u has modified as well
42Optimistic Conc. Control Example (1/3)
- Assume that you have the following optimistic
units - Unit start time end time read set write set
- 1 1 5 1,3,5
2,4 - 2 3 7 2,3,5
6,4 - 3 5 9 2,3,5
7,8 - 4 10 15 7,3,5
7,8 - What is the validation set (VU) of each one of
them? - Which ones have a conflict (read/write or
write/write) and where exactly does the conflict
appear? - Which of the transactions in the table above will
get validated?
43Optimistic Conc. Control Example (2/3)
- VU(1)
- Why? Because when it finishes, no other unit has
finished yet. - So, unit 1 gets validated immediately.
- VU(2) 1
- Why? Because the end time of unit 1 (5) is
greater than the starting time of unit 2 (3) and
unit 1 has been validated. - Unit 2 has a read/write conflict with unit 1 (in
resource 2) and a write/write with unit 1 (in
resource 4).
44Optimistic Conc. Control Example (3/3)
- VU(3)
- Why? Because only unit 2 has an end time greater
than the starting time of unit 3 but unit 2 has
not been validated (so its ignored). - Therefore, unit 3 gets validated immediately.
- VU(4)
- Why? Because no unit has an end time greater than
the starting time of unit 4. - Thus, unit 4 will be validated as well.
452.4 Comparison
- Both, pessimistic and optimistic techniques,
- guarantee serialisability of processes
- impose a serious complexity in that they need the
ability to undo the effect of processes and
threads. - Pessimistic techniques cause a
- considerable concurrency control overhead through
locking and - they are not deadlock-free
- However, they are sufficiently efficient when
conflicts are likely. - A serious advantage of optimistic techniques
- a neglectable overhead when conflicts are
unlikely - Furthermore they are deadlock-free.
- However the computation of conflict sets is very,
very difficult and complex in a distributed
setting. Moreover the optimistic techniques
assume the existence of synchronised clocks,
which are generally not available in a
distributed setting.
462.4 Comparison (ctd.)
- In summary, the disadvantages of optimistic
concurrency control overwhelm the advantages and
in most distributed systems concurrency is
controlled using pessimistic techniques.
473 CORBA Concurrency Control Service
Application Objects
CORBAfacilities
Object Request Broker
CORBAservices
Concurrency Control
483 Lock Compatibility
- The Concurrency Control service supports
hierarchical locking, as many CORBA objects take
the role of container objects. - As a further optimisation the service defines a
lock type for upgrade locks. - Upgrade locks are read locks that are not
compatible to themselves. Upgrade locks are used
in occasions when the requester knows that it
only needs a read lock to start with but later
will have to acquire a write lock on that
resource as well. - If two processes are in this situation, they
would run into a deadlock if they used only read
locks. With upgrade locks the deadlock can be
prevented as the second process trying to acquire
the upgrade lock will be delayed already.
493 Lock Compatibility (ctd.)
503 Locksets
- The central object type defined by the
Concurrency Control service is the lockset. A
lockset is associated to a resource. - With the Concurrency Control service, concurrency
control has to be managed by the implementation
of a shared resource. Hence the implementation of
a resource would usually have a hidden lockset
attribute. - Operation implementations included in that
resource acquire locks before they access or
modify the resource.
513 The IDL Interfaces
- interface LocksetFactory
- LockSet create()
-
- interface Lockset
- void lock(in lock_mode mode)
- boolean try_lock(in lock_mode mode)
- void unlock(in lock_mode mode)
- void change_mode(in lock_mode held,
- in lock_mode new)
523 The IDL Interfaces (ctd.)
- A LocksetFactory facilitates the creation of new
locksets. The create operation of that interface
would usually be executed during the construction
of an object that implements a shared resource. - The Lockset interface provides operations to
lock, unlock and upgrade locks. The difference
between lock and try_lock is that the former is
blocking while the latter would return control to
the caller also when the lock has not been
granted. - Used at the servant internally, clients dont see
them
534 Summary
- 1 Motivation
- 2 Concurrency Control Techniques
- 3 CORBA Concurrency Control Service
544 Summary
- Lost updates and inconsistent analysis.
- Pessimistic vs. optimistic concurrency control
- Pessimistic control
- higher overhead for locking.
- works efficiently in cases where conflicts are
likely - Optimistic control
- small overhead when conflicts are unlikely.
- distributed computation of conflict sets
expensive. - requires global clock.
- CORBA uses pessimistic two-phase locking.