Title: Parallel Programming with PThreads
1Parallel Programming with PThreads
2Threads
- Sometimes called a lightweight process
- smaller execution unit than a process
- Consists of
- program counter
- register set
- stack space
- Threads share
- memory space
- code section
- OS resources(open files, signals, etc.)
3Threads
- A process is defined to have at least one thread
of execution - A process may launch other threads which execute
concurrently with the process - Switching between threads is faster
- No memory management issues, etc.
- Mutual exclusion problems
4Threads
5Why Threads?
- Software Portability
- Latency Hiding
- Scheduling and Load Balancing
- Ease of Programming and Widespread use
6POSIX Threads
- Thread API available on many OSs
- include ltpthread.hgt
- cc myprog.c o myprog -lpthread
- Thread creation
- int pthread_create(pthread_t thread,
pthread_attr_t attr, void
(start_routine)(void ), void arg) - Thread termination
- void pthread_exit(void retval)
- Waiting for Threads
- int pthread_join(pthread_t th, void
thread_return)
7include ltpthread.hgt include ltstdio.hgt int
print_message_function( void ptr ) int x
1 main() pthread_t thread1, thread2
int thread1result, thread2result
char message1 "Hello" char message2
"World" pthread_attr_t
pthread_attr_default NULL
printf("Begin\n") pthread_create(
thread1, pthread_attr_default,
(void)print_message_function, (void)
message1) pthread_create(thread2,
pthread_attr_default,
(void)print_message_function, (void)
message2) pthread_join(thread1, (void
)thread1result) printf("End thread1
with d\n", thread1result)
pthread_join(thread2, (void )thread2result)
printf("End thread2 with d\n",
thread2result) exit(0)
int print_message_function( void ptr )
char message message (char ) ptr
printf("s ", message)
fflush(stdout) return x
8(No Transcript)
9Thread Issues
- Thread function only gets
- One void argument and void return
- Reentrance
- False Sharing
- Not same global variable, but within cache line
10Effects of False Sharing
11Synchronization Primitives
- int pthread_mutex_init( pthread_mutex_t
mutex_lock, const pthread_mutexattr_t
lock_attr) - int pthread_mutex_lock( pthread_mutex_t
mutex_lock) - int pthread_mutex_unlock( pthread_mutex_t
mutex_lock) - int pthread_mutex_trylock( pthread_mutex_t
mutex_lock)
12include ltpthread.hgt void find_min(void
list_ptr) pthread_mutex_t minimum_value_lock int
minimum_value, partial_list_size main() minim
um_value MIN_INT pthread_init() pthread_mute
x_init(minimum_value_lock, NULL) /inititaliz
e lists etc, create and join threads/ void
find_min(void list_ptr) int
partial_list_ptr, my_min MIN_INT,
i partial_list_ptr (int )list_ptr for (i
0 i lt partial_list_size i) if
(partial_list_ptri lt my_min) my_min
partial_list_ptri pthread_mutex_lock(minimum_v
alue_lock) if (my_min lt minimum_value) minimum
_value my_min pthread_mutex_unlock(minimum_val
ue_lock) pthread_exit(0)
13Locking Overhead
- Serialization points
- Minimize the size of critical sections
- Be careful
- Rather than wait, check if lock is available
- Pthread_mutex_trylock
- If already locked, will return EBUSY
- Will require restructuring of code
14/ Finding k matches in a list / void
find_entries(void start_pointer) / This is
the thread function / struct database_record
next_record int count current_pointer
start_pointer do next_record
find_next_entry(current_pointer) count
output_record(next_record) while (count lt
requested_number_of_records) int
output_record(struct database_record record_ptr)
int count pthread_mutex_lock(output_count_lo
ck) output_count count output_count
pthread_mutex_unlock(output_count_lock) if
(count lt requested_number_of_records)
print_record(record_ptr) return (count)
15/ rewritten output_record function / int
output_record(struct database_record record_ptr)
int count int lock_status
lock_statuspthread_mutex_trylock(output_count_l
ock) if (lock_status EBUSY)
insert_into_local_list(record_ptr) return(0)
else count output_count output_count
number_on_local_list 1 pthread_mutex_unlock
(output_count_lock) print_records(record_ptr,
local_list, requested_number_of_records -
count) return(count number_on_local_list
1)
16Controlling Thread and Synchronization Attributes
- The Pthreads API allows a programmer to change
the default attributes of entities using
attributes objects. - An attributes object is a data-structure that
describes entity (thread, mutex, condition
variable) properties. - Once these properties are set, the attributes
object can be passed to the method initializing
the entity. - Enhances modularity, readability, and ease of
modification.
17Attributes Objects for Threads
- Use pthread_attr_init to create an attributes
object. - Individual properties associated with the
attributes object can be changed using the
following functions - pthread_attr_setdetachstate,
- pthread_attr_setguardsize_np,
- pthread_attr_setstacksize,
- pthread_attr_setinheritsched,
- pthread_attr_setschedpolicy, and
- pthread_attr_setschedparam
18Attributes Objects for Mutexes
- Initialize the attrributes object using function
pthread_mutexattr_init. - The function pthread_mutexattr_settype_np can be
used for setting the type of mutex specified by
the mutex attributes object. - pthread_mutexattr_settype_np (
- pthread_mutexattr_t attr,
- int type)
- Here, type specifies the type of the mutex and
can take one of - PTHREAD_MUTEX_NORMAL_NP
- PTHREAD_MUTEX_RECURSIVE_NP
- PTHREAD_MUTEX_ERRORCHECK_NP
19Condition Variables for Synchronization
- A condition variable allows a thread to block
itself until specified data reaches a predefined
state. - A condition variable is associated with a
predicate. - When the predicate becomes true, the condition
variable is used to signal one or more threads
waiting on the condition. - A single condition variable may be associated
with more than one predicate. - A condition variable always has a mutex
associated with it. - A thread locks this mutex and tests the predicate
defined on the shared variable. - If the predicate is not true, the thread waits on
the condition variable associated with the
predicate using the function pthread_cond_wait.
20Using Condition Variables
Main Thread Declare and initialize global
data/variables which require synchronization
(such as "count") Declare and initialize a
condition variable object Declare and
initialize an associated mutex Create
threads A and B to do work Thread A Do
work up to the point where a certain condition
must occur (such as "count" must reach a
specified value) Lock associated mutex and
check value of a global variable Call
pthread_cond_wait() to perform a blocking wait
for signal from Thread-B. Note that a
call to pthread_cond_wait() automatically and
atomically unlocks the associated mutex
variable so that it can be used by Thread-B.
When signalled, wake up. Mutex is automatically
and atomically locked. Explicitly unlock
mutex Continue Thread B Do work
Lock associated mutex Change the value
of the global variable that Thread-A is waiting
upon. Check value of the global Thread-A
wait variable. If it fulfills the desired
condition, signal Thread-A. Unlock mutex.
Continue
21Condition Variables for Synchronization
- Pthreads provides the following functions for
condition variables - int pthread_cond_wait(pthread_cond_t cond,
- pthread_mutex_t mutex)
- int pthread_cond_signal(pthread_cond_t cond)
- int pthread_cond_broadcast(pthread_cond_t cond)
- int pthread_cond_init(pthread_cond_t cond,
- const pthread_condattr_t attr)
- int pthread_cond_destroy(pthread_cond_t cond)
22Producer-Consumer Using Locks
- pthread_mutex_t task_queue_lock
- int task_available
- ...
- main()
- ....
- task_available 0
- pthread_mutex_init(task_queue_lock, NULL)
- ....
-
- void producer(void producer_thread_data)
- ....
- while (!done())
- inserted 0
- create_task(my_task)
- while (inserted 0)
- pthread_mutex_lock(task_queue_lock)
- if (task_available 0)
- insert_into_queue(my_task)
- task_available 1
23Producer-Consumer Using Locks
- void consumer(void consumer_thread_data)
- int extracted
- struct task my_task
- / local data structure declarations /
- while (!done())
- extracted 0
- while (extracted 0)
- pthread_mutex_lock(task_queue_lock)
- if (task_available 1)
- extract_from_queue(my_task)
- task_available 0
- extracted 1
-
- pthread_mutex_unlock(task_queue_lock)
-
- process_task(my_task)
-
-
24Producer-Consumer Using Condition Variables
- pthread_cond_t cond_queue_empty, cond_queue_full
- pthread_mutex_t task_queue_cond_lock
- int task_available
- / other data structures here /
- main()
- / declarations and initializations /
- task_available 0
- pthread_init()
- pthread_cond_init(cond_queue_empty, NULL)
- pthread_cond_init(cond_queue_full, NULL)
- pthread_mutex_init(task_queue_cond_lock, NULL)
- / create and join producer and consumer threads
/ -
25Producer-Consumer Using Condition Variables
- void producer(void producer_thread_data)
-
- int inserted
- while (!done())
-
- create_task()
- pthread_mutex_lock(task_queue_cond_lock)
- while (task_available 1)
- pthread_cond_wait(cond_queue_empty,
- task_queue_cond_lock)
- insert_into_queue()
- task_available 1
- pthread_cond_signal(cond_queue_full)
- pthread_mutex_unlock(task_queue_cond_lock)
-
-
26Producer-Consumer Using Condition Variables
- void consumer(void consumer_thread_data)
-
- while (!done())
-
- pthread_mutex_lock(task_queue_cond_lock)
- while (task_available 0)
- pthread_cond_wait(cond_queue_full,
- task_queue_cond_lock)
- my_task extract_from_queue()
- task_available 0
- pthread_cond_signal(cond_queue_empty)
- pthread_mutex_unlock(task_queue_cond_lock)
- process_task(my_task)
-
-
27Condition Variables
- Rather than just signaling one blocked thread, we
can signal all - int pthread_cond_broadcast(pthread_cond_t cond)
- Can also have a timeout
- int pthread_cond_timedwait( pthread_cond_t cond,
pthread_mutex_t mutex,
const struct timespec abstime)
28include ltpthread.hgt include ltstdio.hgt include
ltstdlib.hgt define NUM_THREADS 3 define TCOUNT
10 define COUNT_LIMIT 12 int count 0 int
thread_ids5 0,1,2,3,4 pthread_mutex_t
count_mutex pthread_cond_t count_threshold_cv i
nt main(int argc, char argv) int i, rc
pthread_t threads5 pthread_attr_t attr
/ Initialize mutex and condition variable
objects / pthread_mutex_init(count_mutex,
NULL) pthread_cond_init (count_threshold_cv,
NULL) / For portability, explicitly create
threads in a joinable state /
pthread_attr_init(attr) pthread_attr_setdetach
state(attr, PTHREAD_CREATE_JOINABLE)
pthread_create(threads4, attr, watch_count,
(void )thread_ids4) pthread_create(threads
3, attr, watch_count, (void )thread_ids3)
pthread_create(threads2, attr, watch_count,
(void )thread_ids2) pthread_create(threads
1, attr, inc_count, (void )thread_ids1)
pthread_create(threads0, attr, inc_count,
(void )thread_ids0) / Wait for all
threads to complete / for (i 0 i lt
NUM_THREADS i) pthread_join(threadsi,
NULL) printf ("Main() Waited on d
threads. Done.\n", NUM_THREADS) / Clean up
and exit / pthread_attr_destroy(attr)
pthread_mutex_destroy(count_mutex)
pthread_cond_destroy(count_threshold_cv)
pthread_exit (NULL)
29int count 0 int thread_ids5
0,1,2,3,4 pthread_mutex_t count_mutex pthread_
cond_t count_threshold_cv
void inc_count(void idp) int j,i double
result0.0 int my_id idp for (i0 i lt
TCOUNT i) pthread_mutex_lock(count_mutex
) count / Check the value of
count and signal waiting thread when condition
is reached. Note that this occurs while
mutex is locked. / if (count
COUNT_LIMIT) pthread_cond_broadcast(
count_threshold_cv) printf("inc_count()
thread d, count d Threshold reached.\n",
my_id, count)
printf("inc_count() thread d, count d,
unlocking mutex\n", my_id, count)
pthread_mutex_unlock(count_mutex) / Do
some work so threads can alternate on mutex lock
/ for (j0 j lt 1000 j) result
result (double)random()
pthread_exit(NULL)
void watch_count(void idp) int my_id
idp printf("Starting watch_count() thread
d\n", my_id) / Lock mutex and wait for
signal. Note that the pthread_cond_wait routine
will automatically and atomically unlock mutex
while it waits. Also, note that if COUNT_LIMIT
is reached before this routine is run by the
waiting thread, the loop will be skipped to
prevent pthread_cond_wait from never
returning. / pthread_mutex_lock(count_mutex)
if (count lt COUNT_LIMIT)
pthread_cond_wait(count_threshold_cv,
count_mutex) printf("watch_count() thread
d Condition signal received.\n", my_id)
pthread_mutex_unlock(count_mutex)
pthread_exit(NULL)
30Composite Synchronization Constructs
- By design, Pthreads provide support for a basic
set of operations. - Higher level constructs can be built using basic
synchronization constructs. - Consider Read-Write Locks and Barriers
31Read-Write Locks
- In many applications, a data structure is read
frequently but written infrequently. - For such applications, we should use read-write
locks. - A read lock is granted when there are other
threads that may already have read locks. - If there is a write lock on the data (or if there
are queued write locks), the thread performs a
condition wait. - If there are multiple threads requesting a write
lock, they must perform a condition wait.
32Read-Write Locks
- The lock data type mylib_rwlock_t holds the
following - a count of the number of readers,
- the writer (a 0/1 integer specifying whether a
writer is present), - a condition variable readers_proceed that is
signaled when readers can proceed, - a condition variable writer_proceed that is
signaled when one of the writers can proceed, - a count pending_writers of pending writers, and
- a mutex read_write_lock associated with the
shared data structure
33Read-Write Locks
- typedef struct
- int readers
- int writer
- pthread_cond_t readers_proceed
- pthread_cond_t writer_proceed
- int pending_writers
- pthread_mutex_t read_write_lock
- mylib_rwlock_t
- void mylib_rwlock_init (mylib_rwlock_t l)
- l -gt readers l -gt writer l -gt pending_writers
0 - pthread_mutex_init((l -gt read_write_lock),
NULL) - pthread_cond_init((l -gt readers_proceed), NULL)
- pthread_cond_init((l -gt writer_proceed), NULL)
-
34Read-Write Locks
- void mylib_rwlock_rlock(mylib_rwlock_t l)
-
- / if there is a write lock or pending writers,
perform condition wait.. else increment count of
readers and grant read lock / - pthread_mutex_lock((l -gt read_write_lock))
- while ((l -gt pending_writers gt 0) (l -gt writer
gt 0)) - pthread_cond_wait((l -gt readers_proceed),
- (l -gt read_write_lock))
- l -gt readers
- pthread_mutex_unlock((l -gt read_write_lock))
-
35Read-Write Locks
- void mylib_rwlock_wlock(mylib_rwlock_t l)
-
- / if there are readers or writers, increment
pending writers count and wait. On being woken,
decrement pending writers count and increment
writer count / - pthread_mutex_lock((l -gt read_write_lock))
- while ((l -gt writer gt 0) (l -gt readers gt 0))
-
- l -gt pending_writers
- pthread_cond_wait((l -gt writer_proceed),
- (l -gt read_write_lock))
-
- l -gt pending_writers --
- l -gt writer
- pthread_mutex_unlock((l -gt read_write_lock))
-
36Read-Write Locks
- void mylib_rwlock_unlock(mylib_rwlock_t l)
-
- / if there is a write lock then unlock, else if
there are read locks, decrement count of read
locks. If the count is 0 and there is a pending
writer, let it through, else if there are pending
readers, let them all go through / - pthread_mutex_lock((l -gt read_write_lock))
- if (l -gt writer gt 0)
- l -gt writer 0
- else if (l -gt readers gt 0)
- l -gt readers --
- pthread_mutex_unlock((l -gt read_write_lock))
- if ((l -gt readers 0) (l -gt pending_writers
gt 0)) - pthread_cond_signal((l -gt writer_proceed))
- else if (l -gt readers gt 0)
- pthread_cond_broadcast((l -gt readers_proceed))
-
37Barriers
- A barrier holds a thread until all threads
participating in the barrier have reached it. - Barriers can be implemented using a counter, a
mutex and a condition variable. - A single integer is used to keep track of the
number of threads that have reached the barrier. - If the count is less than the total number of
threads, the threads execute a condition wait. - The last thread entering (and setting the count
to the number of threads) wakes up all the
threads using a condition broadcast.
38Barriers
- typedef struct
-
- pthread_mutex_t count_lock
- pthread_cond_t ok_to_proceed
- int count
- mylib_barrier_t
- void mylib_init_barrier(mylib_barrier_t b)
-
- b -gt count 0
- pthread_mutex_init((b -gt count_lock), NULL)
- pthread_cond_init((b -gt ok_to_proceed), NULL)
39Barriers
- void mylib_barrier (mylib_barrier_t b, int
num_threads) -
- pthread_mutex_lock((b -gt count_lock))
- b -gt count
- if (b -gt count num_threads)
-
- b -gt count 0
- pthread_cond_broadcast((b -gt ok_to_proceed))
-
- else
- while (pthread_cond_wait((b -gt ok_to_proceed),
- (b -gt count_lock)) !
0) - pthread_mutex_unlock((b -gt count_lock))
40Barriers
- Linear barrier.
- The trivial lower bound on execution time of this
function is O(n) for n threads. - This implementation of a barrier can be speeded
up using multiple barrier variables organized in
a tree. - Use n/2 condition variable-mutex pairs for
implementing a barrier for n threads. - At the lowest level, threads are paired up and
each pair of threads shares a single condition
variable-mutex pair. - Once both threads arrive, one of the two moves
on, the other one waits. - This process repeats up the tree.
- This is called a log barrier and its runtime
grows as O(log p).
41Barrier
- Execution time of 1000 sequential and logarithmic
barriers as a function of number of threads on a
32 processor SGI Origin 2000.
42Semaphores
- Synchronization tool provided by the OS
- Integer variable and 2 operations
- Wait(s) while (s lt 0) do noop
/sleep/ s s - 1 - Signal(s) s s 1
- All modifications to s are atomic
43The critical section problem
Shared semaphore mutex 1
repeat wait(mutex) critical
section signal(mutex) remainder
section until false
44Using semaphores
- Two processes P1 and P2
- Statements S1 and S2
- S2 must execute only after S1
45Bounded Buffer Solution
Shared semaphore empty n, full 0, mutex 1
repeat produce an item in nextp wait(empty) w
ait(mutex) add nextp to the buffer signal(mut
ex) signal(full) until false
repeat wait(full) wait(mutex) remove an
item from buffer place it in nextc signal(mutex
) signal(empty) consume the item in
nextc until false
46Readers - Writers (priority?)
Shared Semaphore mutex1, wrt 1 Shared
integer readcount 0
wait(mutex) readcount readcount 1 if
(readcount 1) wait(wrt) signal(mutex) read
the data wait(mutex) readcount readcount -
1 if (readcount 0) signal(wrt) signal(mutex)
wait(wrt) write to the data object signal(wrt)
47Readers Writers (priority?)
outerQ, rsem, rmutex, wmutex, wsem 1
wait (outerQ) wait (rsem) wait
(rmutex) readcnt if (readcnt 1) wait
(wsem) signal(rmutex) signal
(rsem) signal (outerQ) READ wait (rmutex)
readcnt-- if (readcnt 0) signal
(wsem) signal (rmutex)
wait (wsem) writecnt if (writecnt
1) wait (rsem) signal (wsem) wait
(wmutex) WRITE signal (wmutex) wait (wsem)
writecnt-- if (writecnt 0) signal
(rsem) signal (wsem)
48Unix Semaphores
- Are a generalization of the counting semaphores
(more operations are permitted). - A semaphore includes
- the current value S of the semaphore
- number of processes waiting for S to increase
- number of processes waiting for S to be 0
- System calls
- semget creates an array of semaphores
- semctl allows for the initialization of
semaphores - semop performs a list of operations one on each
semaphore (atomically)
49Unix Semaphores
- Each operation to be done is specified by a value
sop. - Let S be the semaphore value
- if sop gt 0 (signal operation)
- S is incremented and process awaiting for S to
increase are awaken - if sop 0
- If S0 do nothing
- if S!0, block the current process on the event
that S0 - if sop lt 0 (wait operation)
- if S gt sop then S S - sop then if S
lt0 wait
50(No Transcript)