Multiprocessors and Threads - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Multiprocessors and Threads

Description:

graceful degradation in face of failures. Fred Kuhns ... One system wide clock kthread. pool of 9 partially initialized kthreads per CPU for interrupts ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 42

Provided by: fredk5

Category:

more less

Transcript and Presenter's Notes

Title: Multiprocessors and Threads

1
Multiprocessors and Threads

Lecture 3

2
Motivation for Multiprocessors

Enhanced Performance -
Concurrent execution of tasks for increased
throughput (between processes)
Exploit Concurrency in Tasks (Parallelism within
process)
Fault Tolerance -
graceful degradation in face of failures

3
Basic MP Architectures

Single Instruction Single Data (SISD) -
conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) - Vector
and Array Processors
Multiple Instruction Single Data (MISD) - Not
Implemented.
Multiple Instruction Multiple Data (MIMD) -
conventional MP designs

4
MIMD Classifications

Tightly Coupled System - all processors share the
same global memory and have the same address
spaces (Typical SMP system).
Main memory for IPC and Synchronization.
Loosely Coupled System - memory is partitioned
and attached to each processor. Hypercube,
Clusters (Multi-Computer).
Message passing for IPC and synchronization.

5
MP Block Diagram
6
Memory Access Schemes

Uniform Memory Access (UMA)
Centrally located
All processors are equidistant (access times)
NonUniform Access (NUMA)
physically partitioned but accessible by all
processors have the same address space
NO Remote Memory Access (NORMA)
physically partitioned, not accessible by all
processors have own address space

7
Other Details of MP

Interconnection technology
Bus
Cross-Bar switch
Multistage Interconnect Network
Caching - Cache Coherence Problem!
Write-update
Write-invalidate
bus snooping

8
MP OS Structure - 1

Separate Supervisor -
all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency

9
MP OS Structure - 2

Master/Slave Configuration
master monitors the status and assigns work to
other processors (slaves)
Slaves are a schedulable pool of resources for
the master
master can be bottleneck
poor fault tolerance

10
MP OS Structure - 3

Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently
across all processors
Synchronize access to shared data structures
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that
normally have little interaction
multithread kernel and control access to
resources (continuum)

11
MP Overview
MultiProcessor
SIMD
MIMD
Shared Memory (tightly coupled)
Distributed Memory (loosely coupled)
Symmetric (SMP)
Clusters
Master/Slave
12
SMP OS Design Issues

Threads - effectiveness of parallelism depends on
performance of primitives used to express and
control concurrency.
Process Synchronization - disabling interrupts is
not sufficient.
Process Scheduling - efficient, policy
controlled, task scheduling (process/threads)
global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread
dependencies

13
SMP OS design issues - 2

Memory Management - complicated since main memory
is shared by possibly many processors. Each
processor must maintain its own map tables for
each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency
Reliability and fault Tolerance - degrade
gracefully in the event of failures

14
Typical SMP System
CPU
CPU
CPU
CPU
500MHz
cache
MMU
cache
MMU
cache
MMU
cache
MMU
System/Memory Bus

Issues
Memory contention
Limited bus BW
I/O contention
Cache coherence

I/O subsystem
Bridge
INT
ether
System Functions (timer, BIOS, reset)
scsi

Typical I/O Bus
33MHz/32bit (132MB/s)
66MHz/64bit (528MB/s)

video
15
Some Definitions

Parallelism degree to which a multiprocessor
application achieves parallel execution
Concurrency Maximum parallelism an application
can achieve with unlimited processors
System Concurrency kernel recognizes multiple
threads of control in a program
User Concurrency User space threads (coroutines)
provide a natural programming model for
concurrent applications. Concurrency not
supported by system.

16
Process and Threads

Process encompasses
set of threads (computational entities)
collection of resources
Thread Dynamic object representing an execution
path and computational state.
threads have their own computational state PC,
stack, user registers and private data
Remaining resources are shared amongst threads in
a process

17
Threads

Effectiveness of parallel computing depends on
the performance of the primitives used to express
and control parallelism
Threads separate the notion of execution from the
Process abstraction
Useful for expressing the intrinsic concurrency
of a program regardless of resulting performance
Three types User threads, kernel threads and
Light Weight Processes (LWP)

18
User Level Threads

User level threads - supported by user level
(thread) library
Benefits
no modifications required to kernel
flexible and low cost
Drawbacks
can not block without blocking entire process
no parallelism (not recognized by kernel)

19
Kernel Level Threads

Kernel level threads - kernel directly supports
multiple threads of control in a process. Thread
is the basic scheduling entity
Benefits
coordination between scheduling and
synchronization
less overhead than a process
suitable for parallel application
Drawbacks
more expensive than user-level threads
generality leads to greater overhead

20
Light Weight Processes (LWP)

Kernel supported user thread
Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP
LWP is scheduled by kernel
User threads scheduled by library onto LWPs
Multiple LWPs per process

21
First Class threads (Psyche OS)

Thread operations in user space
create, destroy, synch, context switch
kernel threads implement a virtual processor
Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals).
Example, for scheduling decisions and preemption
warnings.
Kernel scheduler interface - allows dissimilar
thread packages to coordinate.

22
Scheduler Activations

An activation
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of
current user thread when stopped by kernel
kernel is responsible for processor allocation gt
preemption by kernel.
Thread package responsible for scheduling threads
on available processors (activations)

23
Support for Threading

BSD
process model only. 4.4 BSD enhancements.
Solarisprovides
user threads, kernel threads and LWPs
Mach supports
kernel threads and tasks. Thread libraries
provide semantics of user threads, LWPs and
kernel threads.
Digital UNIX extends MACH to provide usual UNIX
semantics.
Pthreads library.

24
Solaris Threads

Supports
user threads (uthreads) via libthread and
libpthread
LWPs, acts as a virtual CPU for user threads
kernel threads (kthread), every LWP is associated
with one kthread, however a kthread may not have
an LWP
interrupts as threads

25
Solaris kthreads

Fundamental scheduling/dispatching object
all kthreads share same virtual address space
(the kernels) - cheap context switch
System threads - example STREAMS, callout
kthread_t, /usr/include/sys/thread.h
scheduling info, pointers for scheduler or sleep
queues, pointer to klwp_t and proc_t

26
Solaris LWP

Bound to a kthread
LWP specific fields from proc are kept in klwp_t
(/usr/include/sys/klwp.h)
user-level registers, system call params,
resource usage, pointer to kthread_t and proc_t
klwp_t can be swapped with LWP
LWP non-swappable info kept in kthread_t

27
Solaris LWP (cont)

All LWPs in a process share
signal handlers
Each may have its own
signal mask
alternate stack for signal handling
No global name space for LWPs

28
Solaris User Threads

Implemented in user libraries
library provides synchronization and scheduling
facilities
threads may be bound to LWPs
unbound threads compete for available LWPs
Manage thread specific info
thread id, saved register state, user stack,
signal mask, priority, thread local storage
Solaris provides two libraries libthread and
libpthread.
Try man thread or man pthreads

29
Solaris Thread Data Structures
proc_t
p_tlist
kthread_t
t_procp
t_lwp
klwp_t
t_forw
lwp_thread
lwp_procp
30
Solaris Processes, Threads and LWPs
Process 2
Process 1
user
Int kthr
kernel
hardware
31
Solaris Interrupts

One system wide clock kthread
pool of 9 partially initialized kthreads per CPU
for interrupts
interrupt thread can block
interrupted thread is pinned to the CPU

32
Solaris Signals and Fork

Divided into Traps (synchronous) and interrupts
(asynchronous)
each thread has its own signal mask, global set
of signal handlers
Each LWP can specify alternate stack
fork replicates all LWPs
fork1 only the invoking LWP/thread

33
Mach

Two abstractions
Task - static object, address space and system
resources called port rights.
Thread - fundamental execution unit and runs in
context of a task.
Zero or more threads per task,
kernel schedulable
kernel stack
computational state
Processor sets - available processors divided
into non-intersecting sets.
permits dedicating processor sets to one or more
tasks

34
Mach c-thread Implementations

Coroutine-based - multiples user threads onto a
single-threaded task
Thread-based - one-to-one mapping from c-threads
to Mach threads. Default.
Task-based - One Mach Task per c-thread.

35
Digital UNIX

Based on Mach 2.5 kernel
Provides complete UNIX programmers interface
4.3BSD code and ULTRIX code ported to Mach
u-area replaced by utask and uthread
proc structure retained

36
Digital UNIX threads

Signals divided into synchronous and asynchronous
global signal mask
each thread can define its own handlers for
synchronous signals
global handlers for asynchronous signals

37
Pthreads library

One Mach thread per pthread
implements asynchronous I/O
separate thread created for synchronous I/O which
in turn signals original thread
library includes signal handling, scheduling
functions, and synchronization primitives.

38
Mach Continuations

Address problem of excessive kernel stack memory
requirements
process model versus interrupt model
one per process kernel stack versus a per thread
kernel stack
Thread is first responsible for saving any
required state (the thread structure allows up to
28 bytes)
indicate a function to be invoked when unblocked
(the continuation function)
Advantage stack can be transferred between
threads eliminating copy overhead.

39
Threads in Windows NT

Design driven by need to support a variety of OS
environments
NT process implemented as an object
executable process contains gt 1 thread
process and thread objects have built in
synchronization capabilitiesS

40
NT Threads

Support for kernel (system) threads
Threads are scheduled by the kernel and thus are
similar to UNIX threads bound to an LWP (kernel
thread)
fibers are threads which are not scheduled by the
kernel and thus are similar to unbound user
threads.

41
4.4 BSD UNIX