Multiprocessors and Threads - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Multiprocessors and Threads

Description:

graceful degradation in face of failures. Fred Kuhns ... One system wide clock kthread. pool of 9 partially initialized kthreads per CPU for interrupts ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 42
Provided by: fredk5
Category:

less

Transcript and Presenter's Notes

Title: Multiprocessors and Threads


1
Multiprocessors and Threads
  • Lecture 3

2
Motivation for Multiprocessors
  • Enhanced Performance -
  • Concurrent execution of tasks for increased
    throughput (between processes)
  • Exploit Concurrency in Tasks (Parallelism within
    process)
  • Fault Tolerance -
  • graceful degradation in face of failures

3
Basic MP Architectures
  • Single Instruction Single Data (SISD) -
    conventional uniprocessor designs.
  • Single Instruction Multiple Data (SIMD) - Vector
    and Array Processors
  • Multiple Instruction Single Data (MISD) - Not
    Implemented.
  • Multiple Instruction Multiple Data (MIMD) -
    conventional MP designs

4
MIMD Classifications
  • Tightly Coupled System - all processors share the
    same global memory and have the same address
    spaces (Typical SMP system).
  • Main memory for IPC and Synchronization.
  • Loosely Coupled System - memory is partitioned
    and attached to each processor. Hypercube,
    Clusters (Multi-Computer).
  • Message passing for IPC and synchronization.

5
MP Block Diagram
6
Memory Access Schemes
  • Uniform Memory Access (UMA)
  • Centrally located
  • All processors are equidistant (access times)
  • NonUniform Access (NUMA)
  • physically partitioned but accessible by all
  • processors have the same address space
  • NO Remote Memory Access (NORMA)
  • physically partitioned, not accessible by all
  • processors have own address space

7
Other Details of MP
  • Interconnection technology
  • Bus
  • Cross-Bar switch
  • Multistage Interconnect Network
  • Caching - Cache Coherence Problem!
  • Write-update
  • Write-invalidate
  • bus snooping

8
MP OS Structure - 1
  • Separate Supervisor -
  • all processors have their own copy of the kernel.
  • Some share data for interaction
  • dedicated I/O devices and file systems
  • good fault tolerance
  • bad for concurrency

9
MP OS Structure - 2
  • Master/Slave Configuration
  • master monitors the status and assigns work to
    other processors (slaves)
  • Slaves are a schedulable pool of resources for
    the master
  • master can be bottleneck
  • poor fault tolerance

10
MP OS Structure - 3
  • Symmetric Configuration - Most Flexible.
  • all processors are autonomous, treated equal
  • one copy of the kernel executed concurrently
    across all processors
  • Synchronize access to shared data structures
  • Lock entire OS - Floating Master
  • Mitigated by dividing OS into segments that
    normally have little interaction
  • multithread kernel and control access to
    resources (continuum)

11
MP Overview
MultiProcessor
SIMD
MIMD
Shared Memory (tightly coupled)
Distributed Memory (loosely coupled)
Symmetric (SMP)
Clusters
Master/Slave
12
SMP OS Design Issues
  • Threads - effectiveness of parallelism depends on
    performance of primitives used to express and
    control concurrency.
  • Process Synchronization - disabling interrupts is
    not sufficient.
  • Process Scheduling - efficient, policy
    controlled, task scheduling (process/threads)
  • global versus per CPU scheduling
  • Task affinity for a particular CPU
  • resource accounting and intra-task thread
    dependencies

13
SMP OS design issues - 2
  • Memory Management - complicated since main memory
    is shared by possibly many processors. Each
    processor must maintain its own map tables for
    each process
  • cache coherence
  • memory access synchronization
  • balancing overhead with increased concurrency
  • Reliability and fault Tolerance - degrade
    gracefully in the event of failures

14
Typical SMP System
CPU
CPU
CPU
CPU
500MHz
cache
MMU
cache
MMU
cache
MMU
cache
MMU
System/Memory Bus
  • Issues
  • Memory contention
  • Limited bus BW
  • I/O contention
  • Cache coherence

I/O subsystem
Bridge
INT
ether
System Functions (timer, BIOS, reset)
scsi
  • Typical I/O Bus
  • 33MHz/32bit (132MB/s)
  • 66MHz/64bit (528MB/s)

video
15
Some Definitions
  • Parallelism degree to which a multiprocessor
    application achieves parallel execution
  • Concurrency Maximum parallelism an application
    can achieve with unlimited processors
  • System Concurrency kernel recognizes multiple
    threads of control in a program
  • User Concurrency User space threads (coroutines)
    provide a natural programming model for
    concurrent applications. Concurrency not
    supported by system.

16
Process and Threads
  • Process encompasses
  • set of threads (computational entities)
  • collection of resources
  • Thread Dynamic object representing an execution
    path and computational state.
  • threads have their own computational state PC,
    stack, user registers and private data
  • Remaining resources are shared amongst threads in
    a process

17
Threads
  • Effectiveness of parallel computing depends on
    the performance of the primitives used to express
    and control parallelism
  • Threads separate the notion of execution from the
    Process abstraction
  • Useful for expressing the intrinsic concurrency
    of a program regardless of resulting performance
  • Three types User threads, kernel threads and
    Light Weight Processes (LWP)

18
User Level Threads
  • User level threads - supported by user level
    (thread) library
  • Benefits
  • no modifications required to kernel
  • flexible and low cost
  • Drawbacks
  • can not block without blocking entire process
  • no parallelism (not recognized by kernel)

19
Kernel Level Threads
  • Kernel level threads - kernel directly supports
    multiple threads of control in a process. Thread
    is the basic scheduling entity
  • Benefits
  • coordination between scheduling and
    synchronization
  • less overhead than a process
  • suitable for parallel application
  • Drawbacks
  • more expensive than user-level threads
  • generality leads to greater overhead

20
Light Weight Processes (LWP)
  • Kernel supported user thread
  • Each LWP is bound to one kernel thread.
  • a kernel thread may not be bound to an LWP
  • LWP is scheduled by kernel
  • User threads scheduled by library onto LWPs
  • Multiple LWPs per process

21
First Class threads (Psyche OS)
  • Thread operations in user space
  • create, destroy, synch, context switch
  • kernel threads implement a virtual processor
  • Course grain in kernel - preemptive scheduling
  • Communication between kernel and threads library
  • shared data structures.
  • Software interrupts (user upcalls or signals).
    Example, for scheduling decisions and preemption
    warnings.
  • Kernel scheduler interface - allows dissimilar
    thread packages to coordinate.

22
Scheduler Activations
  • An activation
  • serves as execution context for running thread
  • notifies thread of kernel events (upcall)
  • space for kernel to save processor context of
    current user thread when stopped by kernel
  • kernel is responsible for processor allocation gt
    preemption by kernel.
  • Thread package responsible for scheduling threads
    on available processors (activations)

23
Support for Threading
  • BSD
  • process model only. 4.4 BSD enhancements.
  • Solarisprovides
  • user threads, kernel threads and LWPs
  • Mach supports
  • kernel threads and tasks. Thread libraries
    provide semantics of user threads, LWPs and
    kernel threads.
  • Digital UNIX extends MACH to provide usual UNIX
    semantics.
  • Pthreads library.

24
Solaris Threads
  • Supports
  • user threads (uthreads) via libthread and
    libpthread
  • LWPs, acts as a virtual CPU for user threads
  • kernel threads (kthread), every LWP is associated
    with one kthread, however a kthread may not have
    an LWP
  • interrupts as threads

25
Solaris kthreads
  • Fundamental scheduling/dispatching object
  • all kthreads share same virtual address space
    (the kernels) - cheap context switch
  • System threads - example STREAMS, callout
  • kthread_t, /usr/include/sys/thread.h
  • scheduling info, pointers for scheduler or sleep
    queues, pointer to klwp_t and proc_t

26
Solaris LWP
  • Bound to a kthread
  • LWP specific fields from proc are kept in klwp_t
    (/usr/include/sys/klwp.h)
  • user-level registers, system call params,
    resource usage, pointer to kthread_t and proc_t
  • klwp_t can be swapped with LWP
  • LWP non-swappable info kept in kthread_t

27
Solaris LWP (cont)
  • All LWPs in a process share
  • signal handlers
  • Each may have its own
  • signal mask
  • alternate stack for signal handling
  • No global name space for LWPs

28
Solaris User Threads
  • Implemented in user libraries
  • library provides synchronization and scheduling
    facilities
  • threads may be bound to LWPs
  • unbound threads compete for available LWPs
  • Manage thread specific info
  • thread id, saved register state, user stack,
    signal mask, priority, thread local storage
  • Solaris provides two libraries libthread and
    libpthread.
  • Try man thread or man pthreads

29
Solaris Thread Data Structures
proc_t
p_tlist
kthread_t
t_procp
t_lwp
klwp_t
t_forw
lwp_thread
lwp_procp
30
Solaris Processes, Threads and LWPs
Process 2
Process 1
user
Int kthr
kernel
hardware
31
Solaris Interrupts
  • One system wide clock kthread
  • pool of 9 partially initialized kthreads per CPU
    for interrupts
  • interrupt thread can block
  • interrupted thread is pinned to the CPU

32
Solaris Signals and Fork
  • Divided into Traps (synchronous) and interrupts
    (asynchronous)
  • each thread has its own signal mask, global set
    of signal handlers
  • Each LWP can specify alternate stack
  • fork replicates all LWPs
  • fork1 only the invoking LWP/thread

33
Mach
  • Two abstractions
  • Task - static object, address space and system
    resources called port rights.
  • Thread - fundamental execution unit and runs in
    context of a task.
  • Zero or more threads per task,
  • kernel schedulable
  • kernel stack
  • computational state
  • Processor sets - available processors divided
    into non-intersecting sets.
  • permits dedicating processor sets to one or more
    tasks

34
Mach c-thread Implementations
  • Coroutine-based - multiples user threads onto a
    single-threaded task
  • Thread-based - one-to-one mapping from c-threads
    to Mach threads. Default.
  • Task-based - One Mach Task per c-thread.

35
Digital UNIX
  • Based on Mach 2.5 kernel
  • Provides complete UNIX programmers interface
  • 4.3BSD code and ULTRIX code ported to Mach
  • u-area replaced by utask and uthread
  • proc structure retained

36
Digital UNIX threads
  • Signals divided into synchronous and asynchronous
  • global signal mask
  • each thread can define its own handlers for
    synchronous signals
  • global handlers for asynchronous signals

37
Pthreads library
  • One Mach thread per pthread
  • implements asynchronous I/O
  • separate thread created for synchronous I/O which
    in turn signals original thread
  • library includes signal handling, scheduling
    functions, and synchronization primitives.

38
Mach Continuations
  • Address problem of excessive kernel stack memory
    requirements
  • process model versus interrupt model
  • one per process kernel stack versus a per thread
    kernel stack
  • Thread is first responsible for saving any
    required state (the thread structure allows up to
    28 bytes)
  • indicate a function to be invoked when unblocked
    (the continuation function)
  • Advantage stack can be transferred between
    threads eliminating copy overhead.

39
Threads in Windows NT
  • Design driven by need to support a variety of OS
    environments
  • NT process implemented as an object
  • executable process contains gt 1 thread
  • process and thread objects have built in
    synchronization capabilitiesS

40
NT Threads
  • Support for kernel (system) threads
  • Threads are scheduled by the kernel and thus are
    similar to UNIX threads bound to an LWP (kernel
    thread)
  • fibers are threads which are not scheduled by the
    kernel and thus are similar to unbound user
    threads.

41
4.4 BSD UNIX
  • Initial support for threads implemented but not
    enabled in distribution
  • Proc structure and u-area reorganized
  • All threads have a unique ID
  • How are the proc and u areas reorganized to
    support threads?
Write a Comment
User Comments (0)
About PowerShow.com