Title: Test and Verification Solutions
1Test and Verification Solutions
The testing challenges associated with
distributed processing Mike Bartley, TVS
2Agenda
- The types of distributed computing
- The rise of multicore computing
- Where is this stuff used?
- Why is it important?
- The problems
- The challenges for
- Test
- Debug
3The types of distributed computing
- Distributed CPU and memory
- Client-server
- Internet applications
- Multitasking
- The execution of multiple concurrent software
processes in a system as opposed to a single
process at any one instant - Multi-processing
- Multiprocessing is the use of two or more central
processing units (CPUs) within a single computer
system - Multicore
- A multi-core processor is composed of two or more
independent cores. - An integrated circuit which has two or more
individual processors (called cores in this
sense). - Multi-threading
- threads have to share the resources of a single
CPU
4The rise of multicore parallel computing
- Moores law
- Double the number of transistors speed every 18
months - Power issues
- Continuous frequency scaling through process
minimisation cannot cope with the reduced power
demands - Multicore processing will become the norm
One of the goals of .NET Framework 4 was to make
it easier for developers to write parallel
programs that target multi-core machines. To
achieve that goal, .NET 4 introduces various
parallel-programming primitives that abstract
away some of the messy details that developers
have to deal with when implementing parallel
programs from scratch.
5Example mobile phone chip
6Example application domains of distributed
computing
- Mobile
- Automotive
- Aerospace
- Gaming
- Physics, rendering, AI, sound, IO
- Video encode and decode
- Server
- Handle multiple users accessing the same content
- Files, applications, just sharing raw compute
power - Graphic design
- Photoshop
- Banking
- Simulations, Monte-carlo
7Parallel processing and multi-core
- Moores Law computing speed doubles every 18
months - Performance can be increased further through
parallelisation - Multiple processing units can work
simultaneously, on a single chip, on different
parts of the same problem to reach a solution
quicker than if one unit tried to solve it by
itself - Non-trivial to exploit effectively
- Multi-core means multiple, separate processing
units operating in parallel in a single processor - Multicore has become the dominant processor
architecture in a short space of time and number
of cores in servers and devices is set to grow
8Why is this important to the UK?
- Multicore devices are becoming all-pervasive,
deployed in - embedded systems in washing machines and cars,
mobile telephones, communications networks
including the Internet - high-performance systems running applications in
sectors including pharmaceutical, energy,
aerospace, finance and engineering - System inefficiencies that could arise from
failing to exploit parallel computing effectively
could have a substantial economic impact - UK has world-class expertise in this area
stemming in part from the development of the
Transputer in the 1980s - Much of this expertise is now dispersed
9The challenge
- Our consultations with experts in industry and
academia have highlighted the (short-term) need
for - Awareness creation Many senior managers do not
realise the importance of parallel computing in
the context of their businesses - Application porting Those who do realise do not
know how to migrate existing applications to take
their businesses forward - Training It is estimated that fewer than 1 of
programmers are parallel-literate and able to
properly and quickly exploit parallelisation - Multicore Ecosystems While the UK has a base of
parallel and multicore programme skills, the
community is fragmented
10The problems with distributed processing
- A sequential programs output only depends on the
input - whereas a distributed systems output depends on
both the input and the execution order of the
interactions between processes - Non-determinism
- We cannot guarantee the order of execution
- Can lead to race conditions
11Race Condition Examples
12The problems with distributed processing
- Non-determinism
- We cannot guarantee the order of execution
- Can lead to race conditions
- Shared resources
13Sharing resources
- Typical distributed programming paradigms for
distributed hardware can be divided into two
categories - shared-memory
- The shared-memory paradigm uses shared memory
space (e.g. registers) for communication between
threads - message-passing
- the implementation of data exchange between
threads by passing messages. - This paradigm is necessary for communication of
threads on distributed-memory applications - Message Passing Interface (MPI) is the most
commonly used library for the message-passing
paradigm
14The problems with distributed processing
- Non-determinism
- We cannot guarantee the order of execution
- Can lead to race conditions
- Shared resources
- Shared memory
- Use of locks
- Errors because we forget to release the locks
- Can lead to deadlocks
15The use of locks
- class mutex
- mutex() //gettheappropriatemutex
- mutex() //release it
- private sometype mHandle
-
- voidfoo()
- mutex get //getthemutex
-
- if(a) return //released here
-
- if(b) throwoops //or here
-
- return //or here
16Deadlock example
Code for Process P Code for Process Q
Lock(M) Lock(N)
Lock(N) Lock(M)
Critical Section Critical Section
Unlock(N) Unlock(M)
Unlock(M) Unlock(N)
17Cause of deadlock
- 1) Tasks claim exclusive control of the resources
they require ("mutual exclusion condition). - 2) Tasks hold resources already allocated to them
while waiting for additional resources ("wait
for" condition). - 3) Resources cannot be forcibly removed from the
tasks holding them until the resources are used
to completion ("no preemption" condition). - 4) A circular chain of tasks exists, such that
each task holds one or more resources that are
being requested by the next task in the chain
("circular wait condition).
18The use of message passing
There is still scope for race conditions
19Avoiding deadlock
- Each task must request all its required resources
at once and cannot proceed until all have been
granted ("wait-for condition denied). - If a task holding certain resources is denied a
further request, that task must release its
original resources and, if necessary, request
them again together with the additional resources
("no preemption condition denied). - The imposition of a linear ordering of resource
types on all tasks
20Categorisation of parallel programming bugs
- From 5 Pedersen J.B. Classification of
Programming Errors in Parallel Message Passing
Systems. Proceedings of Communicating Process
Architectures 2006 (CPA'06). IOS Press, 2006.
p363-376. - The research looked into the bugs introduced when
students were asked to parallelise sequential
programs - Equation Solver,
- Mandelbrot,
- Matrix Multiplication,
- Partial Sum,
- Pipeline Computation
- and Differential Equation Solver
21Categorisation of parallel programming bugs
- Data DecompositionThe root of the bug had to do
with the decomposition of the data set from the
sequential to the parallel version of the
program. - Functional Decomposition The root of the bug
was the decomposition of the functionality when
implementing the parallel version of the program.
- API Usage This type of error is associated with
the use of the MPI API calls. Typical errors here
include passing data of the wrong type or
misunderstanding the way the MPI functions work. - Sequential ErrorThis type of error is the type
we know from sequential programs. This includes
using instead of in tests etc. - Message Problem This type covers
sending/receiving the wrong data, that is, it is
concerned with the content of the messages, not
the entire protocol of the system. - Protocol Problem This error type is concerned
with stray/missing messages that violate the
overall communication protocol of the parallel
system. - OtherBugs that do not fit any of the above
categories are reported as other. - Lets look at the distribution of errors!
22Project Context
Results of an online error reporting
survey (Source copy from Pedersen)
23The test and debug challenges
- Are our current test methodologies sufficient?
- Is our testing thorough enough?
- How do we know when to stop?
- Debug
- How much time do we spend in debug?
- How do we do it?
- Debuggers
- Print statements
- Heisenbugs!
- These will be more frequent
24The test challenges
- What are our current models of test completion?
- Black box?
- Requirements coverage
- Test matrix
- Use-case coverage
- White box?
- Code coverage
- These will not be enough!
- Can we ensure no races?
- Can we ensure no deadlocks?
- How much of the message passing have we covered?
- Static analysis will become more important
25Consider coverage of process communications
26Debug
- If we run the same test twice it is not
guaranteed to produce the same result! - Debuggers show a single CPU only!