Title: Data Flow Implementation EEE465 1999 Lecture 4 Includes material from a lecture by Prof' Dr' Florian
1Data Flow ImplementationEEE465 1999 Lecture 4
Includes material from a lecture by Prof. Dr.
Florian Matthes
- Major Greg Phillips
- Royal Military College of Canada
- Electrical and Computer Engineering
- greg.phillips_at_rmc.ca
- 01-613-541-6000 ext. 6190
2Batch Sequential Architecture
- Processing steps are independent programs
- Each step runs to completion before next step
starts
file
Validate
Sort
Update
Report
file
file
file
file
report
Implementation involves writing each of the batch
programs to read the relevant input file and
write the required output file. The application
architecture is created in an application
scripting language (e.g., JCL or Rexx) which runs
each program in the required sequence making use
of the generated files.
3Pipeline Architecture
Data flow
ASCII stream (pipe)
Process
Implementation involves writing each of the pipe
and filter processes to read the relevant input
stream and write the required output stream. The
application architecture is created in an
application scripting language (e.g., Unix sh or
Java) which runs the pipe and filter processes in
parallel and establishes the required connections.
filter
4Recall Pipes and Filters
- Filter
- incrementally transform some of the source data
to sink data - enrich data by computing and adding information
- refine data by concentrating or extracting
information - transform data by changing its representation
- stream-to-stream transformations
- use little local context in processing streams
- preserve no state between instantiations
- Pipe
- moves data from a filter output to a filter input
(or file) - one-way flow may be flow control upstream but
not data - may implement a bounded or unbounded buffer
- pipes form data transmission graphs
- Overall operation
- run pipes and filters (non-deterministically)
until no more computations are possible - action mediated by data delivery
5Filters
- There are two strategies to construct a filter
- an active filter drives the data flow on the
pipes - a passive filter is driven by the the data flow
on the (input or output) pipes - In a pipe and filter architecture there has to be
at least one active filter - This active filter can be in the environment of
the system (e.g., user input) - If there is more than one active filter, a
buffering and synchronization mechanism is
required
6Pipes
- A pipe is a first-class object
- A pipe transfers data from a data source to a
data sink - A pipe may implement a bounded or unbounded
buffer - Pipes can be between
- two threads of a single process (e.g., Java IO
Streams) - stream may contain references to shared language
objects - two processes on a single host computer (e.g.,
Unix Named Pipes) - stream may contain references to shared operating
system objects (e.g., files) - two processes in a distributed system (e.g.,
Internet Sockets) - stream contents normally limited to raw bytes
- protocols implement higher level abstractions
(e.g., pass pipes as references, pass CORBA
object references)
7Pipes and Filters in Unix
- Unix processes that transform stdin to stdout are
generally called filters - but when they consume all the input before
starting the output, they violate the filter
assumption (e.g., sort) - Unix pipes can treat files as well as filters as
data sources and sinks - but files are passive, so you cant make
arbitrary combinations - Unix assumes that the pipes carry ASCII character
streams - the good news anything can connect to anything
else - the cloud on the good news filters make
assumptions about the syntax of the stream - the bad news everything must be encoded in
ASCII, then shipped, then decoded
8Quick Review Questions
- Question what is the force that makes data flow?
- Question why cant you pipe a file to a file in
Unix?
9Unix Pipe and Filter Example
- Task Print a sorted list of all words that
appear more than once in a given file - mknod pipeA p
- mknod pipeB p
- sort pipeA gt pipeB
- cat file1 tee pipeA sort -u comm -13 -
pipeB gt file2
Legend
pipeA
pipeB
sort
filex
file
tee
cat
sort -u
comm -13
file1
file2
filter
command
pipe
10Java IO Streams
- Java provides a set of IO Stream classes in its
java.io package. These are divided into - Readers and Writers, which process streams of
Unicode characters, and - InputStreams and OutputStreams, which process
streams of bytes - All Java IO streams are passive filters which
must be attached to at least one active filter
supplied by the application programmer. - Synchronization between multiple active filters
is provided by PipedWriter and PipedReader
(Stream) classes, which together implement
(roughly) the equivalent of a Unix pipe.
11Push vs Pull
- InputStreams and Readers are pull filters
- they implement a read() method, which is called
by the next filter in the chain - OutputStreams and Writers are push filters
- they implement a write() method, which is called
by the previous filter in the chain
push filter
pull filter
char read() char x myInput.read() return
f(x)
void write(char x) myOutput.write(f(x))
12Java IO Streams Example
input file
Legend
File Reader
Buffered Reader
main()
file
Passive Filter
Buffered Output Stream
GZIP Output Stream
File Output Stream
output file
Active Filter
method call
//... public static void main(String args)
// ... BufferedReader in new
BufferedReader( new FileReader(args0))
BufferedOutputStream out new
BufferedOutputStream( new
GZIPOutputStream(new FileOutputStream(args1)))
int c while((c in.read()) ! -1)
out.write(c) in.close() out.close()
//
Where are the pipes?
13Implementation Issues
- Identify the processing steps (re-using existing
filters as much as possible) - Define the data format to be passed along each
pipe - Define the end-of-stream symbol
- Decide how to implement each pipe connection as
active or passive - Design and implement the necessary filters
- Design error handling
14Stream Data Formats
- Fundamental tradeoff
- compatibility and reusability
- everything is a stream
- versus type safety
- stream of Persons, stream of Text, etc.
- Popular solutions
- raw byte stream
- stream of ASCII text lines with line separator
- record stream
- record attributes are strings, separated by,
e.g., commas - nested record stream
- record attribute is in turn a sequence
- stream representing tree traversal
- nodes enumerated in some defined order
- typed stream with header indicating its type
- event stream
- event name and event arguments
15Data Flow Summary
16An Architectural Comparison
- OO System Architecture
- objects passed as arguments of messages by
reference - data, code, and threads shared
- large number of object links
- object creation defined by other objects
- frequent bi-directional object exchange between
objects - focus on control flow (sequential)
- everything is an object
- small-grain system structure
- dynamic object links
- Data Flow Architecture
- data values passed as copies between filters
- nothing shared
- very small number of pipes
- filters and topology defined outside the
filters - unidirectional data flow
- focus on data flow (may be highly concurrent)
- filters can have complex internal structure that
cannot be described by pipes and filters - large-grain system structure
- mostly static pipe topology
17Benefits of Data Flow Architectures
- Intermediate data structures are possible but not
necessary (yes in batch sequential, no in
pipeline) - Flexibility through filter exchange
- Flexibility through recombination
- Reuse of filter components
- Rapid prototyping
- Parallel processing in a multiprocessor
environment
18Limitations of Data Flow Architectures
- Sharing state information is expensive or
inflexible - Efficiency loss in a single-processor environment
- cost of transferring data
- data dependencies between stream elements (e.g.,
sorting, tree traversal) - cost of context switching (particularly for
non-buffered pipes) - Data transformation overhead
- data on the stream
- objects in memory
- Difficulty of coordinated error handling
19Next ClassData Store Systems