Data Flow Implementation EEE465 1999 Lecture 4 Includes material from a lecture by Prof' Dr' Florian - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Data Flow Implementation EEE465 1999 Lecture 4 Includes material from a lecture by Prof' Dr' Florian

Description:

Implementation involves writing each of the batch programs to read the ... run pipes and filters (non-deterministically) until no more computations are possible ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 20
Provided by: GregPh4
Category:

less

Transcript and Presenter's Notes

Title: Data Flow Implementation EEE465 1999 Lecture 4 Includes material from a lecture by Prof' Dr' Florian


1
Data Flow ImplementationEEE465 1999 Lecture 4
Includes material from a lecture by Prof. Dr.
Florian Matthes
  • Major Greg Phillips
  • Royal Military College of Canada
  • Electrical and Computer Engineering
  • greg.phillips_at_rmc.ca
  • 01-613-541-6000 ext. 6190

2
Batch Sequential Architecture
  • Processing steps are independent programs
  • Each step runs to completion before next step
    starts

file
Validate
Sort
Update
Report
file
file
file
file
report
Implementation involves writing each of the batch
programs to read the relevant input file and
write the required output file. The application
architecture is created in an application
scripting language (e.g., JCL or Rexx) which runs
each program in the required sequence making use
of the generated files.
3
Pipeline Architecture
Data flow
ASCII stream (pipe)
Process
Implementation involves writing each of the pipe
and filter processes to read the relevant input
stream and write the required output stream. The
application architecture is created in an
application scripting language (e.g., Unix sh or
Java) which runs the pipe and filter processes in
parallel and establishes the required connections.
filter
4
Recall Pipes and Filters
  • Filter
  • incrementally transform some of the source data
    to sink data
  • enrich data by computing and adding information
  • refine data by concentrating or extracting
    information
  • transform data by changing its representation
  • stream-to-stream transformations
  • use little local context in processing streams
  • preserve no state between instantiations
  • Pipe
  • moves data from a filter output to a filter input
    (or file)
  • one-way flow may be flow control upstream but
    not data
  • may implement a bounded or unbounded buffer
  • pipes form data transmission graphs
  • Overall operation
  • run pipes and filters (non-deterministically)
    until no more computations are possible
  • action mediated by data delivery

5
Filters
  • There are two strategies to construct a filter
  • an active filter drives the data flow on the
    pipes
  • a passive filter is driven by the the data flow
    on the (input or output) pipes
  • In a pipe and filter architecture there has to be
    at least one active filter
  • This active filter can be in the environment of
    the system (e.g., user input)
  • If there is more than one active filter, a
    buffering and synchronization mechanism is
    required

6
Pipes
  • A pipe is a first-class object
  • A pipe transfers data from a data source to a
    data sink
  • A pipe may implement a bounded or unbounded
    buffer
  • Pipes can be between
  • two threads of a single process (e.g., Java IO
    Streams)
  • stream may contain references to shared language
    objects
  • two processes on a single host computer (e.g.,
    Unix Named Pipes)
  • stream may contain references to shared operating
    system objects (e.g., files)
  • two processes in a distributed system (e.g.,
    Internet Sockets)
  • stream contents normally limited to raw bytes
  • protocols implement higher level abstractions
    (e.g., pass pipes as references, pass CORBA
    object references)

7
Pipes and Filters in Unix
  • Unix processes that transform stdin to stdout are
    generally called filters
  • but when they consume all the input before
    starting the output, they violate the filter
    assumption (e.g., sort)
  • Unix pipes can treat files as well as filters as
    data sources and sinks
  • but files are passive, so you cant make
    arbitrary combinations
  • Unix assumes that the pipes carry ASCII character
    streams
  • the good news anything can connect to anything
    else
  • the cloud on the good news filters make
    assumptions about the syntax of the stream
  • the bad news everything must be encoded in
    ASCII, then shipped, then decoded

8
Quick Review Questions
  • Question what is the force that makes data flow?
  • Question why cant you pipe a file to a file in
    Unix?

9
Unix Pipe and Filter Example
  • Task Print a sorted list of all words that
    appear more than once in a given file
  • mknod pipeA p
  • mknod pipeB p
  • sort pipeA gt pipeB
  • cat file1 tee pipeA sort -u comm -13 -
    pipeB gt file2

Legend
pipeA
pipeB
sort
filex
file
tee
cat
sort -u
comm -13
file1
file2
filter
command
pipe
10
Java IO Streams
  • Java provides a set of IO Stream classes in its
    java.io package. These are divided into
  • Readers and Writers, which process streams of
    Unicode characters, and
  • InputStreams and OutputStreams, which process
    streams of bytes
  • All Java IO streams are passive filters which
    must be attached to at least one active filter
    supplied by the application programmer.
  • Synchronization between multiple active filters
    is provided by PipedWriter and PipedReader
    (Stream) classes, which together implement
    (roughly) the equivalent of a Unix pipe.

11
Push vs Pull
  • InputStreams and Readers are pull filters
  • they implement a read() method, which is called
    by the next filter in the chain
  • OutputStreams and Writers are push filters
  • they implement a write() method, which is called
    by the previous filter in the chain

push filter
pull filter
char read() char x myInput.read() return
f(x)
void write(char x) myOutput.write(f(x))
12
Java IO Streams Example
input file
Legend
File Reader
Buffered Reader
main()
file
Passive Filter
Buffered Output Stream
GZIP Output Stream
File Output Stream
output file
Active Filter
method call
//... public static void main(String args)
// ... BufferedReader in new
BufferedReader( new FileReader(args0))
BufferedOutputStream out new
BufferedOutputStream( new
GZIPOutputStream(new FileOutputStream(args1)))
int c while((c in.read()) ! -1)
out.write(c) in.close() out.close()
//
Where are the pipes?
13
Implementation Issues
  • Identify the processing steps (re-using existing
    filters as much as possible)
  • Define the data format to be passed along each
    pipe
  • Define the end-of-stream symbol
  • Decide how to implement each pipe connection as
    active or passive
  • Design and implement the necessary filters
  • Design error handling

14
Stream Data Formats
  • Fundamental tradeoff
  • compatibility and reusability
  • everything is a stream
  • versus type safety
  • stream of Persons, stream of Text, etc.
  • Popular solutions
  • raw byte stream
  • stream of ASCII text lines with line separator
  • record stream
  • record attributes are strings, separated by,
    e.g., commas
  • nested record stream
  • record attribute is in turn a sequence
  • stream representing tree traversal
  • nodes enumerated in some defined order
  • typed stream with header indicating its type
  • event stream
  • event name and event arguments

15
Data Flow Summary
16
An Architectural Comparison
  • OO System Architecture
  • objects passed as arguments of messages by
    reference
  • data, code, and threads shared
  • large number of object links
  • object creation defined by other objects
  • frequent bi-directional object exchange between
    objects
  • focus on control flow (sequential)
  • everything is an object
  • small-grain system structure
  • dynamic object links
  • Data Flow Architecture
  • data values passed as copies between filters
  • nothing shared
  • very small number of pipes
  • filters and topology defined outside the
    filters
  • unidirectional data flow
  • focus on data flow (may be highly concurrent)
  • filters can have complex internal structure that
    cannot be described by pipes and filters
  • large-grain system structure
  • mostly static pipe topology

17
Benefits of Data Flow Architectures
  • Intermediate data structures are possible but not
    necessary (yes in batch sequential, no in
    pipeline)
  • Flexibility through filter exchange
  • Flexibility through recombination
  • Reuse of filter components
  • Rapid prototyping
  • Parallel processing in a multiprocessor
    environment

18
Limitations of Data Flow Architectures
  • Sharing state information is expensive or
    inflexible
  • Efficiency loss in a single-processor environment
  • cost of transferring data
  • data dependencies between stream elements (e.g.,
    sorting, tree traversal)
  • cost of context switching (particularly for
    non-buffered pipes)
  • Data transformation overhead
  • data on the stream
  • objects in memory
  • Difficulty of coordinated error handling

19
Next ClassData Store Systems
Write a Comment
User Comments (0)
About PowerShow.com