Title: A Proof of Concept: Provenance in a Service Oriented Architecture
1A Proof of ConceptProvenance in a Service
Oriented Architecture
- Liming Chen, Victor Tan,
- Fenglian Xu, Alexis Biller,
- Paul Groth, Simon Miles,
- John Ibbotson, Michael Luck
- and Luc Moreau
2Purpose
- Asking questions about the provenance of
something, i.e. the process by which it came to
be as it is, is essential in many domains - We are working with bioinformaticians, medics,
aerospace engineers, physicists and have found a
wide range of questions they wish to ask - A simple example application can
- Clarify the requirements on software to aid
answering those questions - Be used to explain the issues involved to
non-domain experts - Be extended in controlled ways to explore issues
that arise in real applications
3EU Provenance and PASOA
- Recent work of the EU Provenance project
- Developed a logical architecture for software to
aid answering provenance-related questions, along
with other research on security, scalability and
user tool support. - Now being applied to two project applications
organ transport management (UPC, Spain) and
aerospace engineering (DLR, Germany) - The logical architecture document should be
released next week keep an eye on
www.gridprovenance.org - Recent work of the PASOA project
- Has focused on e-Science applications and has
gathered requirements, developed protocols and
software - EU Provenance used PASOA software for the work
described in this talk - PASOA will be discussed in the following two
presentations
4Outline
- The example application
- Asking provenance-related questions
- The example as a service-oriented process
- Recording documentation of a process
- What does the example show us?
- What are the limits of the example?
- Conclusions
5The example application
6Baking a Victoria Sponge
- INGREDIENTS
- 110g (4oz) Butter 110g (4oz) Caster Sugar 110g
(4oz) Self-raising Flour 2 Eggs Vanilla Essence
or 1 tsp Grated Lemon Rind - RECIPE
- Preheat oven to 190C 375F Gas 5. Whisk
together the butter and sugar until light and
creamy. Add the beaten eggs gradually with a
little of the flour. Fold in the remaining
sieved flour and add the flavouring. Divide
equally between two 15cm (6 inch) sandwich tins.
Bake for 20 - 25 minutes. Turn out on to a wire
rack to cool. - This is not so a contrived an example!
www.thefoody.com
720g sugar
and 20g butter
whisk them together
get mixture 1
8beat the eggs for 2 minutes
2 eggs
mix the beaten eggs with mixture 1
obtain mixture 2
9100g flour
together with mixture 2
fold to mixture 3
10set baking time to 30min
put mixture 3 into oven
obtain a cake
set baking temperature to 180C
11We then set a time for baking
cake
12After Baking
- Some questions can be asked after baking a cake
- Answers to the questions can be found if we
record details of the baking process during its
execution - Details of the baking process is what we call the
provenance of a cake
13What went wrong? Questions
- Did we follow the recipe accurately?
- Did we use the correct ingredients at the right
time? - Did we provide the correct quantities? Correct
units? - Did we perform actions for the right duration?
- We need to keep a record of all actions performed
with all their parameters (such as the number of
eggs used) - Organ transplant example Did the medics follow
the correct procedure? - Bioinformatics example Did I analyse a amino
acid sequence using tools that actually only
apply to nucleotide sequences?
14What went wrong? Questions
- Other factors can affect the baking process
- Amount of flour required varies with altitude
- Oven is broken and baked at a different
temperature - We need to know the internal state of the
different entities participating in the baking
process (such as actual oven temperature or oven
altitude) - Organ transplant example By what criteria did a
team decide to accept or reject an organ? - Bioinformatics example What script was used by
the services to perform each stage of the
experiment?
15Process Analysis Questions
- Did we use the same amount of ingredients for
baking cake 1 and cake 2? or in the same
proportion? - What was the longest step in the execution of a
recipe? - Why did not we finish the process? Where did we
stop? - The process that led to a given cake should be
delimited and analysable - Organ transplant example Which patients death
led to the organ now being transplanted? - Bioinformatics example What samples led to the
final analysis result?
16What Did Parties Do? Questions
- Did the baker follow the users instructions
(regardless of any claim from the baker)? - Did each step of the baking process follow the
users instructions? Did they receive the correct
instructions? - Did they follow the received instructions?
- All entities should document their view of a
process because it may vary - Organ transplant example Were there differing
opinions on the suitability of an organ for
transplant? - Bioinformatics example I claim I used a database
in my experiments whose license allows me to
patent my results does the database owner
confirm this?
17Implementation
- We implemented the application as a set of Web
Services, and then implemented clients that
answered the provenance-related questions by
querying the provenance store - This involved mapping the scenario onto a
service-oriented architecture
18Service-Oriented Process
19Recording
Provenance Store
Baker (Sugar, Flour, Beating Time, Temperature
After baking, the provenance store contains a
trace of the different activities that were
involved in the production of a cake.
Whisk (Butter, Sugar)
WhiskReturn (Mixture 1)
BeatMix (Mixture 1, Eggs, Beating Time)
BeatMixReturn (Mixture 2)
The provenance of a cake is the documentation of
the process that led to that cake
Fold (Flour, Mixture 2)
FoldReturn (Mixture 3)
OvenBake (Mixture 3, Temperature, Baking Time)
OvenBakeReturn (Cake)
BakerReturn (Cake)
20What we have learnt
21Process Documentation and Provenance
- We distinguish
- process documentation (the documentation recorded
into a provenance store about a process) - provenance (the information retrieved from a
provenance store about a process) - This is because we have found there to be
different requirements on each
Process documentation
Provenance
Processing
22Process documentation
- Should allow questions about the provenance of
entities to be answered - Should follow a consistent, application-independen
t structure so that independent parties can
record documentation that is easily combined - e.g. oven may be owned by someone other than the
user, but their documentation is combined to
answer whether the requested temperature was used - Should state exactly what those recording it know
to have happened, not confuse it with what they
guessed or inferred had happened - e.g. baker states that it put the cake in the
oven, not that the cake was successfully baked,
because the oven may have been broken
23Provenance
- Should give the client asking for the provenance
of something control over the scope of the answer - e.g. whether the process that produced the flour
is included in the provenance of the cake - Should be/provide the information relevant to
answering a clients/users questions (not swamp
them with detail) - e.g. report how much flour used rather than
giving XML structure sent between application
components - May (in order to achieve the above) include
inferred information - e.g. infer from baker putting mixture in oven and
getting cake out that the cake was successfully
baked from the mixture
24Provenance architectures
- Should allow different parties to record
independent documentation if they want to - e.g. user and baker can record independently,
allowing discrepancies to be noticed - Should have no dependence on any one workflow
engine/language, and no requirement for
(explicit) workflows to be used at all - e.g. our example application was written in Java,
and baking in reality follows a plan in someones
head - Should have independence from any one product of
a process should not be necessary to store
process documentation with any one result of a
process - e.g. the provenance of the cake, the provenance
of the ingredients and the provenance of the
intermediate mixtures overlap, so cannot claim it
belongs to any
25Limitations and Strengths
- The current example has limitations
- Physical world treated as if it mapped directly
to the electronic world how does a baker record
documentation in a provenance store Web Service?
through a GUI? what if the GUI goes wrong or
they use the GUI wrongly, do we still have sound
process documentation? - None of the objects in the process have
constituent parts that we may want to
independently find the provenance of - Assumes a single provenance store that every
service happily submits documentation to - but the strength of the example is that it can
be simply extended to remove these limitations
26Conclusions
- The simple example allows us to determine the
requirements on software to record process
documentation and make it available to users - We have used it as a testbed, extending it to
explore other aspects of provenance (along with
other applications) - It is rich enough to continue extending to
mirror, in a controlled way, issues discovered in
the future
27EU Provenance Partners
- IBM United Kingdom Limited
- University of Southampton
- University of Wales, Cardiff
- Deutsches Zentrum fur Luft- und Raumfahrt s.V
- Universitat Politecnica de Catalunya
- Magyar Tudomanyos Akademia Szamitastechnikai es
Automatizalasi Kutato Intezet