Title: Privacy issues in integrating R environment in scientific workflows
1Privacy issues in integrating R environment in
scientific workflows
- Dr. Zhiming Zhao
- University of Amsterdam
- Virtual Laboratory for e-Science
Privacy issues in integrating Legacy Experiment
Environment to Scientific WorkflowsZhiming Zhao,
Dmitry A. Vasunin, Adianto Wibisono, Adam
Belloum, Cees de Laat, Pieter Adriaans, Bob
Hertzberger
2Outline
- Scientific experiments and R
- Problem description
- Optional solutions
- Experimental results
- Summarizing discussion
- Future work
3Scientific experiments and support systems
- In such scenarios
- Existing experiment environments, such as R, are
widely used by domain scientists - Human in the loop computing is important for
testing and validating prototypes - scientific workflows are used to manage different
processes and the experiment lifecycle
4R and workflow support in VL-e
- R realises rich functionality of data statistics
and visualisation, and has been used as an
important experimental environment in
bio-sciences. - R needs scientific workflow support
- Accessing different e-Science resources
- Being coordinated with the other components in a
large scale experiment - E-Science workflows in certain domains also need
R - Reuse the advanced results from legacy systems
- Support experiments developed on legacy systems
- Workflow support in VL-e
- Four systems are recommended
- Taverna, Kepler and VLAM have support to R
- A generic solution is under construction
5R in scientific workflows current solutions
- Three types of solutions
- Local local installation of R, through the
command line interface of R - Simple configuration
- Performance bottleneck
- Web Service SOAP to pass R script and objects
- Standard interface, distributed computing
- High latency
- TCP Socket socket interface (RServe)
- Distributed computing
- Maintain states
- Poor security
6Typical scenario of RServe and requirements on
privacy
- Different levels of privacy issues
- Data level
- Intermediate results not to be seen by the other
users - Communication level graphical display
- Remote X display and interaction between multi
users
WF1
WF2
R
Display
7Problem description and desired solution
- Problem description
- Most of the legacy experiment environment do not
have strong security management - Workflow systems provide integration without
considering security issues - The deployment of remote environment is required
to be secure - Desire
- Using existing technologies
- Provide solutions to privacy issues at workflow
level, preferably in a transparent way
8Experiments
- Review optional solutions
- Investigate the overhead of security enhancement
on the workflow execution
9Different configurations and their level of
security
10An experiment Taverna, RServe and security tunnel
- Experiment
- Adding security enhancement in Taverna
- Protect the data channels between Taverna and
RServe - Overhead
- Setting up security tunnels
- Runtime data transfer
11Summarizing discussion
- Integrating existing experiment environment with
workflow system is important for rapid
prototyping - Privacy issues are demanded by both users and
e-Science infrastructure, and can be viewed a
generic issue when integrating a user interaction
enabled legacy component in workflow - Privacy protection can be achieved at certain
level by customizing the workflow execution - Enhancing workflow execution not necessarily
gives high penalty on execution
12Future work
- In the VL-e project, we are developing a bus
style generic solution for different workflow
systems - Taking the data privacy into account when
realizing the interoperability between different
workflow systems
13Activities
- Intl workshop on Workflow systems in
e-Science, organized by Zhiming Zhao and Adam
Belloum, in the context of ICCS, 2006 Reading
University, 2007 Beijing, China. - Proceedings is in LNCS, Springer Verlag.
- A special issue will be published in Scientific
Programming Journal. - http//staff.science.uva.nl/zhiming/iccs-wses
- Workshop on Scientific workflows and industrial
workflow standards in e-Science , organized by
Adam Belloum and Zhiming Zhao, in the context of
IEEE e-Science and Grid computing conference in
Amsterdam December 2006. - Pegasus, Dr. Ewa Deelman (Department of Computer
Science University of South California) - BPEL, Dr. Dieter König (IBM Research Germany
Development Laboratory) - Kepler, Dr. Bertram Ludäscher (Department of
Computer Science University of California, Davis)
- Taverna, Prof. Peter Rice (European
Bioinformatics Institute) - WS and Semantic issues, Dr. Steve Ross-Talbot
(CEO, and a co-founder, of Pi4 Technologies) - Triana, Dr. Ian J. Taylor (Department of Computer
Science Cardiff University) - http//staff.science.uva.nl/adam/workshop/VL-e-wo
rkshop.htm