Recording and Using Provenance in a Protein Compressibility Experiment - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Recording and Using Provenance in a Protein Compressibility Experiment

Description:

WS Calls. Java Calls. PReServ Implementation Diagram ... papers. PReServ software. July 27, 2005. High Performance Distributed Computing 05. Configuration ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 20
Provided by: PG9
Category:

less

Transcript and Presenter's Notes

Title: Recording and Using Provenance in a Protein Compressibility Experiment


1
Recording and Using Provenance in a Protein
Compressibility Experiment
  • Paul Groth, Simon Miles, Weijian Fang, Sylvia C.
    Wong,
  • Klaus-Peter Zauner and Luc Moreau
  • University of Southampton

2
Outline
  • Biology
  • The Workflow
  • Use Cases
  • Provenance
  • Implementation
  • Evaluation
  • Conclusion

3
Biology
  • Determine how protein sequences (chains of amino
    acids) fold into a 3D structure?
  • Which part of DNA translates into one protein
    sequence?
  • Structure of protein sequences may help to answer
    these questions.
  • Structure can be quantified by textual
    compressibility.
  • Determine the amino acid groupings that maximize
    compressibility?

4
The Workflow
  • Get Sequences
  • Make a Sample
  • Recode Sample
  • Compress and Measure
  • Shuffle the sample
  • Compress and Measure each permutation
  • Collate all measures
  • Produce the average compressibility

5
Use Case (1)
  • A bioinformatician, A, downloads sequence data of
    microbial proteins from the database RefSeq.
  • Runs the compressibility experiment.
  • A later performs the same experiment on the same
    sequence data, again downloaded from RefSeq.
  • A compares the two experiment results and notices
    a difference.
  • A determines whether the difference was caused by
    the algorithms changing

6
Use Case (2)
  • A bioinformatician performs an experiment on a
    FASTA sequence encoding a protein.
  • A reviewer, later determines whether or not the
    sequence was in fact processed by a service that
    meaningfully processes protein sequences only.

7
Provenance
  • Use cases related to process
  • Provenance Definition
  • The provenance of a result is the process that
    led to that result.
  • This is a conceptual definition.

8
Documentation of Process
  • Conceive a computer based representation of
    provenance
  • We represent the provenance of some data by
    documenting the process that led to the data
  • documentation can be complete or partial
  • it can be accurate or inaccurate
  • it can present conflicting or consensual views of
    the actors involved
  • it can provide operational details of execution
    or it can be abstract.

9
Heterogeneity
  • This is a heterogeneous application
  • Has shell scripts, java programs, web services
  • Heterogeneity is common in Grid based apps
  • LCG Atlas - Athena VDT coexist
  • Support for plugging-in different execution
    environments

10
Provenance Lifecycle
Record Documentation of Process
Query to retrieve the provenance of a result
Provenance Store
11
Use Case 1 Do services differ between
experiments?
Retrieve documentation of experiments
  • Service A
  • .
  • ..
  • Service A
  • .
  • ..
  • .

Highlight differences in services between
experiments
12
Implementation
  • Implemented as a VDT workflow
  • Scheduled by Condor
  • Each service, script, command records process
    documentation into a provenance store.
  • Uses PReServ a web services implementation of a
    provenance store

13
PReServ Implementation Diagram
14
Evaluation Deployment
  • Runs on VMWare
  • deployment consistency
  • ease of development
  • Workflow is executed on one machine
  • PReServ runs on another machine

15
Recording Performance
16
Query Performance
17
Conclusion
  • Both recording and query times are linear
  • 10 overhead for asynchronous recording
  • Our provenance concept / system are grounded in a
    number of use cases
  • The experiment is ready to be moved to a cluster
    or a grid
  • Southampton Cluster
  • A Grid
  • Will allow us to test scalability

18
Contact Info
  • Paul Groth
  • pg03r_at_ecs.soton.ac.uk
  • www.pasoa.org
  • use case descriptions
  • papers
  • PReServ software

19
Configuration
  • Redhat Linux 9.1 on VMWare on Windows XP
  • Pentium P4 2.8 GHZ 1.5 GB RAM
  • PReServ on another machine
  • Database backend Berkley JDB
  • 100 Mb local ethernet
Write a Comment
User Comments (0)
About PowerShow.com