ES3 | ||||||||
| Line: 54 to 53 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| Line: 67 to 66 | ||||||||
| We wrote a translator to read a foreign provenance data file and translate it to ES3 objects which could then be sent to ES3. ES3ingest.py is the translator script that was created for the translation step. Here is the command syntax and examples for ES3ingest.py: | ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| Usage: ES3ingest.py -t foreign file type -e execution log file filename example: | ||||||||
| Line: 75 to 74 | ||||||||
| ES3ingest.py -t PASS challenge-D-mod.xml ES3ingest.py -t VisTrails -e pc_log.xml pc_part3a.xml" | ||||||||
| Changed: | ||||||||
| < < |
||||||||
| > > |
||||||||
Translating PASS Provenance Data | ||||||||
| Line: 94 to 92 | ||||||||
Using ES3 Provenance DataES3 provenance data was used for the second portion of the challenge workflow. This data was collected by running the provenance challenge workflow scripts while the probulator was monitoring them. The script run was 'workflow-part2.sh' which executed the command: | ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| $AIR_DIR/bin/softmean atlas.hdr y null resliced1.img resliced2.img resliced3.img resliced4.img | ||||||||
| Changed: | ||||||||
| < < |
||||||||
| > > |
||||||||
| The ES3 transmitter was then run, which send the information captured by the probulator to ES3. | ||||||||
| Line: 204 to 202 | ||||||||
Discussion | ||||||||
| Changed: | ||||||||
| < < |
Our solution to Query 7, while not implemented entirely as an ES3 Core query, is nevertheless responsive to one of the primary classes of user queries that ES3 as whole was designed to support; namely, "what changed?" queries. It's extremely common for scientists developing ad hoc workflows to notice differences in outputs across invocations between which "nothing was changed". Our graph-differencin g approach is designed to answer the "what changed?" query as directly (and visually) as possible, while still allowing subsequent drill-down into the details. | |||||||
| > > |
Our solution to Query 7, while not implemented entirely as an ES3 Core query, is nevertheless responsive to one of the primary classes of user queries that ES3 as whole was designed to support; namely, "what changed?" queries. It's extremely common for scientists developing ad hoc workflows to notice differences in outputs across invocations between which "nothing was changed". Our graph-differencing approach is designed to answer the "what changed?" query as directly (and visually) as possible, while still allowing subsequent drill-down into the details. | |||||||
ES3 | |||||||||||||||||||
| Line: 60 to 60 | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model Integration Results | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
The second phase of the provenance challenge required each team to ingest provenance data from other systems into their own system. For this task we wrote a translator to read a foreign provenance data file and translate it to ES3 objects which could then be sent to ES3. | ||||||||||||||||||
| > > |
We imported provenance data from the PASS system and VisTrails.
Translation DetailsWe wrote a translator to read a foreign provenance data file and translate it to ES3 objects which could then be sent to ES3. | ||||||||||||||||||
|
ES3ingest.py is the translator script that was created for the translation step. Here is the command syntax and examples for ES3ingest.py:
| |||||||||||||||||||
| Line: 137 to 141 | |||||||||||||||||||
Benchmarks | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
The provenance queries used in the First Provenance Challenge were used successfully for this challenge without changes. | ||||||||||||||||||
| > > |
We used the provenance queries from the first challenge as a benchmark, since these queries are well known to every team and results are easily compared between the first and second challenge. The provenance queries used in the First Provenance Challenge were used successfully for this challenge without changes. | ||||||||||||||||||
Provenance Queries | |||||||||||||||||||
| Deleted: | |||||||||||||||||||
| < < |
| ||||||||||||||||||
Query 1 | |||||||||||||||||||
| Line: 206 to 208 | |||||||||||||||||||
| "what changed?" queries. It's extremely common for scientists developing ad hoc workflows to notice differences in outputs across invocations between which "nothing was changed". Our graph-differencin g approach is designed to answer the "what changed?" query as directly (and visually) as possible, while still allowing subsequent drill-down into the details. | |||||||||||||||||||
| Deleted: | |||||||||||||||||||
| < < |
Queries 8 and 9We did not implement Queries 8 and 9, since the ES3 Core currently doesn't support annotations. (See Further Comments below) | ||||||||||||||||||
Further Comments | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
ES3's provenance management currently concentrates on the automatic, transparent acquisition of structural provenance; i.e., reverse-engineering workflow. There is nothing that prevents one from storing in ES3 the additional content-based information required to by Queries 5, 8, and 9; however, we have not yet implemented a way to "slipstream" this information into the Probulator logs or Transmitter messages while remaining unobtrusive to the ES3 user. This is definitely within ES3's scope, which is why we've scored these queries , and is the part of ES3 currently being developed.
| ||||||||||||||||||
| > > |
The manual operation of stitching together the provenance data from different systems to make a complete workflow was cumbersome. ES3 can use md5sums to combine workflows, but md5sums are often an expensive operation and often this data is not collected. Another method of combining data should be found if it proves to be beneficial to combine dissimilar provenance data in the future. | ||||||||||||||||||
| Deleted: | |||||||||||||||||||
| < < |
Further Comments<!-- Provide here further comments. --> TBD | ||||||||||||||||||
Conclusions | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
<!-- Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting. --> TBD | ||||||||||||||||||
| > > |
Translating foreign provenance data and importing into ES3 was fairly straighforward. However, fully understanding another systems data model from exported data and documentation is an incomplete method, which affects the implementation of the translation process. Interoperability would be facilitated by a common set of terms and possibly a common provenance data format. | ||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
-- JamesFrew - 22 Feb 2007 | ||||||||||||||||||
| > > |
-- JamesFrew - 25 June 2007 | ||||||||||||||||||
ES3 | ||||||||
| Line: 198 to 198 | ||||||||
|---|---|---|---|---|---|---|---|---|
Query 7 | ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
Pending | |||||||
Discussion | ||||||||
| Line: 225 to 213 | ||||||||
Further Comments | ||||||||
| Changed: | ||||||||
| < < |
ES3's provenance management currently concentrates on the automatic, transparent acquisition of structural provenance; i.e., reverse-engineering workflow. There is nothing that prevents one from stori
ng in ES3 the additional content-based information required to by Queries 5, 8, and 9; however, we have not yet implemented a way to "slipstream" this information into the Probulator logs or Transmitter
messages while remaining unobtrusive to the ES3 user. This is definitely within ES3's scope, which is why we've scored these queries , and is the part of ES3 currently being developed.
| |||||||
| > > |
ES3's provenance management currently concentrates on the automatic, transparent acquisition of structural provenance; i.e., reverse-engineering workflow. There is nothing that prevents one from storing in ES3 the additional content-based information required to by Queries 5, 8, and 9; however, we have not yet implemented a way to "slipstream" this information into the Probulator logs or Transmitter messages while remaining unobtrusive to the ES3 user. This is definitely within ES3's scope, which is why we've scored these queries , and is the part of ES3 currently being developed.
| |||||||
Further Comments | ||||||||
ES3 | |||||||||||||||||||
| Line: 60 to 60 | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model Integration Results | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
The second phase of the provenance challenge required each team to ingest provenance data from other systems into their own system. For this task we wrote a translator to read a foreign provenance data file and translate it to ES3 objects which could then be sent to ES3. ES3ingest.py is the translator script that was created for the translation step. Here is the command syntax and examples for ES3ingest.py: | ||||||||||||||||||
| > > |
The second phase of the provenance challenge required each team to ingest provenance data from other systems into their own system. For this task we wrote a translator to read a foreign provenance data file and translate it to ES3 objects which could then be sent to ES3. | ||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
ES3ingest.py is the translator script that was created for the translation step. Here is the command syntax and examples for ES3ingest.py:
| ||||||||||||||||||
| Usage: ES3ingest.py -t foreign file type -e execution log file filename example: | |||||||||||||||||||
| Line: 75 to 71 | |||||||||||||||||||
| ES3ingest.py -t PASS challenge-D-mod.xml ES3ingest.py -t VisTrails -e pc_log.xml pc_part3a.xml" | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
|||||||||||||||||||
| > > |
|||||||||||||||||||
Translating PASS Provenance Data | |||||||||||||||||||
| Line: 94 to 90 | |||||||||||||||||||
Using ES3 Provenance DataES3 provenance data was used for the second portion of the challenge workflow. This data was collected by running the provenance challenge workflow scripts while the probulator was monitoring them. The script run was 'workflow-part2.sh' which executed the command: | |||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
| ||||||||||||||||||
| $AIR_DIR/bin/softmean atlas.hdr y null resliced1.img resliced2.img resliced3.img resliced4.img | |||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
|||||||||||||||||||
| The ES3 transmitter was then run, which send the information captured by the probulator to ES3. lineageTrace-part2.graphml is the XML retuned by and ES3 lineage query that shows the second portion of the challenge workflow.]] | |||||||||||||||||||
| Line: 118 to 117 | |||||||||||||||||||
Combining parts of the workflow | |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
The usual method for ES3 to combine workflows is via the files that they share. One workflow creates an output file, then subsequent workflows read these files. The md5sum calculated for these files and stored when the file is registerd is used to determine which files are common to workflows. Lineage queries will determine these common files and traverse workflows that share them. | ||||||||||||||||||
| > > |
The usual method for ES3 to combine workflows is via the files that they share. One workflow creates an output file, then subsequent workflows read these files. The md5sum calculated for these files and stored when the file is registerd is used to determine which files are common to workflows. Lineage queries will determine these common files and traverse workflows that share them. | ||||||||||||||||||
| If the md5sum is not provided however, such as with the provenance data from the Provenance Challenge, then the workflows have to be stitched together manually creating "identity" links between common files. | |||||||||||||||||||
| Line: 140 to 135 | |||||||||||||||||||
| shows is a graphical representation of a lineage query showing the combined workflow. | |||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
Benchmarks | ||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
Translation Details | ||||||||||||||||||
| > > |
The provenance queries used in the First Provenance Challenge were used successfully for this challenge without changes. | ||||||||||||||||||
| Deleted: | |||||||||||||||||||
| < < |
<!-- Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous --> TBD | ||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
Benchmarks | ||||||||||||||||||
| > > |
Provenance Queries | ||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
<!-- Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system --> TBD | ||||||||||||||||||
| > > |
Query 1
Query 2
Query 3
DiscussionThe ES3 Core data model doesn't include a concept of workflow "stages". For this query we simply traced back five links (our interpretation of "Stages 3, 4, and 5" in the challenge workflow) from the "A tlas X Graphic" object. The lineage trace query uses a termination condition that states the trace should end after traversing five links from the starting UUID.Query 4DiscussionThe split score ( ) for this query is due to XQuery's lack of support for queries based on day-of-week.
Query 5We did not implement Query 5, since the ES3 Probulator currently doesn't examine the contents of the objects it monitors. (See Further Comments below)Query 6
Query 7
DiscussionOur solution to Query 7, while not implemented entirely as an ES3 Core query, is nevertheless responsive to one of the primary classes of user queries that ES3 as whole was designed to support; namely, "what changed?" queries. It's extremely common for scientists developing ad hoc workflows to notice differences in outputs across invocations between which "nothing was changed". Our graph-differencin g approach is designed to answer the "what changed?" query as directly (and visually) as possible, while still allowing subsequent drill-down into the details.Queries 8 and 9We did not implement Queries 8 and 9, since the ES3 Core currently doesn't support annotations. (See Further Comments below)Further CommentsES3's provenance management currently concentrates on the automatic, transparent acquisition of structural provenance; i.e., reverse-engineering workflow. There is nothing that prevents one from stori ng in ES3 the additional content-based information required to by Queries 5, 8, and 9; however, we have not yet implemented a way to "slipstream" this information into the Probulator logs or Transmitter messages while remaining unobtrusive to the ES3 user. This is definitely within ES3's scope, which is why we've scored these queries , and is the part of ES3 currently being developed.
| ||||||||||||||||||
Further Comments | |||||||||||||||||||
ES3 | ||||||||
| Line: 15 to 15 | ||||||||
|---|---|---|---|---|---|---|---|---|
| ES3 lineage trace schema | ||||||||
| Changed: | ||||||||
| < < |
||||||||
| > > |
||||||||
| Data Model | ||||||||
| Line: 60 to 60 | ||||||||
Model Integration Results | ||||||||
| Changed: | ||||||||
| < < |
<!-- State here which combinations of teams' models you have managed to perform the provenance query over --> TBD | |||||||
| > > |
The second phase of the provenance challenge required each team to ingest provenance data from other systems into their own system.
For this task we wrote a translator to read a foreign provenance data file and translate it to ES3 objects which could then be sent to ES3.
ES3ingest.py is the translator script that was created for the translation step.
Here is the command syntax and examples for ES3ingest.py:
Usage: ES3ingest.py -t foreign file type -e execution log file filename
example:
ES3ingest.py -t PASS challenge-D-mod.xml
ES3ingest.py -t VisTrails -e pc_log.xml pc_part3a.xml"
Translating PASS Provenance DataProvenance data from the PASS system was used for the first portion of the challenge workflow. The data model used by PASS is very similiar to the one used in ES3. The translation process involved converting PASS 'PROC' objects into ES3 'transformation' objects and PASS 'FILE' objects into ES3 'file' objects. runCmd is a shell script that runs the translator program for the PASS data. lineageTrace-part1.graphml is the XML returned by an ES3 lineage query that shows the first portion of the challenge workflow. [[http://eil.bren.ucsb.edu/ES3/SecondProvenanceChallenge/PHASE2/Teams/PASS/Results/lineageTrace-part1.png]lineageTrace-part1.png]] is a graphical rendering of an ES3 lineage query that shows the PASS provenance data in ES3.Using ES3 Provenance DataES3 provenance data was used for the second portion of the challenge workflow. This data was collected by running the provenance challenge workflow scripts while the probulator was monitoring them. The script run was 'workflow-part2.sh' which executed the command: $AIR_DIR/bin/softmean atlas.hdr y null resliced1.img resliced2.img resliced3.img resliced4.img The ES3 transmitter was then run, which send the information captured by the probulator to ES3. lineageTrace-part2.graphml is the XML retuned by and ES3 lineage query that shows the second portion of the challenge workflow.]] lineageTrace-part2 is a graphical rendering of an ES3 lineage query that shows the VisTrails provenance data in ES3.Translating VisTrails Provenance DataProvenance data from the VisTrails system was used for the third portion of the challenge workflow. lineageTrace-part3.graphml is the XML returned by an ES3 lineage query that shows the third portion of the challenge workflow. lineageTrace-part3.png is a graphical rendering of an ES3 lineage query that shows the VisTrails provenance data in ES3. lineageTrace-part3-Q7.graphml is the XML returned by an ES3 lineage query that shows the third portion of the challenge workflow. lineageTrace-part3-Q7.png is a graphical rendering of an ES3 lineage query that shows the VisTrails provenance data in ES3.Combining parts of the workflowThe usual method for ES3 to combine workflows is via the files that they share. One workflow creates an output file, then subsequent workflows read these files. The md5sum calculated for these files and stored when the file is registerd is used to determine which files are common to workflows. Lineage queries will determine these common files and traverse workflows that share them. If the md5sum is not provided however, such as with the provenance data from the Provenance Challenge, then the workflows have to be stitched together manually creating "identity" links between common files. The files demonstrate how this was done to stitch together part1 to part2 and part3 to part3 of the workflow. The file shows is a graphical representation of a lineage query showing the combined workflow. | |||||||
Translation Details | ||||||||
ES3 | ||||||||
| Line: 48 to 48 | ||||||||
|---|---|---|---|---|---|---|---|---|
Provenance Data for Workflow Parts
| ||||||||
| Changed: | ||||||||
| < < |
||||||||
| > > |
||||||||
| ||||||||
| Changed: | ||||||||
| < < |
||||||||
| > > |
||||||||
Model Integration Results | ||||||||
ES3 | ||||||||
| Line: 13 to 13 | ||||||||
|---|---|---|---|---|---|---|---|---|
Differences from First Challenge | ||||||||
| Added: | ||||||||
| > > |
ES3 lineage trace schema | |||||||
| Added: | ||||||||
| > > |
Data Model The data model for ES3 contains only 4 types of objects: 1) files, 2) data transformations, 3) links 4) workflows. File objects in ES3 represent files on disk that are read from or written to during the execution of the workflow. File objects may be data files that are manipulated directly by the workflow or may be files that are read and written by the executables used by the workflow, including operating system libraries, directories and temporary files. File information may be filtered before being sent to ES3 using a configuration file, so that files that are not of interest to the investigator are ignored, such as system libraries or temporary files that the workflow uses but are not of interest to the investigator. Data transformation objects are executable scripts or programs that are run during the execution of the workflow. Link objects represent the connections between ES3 objects, for example between file objects and transformation objects. A link has a single direction so each link defined a 'source' object and a 'destination' object. Links must be organized as a Directed Acyclic Graph, such that no links point backward in the graph to create a loop. A workflow object is the container in which all file, transformation and link objects belong. The workflow object represents all objects that are used during an instance of scientific processing that begins when recording for a Unix process begins and ends when that process exits. Workflows can be connected to each other implicitly via that one workflow writes and another workflow reads. No explicit connection is created between workflows. A workflow may contain another workflow thereby creating a nested structure. | |||||||
Provenance Data for Workflow Parts
| ||||||||
ES3 | ||||||||
| Line: 18 to 18 | ||||||||
|---|---|---|---|---|---|---|---|---|
Provenance Data for Workflow Parts
| ||||||||
| Changed: | ||||||||
| < < |
||||||||
| > > |
| |||||||
Model Integration Results | ||||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Added: | ||||||||
| > > |
ES3Participating Team
Differences from First ChallengeProvenance Data for Workflow PartsModel Integration Results<!-- State here which combinations of teams' models you have managed to perform the provenance query over --> TBDTranslation Details<!-- Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous --> TBDBenchmarks<!-- Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system --> TBDFurther Comments<!-- Provide here further comments. --> TBDConclusions<!-- Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting. --> TBD -- JamesFrew - 22 Feb 2007 | |||||||