<<O>>  Difference Topic SecondProvenanceChallenge (r1.15 - 29 Apr 2007 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 25 to 25

  • November 2006 Draft of challenge presented for discussion
  • Start of December 2006 Challenge starts
  • 20th of February 2007 First phase of challenge ends
Changed:
<
<
  • 25/26 June 2007 Challenge ends, workshop to discuss results
>
>
  • 26th of June 2007 Challenge ends, workshop to discuss results

The workshop will be held in Monterey, California prior to HPDC 2007 there. Specific timing and agenda will be made available later.

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.14 - 12 Jan 2007 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 14 to 14

  • Understand where data in one model is translatable to or has no parallel in another model.
  • Understand how the provenance of data can be traced across multiple systems, so adding value to all those systems.
Changed:
<
<
The challenge is split into two phases. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.
>
>
The challenge is split into two phases. Each team should create a TWiki page for their second challenge results, linked from ParticipatingTeams, and inform others when they complete each phase.

Once the challenge is complete, we will make all provenance data, translation programs and queries available on our Web site. Our goal is to create a repository with multiple scenarios and query workloads that can serve as benchmarks for provenance systems.
Line: 72 to 72

  • Any data which was excluded in translation from a downloaded model because it was extraneous
  • Benchmark queries and results of applying their own suggested benchmarks to their own system
Changed:
<
<
Each team should, as before, construct a results page on the TWiki and upload the results above onto it. The template to follow for this challenge is given here.
>
>
Each team should, as before, construct a results page on the TWiki, linked from ParticipatingTeams, and upload the results above onto it. The template to follow for this challenge is given here.

Workflow Parts

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.13 - 03 Jan 2007 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 24 to 24

The timeline for the second provenance challenge is as follows:
  • November 2006 Draft of challenge presented for discussion
  • Start of December 2006 Challenge starts
Changed:
<
<
  • End of January 2006 First phase of challenge ends
>
>
  • 20th of February 2007 First phase of challenge ends

  • 25/26 June 2007 Challenge ends, workshop to discuss results

The workshop will be held in Monterey, California prior to HPDC 2007 there. Specific timing and agenda will be made available later.

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.12 - 08 Dec 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 76 to 76

Workflow Parts

Changed:
<
<
The details of the workflow are given on the first provenance challenge page. However, the workflow is now split into three parts, depicted graphically here. Click on workflow images for high resolution versions.
>
>
This challenge is based on the same workflow definition as in the first challenge. However, the workflow definition is now split into three parts, depicted graphically here.

As shown below, the workflow is comprised of procedures exchanging data. We do not want to restrict the execution environment for workflow runs, as the challenge is about provenance, and so only define only the essentials of the workflow: the roles of data in the workflow, types of procedure performed and where the output of one procedure becomes input for another. We do not otherwise prescribe how workflows are instantiated or run. The details of the workflow definition are given on the first provenance challenge page, as are pre-computed data for each of the items in the workflow for those that wish to simulate their workflow runs.

Click on workflow images for high resolution versions.


Part 1

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.11 - 05 Dec 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Changed:
<
<
The first provenance challenge was established following the IPAW 2006 workshop, where a range of papers were presented that described provenance models and systems. The challenge was proposed as a means for the disparate groups to gain a better understanding of the similarities, differences, core concepts and common issues across systems. The challenge consisted of running a workflow for an fMRI application and answering a set of queries over the provenance derived. The participants all executed this same workflow and performed the same set of queries over the data collected regarding the execution. At the conclusion of the challenge, the participants met and discussed their results, the commonality of the problem being a point around which comparison could take place.
>
>
The first provenance challenge was established following the IPAW 2006 workshop, where a range of papers were presented that described provenance models and systems. The challenge was proposed as a means for the disparate groups to gain a better understanding of the similarities, differences, core concepts and common issues across systems. The challenge consisted of running a workflow for an fMRI application and answering a set of queries over the provenance derived. The participants all executed this same workflow and performed the same set of queries over the data collected regarding the execution. At the conclusion of the challenge, the participants met and discussed their results, the commonality of the problem being a point around which comparison could take place.

While the first challenge had a large number of participants and led to valuable discussion about the aspects of provenance which were fundamental to all approaches, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data. It was decided that a second challenge, based on the first, was desirable. With the first challenge showing that there are multiple levels of granularity/types of provenance that may be relevant at different times or in different parts of a process, understanding interoperability becomes a key issue, and so this will be the focus of second challenge.

Line: 41 to 31

Communicable Data Model

Changed:
<
<
The first phase of the challenge is to make available provenance data/process documentation for a workflow. As the focus of the challenge is to share and combine data models, we need to make it easy for other teams to parse the data. Therefore, the data should be exported in a well documented format, and a reference given to a free parser for that format. For consistency and availability of parsers, we strongly encourage (but do not require) teams to export their data in XML. The schema should be provided and adequately described for others to use, either in the schema document, on the TWiki or in a referenced document.
>
>
The first phase of the challenge is to make available provenance data/process documentation for adequate workflow runs to answer the provenance queries. As the focus of the challenge is to share and combine data models, we need to make it easy for other teams to parse the data. Therefore, the data should be exported in a well documented format, and a reference given to a free parser for that format. For consistency and availability of parsers, we strongly encourage (but do not require) teams to export their data in XML. The schema should be provided and adequately described for others to use, either in the schema document, on the TWiki or in a referenced document.

The second challenge is based on the same workflow as the first. However, it is now divided into three parts:

  • Part 1: align_warp and reslice (stages 1 and 2)
  • Part 2: softmean (stage 3)
  • Part 3: slicer and convert (stages 4 and 5)
Changed:
<
<
Each part is considered a workflow in its own right with regards to provenance data, so there will be three distinct sets of provenance data uploaded to the TWiki for each team. To aid translation, adequate supporting information should be given on how the elements of the workflow are apparent in the provenance, e.g. how the challenge data files are identified in the provenance. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance data afterwards is left to each team to decide.
>
>
Each part is considered a workflow in its own right with regards to provenance data, so there will be three distinct sets of provenance data uploaded to the TWiki for each team. To aid translation, adequate supporting information should be given on how the elements of the workflow are apparent in the provenance, e.g. how the challenge data files are identified in the provenance. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance data afterwards is left to each team to decide.

The details of the workflow parts is given at the bottom of this page, but do not differ from the first challenge.
Added:
>
>
Specifically, the exported data should contain:
  • Documentation of the three parts of one run of the workflow as shown in the Workflow Parts section below.
  • Documentation of the three parts of one run of the workflow in the adaptation specified by Provenance Query 7, i.e. replacing the single convert procedure with two procedures, pgmtoppm then pnmtojpeg, in workflow Part 3.
  • The following annotations:
    • Anatomy Image 1, as used in the first workflow run, is annotated with key-value pair center=UChicago.
    • Anatomy Image 2, as used in the first workflow run, is annotated with key-value pairs center=southampton and studyModality=speech.

If the output of a team differs from that given above, including omissions of one or other piece of data, please make it clear in your data output.


Cross-Model Provenance Query

The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. Each team should download data for each of the three workflow parts, each part from a different originating team.

Changed:
<
<
They must then perform the queries from the first provenance challenge over the combined three parts, as if they had captured the provenance data themselves. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The queries are listed on the first provenance challenge page.
>
>
They must then perform the queries from the first provenance challenge over the combined three parts, as if they had captured the provenance data themselves. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The queries are listed in the Provenance Queries section below.

Teams should perform queries over as many combinations as they feel is adequate for fulfilling the challenge goals above, but must query over at least one other team's data to have completed the challenge. Additional credit goes to teams whose data has been successfully imported and queried over by others!

Line: 98 to 96

Third part of workflow
Added:
>
>

Provenance Queries

These are the same queries as in the first challenge, but more tightly specified to remove ambiguities.

  1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
  2. Find the process that led to Atlas X Graphic, excluding everything prior to softmean outputting the Atlas Image, i.e. the inputs, processing and outputs of align_warp and reslice, and the inputs and processing of softmean will be excluded.
  3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.
  4. Find all invocations of procedure align_warp that have ever occurred in the system using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.
  5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.
  6. Find all images ever output from softmean where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."
  7. A user has run the workflow twice on the same input files, in the second instance replacing each convert procedure in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.
  8. A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
  9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

META FILEATTACHMENT First.png attr="" comment="First part of workflow" date="1159872076" path="First.png" size="15515" user="SimonMiles" version="1.1"
META FILEATTACHMENT Second.png attr="" comment="Second part of workflow" date="1159872090" path="Second.png" size="6747" user="SimonMiles" version="1.1"
META FILEATTACHMENT Third.png attr="" comment="Third part of workflow" date="1159872103" path="Third.png" size="8500" user="SimonMiles" version="1.1"
 <<O>>  Difference Topic SecondProvenanceChallenge (r1.10 - 05 Dec 2006 - SimonMiles)

Second Provenance Challenge

Motivation

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.9 - 01 Dec 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 55 to 55

Cross-Model Provenance Query

Changed:
<
<
The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. A team should download data for each of the three workflow parts.
>
>
The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. Each team should download data for each of the three workflow parts, each part from a different originating team.

They must then perform the queries from the first provenance challenge over the combined three parts, as if they had captured the provenance data themselves. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The queries are listed on the first provenance challenge page.

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.8 - 29 Nov 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 74 to 74

  • Any data which was excluded in translation from a downloaded model because it was extraneous
  • Benchmark queries and results of applying their own suggested benchmarks to their own system
Changed:
<
<
Each team should, as before, construct a results page on the TWiki and upload the results above onto it. The template to follow for this challenge is given SecondChallengeTemplate?.
>
>
Each team should, as before, construct a results page on the TWiki and upload the results above onto it. The template to follow for this challenge is given here.

Workflow Parts

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.7 - 27 Nov 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Line: 16 to 16

participants met and discussed their results, the commonality of the problem being a point around which comparison could take place.
Changed:
<
<
While the first challenge had a large number of participants and led to valuable discussion about the aspects of provenance which were fundamental to all approaches, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data. It was decided that a second challenge, based on the first, was desirable. To better understand the commonalities and differences across different approaches to provenance, the focus of second challenge will be on interoperability.
>
>
While the first challenge had a large number of participants and led to valuable discussion about the aspects of provenance which were fundamental to all approaches, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data. It was decided that a second challenge, based on the first, was desirable. With the first challenge showing that there are multiple levels of granularity/types of provenance that may be relevant at different times or in different parts of a process, understanding interoperability becomes a key issue, and so this will be the focus of second challenge.

Changed:
<
<
One way in which the interoperability of the approaches could be tested would be to compose the workflow execution systems, each system executing a part of the workflow, and run provenance queries over the results. However, this would primarily present a challenge to workflow systems rather than approaches to provenance. Instead, we propose that teams share provenance data produced by their different systems, and then perform provenance queries over compositions of data from other teams, as if it had been produced by their own system.
>
>
One way in which the interoperability of the approaches could be tested would be to compose the workflow execution systems, each system executing a part of the workflow, and run provenance queries over the results. However, this would primarily present a challenge to workflow systems rather than approaches to provenance: the scientific value of provenance will come from tracking provenance across workflow runs, through manual processes, and across workflow/provenance systems. Instead, we propose that teams share provenance data produced by their different systems, and then perform provenance queries over compositions of data from other teams, as if it had been produced by their own system.

Through this approach, the second provenance challenge should encourage systematic conversions of data between systems, and so a reliable basis of comparison. Specifically, we hope to achieve the following goals.

  • Understand where data in one model is translatable to or has no parallel in another model.
Line: 43 to 41

Communicable Data Model

Changed:
<
<
The first phase of the challenge is to make available provenance data/process documentation for a workflow. As the focus of the challenge is share and combine data models, we need to make it easy for other teams to parse the data. Therefore, the data should be exported in XML. The schema should be provided and adequately described for others to use, either in the schema document, on the TWiki or in a referenced document.
>
>
The first phase of the challenge is to make available provenance data/process documentation for a workflow. As the focus of the challenge is to share and combine data models, we need to make it easy for other teams to parse the data. Therefore, the data should be exported in a well documented format, and a reference given to a free parser for that format. For consistency and availability of parsers, we strongly encourage (but do not require) teams to export their data in XML. The schema should be provided and adequately described for others to use, either in the schema document, on the TWiki or in a referenced document.

The second challenge is based on the same workflow as the first. However, it is now divided into three parts:

  • Part 1: align_warp and reslice (stages 1 and 2)
  • Part 2: softmean (stage 3)
  • Part 3: slicer and convert (stages 4 and 5)
Changed:
<
<
Each part is considered a workflow in its own right with regards to provenance data, so there will be three distinct sets of provenance data uploaded to the TWiki for each team. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance data afterwards is left to each team to decide.
>
>
Each part is considered a workflow in its own right with regards to provenance data, so there will be three distinct sets of provenance data uploaded to the TWiki for each team. To aid translation, adequate supporting information should be given on how the elements of the workflow are apparent in the provenance, e.g. how the challenge data files are identified in the provenance. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance data afterwards is left to each team to decide.

The details of the workflow parts is given at the bottom of this page, but do not differ from the first challenge.

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.6 - 10 Nov 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Changed:
<
<
The first provenance challenge was established following the IPAW 2006 workshop, where a range of papers were presented on the topic of provenance, and systems to support provenance being determined in application environments. The challenge was proposed as a means for the disparate groups to gain a better understanding of the similarities, differences, core concepts and common issues between systems. It took the form of a sample experiment, a workflow for an fMRI application, and a set of queries regarding the provenance of the experiment's results (provenance queries). The particpants all executed this same workflow and performed the same set of queries over the data collected regarding the execution. At the conclusion of the challenge, the participants met and discussed their results, the commonality of the problem being a point around which comparison could take place.

While the first challenge had a large number of participants and led to valuable discussion about the aspects of provenance which were fundamental to all approaches, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data. It was decided that a second challenge, based on the first, was desirable.

>
>
The first provenance challenge was established following the IPAW 2006 workshop, where a range of papers were presented that described provenance models and systems. The challenge was proposed as a means for the disparate groups to gain a better understanding of the similarities, differences, core concepts and common issues across systems. The challenge consisted of running a workflow for an fMRI application and answering a set of queries over the provenance derived. The participants all executed this same workflow and performed the same set of queries over the data collected regarding the execution. At the conclusion of the challenge, the participants met and discussed their results, the commonality of the problem being a point around which comparison could take place.

While the first challenge had a large number of participants and led to valuable discussion about the aspects of provenance which were fundamental to all approaches, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data. It was decided that a second challenge, based on the first, was desirable. To better understand the commonalities and differences across different approaches to provenance, the focus of second challenge will be on interoperability.


One way in which the interoperability of the approaches could be tested would be to compose the workflow execution systems, each system executing a part of the workflow, and run provenance queries over the results. However, this would primarily present a challenge to workflow systems rather than approaches to provenance. Instead, we propose that teams share provenance data produced by their different systems, and then perform provenance queries over compositions of data from other teams, as if it had been produced by their own system.

Through this approach, the second provenance challenge should encourage systematic conversions of data between systems, and so a reliable basis of comparison. Specifically, we hope to achieve the following goals.

Changed:
<
<
  • Understanding of where data in one model is translatable to or has no parallel in another model.
  • Understanding how the provenance of data can be traced across multiple systems, so adding value to all those systems.
  • Development of queries and other measures suitable for benchmarking provenance systems.
>
>
  • Understand where data in one model is translatable to or has no parallel in another model.
  • Understand how the provenance of data can be traced across multiple systems, so adding value to all those systems.

Changed:
<
<
The challenge is split into three phases, with the last two running in parallel. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.
>
>
The challenge is split into two phases. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase. Once the challenge is complete, we will make all provenance data, translation programs and queries available on our Web site. Our goal is to create a repository with multiple scenarios and query workloads that can serve as benchmarks for provenance systems.

Timetable

The timeline for the second provenance challenge is as follows:

Changed:
<
<
  • October 2006 Draft of challenge presented for discussion
>
>
  • November 2006 Draft of challenge presented for discussion

  • Start of December 2006 Challenge starts
  • End of January 2006 First phase of challenge ends
  • 25/26 June 2007 Challenge ends, workshop to discuss results
Line: 28 to 43

Communicable Data Model

Changed:
<
<
The first phase of the challenge is to make available provenance data/process documentation for a workflow. The format in which it is made available can be whatever is deemed most suitable but it should be possible to upload it to the TWiki and for others to download and parse it. The format should be adequately described for others to use, either on the TWiki or in a referenced document.
>
>
The first phase of the challenge is to make available provenance data/process documentation for a workflow. As the focus of the challenge is share and combine data models, we need to make it easy for other teams to parse the data. Therefore, the data should be exported in XML. The schema should be provided and adequately described for others to use, either in the schema document, on the TWiki or in a referenced document.

The second challenge is based on the same workflow as the first. However, it is now divided into three parts:

  • Part 1: align_warp and reslice (stages 1 and 2)
Line: 43 to 58

The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. A team should download data for each of the three workflow parts.

Changed:
<
<
They must then perform the queries from the first provenance challenge over the combined three parts, as if they had captured the provenance data themselves. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The query is the following:

Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.

>
>
They must then perform the queries from the first provenance challenge over the combined three parts, as if they had captured the provenance data themselves. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The queries are listed on the first provenance challenge page.

Teams should perform queries over as many combinations as they feel is adequate for fulfilling the challenge goals above, but must query over at least one other team's data to have completed the challenge. Additional credit goes to teams whose data has been successfully imported and queried over by others!

Benchmarks for Provenance Systems

Changed:
<
<
The third phase of the challenge, running in parallel with the second, is to propose, and demonstrate on your own system, queries with measurable qualities, so that provenance systems can be benchmarked against these queries.
>
>
We aim to build a repository with different scenarios and queries, where each scenario/query combination exercises different features of provenance systems. The goal would be to classify the different systems with respect to their ability to handle the different scenarios. This exercise will be informed by the results of the second provenance challenge, where significant differences in approach should become apparent through attempting to translate data models. will be discussed at the second challenge workshop and we hope it will continue from there.

Results

Line: 86 to 99

Third part of workflow
Deleted:
<
<
-- SimonMiles - 26 Oct 2006

META FILEATTACHMENT First.png attr="" comment="First part of workflow" date="1159872076" path="First.png" size="15515" user="SimonMiles" version="1.1"
META FILEATTACHMENT Second.png attr="" comment="Second part of workflow" date="1159872090" path="Second.png" size="6747" user="SimonMiles" version="1.1"
META FILEATTACHMENT Third.png attr="" comment="Third part of workflow" date="1159872103" path="Third.png" size="8500" user="SimonMiles" version="1.1"
 <<O>>  Difference Topic SecondProvenanceChallenge (r1.5 - 27 Oct 2006 - SimonMiles)

Second Provenance Challenge

Motivation

Changed:
<
<
In the first provenance challenge, many participants performed provenance queries over the same workflow, and this led to valuable discussion about the aspects of provenance which were fundamental to all approaches. However, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data.
>
>
The first provenance challenge was established following the IPAW 2006 workshop, where a range of papers were presented on the topic of provenance, and systems to support provenance being determined in application environments. The challenge was proposed as a means for the disparate groups to gain a better understanding of the similarities, differences, core concepts and common issues between systems. It took the form of a sample experiment, a workflow for an fMRI application, and a set of queries regarding the provenance of the experiment's results (provenance queries). The particpants all executed this same workflow and performed the same set of queries over the data collected regarding the execution. At the conclusion of the challenge, the participants met and discussed their results, the commonality of the problem being a point around which comparison could take place.

While the first challenge had a large number of participants and led to valuable discussion about the aspects of provenance which were fundamental to all approaches, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data. It was decided that a second challenge, based on the first, was desirable.


One way in which the interoperability of the approaches could be tested would be to compose the workflow execution systems, each system executing a part of the workflow, and run provenance queries over the results. However, this would primarily present a challenge to workflow systems rather than approaches to provenance. Instead, we propose that teams share provenance data produced by their different systems, and then perform provenance queries over compositions of data from other teams, as if it had been produced by their own system.

 <<O>>  Difference Topic SecondProvenanceChallenge (r1.4 - 26 Oct 2006 - SimonMiles)

Second Provenance Challenge

Changed:
<
<
The second provenance challenge builds on the first and focusses on the interoperability of the myriad existing approaches to provenance. Through exchanging and combining provenance produced by each others' systems, we hope to achieve the following goals.
>
>

Motivation

In the first provenance challenge, many participants performed provenance queries over the same workflow, and this led to valuable discussion about the aspects of provenance which were fundamental to all approaches. However, the queries and their expected results were weakly specified, and so interpreted differently by different groups. There was, therefore, no systematic way to compare capabilities of systems, including representations of provenance data.

One way in which the interoperability of the approaches could be tested would be to compose the workflow execution systems, each system executing a part of the workflow, and run provenance queries over the results. However, this would primarily present a challenge to workflow systems rather than approaches to provenance. Instead, we propose that teams share provenance data produced by their different systems, and then perform provenance queries over compositions of data from other teams, as if it had been produced by their own system.

Through this approach, the second provenance challenge should encourage systematic conversions of data between systems, and so a reliable basis of comparison. Specifically, we hope to achieve the following goals.


  • Understanding of where data in one model is translatable to or has no parallel in another model.
  • Understanding how the provenance of data can be traced across multiple systems, so adding value to all those systems.
Added:
>
>
  • Development of queries and other measures suitable for benchmarking provenance systems.

Changed:
<
<
The challenge is split into two phases. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.
>
>
The challenge is split into three phases, with the last two running in parallel. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.

Timetable


The timeline for the second provenance challenge is as follows:

  • October 2006 Draft of challenge presented for discussion
Changed:
<
<
  • Start of November 2006 Challenge starts
>
>
  • Start of December 2006 Challenge starts

  • End of January 2006 First phase of challenge ends
Changed:
<
<
  • June 2007 Challenge ends, workshop to discuss results
>
>
  • 25/26 June 2007 Challenge ends, workshop to discuss results

The workshop will be held in Monterey, California prior to HPDC 2007 there. Specific timing and agenda will be made available later.


Communicable Data Model

The first phase of the challenge is to make available provenance data/process documentation for a workflow. The format in which it is made available can be whatever is deemed most suitable but it should be possible to upload it to the TWiki and for others to download and parse it. The format should be adequately described for others to use, either on the TWiki or in a referenced document.

The second challenge is based on the same workflow as the first. However, it is now divided into three parts:

Changed:
<
<
  • align_warp and reslice (stages 1 and 2)
  • softmean (stage 3)
  • slicer and convert (stages 4 and 5)
>
>
  • Part 1: align_warp and reslice (stages 1 and 2)
  • Part 2: softmean (stage 3)
  • Part 3: slicer and convert (stages 4 and 5)

Changed:
<
<
Each part is considered a workflow in its own right with regards to provenance data, so there will be three sets of provenance data uploaded to the TWiki for each team. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance afterwards is left to each team to decide.
>
>
Each part is considered a workflow in its own right with regards to provenance data, so there will be three distinct sets of provenance data uploaded to the TWiki for each team. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance data afterwards is left to each team to decide.

The details of the workflow parts is given at the bottom of this page, but do not differ from the first challenge.

Cross-Model Provenance Query

Changed:
<
<
The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. A team should download data for each of the three workflow parts, possibly including their own for one or two parts in the first instance.
>
>
The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. A team should download data for each of the three workflow parts.

Changed:
<
<
They must then perform Query 1 from the first provenance challenge over the combined three parts, as if they had captured the provenance data themsevles. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The query is the following:
>
>
They must then perform the queries from the first provenance challenge over the combined three parts, as if they had captured the provenance data themselves. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The query is the following:

Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.

Changed:
<
<
Teams should perform queries over as many combinations as they feel is adequate for fulfilling the challenge goals above, but must query over at least one other team's data to have completed the challenge.
>
>
Teams should perform queries over as many combinations as they feel is adequate for fulfilling the challenge goals above, but must query over at least one other team's data to have completed the challenge. Additional credit goes to teams whose data has been successfully imported and queried over by others!

Benchmarks for Provenance Systems

The third phase of the challenge, running in parallel with the second, is to propose, and demonstrate on your own system, queries with measurable qualities, so that provenance systems can be benchmarked against these queries.


Results

Line: 43 to 58

  • Details regarding how data models were translated (or otherwise used to answer the query following the team's approach)
  • Any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query
  • Any data which was excluded in translation from a downloaded model because it was extraneous
Added:
>
>
  • Benchmark queries and results of applying their own suggested benchmarks to their own system

Each team should, as before, construct a results page on the TWiki and upload the results above onto it. The template to follow for this challenge is given SecondChallengeTemplate?.


Workflow Parts

Line: 66 to 84

Third part of workflow
Added:
>
>
-- SimonMiles - 26 Oct 2006

META FILEATTACHMENT First.png attr="" comment="First part of workflow" date="1159872076" path="First.png" size="15515" user="SimonMiles" version="1.1"
META FILEATTACHMENT Second.png attr="" comment="Second part of workflow" date="1159872090" path="Second.png" size="6747" user="SimonMiles" version="1.1"
META FILEATTACHMENT Third.png attr="" comment="Third part of workflow" date="1159872103" path="Third.png" size="8500" user="SimonMiles" version="1.1"
 <<O>>  Difference Topic SecondProvenanceChallenge (r1.3 - 03 Oct 2006 - SimonMiles)

Second Provenance Challenge

Changed:
<
<
The second provenance challenge builds on the first and focusses on the interoperability of the myriad existing approaches to provenance. Through exchanging and combining provenance produced by each others' systems, we hope to achieve the following.
>
>
The second provenance challenge builds on the first and focusses on the interoperability of the myriad existing approaches to provenance. Through exchanging and combining provenance produced by each others' systems, we hope to achieve the following goals.

  • Understanding of where data in one model is translatable to or has no parallel in another model.
  • Understanding how the provenance of data can be traced across multiple systems, so adding value to all those systems.
Changed:
<
<
The challenge is split into two phase. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.
>
>
The challenge is split into two phases. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.

The timeline for the second provenance challenge is as follows:

  • October 2006 Draft of challenge presented for discussion
  • Start of November 2006 Challenge starts
  • End of January 2006 First phase of challenge ends
  • June 2007 Challenge ends, workshop to discuss results

Communicable Data Model

Line: 18 to 24

Each part is considered a workflow in its own right with regards to provenance data, so there will be three sets of provenance data uploaded to the TWiki for each team. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance afterwards is left to each team to decide.

Changed:
<
<

Across-Model Provenance Query

>
>
The details of the workflow parts is given at the bottom of this page, but do not differ from the first challenge.

Cross-Model Provenance Query


The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. A team should download data for each of the three workflow parts, possibly including their own for one or two parts in the first instance.

Changed:
<
<
They must then perform Query 1 from the first provenance challenge over the combined three parts, as if they had captured the provenance data themsevles. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models.
>
>
They must then perform Query 1 from the first provenance challenge over the combined three parts, as if they had captured the provenance data themsevles. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models. The query is the following:

Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.

Teams should perform queries over as many combinations as they feel is adequate for fulfilling the challenge goals above, but must query over at least one other team's data to have completed the challenge.


Results

Line: 30 to 42

  • Which combinations of models they have managed to perform the provenance query over
  • Details regarding how data models were translated (or otherwise used to answer the query following the team's approach)
  • Any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query
Changed:
<
<
  • Any data which was excluded in translation from a downloaded model
>
>
  • Any data which was excluded in translation from a downloaded model because it was extraneous

Workflow Parts

The details of the workflow are given on the first provenance challenge page. However, the workflow is now split into three parts, depicted graphically here. Click on workflow images for high resolution versions.

Part 1

This part is the reslicing of images to fit one reference image. It includes two stages: align_warp and reslice.

First part of workflow

Part 2

This part is the averaging of brain images into one. It includes one stage: softmean.

Second part of workflow

Part 3

This part is the conversion of the averaged image into three graphics files showing slices of that brain. It includes two stages: slicer and convert.


Changed:
<
<
-- SimonMiles - 03 Oct 2006
>
>
Third part of workflow

Added:
>
>
META FILEATTACHMENT First.png attr="" comment="First part of workflow" date="1159872076" path="First.png" size="15515" user="SimonMiles" version="1.1"
META FILEATTACHMENT Second.png attr="" comment="Second part of workflow" date="1159872090" path="Second.png" size="6747" user="SimonMiles" version="1.1"
META FILEATTACHMENT Third.png attr="" comment="Third part of workflow" date="1159872103" path="Third.png" size="8500" user="SimonMiles" version="1.1"
META FILEATTACHMENT First.pdf attr="" comment="First part of workflow (hi-res)" date="1159872448" path="First.pdf" size="94227" user="SimonMiles" version="1.1"
META FILEATTACHMENT Second.pdf attr="" comment="Second part of workflow (hi-res)" date="1159872465" path="Second.pdf" size="37064" user="SimonMiles" version="1.1"
META FILEATTACHMENT Third.pdf attr="" comment="Third part of workflow (hi-res)" date="1159872479" path="Third.pdf" size="34439" user="SimonMiles" version="1.1"
 <<O>>  Difference Topic SecondProvenanceChallenge (r1.2 - 03 Oct 2006 - SimonMiles)
Changed:
<
<

Second Provenance Challenge

>
>

Second Provenance Challenge


Added:
>
>
The second provenance challenge builds on the first and focusses on the interoperability of the myriad existing approaches to provenance. Through exchanging and combining provenance produced by each others' systems, we hope to achieve the following.
  • Understanding of where data in one model is translatable to or has no parallel in another model.
  • Understanding how the provenance of data can be traced across multiple systems, so adding value to all those systems.

Changed:
<
<
  • Revise workflow with 3 parts
  • Each team to export provenance information about each workflow part
  • Each team to import some provenance information from other system as if they had captured the information themselves
  • Run queries over aggregate provenance information
>
>
The challenge is split into two phase. Each team should create a TWiki page for their second challenge results, and inform others when they complete each phase.

Changed:
<
<
-- LucMoreau - 14 Sep 2006
>
>

Communicable Data Model

The first phase of the challenge is to make available provenance data/process documentation for a workflow. The format in which it is made available can be whatever is deemed most suitable but it should be possible to upload it to the TWiki and for others to download and parse it. The format should be adequately described for others to use, either on the TWiki or in a referenced document.

The second challenge is based on the same workflow as the first. However, it is now divided into three parts:

  • align_warp and reslice (stages 1 and 2)
  • softmean (stage 3)
  • slicer and convert (stages 4 and 5)

Each part is considered a workflow in its own right with regards to provenance data, so there will be three sets of provenance data uploaded to the TWiki for each team. Whether gathering this data is best done by running three separate workflows or by running one and splitting the provenance afterwards is left to each team to decide.

Across-Model Provenance Query

The second phase of the challenge is for each team to use their approach to combine provenance data produced in the first phase by multiple other teams and use their approach to query over it. A team should download data for each of the three workflow parts, possibly including their own for one or two parts in the first instance.

They must then perform Query 1 from the first provenance challenge over the combined three parts, as if they had captured the provenance data themsevles. This is likely to involve, for most teams, a first step of translating the downloaded data into their own models.

Results

Teams should report the following results for the challenge:

  • Which combinations of models they have managed to perform the provenance query over
  • Details regarding how data models were translated (or otherwise used to answer the query following the team's approach)
  • Any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query
  • Any data which was excluded in translation from a downloaded model

-- SimonMiles - 03 Oct 2006


 <<O>>  Difference Topic SecondProvenanceChallenge (r1.1 - 14 Sep 2006 - LucMoreau)
Line: 1 to 1
Added:
>
>

Second Provenance Challenge

  • Revise workflow with 3 parts
  • Each team to export provenance information about each workflow part
  • Each team to import some provenance information from other system as if they had captured the information themselves
  • Run queries over aggregate provenance information

-- LucMoreau - 14 Sep 2006

Revision r1.1 - 14 Sep 2006 - 16:56 - LucMoreau
Revision r1.15 - 29 Apr 2007 - 19:40 - SimonMiles