|
|
< < |
Provenance Challenge
|
> > |
First Provenance Challenge
|
|
Aims
|
|
- The capabilities of each system in answering provenance-related queries
- What each system considers to be within scope of the topic of provenance (regardless of whether the system can yet achieve all problems in that scope)
|
< < |
To help achieve the aims, we define a single example workflow that forms the basis of the challenge. It is inspired from a real experiment, in the area of Functional Magnetic Resonance Imaging (fMRI). Here, we use the term workflow to denote a series of procedures being performed in a system, each taking some data as input and producing other data as output. We do not assume that these procedures must use some particular form of technology (EXE files, Web Services etc.) or that the workflow is explicitly defined in a workflow technology (BPEL, compiled executable, Scufl, batch file etc.), but individual participants will adopt their technology of choice.
|
> > |
To help achieve the aims, we define a simple example workflow that forms the basis of the challenge. It is inspired from a real experiment, in the area of Functional Magnetic Resonance Imaging (fMRI). Here, we use the term workflow to denote a series of procedures being performed in a system, each taking some data as input and producing other data as output. We do not assume that these procedures must use some particular form of technology (EXE files, Web Services etc.) or that the workflow is explicitly defined in a workflow technology (BPEL, compiled executable, Scufl, batch file etc.), but individual participants will adopt their technology of choice.
|
|
|
< < |
Our focus in this challenge is on provenance and not on running the experiment. Hence, to facilitate take-up, while based on a real experiment, the procedures can be implemented as "dummies", i.e. we provide the input, output and intermediate data and participants can use fake procedures that take the right input and produce the right output. Alternatively, if participants wish to enact the real workflow, this is also possible. In addition to this, we define a set of core queries that all partipicipants should show how they address, so we can compare systems.
|
> > |
Our focus in this challenge is on provenance and not on running the experiment. Hence, to facilitate take-up, while based on a real experiment, the procedures can be implemented as "dummies", i.e. we provide the input, output and intermediate data and participants can use fake procedures that take the right input and produce the right output. Alternatively, participants can actually execute the real workflow after installing the necessary libaries. In addition to this, we define a set of core queries that all partipicipants should show how they address, so we can compare systems.
|
|
|
< < |
Each participant in the challenge will have its own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge. During the provenance challenge, we expect the participants to upload the following to their page, to then allow comparison.
|
> > |
Each participant in the challenge will have their own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge. During the provenance challenge, we expect the participants to upload the following to their page, to then allow comparison.
- Representations of the workflow in their system
|
|
- Representations of provenance for the example workflow
- Representations of the result of the core (and other) queries
|
< < |
- Contributions to a matrix of queries vs systems, indicating for each that: (1) the query can be answered by the system, (2) the system cannot answer the query now but considers it relevant, (3) the query is not relevant to the project.
|
> > |
- Contributions to a matrix of queries vs systems, indicating for each that: (1) the query can be answered by the system, (2) the system cannot answer the query now but considers it relevant, (3) the query is not relevant to the project.
|
|
Optionally, the participants may like to contribute the following.
- Additional queries (beyond the core queries) that illustrate the scope of their system
|
< < |
- Representations of the workflow in their system
|
|
- Extensions to the example workflow to best illustrate the unique aspects of their system
- Any categorisation of queries that the project considers to have practical value
Example Workflow
|
< < |
We propose an example workflow for creating population-based "brain atlases" from the fMRI Data Center's archive of high resolution anatomical data. The workflow is shown below (click for larger image).
|
> > |
We propose an example workflow for creating population-based "brain atlases" from the fMRI Data Center's archive of high resolution anatomical data. The workflow is shown below (click for a pdf version of the image).
|
|
|
|
|
< < |
It is comprised of procedures, shown as orange ovals, and data items flowing between them, shown as rectangles. It can be seen as five stages, where each stage is depicted as a horizontal row of the same procedure in the figure (the term stage is introduced only to help description of the workflow, and we do not dictate how it is apparent in a concrete implementation). The procedures employ the AIR (automated image registration) suite to create an averaged brain from a collection of high resolution anatomical data. In addition to the data items shown in the figure, there are other inputs to procedures (constant string options), defined below.
|
> > |
It is comprised of procedures, shown as orange ovals, and data items flowing between them, shown as rectangles. It can be seen as five stages, where each stage is depicted as a horizontal row of the same procedure in the figure. Note that the term stage is introduced only to help description of the workflow, and we do not dictate how it is apparent in a concrete implementation. The procedures employ the AIR (automated image registration) suite to create an averaged brain from a collection of high resolution anatomical data. In addition to the data items shown in the figure, there are other inputs to procedures (constant string options), defined below.
|
|
|
< < |
The inputs to a workflow are a set of new brain images (Anatomy Image 1 to 4) and a single reference brain image (Reference Image). All input images are 3D scans of a brain of varying resolutions, so that different features are evident. For each image, there is the actual image and the metadata information for that image (Anatomy Header 1 to 4). The image data was published with article Accession Number: 2-2004-1168X Frontal-Hippocampal Double Dissociation Between Normal Aging and Alzheimer's Disease by Head, D, Synder, AZ, Girton, LE, Morris, JC, Buckner, RL in the fMRI Data Center.
|
> > |
The inputs to a workflow are a set of new brain images (Anatomy Image 1 to 4) and a single reference brain image (Reference Image). All input images are 3D scans of a brain of varying resolutions, so that different features are evident. For each image, there is the actual image and the metadata information for that image (Anatomy Header 1 to 4). The image data was published with article Frontal-Hippocampal Double Dissociation Between Normal Aging and Alzheimer's Disease by Head, D, Synder, AZ, Girton, LE, Morris, JC, Buckner, RL in the fMRI Data Center Accession Number: 2-2004-1168X.
|
|
The stages of the workflow are as follows.
- For each new brain image, align_warp compares the reference image to determine how the new image should be warped, i.e. the position and shape of the image adjusted, to match the reference brain. The output of each procedure in the stage is a _warp parameter set_ defining the spatially transformation to be performed (Warp Params 1 to 4).
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
|
< < |
|
> > |
|
|
Core Provenance Queries
|
|
- Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was g
|