NCSA Provenance Challenge, CyberIntegrator? | ||||||||||
| Line: 494 to 495 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
| Added: | ||||||||||
| > > |
| |||||||||
NCSA Provenance Challenge, CyberIntegrator? | ||||||||
| Line: 12 to 12 | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
| Added: | ||||||||
| > > |
| |||||||
Workflow implementation and provenance traceA detailed narrative and provenance trace are attached. In the case of the example workflow, the task was run interactively rather than in batch mode; in other words there is no trace of the workflow structure other than the execution trace. | ||||||||
| Added: | ||||||||
| > > |
| |||||||
NCSA Provenance Challenge, CyberIntegrator? | ||||||||
NCSA Provenance Challenge, CyberIntegrator? | ||||||||
| Line: 16 to 16 | ||||||||
|---|---|---|---|---|---|---|---|---|
A detailed narrative and provenance trace are attached. In the case of the example workflow, the task was run interactively rather than in batch mode; in other words there is no trace of the workflow structure other than the execution trace.
How provenance was capturedCyberIntegrator? is instrumented to push triples via JDBC to an intermediate Oracle store where they are harvested | ||||||||
| Changed: | ||||||||
| < < |
into multiple Kowari servers. This is a completely different way of getting the triples into Kowari than we employed for the D2K case. | |||||||
| > > |
into multiple Kowari servers. This is a completely different way of getting the triples into Kowari than we employed for the NcsaD2k case. | |||||||
Provenance queries#1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. | ||||||||
| Line: 362 to 362 | ||||||||
#7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. | ||||||||
| Added: | ||||||||
| > > |
Not sure what this means. Graph diffs can be computed between the compute nodes and input/output edges, or statistics profiling distribution of execution times across the runs, parameters could be compared, etc. | |||||||
#8: A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.We can annotate any input or output with a pathname we recognize as one of the anatomy images. To find the inputs/outputs associated with "anatomy1.hdr" or "anatomy3.hdr" (remember, only the headers are represented as inputs), we can do this query: | ||||||||
NCSA Provenance Challenge, CyberIntegrator? | ||||||||
| Line: 13 to 13 | ||||||||
|---|---|---|---|---|---|---|---|---|
Workflow implementation and provenance trace | ||||||||
| Changed: | ||||||||
| < < |
A detailed narrative and provenance trace are attached. | |||||||
| > > |
A detailed narrative and provenance trace are attached. In the case of the example workflow, the task was run interactively rather than in batch mode; in other words there is no trace of the workflow structure other than the execution trace. | |||||||
How provenance was capturedCyberIntegrator? is instrumented to push triples via JDBC to an intermediate Oracle store where they are harvested into multiple Kowari servers. This is a completely different way of getting the triples into Kowari than | ||||||||
| Changed: | ||||||||
| < < |
#1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. | |||||||
| > > |
NCSA Provenance Challenge, CyberIntegrator?Participating Team
Workflow implementation and provenance traceA detailed narrative and provenance trace are attached.How provenance was capturedCyberIntegrator? is instrumented to push triples via JDBC to an intermediate Oracle store where they are harvested into multiple Kowari servers. This is a completely different way of getting the triples into Kowari than we employed for the D2K case.Provenance queries#1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. | |||||||
| To do this we need transitive closure on the property of one step having as input the output of another step, which we'll call | ||||||||
| Line: 76 to 96 | ||||||||
|---|---|---|---|---|---|---|---|---|
| $s $p $o)); | ||||||||
| Changed: | ||||||||
| < < |
#2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean. | |||||||
| > > |
#2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean. | |||||||
| In the execution trace, steps are annotated with a property indicating what function was run. To find the step that executed softmean, we need to find one having the property "function=Prov3(softMean)". We can do this with the following query: | ||||||||
| Line: 161 to 181 | ||||||||
| $s $p $o)); | ||||||||
| Changed: | ||||||||
| < < |
#3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic. | |||||||
| > > |
#3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic. | |||||||
| CyberIntegrator? doesn't have a concept of workflow "stages," so our knowledge of the strategy the author used is external to CyberIntegrator? and we need to add that information as annotations. We can characterize the stages as follows: stage 3 is the softmean stage, stage 4 is the slicer stage, and stage 5 is the convert stage. | ||||||||
| Line: 208 to 228 | ||||||||
| $s $p $o)); | ||||||||
| Changed: | ||||||||
| < < |
#4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday. | |||||||
| > > |
#4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday. | |||||||
| We can't quite answer this query from the execution trace data because in that data, the command line arguments are not separated from one another but appear together. But our technique would be no different if they had been split up. So for the purposes of the challenge, we will search for "-m 12 -q" instead of "-m 12". Strangely, in the execution trace the property holding the command line arguments is called "name" and it's the parameter, not the property, that identifies it as a command-line option. | ||||||||
| Line: 231 to 251 | ||||||||
|
$execution | ||||||||
| Changed: | ||||||||
| < < |
#5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility. | |||||||
| > > |
#5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility. | |||||||
| The workflow doesn't run scanheader (that's not part of the example workflow, so we didn't add it to our workflow). However it does identify header files as inputs so we can extract the values if we have the file data handy, and add nodes to the execution trace containing header keys and values. | ||||||||
| Line: 315 to 335 | ||||||||
| (which returns all three atlas graphic images). | ||||||||
| Changed: | ||||||||
| < < |
#6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." | |||||||
| > > |
#6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." | |||||||
| We can answer this query by combining the conditions in the query with traversing the transitive closure of the precedence predicate (see #1). | ||||||||
| Line: 340 to 360 | ||||||||
|
$prop3 | ||||||||
| Changed: | ||||||||
| < < |
#7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. | |||||||
| > > |
#7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. | |||||||
| Changed: | ||||||||
| < < |
#8: A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago. | |||||||
| > > |
#8: A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago. | |||||||
| We can annotate any input or output with a pathname we recognize as one of the anatomy images. To find the inputs/outputs associated with "anatomy1.hdr" or "anatomy3.hdr" (remember, only the headers are represented as inputs), we can do this query: | ||||||||
| Line: 383 to 403 | ||||||||
|
$ann | ||||||||
| Changed: | ||||||||
| < < |
#9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. | |||||||
| > > |
#9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. | |||||||
| From the workflow, we can infer that if a file is the output of "convert", it's an atlas graphic. That amounts to this query: | ||||||||
| Line: 465 to 485 | ||||||||
| ||||||||
| Added: | ||||||||
| > > |
| |||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Added: | ||||||||
| > > |
#1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.To do this we need transitive closure on the property of one step having as input the output of another step, which we'll call "precedence". Kowari can only compute transitive closure per-predicate, so this needs to be collapsed into a single predicate as follows:insert select $this <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $next from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $this <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $next <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $out into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;This query starts with the step that outputted atlas x graphic and finds all preceding modules. Note that because of how CyberIntegrator? represents inputs and outputs, we can only match on pathnames, not on true file identity. Looking at the #hasFilename predicate, we see that there's a file named "D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif". We can walk the provenance graph back from this file to get all the steps that preceded it in the workflow: select $step from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif') or ($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end)));To describe the process, we can return all triples on those steps: select $step $p $o from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where (($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif') or ($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end)))) and $step $p $o;This is informative, but it's more informative when property key/value pairs are also returned. Unlike D2K, CyberIntegrator? groups properties into higher-level structures called parameters, so we have to walk a little further in the graph from the steps that we find. Here we exploit the fact that parameters are not shared between steps to simplify our query. select $s $p $o from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where (($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif') or ($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end)))) and (($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $s <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $s $p $o) or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $s and $s $p $o) or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $s and $s $p $o)); #2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.In the execution trace, steps are annotated with a property indicating what function was run. To find the step that executed softmean, we need to find one having the property "function=Prov3(softMean)". We can do this with the following query:select $step from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)';Using the transitive closure of the #precedes predicate, we can find all following steps: select $step from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and $avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean and ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step or trans ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step));Now we can constrain it as in query #1, to capture which of those modules contributed to Atlas X Graphic: select $step from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and $avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean and ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step or trans ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step)) and ( ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif') or ($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end))) );Now we can get all triples on these steps and their properties, as in #1: select $s $p $o from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and $avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean and ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step or trans ($avantSoftmean <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $step)) and ( ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif') or ($end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\atlas-X.gif' and ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end or trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end))) ) and (($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and $s <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and $s $p $o) or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $s and $s $p $o) or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param3 and $param3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $s and $s $p $o)); #3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.CyberIntegrator? doesn't have a concept of workflow "stages," so our knowledge of the strategy the author used is external to CyberIntegrator? and we need to add that information as annotations. We can characterize the stages as follows: stage 3 is the softmean stage, stage 4 is the slicer stage, and stage 5 is the convert stage. The following query adds a predicate to all the stage 3, 4, and 5 steps describing which stage they're in. The query keys on the function property, because in the example workflow the function property is sufficient to identify the steps. This is of course not true in the general case.insert select $step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> $stage from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and $stage <http://tucana.org/tucana#is> '3') or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov4(3DSlice)' and $stage <http://tucana.org/tucana#is> '4') or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and $stage <http://tucana.org/tucana#is> '5') into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;this query retrieves the ids of all the modules in steps 3, 4, and 5, and the statements and properties associated with them: select $s $p $o from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where ($step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> '3' or $step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> '4' or $step <http://ecid.ncsa.uiuc.edu/md/mwf#inStage> '5') and (($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $s <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $s $p $o) or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $s and $s $p $o) or ($step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param and $param <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $s and $s $p $o)); #4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.We can't quite answer this query from the execution trace data because in that data, the command line arguments are not separated from one another but appear together. But our technique would be no different if they had been split up. So for the purposes of the challenge, we will search for "-m 12 -q" instead of "-m 12". Strangely, in the execution trace the property holding the command line arguments is called "name" and it's the parameter, not the property, that identifies it as a command-line option. iTQL doesn't support date arithmetic, so here we'll just match the align_warps with the given options and return timestamps along with them:select $step $start $end from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and $param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameterName> 'optionString' and $param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop2 and $prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'name' and $prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> '-m 12 -q' and $execution <http://ecid.ncsa.uiuc.edu/md/mwf#executionOf> $step and $execution <http://ecid.ncsa.uiuc.edu/md/mwf#startOn> $start and $execution <http://ecid.ncsa.uiuc.edu/md/mwf#endOn> $end; #5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.The workflow doesn't run scanheader (that's not part of the example workflow, so we didn't add it to our workflow). However it does identify header files as inputs so we can extract the values if we have the file data handy, and add nodes to the execution trace containing header keys and values. CyberIntegrator? identifies objects that are used as inputs and outputs, and associates those objects with pathnames. This query will get us the pathnames that are associated with inputs to warp_align. In this implementation of the workflow, only the header files are given as inputs:select $input $path from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $input and $input <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $path;Given the output of this query, we can scan the header files and produce RDF describing them with the following Perl script. Note that the script has to know a priori how to map local paths to the ones used in the workflow description. A better solution would be if each dataset had a globally unique id independent of where it is physically stored.
#!/usr/bin/perl
$AIR_BIN="../AIR/bin";
$LOCAL_DATA_DIR="../data";
$WORKFLOW_DATA_DIR="D:\\\\\\\\sclee\\\\\\\\Provenance\\\\\\\\output\\\\\\\\prov-chal6\\\\\\\\";
$ix=1;
while(<>) {
chomp;
($input,$workflowFile) = split /\t/;
($localFile = $workflowFile) =~ s/^"${WORKFLOW_DATA_DIR}(.*)"/${LOCAL_DATA_DIR}\/\1/;
open S,"${AIR_BIN}/scanheader $localFile |";
while(<S>) {
chomp;
next if /^$/;
($name,$value) = split /=/;
$header="http://ecid.ncsa.uiuc.edu/md/mwf#header_${ix}";
print "<${input}> <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeader> <${header}>\n";
print "<${header}> <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderName> '${name}'\n";
print "<${header}> <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderValue> '${value}'\n";
$ix++;
}
}
The script generates the following output which can be inserted directly into Kowari (note that this is formatted in iTQL and not a standard RDF serialization):
Now we can do the query ("Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095"). We know that align_warp takes anatomy images as inputs, so we can look at those inputs to see if they have matching header values, and find the associated steps:
select $step from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderName> 'global maximum' and $header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderValue> '4095' and $in <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeader> $header and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $in and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)';Now we need to find all the atlas graphic images resulting from any of these modules. We walk from the modules with the files-of-interest as inputs until we hit a "convert" step, which we know has an atlas graphic as an output: select $pathname from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderName> 'global maximum' and $header <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeaderValue> '4095' and $in <http://ecid.ncsa.uiuc.edu/md/mwf#hasHeader> $header and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $in and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and trans ($step <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $end) and $end <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and $param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop2 and $prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and $end <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname;(which returns all three atlas graphic images). #6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."We can answer this query by combining the conditions in the query with traversing the transitive closure of the precedence predicate (see #1).select $softmean $alignWarp $pathname from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov3(softMean)' and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $softmean <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname and trans ($alignWarp <http://ecid.ncsa.uiuc.edu/md/mwf#precedes> $softmean) and $alignWarp <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param2 and $param2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop2 and $prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and $alignWarp <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param3 and $param3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameterName> 'optionString' and $param3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop3 and $prop3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'name' and $prop3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> '-m 12 -q'; #7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.#8: A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.We can annotate any input or output with a pathname we recognize as one of the anatomy images. To find the inputs/outputs associated with "anatomy1.hdr" or "anatomy3.hdr" (remember, only the headers are represented as inputs), we can do this query:select $io from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $io <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\anatomy1.hdr' or $io <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> 'D:\\sclee\\Provenance\\output\\prov-chal6\\anatomy3.hdr';Given the output of this query, we can insert annotations. For example: insert <http://ecid.ncsa.uiuc.edu/md/mwf#data_e74eb542-264d-4a23-8406-0a1aa8a4ef95> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1 $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'center' $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'UChicago' <http://ecid.ncsa.uiuc.edu/md/mwf#data_2820fa46-d2c7-416e-b01c-df80db3ab63a> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann2 $ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'center' $ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'UChicago' into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;Now we can perform the query: select $out $pathname from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov1(warp)' and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasInput> $in and $in <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann and $ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'center' and $ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'UChicago'; #9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.From the workflow, we can infer that if a file is the output of "convert", it's an atlas graphic. That amounts to this query:select $out $pathname from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and $param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and $prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and $step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and $out <http://ecid.ncsa.uiuc.edu/md/mwf#hasFilename> $pathname;Now we need to add the annotations, using the same strategy as query #8: For atlas x: insert <http://ecid.ncsa.uiuc.edu/md/mwf#data_7c9e83bf-5bf0-4ec8-8cce-90e0cc9b0aeb> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1 $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality' $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'speech' <http://ecid.ncsa.uiuc.edu/md/mwf#data_7c9e83bf-5bf0-4ec8-8cce-90e0cc9b0aeb> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann2 $ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'foo' $ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'bar' <http://ecid.ncsa.uiuc.edu/md/mwf#data_7c9e83bf-5bf0-4ec8-8cce-90e0cc9b0aeb> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann3 $ann3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'foo' $ann3 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'quux' into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;For atlas y: insert <http://ecid.ncsa.uiuc.edu/md/mwf#data_282191c5-319a-4eb1-8a6f-d35ff34cc02a> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1 $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality' $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'tactile' <http://ecid.ncsa.uiuc.edu/md/mwf#data_282191c5-319a-4eb1-8a6f-d35ff34cc02a> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann2 $ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'foo' $ann2 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'fnord' into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;For atlas z: insert <http://ecid.ncsa.uiuc.edu/md/mwf#data_c3cc7405-a148-4c17-93a0-4b2ec1c36c7d> <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann1 $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality' $ann1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'visual' into <rmi://badger.ncsa.uiuc.edu/server1#cipc>;Now we can perform the query. The subquery produces a nested table which groups the annotation key/value pairs by which output they're associated with.
select $out
subquery(select $name $value
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $otherAnn and
$otherAnn <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> $name and
$otherAnn <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> $value)
from <rmi://badger.ncsa.uiuc.edu/server1#cipc> where
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasParameter> $param1 and
$param1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasProperty> $prop1 and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyName> 'function' and
$prop1 <http://ecid.ncsa.uiuc.edu/md/mwf#hasPropertyValue> 'Prov5(convert)' and
$step <http://ecid.ncsa.uiuc.edu/md/mwf#hasOutput> $out and
$out <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotation> $ann and
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationName> 'studyModality' and
($ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'speech' or
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'audio' or
$ann <http://ecid.ncsa.uiuc.edu/md/mwf#hasAnnotationValue> 'visual');
-- JoeFutrelle - 12 Sep 2006
| |||||||