Provenance Challenge Template | |||||||||||||||||||||||||||||||
| Line: 28 to 29 | |||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Upload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.) | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
Sa sample log of the provenance activities generated by the workflow/services is shown here notifications.xml. The Karma Service API supports 2 kinds of provenance retrieval: Data Provenance and Process Provenance. It also supports variations of these that can retrieve RecursiveDataProvenance?, DataUsage?, and WorkflowTrace?. Results of these provenance queries on the given workflow are shown here: | ||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
| Line: 38 to 43 | |||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
These query APIs form the building blocks for constructing the different "canonical" provenance queries in the challenge. Karma does not provide extensive support for annotations at the level of data products. We take the approach that the provenance system is not a generic metadata management system and should be focused mainly on storing and retreiving provenance. In the LEAD project where Karma is used, queries over generic data product metadata and provenance are achieved by pushing the provenance into the metadata for the data product and allow the MyLEAD metadata management system to answer the "join" queries. Limited support for queries over annotations is present and has been used to answer the challenge queries that include annotations (except for #9). Some of them has required us to query the provenance service's backend relational database, since support for queries over annotation is not present through the service API yet. | ||||||||||||||||||||||||||||||
Provenance Queries | |||||||||||||||||||||||||||||||
| Line: 47 to 55 | |||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||
| > > |
| ||||||||||||||||||||||||||||||
1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. | |||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||
| < < |
The getRecursiveDataProvenance API provided by the Karma provenance service allows the retreival of the entire data provenance history of a data product. Invoking that method with the data product ID of Atlas X Graphic (in this case, 'lead:uuid:1157946992-atlas-x.gif') returns the complete process that led to its creation. The result of the provenance query is shown in recursive_data_provenance.xml. | ||||||||||||||||||||||||||||||
| > > |
The getRecursiveDataProvenance API provided by the Karma provenance service allows the retrieval of the entire data provenance history of a data product. Invoking that method with the data product ID of Atlas X Graphic (in this case, 'lead:uuid:1157946992-atlas-x.gif') returns the complete process that led to its creation. The result of the provenance query is shown in recursive_data_provenance.xml. | ||||||||||||||||||||||||||||||
2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.This query is performed by the client by first invoking the getDataProvenance method on the Karma provenance service to retreive the immediate data provenance for Atlas X Graphic. The client then recursively calls getDataProvenance to get move up the provenance tree until the SoftmeanService is encountered in the data provenance results. The pseudo-code for the client looks like this: | |||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||
| < < |
1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif'] | ||||||||||||||||||||||||||||||
| > > |
PrintRecursiveDataProvenanceUntil?('lead:uuid:1157946992-atlas-x.gif', 'urn:qname:...:SoftmeanService'); void PrintRecursiveDataProvenanceUntil?(DataProductID? dataProduct, URI processID) 1. let $dataList := [dataProduct] | ||||||||||||||||||||||||||||||
2. while ($dataList = empty) do
| |||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||
| < < |
c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop. | ||||||||||||||||||||||||||||||
| > > |
c. if ($dataProvenance.getProducedBy() == processID) break; // found Softmean. Stop. | ||||||||||||||||||||||||||||||
d. foreach ($inputData in $dataProvenance.getUsingData()) do
// get input data used by this data product. recurse up the tree using iteration
| |||||||||||||||||||||||||||||||
| Line: 78 to 91 | |||||||||||||||||||||||||||||||
4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
The Karma provenance service is primarilly intended as a provenance recording and querying system, and only has limited capabiltiy for recording generic metadata and annotations. Provenance activities can have annotations and relevant activities also contain the messages that were exchanged by service and client to perform an operation. These activities are recorded in a relational database and free text queries are possible on the annotations using SQL queries. Direct SQL queries is currently not exposed to the client but provenance service has the capability to answer these queries as follows:
| ||||||||||||||||||||||||||||||
5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility. | |||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||
| < < |
6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." | ||||||||||||||||||||||||||||||
| > > |
In the workflow we execute, the command-line applications are wrapped by shell script that can perform pre- and post-processing. We incorporate a call to the scanheader utility within the wrapper for align_warp and have it include the output of the scanheader in the ServiceInvoked activity's annotation. Now the query becomes similar to the previous case:
PrintRecursiveDataUsageFor(Invokee_0, Invokee_1, 'urn:qname:...:ConvertService');
void PrintRecursiveDataUsageFor(EntityID invoker, EntityID invokee, URI processID)
// get initial process's provenance
1. let $processProv := karma.getProcessProvenance(invoker, invokee)
1. let $processList := [$processProv], $visitedDataList := [], $outputDataList := []
// start recursing down the data usage tree iteratively
2. while ($processList != empty) do
a. foreach ($processProv in $processList) do
// test if any of the processes in the current list was 'ConvertService'. If so, print it's output image files.
i. if $processProv.getInvokee().getServiceID() == processID Print $processProv.getProducingData()
// add data products that were produced to the list of output to recurse into
ii. Add all $processProv.getProducingData() to $outputDataList
// we're done with these processes
b. $processList := []
c. foreach ($outputData in $outputDataList) do
// get the data usage list for the output data produced
i. let $dataUsage := karma.getDataUsage($outputData)
// get the process provenance for each process that used the output data and add them to process list
ii. foreach ($usedByProcess in $dataUsage.getUsageList())
- let $processProv := karma.getProcessProvenance($usedByProcess.invoker, $usedByProcess.invokee)
- Add $processProv to $processList
// we're done with these data
d. let $dataList := []
3. End
The results of this operation is shown in query5.txt.
6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."This is a variation of query 4 and query 5. The SQL query used to retreive the align_warp services that had model menu number value of -12 is the same as the query in #4 with the exception of the DAYOFWEEK predicate. Similarly, the client's recursive procedure to locate output of all SoftmeanServices? that were preceeded by these align_warps is similar to the recursive procedure outlined in query #5, with ConvertService being replaced by SoftmeanService. They're reproduced below.
PrintRecursiveDataUsageFor(Invokee_0, Invokee_1, 'urn:qname:...:SoftmeanService'); (See Query #5 for definition)The results of this operation is shown in query6.txt. | ||||||||||||||||||||||||||||||
7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
The getWorkflowTrace API if the Karma service returns the complete workflow trace for a workflow as an XML document. Given the workflow traces for two different workflows, it is possible to do a semantic "diff" of the two documents to find out the differences in the processes that were invoked and the data products used and produced, The pseudo-code for printing out the differences between two workflow traces is given below:
void PrintWorkflowTraceDiff(WorkflowTrace trace1, WorkflowTrace trace2) // Workflow trace is an extension of process procenance document 1. let $processProv1 := trace1 as ProcessProvenance 2. let $processProv2 := trace2 as ProcessProvenance 3. PrintProcessProvenanceDiff($processProv1, $processProv2) // Each step in the workflow trace is a process provenance document 4. foreach($processProv1, $processProv2 in trace1.getTraceSteps(), trace2.getTraceSteps() a. PrintProcessProvenanceDiff($processProv1, $processProv2) 5. End void PrintProcessProvenanceDiff(ProcessProvenance processProv1, ProcessProvenance processProv2) 1. Print "Diff of Processes: ", processProv1.getInvokee(), processProv2.getInvokee() 2. if (processProv1.getInvokee() != processProv2.getInvokee()) a. Print "Invokees Differ: ", processProv1.getInvokee(), processProv2.getInvokee() 3. if (processProv1.getInvoker() != processProv2.getInvoker()) a. Print "Invokers Differ: ", processProv1.getInvoker(), processProv2.getInvoker() 4. if (processProv1.getStatus() != processProv2.getStatus()) a. Print "Process Completion Status Differ: ", processProv1.getStatus(), processProv2.getInvoker() 5. if (processProv1.getRequestReceiveTime() != processProv2.getRequestReceiveTime()) a. Print "Invocation Times Differ: ", processProv1.getRequestReceiveTime(), processProv2.getRequestReceiveTime() 6. foreach ($dataProd1, $dataProd2 in processProv1.getUsingData(), processProv2.getUsingData()) a. PrintDataProductDiff($dataProd1, $dataProd2) 7. foreach ($dataProd1, $dataProd2 in processProv1.getProducingData(), processProv2.getProducingData()) a. PrintDataProductDiff($dataProd1, $dataProd2) 8. End void PrintDataProductDiff(DataProduct dataProd1, DataProduct dataProd2) 1. if (dataProd1.getDataProductID() != dataProd2.getDataProductID()) // trivial. IDs always differ. a. Print "Produced Data IDs Differ: ", dataProd1.getDataProductID(), dataProd2.getDataProductID() 2. if (dataProd1.getLocation() != dataProd2.getLocation()) a. Print "Produced Data Locations Differ: ", dataProd1.getLocation(), dataProd2.getLocation() 3. if (dataProd1.getTimestamp() != dataProd2.getTimestamp()) a. Print "Produced Data Timestamp Differ: ", dataProd1.getTimestamp(), dataProd2.getTimestamp() 4. EndThe second workflow was not run and hence the query results for this are not available. | ||||||||||||||||||||||||||||||
8. A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
As noted earlier, the Karma service does not support detailed annotations at the file level, defering to an external Metadata management system such as MyLEAD. However, it supports generic annotations to be submitted as part of the provenance activities that can be queried upon. We use this facility to add metadata about the input anatomy images to the provenance activity and query it. This is again similar to queries #4, #5 and #6 in that a SQL query retrieves the invocations and we use the getProcessProvenance API of Karma to retrieve the output data products.
| ||||||||||||||||||||||||||||||
9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
The Karma service does not support complex queries such as these on the data product annotations. One way to perform this query would have been to retrieve the annotations for atlas graphics with key studyModality having value visual or audio using a query similar to query #8 and then to filter out the keys at the client end. However, we do not expect to answer such queries through the provenance system and these will not be part of the provenance service API. | ||||||||||||||||||||||||||||||
Suggested Wokflow VariantsSuggest variants of the workflow that can exhibit capabilities that your system support. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
| ||||||||||||||||||||||||||||||
Suggested QueriesSuggest significant queries that your system can support and are not in the proposed list of queries, and how you have implemented/would implement them. These queries may be with regards to a variant of the workflow suggested above. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
| ||||||||||||||||||||||||||||||
Categorisation of queriesAccording to your provenance approach, you may be able to provide a categorisation of queries. Can you elaborate on the categorisation and its rationale. | |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
| ||||||||||||||||||||||||||||||
Live systemsIf your system can be accessed live (through portal, web page, web service, or other), provide relevant information here. | |||||||||||||||||||||||||||||||
| Line: 116 to 311 | |||||||||||||||||||||||||||||||
| Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting. | |||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||
| < < |
-- YogeshSimmhan - 12 Sep 2006 | ||||||||||||||||||||||||||||||
| > > |
-- YogeshSimmhan - 13 Sep 2006 | ||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
| Line: 128 to 323 | |||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||
| > > |
| ||||||||||||||||||||||||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
Provenance Challenge TemplateIn progress | ||||||||
Provenance Challenge Template | ||||||||||||||||||||||||||||||||
| Line: 28 to 28 | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Upload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.) | ||||||||||||||||||||||||||||||||
| Added: | ||||||||||||||||||||||||||||||||
| > > |
| |||||||||||||||||||||||||||||||
Provenance QueriesFor each query, if your system can support your query, provide a description of how you implement the query, what result is returned; otherwise, explain whether the query is in the remit of your system. Also, make sure you complete the ProvenanceQueriesMatrix. | ||||||||||||||||||||||||||||||||
| Added: | ||||||||||||||||||||||||||||||||
| > > |
1. Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.The getRecursiveDataProvenance API provided by the Karma provenance service allows the retreival of the entire data provenance history of a data product. Invoking that method with the data product ID of Atlas X Graphic (in this case, 'lead:uuid:1157946992-atlas-x.gif') returns the complete process that led to its creation. The result of the provenance query is shown in recursive_data_provenance.xml.2. Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.This query is performed by the client by first invoking the getDataProvenance method on the Karma provenance service to retreive the immediate data provenance for Atlas X Graphic. The client then recursively calls getDataProvenance to get move up the provenance tree until the SoftmeanService is encountered in the data provenance results. The pseudo-code for the client looks like this:
1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif']
2. while ($dataList != empty) do
a. $dataProvenance = karma.getDataProvenance($dataList[0]) // get data provenance for this level
b. Print $dataProvenance; $dataList.delete(0) // print process information & remove data from list
c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop.
d. foreach ($inputData in $dataProvenance.getUsingData()) do
// get input data used by this data product. recurse up the tree using iteration
i. $dataList.add($inputData)
3. End
The results of this operation is shown in query2.txt.
3. Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.This query is different from #2 in that the provenance levels are relative to the file, instead of being specified explicitly as 'Softmean'. The getRecursiveDataProvenance API in the Karma provenance service has an optional parameter to specify the depth of recursion. By passing a recursion level of 3 in addition to the data product ID of Atlas X Graphic (in this case, 'lead:uuid:1157946992-atlas-x.gif'), it is possible to retreive the data provenance for stages 3,,4, and 5. The result of the provenance query is shown in query3.xml.4. Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.5. Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.6. Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12."7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.8. A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.9. A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. | |||||||||||||||||||||||||||||||
Suggested Wokflow Variants | ||||||||||||||||||||||||||||||||
| Line: 61 to 116 | ||||||||||||||||||||||||||||||||
| Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting. | ||||||||||||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||||||||||||
| < < |
-- YogeshSimmhan - 11 Sep 2006 | |||||||||||||||||||||||||||||||
| > > |
-- YogeshSimmhan - 12 Sep 2006 | |||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| Added: | ||||||||||||||||||||||||||||||||
| > > |
| |||||||||||||||||||||||||||||||
Provenance Challenge Template | ||||||||||
| Line: 20 to 20 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Provide here a description of how you have encoded the Challenge workflow. | ||||||||||
| Added: | ||||||||||
| > > |
| |||||||||
Provenance TraceUpload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.) | ||||||||||
| Line: 57 to 61 | ||||||||||
| Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting. | ||||||||||
| Changed: | ||||||||||
| < < |
-- YogeshSimmhan - 24 Jul 2006 | |||||||||
| > > |
-- YogeshSimmhan - 11 Sep 2006 | |||||||||
| Added: | ||||||||||
| > > |
| |||||||||
| Line: 1 to 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
| Added: | ||||||||
| > > |
Provenance Challenge TemplateIn progressParticipating Team
Workflow RepresentationProvide here a description of how you have encoded the Challenge workflow.Provenance TraceUpload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.)Provenance QueriesFor each query, if your system can support your query, provide a description of how you implement the query, what result is returned; otherwise, explain whether the query is in the remit of your system. Also, make sure you complete the ProvenanceQueriesMatrix.Suggested Wokflow VariantsSuggest variants of the workflow that can exhibit capabilities that your system support.Suggested QueriesSuggest significant queries that your system can support and are not in the proposed list of queries, and how you have implemented/would implement them. These queries may be with regards to a variant of the workflow suggested above.Categorisation of queriesAccording to your provenance approach, you may be able to provide a categorisation of queries. Can you elaborate on the categorisation and its rationale.Live systemsIf your system can be accessed live (through portal, web page, web service, or other), provide relevant information here.Further CommentsProvide here further comments.ConclusionsProvide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting. -- YogeshSimmhan - 24 Jul 2006 | |||||||