<<O>>  Difference Topic CESNET2 (r1.8 - 25 Jun 2007 - AlesKrenek)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 293 to 293

  1. OK, output
  2. Impossible, missing align_warp parameters
  3. Impossible, missing global maximum parameter
Changed:
<
<
  1. OK, output
>
>
  1. Impossible, missing align_warp parameters

  1. Not addressed in Challenge 2
  2. Out of scope of JP
  3. Impossible, missing studyModality annotation
 <<O>>  Difference Topic CESNET2 (r1.7 - 25 Jun 2007 - JiriSitera)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 497 to 497

  • The global maximum parameter and studyModality annotation are not supported, therefore queries 5 and 9 can't be run.

MyGrid

Changed:
<
<
  • There are numerous "producer" processes which seem to take no input and produce auxiliary input taken by the "regular" processes. We failed to find a sufficiently dicriminating criterion to identify such processes automatically.
  • Both align_warp parameters and global maximum are present in the format, however, their naming is ambiguous (String Value and Ontology) according to our understanding. Therefore we could have not extracted them from the format.
  • Physical filenames are not present
  • In general, the file format is rather difficult to undertand and parse.
>
>
  • As described at MyGrid team page each (workflow) input and output file is represented by its own "pseudoprocess" generating it. It is also true for each file on the workflow part edge. Althrough we probably find a sufficiently dicriminating criterion to identify such processes automatically (className of process BeanShellProcessor? versus StringConstantProcessor?) we don't implement it.
  • Both align_warp parameters and global maximum are present in the format, however, their naming is ambiguous (key of parameter is String Value and the global maximum seems to be encoded in Ontology:4095) according to our understanding. Therefore we could have not extracted them from the format.
  • Physical filenames are not present.
  • In general, the file format is rather difficult to understand and parse.

Karma

 <<O>>  Difference Topic CESNET2 (r1.6 - 25 Jun 2007 - JiriSitera)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 244 to 244

Changed:
<
<
Modified workflow -- TODO
>
>
Modified workflow Not addressed in this challenge.

Model Integration Results

Line: 364 to 365

  1. Out of scope of JP
  2. Impossible, missing studyModality annotation
Changed:
<
<

MindSwap?

>
>

MINDSWAP


Changed:
<
<
TODO
>
>
Import graph

Added:
>
>
Provenance Query summary:
  1. OK, output
  2. OK, output
  3. OK, output
  4. OK. (wrong parameters format in MINDSWAP) output,
  5. ipaw_header missing.
  6. OK, output
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, missing studyModality annotation

Heterogeneous workflows

Line: 398 to 409

  1. Impossible, studyModality annotation missing in SDG data
Deleted:
<
<

ES3-CESNET-Karma

TODO


ES3-MyGrid-SDG

Changed:
<
<
TODO
>
>
Import graph

Provenance Query summary:

  1. OK, output
  2. OK, output
  3. OK, output
  4. ipaw_param not presented in ES3
  5. ipaw_head not presented in ES3
  6. ipaw_param not presented in ES3
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, studyModality annotation missing in SDG data

MyGrid-ES3-SDG

Changed:
<
<
TODO
>
>
Import graph

The graph contains number of "producer" nodes from MyGrid.

Provenance Query summary:

  1. OK. output
  2. OK. output
  3. OK. output
  4. ipaw_param not presented in MyGrid
  5. ipaw_head not presented in MyGrid
  6. ipaw_param not presented in MyGrid
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, studyModality annotation missing in SDG data

Karma-SDG2-MINDSWAP2

Import graph

Provenance Query summary:

  1. OK. output
  2. OK. output
  3. OK. output
  4. OK. output
  5. ipaw_head not presented in Karma
  6. OK. output
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, studyModality annotation missing in MINDSWAP data

Translation Details

Line: 463 to 509

SDG

Changed:
<
<
TODO
>
>
  • stage missing, we supply its value as parameter of the translator.
  • In general well understandable format.

Changed:
<
<

MindSwap?

>
>

MINDSWAP


Changed:
<
<
TODO
>
>
  • global maximum is missing, yelding query 5 to be impossible.
  • There is probably bug in output/input files between stage 2 and 3. Reslice jobs produce image and header files, but softmean job inports headers twice (some in hasInputImage and some in hasInputHeader tag) and no image,
  • Another small bug is in parameters of align_warp jobs -- "-m 12" is stored as "-m -12".
  • In general, the file format is rather difficult to understand.

Benchmarks

Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system

Added:
>
>
On Fri, 22 Jun 2007, Simon Miles wrote: There is nothing particular to prepare for this prior to the workshop, though having thought about possible suitable scenarios or queries that would make suitable benchmarks would be welcome when we come to discuss it.

Further Comments

Provide here further comments.

Line: 519 to 571

META FILEATTACHMENT cks-q4.log attr="" comment="" date="1182517731" path="cks-q4.log" size="13685" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q5.log attr="" comment="" date="1182517751" path="cks-q5.log" size="1405" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q6.log attr="" comment="" date="1182517764" path="cks-q6.log" size="1143" user="AlesKrenek" version="1.1"
Added:
>
>
META FILEATTACHMENT ems-q1.log attr="" comment="es3-mygrid-sdg2 query 1" date="1182774841" path="ems-q1.log" size="9280" user="JiriSitera" version="1.1"
META FILEATTACHMENT ems-q2.log attr="" comment="es3-mygrid-sdg2 query 2" date="1182774981" path="ems-q2.log" size="2053" user="JiriSitera" version="1.1"
META FILEATTACHMENT ems-q3.log attr="" comment="es3-mygrid-sdg2 query 3" date="1182774999" path="ems-q3.log" size="4141" user="JiriSitera" version="1.1"
META FILEATTACHMENT mes-q1.log attr="" comment="mygrid-es3-sdg2 query 1" date="1182775848" path="mes-q1.log" size="13056" user="JiriSitera" version="1.1"
META FILEATTACHMENT mes-q2.log attr="" comment="mygrid-es3-sdg2 query 2" date="1182775871" path="mes-q2.log" size="2058" user="JiriSitera" version="1.1"
META FILEATTACHMENT mes-q3.log attr="" comment="mygrid-es3-sdg2 query 3" date="1182775889" path="mes-q3.log" size="2046" user="JiriSitera" version="1.1"
META FILEATTACHMENT ksm-q1.log attr="" comment="karma-sdg2-mindswap2 query 1" date="1182776690" path="ksm-q1.log" size="7430" user="JiriSitera" version="1.1"
META FILEATTACHMENT ksm-q2.log attr="" comment="karma-sdg2-mindswap2 query 2" date="1182776706" path="ksm-q2.log" size="1935" user="JiriSitera" version="1.1"
META FILEATTACHMENT ksm-q3.log attr="" comment="karma-sdg2-mindswap2 query 3" date="1182776724" path="ksm-q3.log" size="950" user="JiriSitera" version="1.1"
META FILEATTACHMENT ksm-q4.log attr="" comment="karma-sdg2-mindswap2 query 4" date="1182776749" path="ksm-q4.log" size="15761" user="JiriSitera" version="1.1"
META FILEATTACHMENT ksm-q6.log attr="" comment="karma-sdg2-mindswap2 query 6" date="1182776767" path="ksm-q6.log" size="993" user="JiriSitera" version="1.1"
META FILEATTACHMENT ems.ps attr="" comment="" date="1182777031" path="ems.ps" size="22032" user="JiriSitera" version="1.1"
META FILEATTACHMENT mes.ps attr="" comment="" date="1182777043" path="mes.ps" size="32210" user="JiriSitera" version="1.1"
META FILEATTACHMENT ksm.ps attr="" comment="" date="1182777111" path="ksm.ps" size="16174" user="JiriSitera" version="1.1"
META FILEATTACHMENT mindswap-q1.log attr="" comment="" date="1182777271" path="mindswap-q1.log" size="6412" user="JiriSitera" version="1.1"
META FILEATTACHMENT mindswap-q2.log attr="" comment="" date="1182777288" path="mindswap-q2.log" size="1889" user="JiriSitera" version="1.1"
META FILEATTACHMENT mindswap-q3.log attr="" comment="" date="1182777302" path="mindswap-q3.log" size="2942" user="JiriSitera" version="1.1"
META FILEATTACHMENT mindswap-q4.log attr="" comment="" date="1182777315" path="mindswap-q4.log" size="15588" user="JiriSitera" version="1.1"
META FILEATTACHMENT mindswap-q6.log attr="" comment="" date="1182777327" path="mindswap-q6.log" size="2395" user="JiriSitera" version="1.1"
META FILEATTACHMENT mindswap.ps attr="" comment="" date="1182777400" path="mindswap.ps" size="17684" user="JiriSitera" version="1.1"
 <<O>>  Difference Topic CESNET2 (r1.5 - 25 Jun 2007 - JiriSitera)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 51 to 51

Query implementation

Added:
>
>
The queries implementation remains unchaged as implemented for the first challenge except small adaptations described in next paragraphs.

Executable naming

The First Challenge query scripts used hardcoded executable names.

 <<O>>  Difference Topic CESNET2 (r1.4 - 22 Jun 2007 - AlesKrenek)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 15 to 15

Note here any changes in your provenance representation, workflow enactment or system since the first challenge. Alternatively, if you did not participate in the first challenge, please provide the same details as were required for those who did (particularly workflow representation and provenance representation).

Added:
>
>

Implicit workflow representation


The CESNET implementation of the First Provenance Challenge relied on an explicit representation of workflow structure that was extracted from the native workflow representation in gLite -- dependencies among
Line: 26 to 28

Instead, dependence between two workflow processes is inherited from data: Process A is makred as ancestor of B (and vice versa, B is successor of A) if there is a data file F that is output of A and input of B.
Added:
>
>
Logical filenames are considered for this purpose (name in the file elements in the format definition bellow, not physical filenames -- content of url elements).

For the purpose the challenge we implement this process in

Changed:
<
<
an external ``sew'' script.
>
>
an external "sew" script.

The script is seeded with one or more identifiers of processes, it queries recursively JP, data dependences (common input-output files) are traversed
Line: 44 to 49

The mechanism of generating such notifications is already available in JP. It is used in the communication of JP Primary storage and JP Index server.
Added:
>
>

Query implementation

Executable naming

The First Challenge query scripts used hardcoded executable names. This was not a problem, the names matched exactly the values recorded by our implementation of the workflow.

However, the naming varies among the teams, eg. it may or may not contain absolute path to the executable. Therefore the scripts had to be parametrized to be run with the names appropriate for the particular data source

Timestamps

JP starts gatering data on a job virtually at the same time the job is submitted to the Grid. Therefore, during the First Challenge, we could have used times of job registration with JP to approximate the job run time quite accurately. (Queries on the exact execution time were not implemented in JP that time.)

This is not true anymore in the Second Challenge. The job is registered with JP when the data are imported, ie. typically much later wrt. its real execution.

The query scripts were adjusted to use the true execution time.


Provenance Data for Workflow Parts

Give links here to your provenance data files for the workflow parts of the challenge: three parts for the original workflow and three parts for the modified workflow (as per provenance query 7). The data files could be attached to the results page.

Line: 203 to 234

with the exception of IPAW_INPUT and IPAW_OUTPUT which are mapped specifically in this format.
Deleted:
<
<

Process vs. data provenance

TODO

Full workflow data

Original workflow
Line: 253 to 281

ES3

Added:
>
>
Import graph

Provenance Query summary:

  1. OK, output
  2. OK, output
  3. OK, output
  4. Impossible, missing align_warp parameters
  5. Impossible, missing global maximum parameter
  6. OK, output
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, missing studyModality annotation

TODO:

  • what are the additional three processes (coming from stage 3) in the graph?
  • upload query #6 results

Karma

Added:
>
>
Import graph

More complicated due to duplicated arcs. This is caused by using different logical names for .img and .hdr pairs of files (unlike CESNET format which groups them together under a single logical name). Otherwise the graph matches expectations exactly.

Provenance Query summary:

  1. OK, output
  2. OK, output
  3. OK, output
  4. OK, output
  5. Impossible, missing global maximum parameter
  6. OK, output
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Not implemented. studyModality annotation is present, should be doable

TODO: more comments on Q9


MyGrid

Added:
>
>
Import graph

The graph contains number of "producer" nodes (see Translation Details bellow), a manually adjusted version (by removing these nodes) meets the expectation.

Provenance Query summary:

  1. OK, output
  2. OK, output
  3. OK, output
  4. Not implemented, information on align_warp parameters is present but not processed by our translator
  5. Impossible, global maximum parameter may be present in the j.0:global tag, however, the name is not unique, so the translator can't rely on it
  6. Not implemented, information on align_warp parameters is present but not processed by our translator
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Not implemented. studyModality annotation is present, should be doable


SDG

Added:
>
>
Import graph

The graph contains the first row of "producer" jobs, otherwise it matches expectations.

Provenance Query summary:

  1. OK, output
  2. OK, output
  3. OK, output
  4. OK, output
  5. OK, output
  6. OK, output
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, missing studyModality annotation

MindSwap?

TODO


Heterogeneous workflows

Added:
>
>
Most of the challenge queries are affected by availability of data in a particular part of the workflow. Therefore, in general, the results of heterogeneous queries follow the results of the homogeneous queries on the involved provenance system.

In particular:

  • Q4, Q6: align_warp parameters, follow results of workflow part 1
  • Q5: global maximum parameter, workflow part 1 again
  • Q9: studyModality annotation, part 3


CESNET-Karma-SDG

Added:
>
>
Import graph

Provenance Query summary:

  1. OK, output
  2. OK, output
  3. OK, output
  4. OK, output
  5. OK, output
  6. OK, output
  7. Not addressed in Challenge 2
  8. Out of scope of JP
  9. Impossible, studyModality annotation missing in SDG data

ES3-CESNET-Karma

Added:
>
>
TODO

ES3-MyGrid-SDG

Added:
>
>
TODO

MyGrid-ES3-SDG

Added:
>
>
TODO

Translation Details

Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous

Line: 295 to 433

  • one provenance system directories: conversion tools for the particular format, specific parts of the automatic translation and import of homogeneous workflows
  • three provenance systems directories: specific code for translation and import of this particular heterogeneous workflow
Added:
>
>
JP assigns job owner to each process (X509 certificate subject). There seems be no analogy in the other formats, therefore we supplied the value as parameter of the translators.

Most of the formats don't include explicitly information on the part of the workflow (that matches the notion of stage in our format). This was also supplied as an additional parameter of the translator.


ES3

Added:
>
>
  • Different logical names for .hdr and .img file pairs are used (despite we understand these files to be tightly coupled). Consequently duplicate dependences among workflow processes are detected.
  • File names are not consistent across boundaries of the workflow parts (eg. reslice outputs are not the same as softmean inputs). We believe this to be an artifact of the challenge data rather than feature of the system, though, and we fixed the problem by manually renaming the files accordingly.
  • Arguments of align_warp seem to be defined according to Challenge 1 example, however, these data are missing in Challenge 2.
  • The global maximum parameter and studyModality annotation are not supported, therefore queries 5 and 9 can't be run.

MyGrid

Added:
>
>
  • There are numerous "producer" processes which seem to take no input and produce auxiliary input taken by the "regular" processes. We failed to find a sufficiently dicriminating criterion to identify such processes automatically.
  • Both align_warp parameters and global maximum are present in the format, however, their naming is ambiguous (String Value and Ontology) according to our understanding. Therefore we could have not extracted them from the format.
  • Physical filenames are not present
  • In general, the file format is rather difficult to undertand and parse.

Karma

Added:
>
>
  • global maximum is missing, yielding query 5 to be impossible
  • Explicit identifiers of the process instance were missing. We used concatenation of workflowNodeID and serviceID, believing it to be sufficiently unique.

SDG

Added:
>
>
TODO

MindSwap?

Added:
>
>
TODO

Benchmarks

Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system

Line: 317 to 478

Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.

Added:
>
>
TODO (ljocha)

-- SimonMiles - 26 Oct 2006

-- AlesKrenek - 19 Feb 2007

Line: 324 to 487

META FILEATTACHMENT out1.xml attr="" comment="Original workflow, part1" date="1172007675" path="out1.xml" size="31008" user="AlesKrenek" version="1.1"
META FILEATTACHMENT out2.xml attr="" comment="Original workflow, part2" date="1172007934" path="out2.xml" size="4987" user="AlesKrenek" version="1.1"
META FILEATTACHMENT out3.xml attr="" comment="Original workflow, part3" date="1172007961" path="out3.xml" size="19679" user="AlesKrenek" version="1.1"
Added:
>
>
META FILEATTACHMENT es3.ps attr="" comment="ES3 import graph" date="1182510366" path="es3.ps" size="16510" user="AlesKrenek" version="1.1"
META FILEATTACHMENT es3-q1.log attr="" comment="Query #1 results" date="1182512078" path="es3-q1.log" size="6643" user="AlesKrenek" version="1.1"
META FILEATTACHMENT es3-q2.log attr="" comment="Query #2 results" date="1182512111" path="es3-q2.log" size="2009" user="AlesKrenek" version="1.1"
META FILEATTACHMENT es3-q3.log attr="" comment="Query #3 results" date="1182512151" path="es3-q3.log" size="1987" user="AlesKrenek" version="1.1"
META FILEATTACHMENT karma.ps attr="" comment="Karma import graph" date="1182512585" path="karma.ps" size="20985" user="AlesKrenek" version="1.1"
META FILEATTACHMENT karma-q1.log attr="" comment="" date="1182512650" path="karma-q1.log" size="7188" user="AlesKrenek" version="1.1"
META FILEATTACHMENT karma-q2.log attr="" comment="" date="1182512660" path="karma-q2.log" size="2253" user="AlesKrenek" version="1.1"
META FILEATTACHMENT karma-q3.log attr="" comment="" date="1182512671" path="karma-q3.log" size="2295" user="AlesKrenek" version="1.1"
META FILEATTACHMENT karma-q4.log attr="" comment="" date="1182512682" path="karma-q4.log" size="3126" user="AlesKrenek" version="1.1"
META FILEATTACHMENT karma-q6.log attr="" comment="" date="1182512694" path="karma-q6.log" size="7735" user="AlesKrenek" version="1.1"
META FILEATTACHMENT mygrid.ps attr="" comment="MyGrid import graph" date="1182514080" path="mygrid.ps" size="45018" user="AlesKrenek" version="1.1"
META FILEATTACHMENT mygrid2.ps attr="" comment="" date="1182514817" path="mygrid2.ps" size="21894" user="AlesKrenek" version="1.1"
META FILEATTACHMENT mygrid-q1.log attr="" comment="" date="1182515348" path="mygrid-q1.log" size="16129" user="AlesKrenek" version="1.1"
META FILEATTACHMENT mygrid-q2.log attr="" comment="" date="1182515362" path="mygrid-q2.log" size="3513" user="AlesKrenek" version="1.1"
META FILEATTACHMENT mygrid-q3.log attr="" comment="" date="1182515397" path="mygrid-q3.log" size="5576" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg.ps attr="" comment="" date="1182516654" path="sdg.ps" size="29161" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg-q1.log attr="" comment="" date="1182517320" path="sdg-q1.log" size="8191" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg-q2.log attr="" comment="" date="1182517336" path="sdg-q2.log" size="1336" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg-q3.log attr="" comment="" date="1182517351" path="sdg-q3.log" size="1323" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg-q4.log attr="" comment="" date="1182517367" path="sdg-q4.log" size="1893" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg-q5.log attr="" comment="" date="1182517388" path="sdg-q5.log" size="1310" user="AlesKrenek" version="1.1"
META FILEATTACHMENT sdg-q6.log attr="" comment="" date="1182517403" path="sdg-q6.log" size="605" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks.ps attr="" comment="CESNET-Karma-SDG import" date="1182517631" path="cks.ps" size="15783" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q1.log attr="" comment="" date="1182517692" path="cks-q1.log" size="5700" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q2.log attr="" comment="" date="1182517706" path="cks-q2.log" size="1952" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q3.log attr="" comment="" date="1182517719" path="cks-q3.log" size="3957" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q4.log attr="" comment="" date="1182517731" path="cks-q4.log" size="13685" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q5.log attr="" comment="" date="1182517751" path="cks-q5.log" size="1405" user="AlesKrenek" version="1.1"
META FILEATTACHMENT cks-q6.log attr="" comment="" date="1182517764" path="cks-q6.log" size="1143" user="AlesKrenek" version="1.1"
 <<O>>  Difference Topic CESNET2 (r1.3 - 21 Jun 2007 - AlesKrenek)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 7 to 7

Participating Team

Changed:
<
<
  • Participant names: Frantisek Dvorak, Ales Krenek, Ludek Matyska, Milos Mulac, Jiri Sitera
>
>
  • Participant names: Frantisek Dvorak, Jiri Filipovic, Ales Krenek, Ludek Matyska, Milos Mulac, Jiri Sitera, Zdenek Sustr

Line: 15 to 15

Note here any changes in your provenance representation, workflow enactment or system since the first challenge. Alternatively, if you did not participate in the first challenge, please provide the same details as were required for those who did (particularly workflow representation and provenance representation).

Added:
>
>
The CESNET implementation of the First Provenance Challenge relied on an explicit representation of workflow structure that was extracted from the native workflow representation in gLite -- dependencies among DAG subjobs specified by the user on its submission. These dependencies were decoded and recorded as ancestor and successor attributes of the DAG subjobs and used for query implmentation.

This restriction is relaxed in the Second Challenge. Instead, dependence between two workflow processes is inherited from data: Process A is makred as ancestor of B (and vice versa, B is successor of A) if there is a data file F that is output of A and input of B.

For the purpose the challenge we implement this process in an external ``sew'' script. The script is seeded with one or more identifiers of processes, it queries recursively JP, data dependences (common input-output files) are traversed in both directions until the complete graph closure is found. The found dependences are recorded with processes in terms of ancestor and successor attributes of the first challenge; then the challenge queries implementation remains unchaged in this sense.

Currently the script is invoked on demand. However, it can be transformed into a part of the JP infrastructure -- an agent which subscribes for receiving notifications on input/output file assignments to processes, and generates the workflow dependencies automatically. The mechanism of generating such notifications is already available in JP. It is used in the communication of JP Primary storage and JP Index server.


Provenance Data for Workflow Parts

Give links here to your provenance data files for the workflow parts of the challenge: three parts for the original workflow and three parts for the modified workflow (as per provenance query 7). The data files could be attached to the results page.

Line: 32 to 61

An export utility used to generate the exchange files with JP queries is available here.
Deleted:
<
<
Currently we are working on implementation of an import plugin, a loadable module that would take this format and let JP understand it directly. See JP references given at the First Challenge page for details.

Commented example

Here we show an example of the data format.

Line: 192 to 217

Model Integration Results

Changed:
<
<
State here which combinations of teams' models you have managed to perform the provenance query over
>
>
In order to get better understanding of the issues of translations between the provenance data models we extend the challenge specification into two stages:
  • translation and evaluation of homogeneous workflows (ie. data recorded in one provenance system only)
  • evaluation of heterogeneous workflows (combining data from multiple systems, as requested by the orignal specification)
In both the stages available data were translated, imported into JP, and the challege queries run. This approach allows us to focus on issues specific to translation of data from a particular system separately, while discussing issues arising intrinsically from the combinations (not many, actually) independently.

The translation and import process

Translation and eventual combination of the provenance data (see Translation tools bellow) is done in the following steps:
  1. separate translation of parts of the workflow from they native format to our format (as defined above)
  2. unification of the input and output file names of the softmean process (part 2) to match outpus and inputs of parts 1 and 3
  3. adjustment of all output filenames with a unique suffix
  4. assignment of new unique id's to all the workflow processes
  5. import of the adjusted files into JP (also
  6. run the sew script to determine dependences between processes

Steps 2--4 are rather artificial and serve the purpose of the challenge only.

Unification of names of softmean inputs/outputs is necessary to trigger inheriting dependences. If all the provenance systems gathered data on the same workflow execution, the matching filenames in all the parts of the workflow would be the same either.

Similarly adding the unique suffix to all filenames allows us to run multiple imports on the same input data without the need to purge the JP database between the attempts. The same holds for assigning the new unique id's to the imported processes in step 4.

Step 6, as its side effect, produces a graph representation of the imported data. These graphs are shown in the result section bellow.

Homogeneous workflows

ES3

Karma

MyGrid

SDG

Heterogeneous workflows

CESNET-Karma-SDG

ES3-CESNET-Karma

ES3-MyGrid-SDG

MyGrid-ES3-SDG


Translation Details

Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous

Added:
>
>
Sections bellow briefly describe issues that raised from translating the particular provenance system data, and importing them into JP. The list is not complete wrt. all the participating teams. We were not able to put the necessary effort into evaluation of all, we have chosen more or less random sample, based on a very subjective and brief view on the provided data. Therefore we are not able to provide any serious assessment of the data formats of systems that are not listed in this section.

Translation tools

For the sake of easy repeatablity of the experiments with data translations we implemented fully automated procedures for translating the data formats and importing the results into JP. This is done for both homogeneous and heterogeneous workflows.

Our CVS repository is organized as follows:

  • export/: JP export and import utilities, ``sew'' script for inheriting the dependences, and common code for the automated translations
  • one provenance system directories: conversion tools for the particular format, specific parts of the automatic translation and import of homogeneous workflows
  • three provenance systems directories: specific code for translation and import of this particular heterogeneous workflow

ES3

MyGrid

Karma

SDG

MindSwap?


Benchmarks

Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system

 <<O>>  Difference Topic CESNET2 (r1.2 - 20 Feb 2007 - AlesKrenek)

META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Line: 83 to 83

<!-- user annotations, including Challenge-specific; only the latter are shown -->

Changed:
<
<
1 align_warp -m 12 -q global_maximum=4095
>
>
http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_STAGE 1 http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_PROGRAM align_warp http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_PARAM -m 12 http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_PARAM -q http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_HEADER global_maximum=4095

Line: 167 to 182

TODO

Full workflow data

Added:
>
>
Original workflow

Changed:
<
<
TODO
>
>

Modified workflow -- TODO


Model Integration Results

Line: 194 to 214

-- AlesKrenek - 19 Feb 2007

Added:
>
>
META FILEATTACHMENT out1.xml attr="" comment="Original workflow, part1" date="1172007675" path="out1.xml" size="31008" user="AlesKrenek" version="1.1"
META FILEATTACHMENT out2.xml attr="" comment="Original workflow, part2" date="1172007934" path="out2.xml" size="4987" user="AlesKrenek" version="1.1"
META FILEATTACHMENT out3.xml attr="" comment="Original workflow, part3" date="1172007961" path="out3.xml" size="19679" user="AlesKrenek" version="1.1"
 <<O>>  Difference Topic CESNET2 (r1.1 - 19 Feb 2007 - AlesKrenek)
Line: 1 to 1
Added:
>
>
META TOPICPARENT ParticipatingTeams

Second Provenance Challenge -- CESNET

Participating Team

  • Short team name: CESNET
  • Participant names: Frantisek Dvorak, Ales Krenek, Ludek Matyska, Milos Mulac, Jiri Sitera
  • Project URL: http://egee.cesnet.cz/en/JRA1/
  • Reference to first challenge results (if participated): CESNET

Differences from First Challenge

Note here any changes in your provenance representation, workflow enactment or system since the first challenge. Alternatively, if you did not participate in the first challenge, please provide the same details as were required for those who did (particularly workflow representation and provenance representation).

Provenance Data for Workflow Parts

Give links here to your provenance data files for the workflow parts of the challenge: three parts for the original workflow and three parts for the modified workflow (as per provenance query 7). The data files could be attached to the results page.

Challenge data format

For the purpose of the Challenge, data are exported from Job Provenance in an XML format conforming to a schema available here.

The format is custom-made specifically for the Challenge in order to facilitate the data exchange with other teams, however, it is a full-featured export format from Job Provenance:

  • it is generated in an automatic way from data available in JP after running the First Challenge workflow, without any manual intervention,
  • virtually all information in JP is included, despite it may not be needed for the Second Challenge as a whole,
  • the exported files can be taken "as is" for importing back into JP, resulting in an equivalent functionality

An export utility used to generate the exchange files with JP queries is available here.

Currently we are working on implementation of an import plugin, a loadable module that would take this format and let JP understand it directly. See JP references given at the First Challenge page for details.

Commented example

Here we show an example of the data format. This example was hand-edited for the sake of better readablility.

<?xml version="1.0"?>

<workflow xmlns="http://egee.cesnet.cz/en/Schema/JP/Challenge2">
   <exportedStages>1 2</exportedStages>

   <job id="https://skurut1.cesnet.cz:9000/yM3sz8v6WCIPgi5-0m8L4w">
      <owner>/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Ales Krenek</owner>
      <regtime>2006-07-11T12:22:34</regtime>

<!-- input and output files of this job -->
      <inputs>
         <file name="urn:challenge:anatomy1.img">
            <url>gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1.img</url>
            <url>gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1.hdr</url>
         </file>
      </inputs>

      <outputs>
         <file name="urn:challenge:anatomy1_yM3sz8v6WCIPgi5-0m8L4w.warp">
            <url>gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1_yM3sz8v6WCIPgi5-0m8L4w.warp</url>
         </file>
      </outputs>

<!-- workflow structure: jobs that preceed and follow this one in the workflow -->

      <ancestors>
<!-- empty for stage 1 -->
      </ancestors>

      <successors>
<!-- note the reference to the other job bellow -->
         <jobid>https://skurut1.cesnet.cz:9000/wdWQHL0-RXkd3VeNcSrTaw</jobid>
      </successors>

<!-- gLite middleware processing and job execution details -->
      <gliteJobRecord>
<!-- omitted for readability --> 
      </gliteJobRecord>

<!-- user annotations, including Challenge-specific; only the latter are shown -->
      <annotations>
         <annotation name="http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_STAGE">1</annotation>
         <annotation name="http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_PROGRAM">align_warp</annotation>
         <annotation name="http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_PARAM">-m 12</annotation>
         <annotation name="http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_PARAM">-q</annotation>
         <annotation name="http://egee.cesnet.cz/en/WSDL/jp-lbtag:IPAW_HEADER">global_maximum=4095</annotation>
      </annotations>
   </job>

   <job id="https://skurut1.cesnet.cz:9000/wdWQHL0-RXkd3VeNcSrTaw">

<!-- another job in the workflow, omitted -->

   </job>

<!-- further jobs follow -->

</workflow>

The root element of the file is workflow, correstponding to an entire exported workflow or its parts as given by the Challenge definition. The stages present in this file are listed in exportedStages.

Further second level elements are job 's, representing the individual processes in the workflow. Each one is assigned a unique ID already when processed by the gLite middleware. Besides general metadata (owner and registration time) the data can be organized in the following sections:

Inputs and outputs

file elements refer to concrete inputs and outputs of the job. The attribute name is a URI identifying the particular file uniquely. As we didn't follow any given file naming scheme in Challenge 1, custom urn: 's are shown in the example. However, any suitable file identifier can be used instead.

File name of input of the shown job has now suffix as it is the input of the entire workflow and only a single set of inputs was given. On the contrary, the output file name contains a unique suffix, suggesting that this file was generated by a particular workflow run.

As some of the files in the Challenge workflow are collections of files in fact (.img and .hdr files), we use nested url 's (that may occur multiple times) to denote also physical file locations.

Workflow structure

Structure of the workflow is denoted by links between job 's using their unique identifiers, and grouped in ancestors and successors. These links are present in the exported format regardless their targets are exported in this part of the workflow or not.

The links are sufficient to "stitch" together separately exported workflow parts in a unique and reliable way. However, if they are not available explicitely, they can be still reconstructed by searching matching inputs and outputs of the jobs.

Job processing details

gliteJobRecord contains details on processing the job in gLite middleware. It conforms to the schema originally defined for the purpose of computing job statistics in EGEE project.

These data are virtually irrelevant for the Challenge, therefore they are omitted in this example. However, they are present in the full exported data bellow.

The contained elements are either described within the schema or they are self-explanatory.

User annotations

JP allows the user to add arbitrary "namespace:name = value" annotations to the job, while "value" can have arbitrary complex XML structure. The same "name" can also occur multiple times. The annotations can be added either during job execution (usually via L&B, the gLite service that tracks the job during its active life), or later via native JP interface.

The annotations of particular interest for the Challenge are shown above. They correspond to tags recorded and described in Challenge 1, with the exception of IPAW_INPUT and IPAW_OUTPUT which are mapped specifically in this format.

Process vs. data provenance

TODO

Full workflow data

TODO

Model Integration Results

State here which combinations of teams' models you have managed to perform the provenance query over

Translation Details

Describe details regarding how data models were translated (or otherwise used to answer the query following the team's approach), any data which was absent from a downloaded model, and whether this affected the possibility of translation or successful provenance query, and any data which was excluded in translation from a downloaded model because it was extraneous

Benchmarks

Describe your proposed benchmark queries, how the comparable quantities are determined, and the results of applying the benchmark to your own system

Further Comments

Provide here further comments.

Conclusions

Provide here your conclusions on the challenge, and issues that you like to see discussed at a face to face meeting.

-- SimonMiles - 26 Oct 2006

-- AlesKrenek - 19 Feb 2007

View topic | Diffs | r1.8 | > | r1.7 | > | r1.6 | More
Revision r1.1 - 19 Feb 2007 - 17:11 - AlesKrenek
Revision r1.8 - 25 Jun 2007 - 21:17 - AlesKrenek