CESNET - Provenance Challenge Member Page | |||||||||||||||||||
| Line: 14 to 14 | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
all data are organised on a per-job basis.
JP collects data about job life cycle including job inputs and outputs, infrastructure state and user annotations.
| |||||||||||||||||||
| Changed: | |||||||||||||||||||
| < < |
| ||||||||||||||||||
| > > |
| ||||||||||||||||||
| |||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
| ||||||||||||||||||
| See also references and glossary at the bottom of this page. | |||||||||||||||||||
| Line: 569 to 570 | |||||||||||||||||||
| |||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
|||||||||||||||||||
| |||||||||||||||||||
| Added: | |||||||||||||||||||
| > > |
| ||||||||||||||||||
CESNET - Provenance Challenge Member Page | ||||||||||
| Line: 360 to 360 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Full implementation
Sample output | ||||||||||
| Changed: | ||||||||||
| < < |
TODO | |||||||||
| > > |
Unfortunately, we didn't manage to find any Monday jobs in our test database, however, there are some Thursday jobs ;-). | |||||||||
CommentsJob registration time, i.e. the submission time, is only an approximation | ||||||||||
| Line: 399 to 400 | ||||||||||
Query #1 but following the successor attribute of workflow's nodes rather than ancestor.
The output files of nodes having IPAW_STAGE = 5 are gathered and sorted to exclude multiple occurences.
| ||||||||||
| Added: | ||||||||||
| > > |
Full implementation | |||||||||
Sample output | ||||||||||
| Changed: | ||||||||||
| < < |
TODO | |||||||||
| > > |
Available here | |||||||||
CommentsIPAW_PROGRAM = 'convert' can be used instead of IPAW_STAGE = 5 as a condition identifying the final output files. | ||||||||||
| Line: 419 to 421 | ||||||||||
OutputsImplementation | ||||||||||
| Added: | ||||||||||
| > > |
JPIS is queried to retrieve IPAW_PROGRAM = 'align_warp' jobs having IPAW_PARAM = '-m 12'.
The result is used to seed graph search, following the successor attribute.
The search is cut at IPAW_PROGRAM = 'softmean', and its outputs are printed.
Full implementation
| |||||||||
Sample outputComments | ||||||||||
| Added: | ||||||||||
| > > |
The actual implementation of this query takes a more efficient (though less intuitive)
approach to follow reversed graph edges via the ancestor attribute.
In this way, JPPS queries are completely avoided and the number of JPIS queries is minimised.
| |||||||||
Query #7A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. | ||||||||||
| Added: | ||||||||||
| > > |
We use Query #1 implementation to show details of the workflows.
Then the differences are apparent -- there is one more stage of the workflow,
and IPAW_PROGRAM attribute values of the two final stages are pgmtoppm and pnmtojpeg respectively.
| |||||||||
Inputs | ||||||||||
| Added: | ||||||||||
| > > |
Atlas graphics file name. | |||||||||
Outputs | ||||||||||
| Added: | ||||||||||
| > > |
Formatted in the same way as for Query #1, while the different workflow nodes are displayed. | |||||||||
Implementation | ||||||||||
| Added: | ||||||||||
| > > |
The workflow is implemented using a modified JDL template. The query client is the same as for #1. | |||||||||
Sample output | ||||||||||
| Added: | ||||||||||
| > > |
TODO | |||||||||
Comments | ||||||||||
| Line: 540 to 561 | ||||||||||
| ||||||||||
| Added: | ||||||||||
| > > |
| |||||||||
| ||||||||||
| Changed: | ||||||||||
| < < |
| |||||||||
| > > |
| |||||||||
CESNET - Provenance Challenge Member Page | ||||||||||||||||||||||
| Line: 148 to 148 | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
How to query Job Provenance | ||||||||||||||||||||||
| Added: | ||||||||||||||||||||||
| > > |
Details on JP architecture, its components, dataflows among them,
and reasons that motivated the design are given in the cited references.
For understanding our implementation of the challenge queries one has to be only aware
that there are two distinct querying endpoints:
| |||||||||||||||||||||
| Changed: | ||||||||||||||||||||||
| < < |
The JP is designed to provide a query interface for end users in two basic steps: | |||||||||||||||||||||
| > > |
Both the querying endpoinds are exposed as web-service interface. | |||||||||||||||||||||
| Changed: | ||||||||||||||||||||||
| < < |
| |||||||||||||||||||||
| > > |
The challenge queries are implemented as Perl scripts which call elementary clients of both the services. | |||||||||||||||||||||
The ProvenanceQueriesMatrixOur line of the ProvenanceQueriesMatrix is here, the explanation of query status is part of each query description.
| ||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||
| < < |
| |||||||||||||||||||||
| > > |
| |||||||||||||||||||||
Query #1 | ||||||||||||||||||||||
| Line: 436 to 442 | ||||||||||||||||||||||
Query #8_ A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago._ | ||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||
| < < |
InputsOutputsImplementationSample outputComments | |||||||||||||||||||||
| > > |
Job Provenance gathers and organises information with the grid job being a primary entity of interest. Despite annotations of a job are its intrinsic part, direct anotations of data are not. Therefore this kind query is not supported. Similarly to Query #9, we might introduce dummy "producer jobs" (i.e. having the particular data file assined as their output), that would carry the annotation. However, we consider this approach too artifitial. | |||||||||||||||||||||
Query #9A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. | ||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||
| < < |
||||||||||||||||||||||
| > > |
As mentioned with Query #8, JP does not provide means of adding annotations to data directly. However, annotations can be added to jobs (via JPPS interface) and it makes good sense to consider job outputs to be annotated with the job annotations too. | |||||||||||||||||||||
Inputs | ||||||||||||||||||||||
| Added: | ||||||||||||||||||||||
| > > |
Value of the studyModality annotation. | |||||||||||||||||||||
Outputs | ||||||||||||||||||||||
| Added: | ||||||||||||||||||||||
| > > |
List of matching graphics files, together with their additional annotations. | |||||||||||||||||||||
Implementation | ||||||||||||||||||||||
| Added: | ||||||||||||||||||||||
| > > |
We assume the annotations to be assigned to whole workflows (i.e. not its subjobs) in the form JP attributes
in a dedicated namespace, e.g. http://twiki.ipaw.info/Challenge/CESNET/Annotations.
Pseudocode:
| |||||||||||||||||||||
Sample output | ||||||||||||||||||||||
| Added: | ||||||||||||||||||||||
| > > |
TODO | |||||||||||||||||||||
Comments | ||||||||||||||||||||||
| Added: | ||||||||||||||||||||||
| > > |
Comment on the IPAW_STAGE = 5 from Query #5 fully applies here too. | |||||||||||||||||||||
| Changed: | ||||||||||||||||||||||
| < < |
||||||||||||||||||||||
| > > |
Currently neither JPPS nor JPIS supports a query "all attributes of this job".
If the annotation names are not known a priori, the following approaches are possible:
| |||||||||||||||||||||
Suggested Wokflow Variants | ||||||||||||||||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
CESNET - Provenance Challenge Member PageWork in progress | ||||||||
CESNET - Provenance Challenge Member Page | ||||||||
| Line: 379 to 379 | ||||||||
|---|---|---|---|---|---|---|---|---|
Query #5 | ||||||||
| Added: | ||||||||
| > > |
Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. | |||||||
Inputs | ||||||||
| Added: | ||||||||
| > > |
N/A | |||||||
Outputs | ||||||||
| Added: | ||||||||
| > > |
List of Atlas Graphic files matching the query. | |||||||
Implementation | ||||||||
| Added: | ||||||||
| > > |
JPIS is queried for jobs matching IPAW_HEADER = 'global_maximum 4095' (and IPAW_PROGRAM = 'align_warp' eventually).
The results of the query (JobIds? of the matching jobs) are used to seed a graph search similar to
Query #1 but following the successor attribute of workflow's nodes rather than ancestor.
The output files of nodes having IPAW_STAGE = 5 are gathered and sorted to exclude multiple occurences.
| |||||||
Sample output | ||||||||
| Added: | ||||||||
| > > |
TODO | |||||||
Comments | ||||||||
| Added: | ||||||||
| > > |
IPAW_PROGRAM = 'convert' can be used instead of IPAW_STAGE = 5 as a condition identifying the final output files. Alternatively, they can be identified as outputs of nodes which have no successors. The code can be also easily modified to record the graph traversal (details on workflow nodes) leading to a particular file, and display it with the file in a similar way as in previous queries. | |||||||
Query #6 | ||||||||
| Added: | ||||||||
| > > |
Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." | |||||||
Inputs | ||||||||
| Line: 405 to 420 | ||||||||
Query #7 | ||||||||
| Added: | ||||||||
| > > |
A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. | |||||||
Inputs | ||||||||
| Line: 418 to 434 | ||||||||
Query #8 | ||||||||
| Added: | ||||||||
| > > |
_ A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago._ | |||||||
Inputs | ||||||||
| Line: 431 to 448 | ||||||||
Query #9 | ||||||||
| Added: | ||||||||
| > > |
A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. | |||||||
Inputs | ||||||||
CESNET - Provenance Challenge Member Page | ||||||||
| Line: 336 to 336 | ||||||||
|---|---|---|---|---|---|---|---|---|
| of lower stage nuber, the search could be cut at IPAW_STAGE = 3, similarly to Query #2. | ||||||||
| Added: | ||||||||
| > > |
Query #4Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.InputsN/AOutputsTime, stage, program name, inputs, outputs of the matching workflow nodesImplementationJPIS is queried for jobs matching IPAW_PROGRAM = 'align_warp' and IPAW_PARAM = '-m 12'. Among the other attributes the job registration time is also retrieved, and the output filtered to jobs that run on Monday. Full implementationSample outputTODOCommentsJob registration time, i.e. the submission time, is only an approximation of running time (the job may have spent long time in a queue). The actual job run time is available in the LB trace, though the current JP implementation cannot extract it yet. Therefore this is a technical only, not principal restriction. The filter "ran on Monday" is quite challenging. Currently, we implement it at client side which is not a scalable solution. However, the JP concept foresees a solution of the issue via an already defined interface to type plugin. A plugin, for a concrete type, defines the following methods:
isWeekDay(x) that would be tranformed at query time to an expression refering
to the new column.
Therefore the condition would be evaluated at the SQL level, i.e. in the most effective way.
Query #5InputsOutputsImplementationSample outputCommentsQuery #6InputsOutputsImplementationSample outputCommentsQuery #7InputsOutputsImplementationSample outputCommentsQuery #8InputsOutputsImplementationSample outputCommentsQuery #9InputsOutputsImplementationSample outputComments | |||||||
Suggested Wokflow VariantsSuggest variants of the workflow that can exhibit capabilities that your system support. | ||||||||
CESNET - Provenance Challenge Member Page | ||||||||||
| Line: 174 to 174 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
URL of the queried Atlas X Graphic file
Outputs | ||||||||||
| Changed: | ||||||||||
| < < |
| |||||||||
| > > |
List of nodes (subjobs) of the workflow that contributed to the queried file: | |||||||||
| ||||||||||
| Line: 201 to 200 | ||||||||||
Full implementation
Sample output | ||||||||||
| Added: | ||||||||||
| > > |
The output bellow is cut and reformated, here is the original output. | |||||||||
|
| ||||||||||
| Changed: | ||||||||||
| < < |
TODO | |||||||||
| > > |
$ ./query1.pl gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.gif 2>/dev/null
Results
===
jobid https://skurut1.cesnet.cz:9000/hvkpZCsRsiqrxs5K_bo7Ew:
attr IPAW_STAGE: 5
attr IPAW_PROGRAM: convert
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.pgm
attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.gif
attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
jobid https://skurut1.cesnet.cz:9000/02ZaAADKyebzggYPp4M9tA:
attr IPAW_STAGE: 4
attr IPAW_PROGRAM: slicer
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.hdr
gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.img
attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla-x.pgm
attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
jobid https://skurut1.cesnet.cz:9000/wGMnTvCILtiSTi7ZOQwfTQ:
attr IPAW_STAGE: 3
attr IPAW_PROGRAM: softmean
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1-resliced.img
...
attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.img
gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/blabla.hdr
attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
jobid https://skurut1.cesnet.cz:9000/9d0XMwfPuefR9woAFkDplQ:
attr IPAW_STAGE: 2
attr IPAW_PROGRAM: reslice
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy3.warp
...
attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy3-resliced.img
...
attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
jobid https://skurut1.cesnet.cz:9000/RglBtUz0IzwSeM32KLnHPg:
attr IPAW_STAGE: 2
attr IPAW_PROGRAM: reslice
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy4.warp
...
...
jobid https://skurut1.cesnet.cz:9000/wdWQHL0-RXkd3VeNcSrTaw:
attr IPAW_STAGE: 2
attr IPAW_PROGRAM: reslice
attr IPAW_PARAM:
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy1.warp
...
...
jobid https://skurut1.cesnet.cz:9000/xwIsN2JgGfsRuvYwh0QXsw:
attr IPAW_STAGE: 2
attr IPAW_PROGRAM: reslice
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy2.warp
...
...
jobid https://skurut1.cesnet.cz:9000/yM3sz8v6WCIPgi5-0m8L4w:
attr IPAW_STAGE: 1
attr IPAW_PROGRAM: align_warp
attr IPAW_PARAM: -m 12, -q
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy4.img
gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/reference.img
attr IPAW_OUTPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy4.warp
attr CE: skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
jobid https://skurut1.cesnet.cz:9000/s47ihjBHQXqPkkNwA2iazg:
attr IPAW_STAGE: 1
attr IPAW_PROGRAM: align_warp
attr IPAW_PARAM: -m 12, -q
attr IPAW_INPUT: gsiftp://umbar.ics.muni.cz:1414/home/mulac/pch06/anatomy2.img
...
...
...
| |||||||||
Comments | ||||||||||
| Line: 212 to 290 | ||||||||||
| Moreover, the queries could be combined together in order to retrieve all attributes of a job in a single hit. | ||||||||||
| Added: | ||||||||||
| > > |
Query #2Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean.InputsURL of the queried Atlas X Graphic fileOutputsSame as for Query #1ImplementationExactly the same as Query #1, with the graph search cut once a node with IPAW_PROGRAM = 'softmean' is found Full implementationSample outputAlmost the same as Query #1, with only nodes up tosoftmean.
Available here.
Query #3Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic.InputsURL of the queried Atlas X Graphic fileOutputsSame as for Query #1ImplementationExactly the same as Query #1, with the final output filtered to contain only jobs having IPAW_STAGE one of 3, 4, 5. Full implementationSample outputAlmost the same as Query #1, with only nodes having IPAW_STAGE one of 3, 4, 5. Available here.CommentsThe implementation is not optimal but more general, we do not impose any special semantics on the value of the IPAW_STAGE attribute. With the additional knowledge that a node is preceeded in the workflow only with nodes of lower stage nuber, the search could be cut at IPAW_STAGE = 3, similarly to Query #2. | |||||||||
Suggested Wokflow Variants | ||||||||||
| Line: 256 to 378 | ||||||||||
-- JiriSitera - 22 Aug 2006
| ||||||||||
| Added: | ||||||||||
| > > |
| |||||||||
| ||||||||||
| Added: | ||||||||||
| > > |
| |||||||||
CESNET - Provenance Challenge Member Page | ||||||||||||||||||||||||||||
| Line: 9 to 9 | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||||||||
| < < |
| |||||||||||||||||||||||||||
| > > |
| |||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
| Line: 33 to 33 | ||||||||||||||||||||||||||||
| The job is the only way the user can access computational resources in gLite. Despite not completely restricted to, gLite is designed to support traditional batch, i.e. non-interactive jobs. | ||||||||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||||||||
| < < |
Upon creation the job is assigned a unique immutable Job Identifier (jobid). The jobid is used to refer to the job all the time during the job life and afterwards. | |||||||||||||||||||||||||||
| > > |
Upon creation the job is assigned a unique immutable Job Identifier (JobId?). The JobId? is used to refer to the job all the time during the job life and afterwards. | |||||||||||||||||||||||||||
| The user describes the job (i.e. executable, parameters, input files etc.) using the Job Description Languate (JDL), using the extensible Classified Advertisement (classad) syntax. | ||||||||||||||||||||||||||||
| Line: 93 to 93 | ||||||||||||||||||||||||||||
| Upload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.) | ||||||||||||||||||||||||||||
| Deleted: | ||||||||||||||||||||||||||||
| < < |
As noted above, when the execution of workflow is finished, the JP service is designed to collect all available traces
of the workflow's life from various Grid subsystems. The end user of JP sees all that available data transformed in form of JP attributes.
Those attributes (key/value pairs) are digested from the traces collected by JP plug-in modules, hiding internal structure, syntax and other implementation details from JP user (at least in the first approximation, the raw files are still available in case of the need for additional processing). So at this level the provenance trace of executed workflow is represented by JP attributes and its values connected to each subjob (node) of workflow. The next table summarize important attributes and its meaning:
| |||||||||||||||||||||||||||
| Changed: | ||||||||||||||||||||||||||||
| < < |
Each job in a Grid is identified by its JobId?. This string is a key to any job related operations, including getting any information about job from LB service. But the JP service is designed to provide tools to find "interesting" jobs based on its attributes (or characteristics). Describe role of JPPS and JPIS and the fact that we are using basic tools to access this services whereas it is expected more sofisticated user interface to be available for ordinary users in the future. A DAG is a set of jobs. Each DAG node (subjob) have its own JobId? and its set of attributes. Desc. PARENT and SUCCESSOR/ANCESTOR. For more detailed description of Job Provenance service architecture and its usage please see JP user's guide. This document contains also user reference to command line tools used in provenance queries described below. | |||||||||||||||||||||||||||
| > > |
As noted above, when the execution of workflow is finished, the JP service can collect traces
of the workflow's life from various Grid subsystems.
Currently only LB is instrumented to provide the trace, however,
the encompassed data are rich and completely sufficient for the challenge.
The LB trace is uploaded as a raw LB dumpfile, three sample snapshots are available
here (files dump[123]).
JP provides the user with an interface to retrieve such raw files, and their format is public in principle
(NetLogger? ULM according to draft-abela-05,
LB specific fields are documented in LB User's Guide).
However, access to the raw files is not supposed to be a typical JP usage.
On the contrary,
the end user of JP sees all that available data transformed into the form of logical JP attributes,
"namespace:name = value" pairs.
Attribute values
are digested from the raw traces JP plug-in modules, hiding internal structure, syntax, format version, and other implementation
details.
At this level the provenance trace of executed workflow is represented by a set of JP attributes
and its values assigned to both the workflow and all its subjobs (nodes).
There are the following classes of attributes:
softmean must have been preceeded by 4 =reslice='s in the challenge workflow,
there are 4 occurences of ancestor attribute of the softmean nodes.
For the specific implementation of the challenge workflow we use LB user tags
to store additional information about the workflow nodes.
JP turns these values into attributes of the 4th kind on the list above.
The following table summarizes their meaning:
| |||||||||||||||||||||||||||
Provenance Queries | ||||||||||||||||||||||||||||
CESNET - Provenance Challenge Member Page | ||||||||||||||
| Line: 145 to 145 | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Query #1 | ||||||||||||||
| Changed: | ||||||||||||||
| < < |
| |||||||||||||
| > > |
Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc.
InputsURL of the queried Atlas X Graphic fileOutputs
ImplementationThe query is implemented as a graph search where the vertices are nodes of the DAG and oriented edges are given by the ANCESTOR attribute. The search is seeded with a JPIS query, retrieving JobId? of the last node of the workflow which produced the queried file directly, i.e. typically theconvert utility.
Pseudocode:
Sample outputTODO CommentsIn the implementation we trade off performance for readability. Namely, with suitable configuration of JPIS, all the JPPS queries, which may easily become a bottleneck of the whole system, could be avoided. Moreover, the queries could be combined together in order to retrieve all attributes of a job in a single hit. | |||||||||||||
Suggested Wokflow Variants | ||||||||||||||
CESNET - Provenance Challenge Member Page | ||||||||
| Line: 60 to 60 | ||||||||
|---|---|---|---|---|---|---|---|---|
| TODO: references JDL, WMS, LB | ||||||||
| Changed: | ||||||||
| < < |
The workflow as a DAG in a Grid | |||||||
| > > |
Challenge workflowWe implement the challenge workflow as a gLite DAG job. The structure of the DAG follows the specified workflow exactly, with the following mapping:
gsiftp:// protocol (solving also access control -- a running gLite job possesses delegated user credentials).
Consequently, the data items are identified with their full URL in our implementation.
We might have used the gLite data services, identifying files with GUID's or logical file names.
However, this approach would make the implementation more obscure while not exhibiting any
important provenance features.
We provide a template for the workflow JDL.
It contains placeholders for the data files,
details on instantiating and submitting it with gLite command-line tools can be found
at this page.
| |||||||
| Deleted: | ||||||||
| < < |
To show the nature of Job Provenance (JP) we must use the context of Grid.
The reader doesn't need a deep understanding of Grid, but we must introduce
many terms and tools that forms a Grid environment.
First of all, the basic entity of interest of user and designer of a Grid is
a job. A job is an application or task performed on Grid resources, in our case
we are thinking about a batch job, which means jobs which are prepared and
submitted into the Grid to be run outside the user interactive login session.
The workflow is represented as a DAG, sequence of jobs with structure
described in the form of a directed acyclic graph. Such a set of jobs
can be submitted into various Grid implementations. For more information
about DAG see also Condor DAG tutorial.
The Grid environmentWe are using gLite middleware based Grid to run this DAG. The gLite Workload Management System (WMS) manage to run the DAG nodes (workflow procedures, represented as subjobs) in proper order at available computing elements, taking care about all job life-cycle related issues (input/output files, security, etc.). The DAG life-cycle is in the gLite monitored by Logging and Bookkeeping service (LB) which collects all available information about the jobs. There are also the Job Provenance (JP) service which primary goal is to combine all information about jobs (LB dumps, input/output collection called sandboxes, accounting logs, etc.) and store it all for long term with query interface available.The actual workflow implementationThe workflow representation we prepared is fully funcional and use the real data files and binaries to execute steps of workflow. To handle input/output files in this experiment we use a GridFTP? server and appropriate Grid aware tools. The input/output files are identified in the form of URLs. The DAG control file describing the workflow can be found here. For detailed description how to submit and run the workflow on a Grid see this page. | |||||||
Provenance Trace | ||||||||
CESNET - Provenance Challenge Member Page | ||||||||
| Line: 9 to 9 | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| ||||||||
| Line: 20 to 22 | ||||||||
Workflow Representation | ||||||||
| Changed: | ||||||||
| < < |
The workflow as a DAG in a GRID | |||||||
| > > |
Job Provenance was developed as a part of the gLite middleware.
Despite its design is more general, capable of handling virtualy any Grid jobs,
the current implementation supports only gLite jobs,
and we use gLite to implement the Provenance Challeng workflow.
Therefore we provide a brief overview of relevant parts of job processing in gLite
before the actual description of the workflow implemetation.
gLite job processing in a nutshellThe job is the only way the user can access computational resources in gLite. Despite not completely restricted to, gLite is designed to support traditional batch, i.e. non-interactive jobs. Upon creation the job is assigned a unique immutable Job Identifier (jobid). The jobid is used to refer to the job all the time during the job life and afterwards. The user describes the job (i.e. executable, parameters, input files etc.) using the Job Description Languate (JDL), using the extensible Classified Advertisement (classad) syntax. The description may grow fairly complex, including requirements on the execution environment, proximity of input and output storage etc. Processing of the job can be summarised as follows:
The workflow as a DAG in a GridTo show the nature of Job Provenance (JP) we must use the context of Grid. The reader doesn't need a deep understanding of Grid, but we must introduce many terms and tools that forms a Grid environment. | |||||||
| Changed: | ||||||||
| < < |
To show the nature of Job Provenance (JP) we must use the context of GRID. The reader doesn't need a deep understanding of GRIDs, but we must introduce many terms and tools that forms a GRID environment. First of all, the basic entity of interest of user and designer of a GRID is a job. A job is an application or task performed on GRID resources, in our case | |||||||
| > > |
First of all, the basic entity of interest of user and designer of a Grid is a job. A job is an application or task performed on Grid resources, in our case | |||||||
| we are thinking about a batch job, which means jobs which are prepared and | ||||||||
| Changed: | ||||||||
| < < |
submitted into the GRID to be run outside the user interactive login session. | |||||||
| > > |
submitted into the Grid to be run outside the user interactive login session. | |||||||
| The workflow is represented as a DAG, sequence of jobs with structure described in the form of a directed acyclic graph. Such a set of jobs | ||||||||
| Changed: | ||||||||
| < < |
can be submitted into various GRID implementations. For more information | |||||||
| > > |
can be submitted into various Grid implementations. For more information | |||||||
| about DAG see also Condor DAG tutorial. | ||||||||
| Changed: | ||||||||
| < < |
The GRID environment | |||||||
| > > |
The Grid environment | |||||||
| Changed: | ||||||||
| < < |
We are using gLite middleware based GRID to run this DAG. The gLite | |||||||
| > > |
We are using gLite middleware based Grid to run this DAG. The gLite | |||||||
| Workload Management System (WMS) manage to run the DAG nodes (workflow procedures, represented as subjobs) in proper order at available computing elements, taking care about all job life-cycle related | ||||||||
| Line: 54 to 94 | ||||||||
| The workflow representation we prepared is fully funcional and use the real data files and binaries to execute steps of workflow. To handle input/output files in this | ||||||||
| Changed: | ||||||||
| < < |
experiment we use a GridFTP? server and appropriate GRID aware tools. | |||||||
| > > |
experiment we use a GridFTP? server and appropriate Grid aware tools. | |||||||
| The input/output files are identified in the form of URLs. The DAG control file describing the workflow can be found here. | ||||||||
| Changed: | ||||||||
| < < |
For detailed description how to submit and run the workflow on a GRID see this page. | |||||||
| > > |
For detailed description how to submit and run the workflow on a Grid see this page. | |||||||
Provenance TraceUpload a representation of the information you captured when executing the workflow. Explain the structure (provide pointers to documents describing your schemas etc.) As noted above, when the execution of workflow is finished, the JP service is designed to collect all available traces | ||||||||
| Changed: | ||||||||
| < < |
of the workflow's life from various GRID subsystems. The end user of JP sees all that available data transformed in form of JP attributes. | |||||||
| > > |
of the workflow's life from various Grid subsystems. The end user of JP sees all that available data transformed in form of JP attributes. | |||||||
Those attributes (key/value pairs) are digested from the traces collected by JP plug-in modules, hiding internal structure, syntax and other implementation details from JP user (at least in the first approximation, the raw files are still available in case of the need for additional processing). So at this level the provenance trace of executed workflow is represented by JP attributes and its values connected to each subjob (node) of workflow. The next table summarize important attributes and its meaning:
| ||||||||
| Line: 84 to 124 | ||||||||
| ||||||||
| Changed: | ||||||||
| < < |
Each job in a GRID is identified by its JobId?. This string is a key to any job related operations, including getting any information about job from LB service. But the JP service is designed to provide tools to find "interesting" jobs based on its attributes (or characteristics). Describe role of JPPS and JPIS and the fact that we are using basic tools to access this services whereas it is expected more sofisticated user interface to be available for ordinary | |||||||
| > > |
Each job in a Grid is identified by its JobId?. This string is a key to any job related operations, including getting any information about job from LB service. But the JP service is designed to provide tools to find "interesting" jobs based on its attributes (or characteristics). Describe role of JPPS and JPIS and the fact that we are using basic tools to access this services whereas it is expected more sofisticated user interface to be available for ordinary | |||||||
| users in the future. A DAG is a set of jobs. Each DAG node (subjob) have its own JobId? and its set of attributes. Desc. PARENT and SUCCESSOR/ANCESTOR. | ||||||||
| Line: 156 to 196 | ||||||||
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| -- CESNET JRA1 team | ||||||||
CESNET - Provenance Challenge Member Page | |||||||||||||||||||||||||||||
| Line: 107 to 107 | |||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||
| < < |
To query the JP in case of this challenge we are using simple command-line tools. We envision two ways how to use the JP service. First is a specialized tool designed by the key user in appropriate application area. The second is multipurpose GUI application, enabled with general query construction interface and some form of visualization tool. We emulate the first way in case of our provenance challenge query implementation. We have for each query one Perl script, which purpose is to prepare and perform the query (usually containing a few actual queries to JPIS and JPPS) and provide the results in a form of text output. | ||||||||||||||||||||||||||||
| > > |
To query the JP in case of this challenge we are using simple command-line tools. We envision two ways how to use the JP service. First is a specialized tool designed by the key user in appropriate application area. It will contain also the appropriate output processing based on the area members needs. The second is multipurpose GUI application, enabled with general query construction interface and some form of general visualization tool. We emulate the first way in case of our provenance challenge query implementation. We have for each query one Perl script, which purpose is to prepare and perform the query (usually containing a few actual queries to JPIS and JPPS) and provide the results in a form of text output. The appropriate visualization of results is out of scope of our work. | ||||||||||||||||||||||||||||
The ProvenanceQueriesMatrix | |||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||
| < < |
|||||||||||||||||||||||||||||
| > > |
Our line of the ProvenanceQueriesMatrix is here, the explanation of query status is part of each query description.
| ||||||||||||||||||||||||||||