<<O>>  Difference Topic ThirdProvenanceChallenge (r1.21 - 17 Jun 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 7 to 7

Current Status

Changed:
<
<
  • Teams should have largely completed the challenge and be working on their slides for the workshop.
>
>
  • The Third Provenance Challenge Workshop was a success. It resulted in several proposals for changes in the OPM specification, additional profiles for OPM, a governance model, a CFP for a journal paper based on PC3 results, and thoughts on a future fourth provenance challenge.

Changed:
<
<

  • We now have 28 confirmed attendees.
>
>

Workshop Details


Added:
>
>

Participating Teams

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.20 - 03 Jun 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 11 to 11

Changed:
<
<
  • We now have 31 confirmed attendees.
>
>
  • We now have 28 confirmed attendees.

Participating Teams

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.19 - 02 Jun 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 7 to 7

Current Status

Changed:
<
<
  • Teams should have exported provenance to OPM. They should now be importing OPM from other teams and running queries over them.
>
>
  • Teams should have largely completed the challenge and be working on their slides for the workshop.

Changed:
<
<
  • Registration for the workshop is open. (See LocalDetailsPC3 for info on how to register.)
>
>
  • We now have 31 confirmed attendees.


Participating Teams

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.18 - 07 May 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 7 to 7

Current Status

Changed:
<
<
Teams should be implementing the workflow below, running the core queries and starting to export OPM formatted data.
>
>
  • Teams should have exported provenance to OPM. They should now be importing OPM from other teams and running queries over them.

Changed:
<
<
Workshop details can be found at LocalDetailsPC3.
>
>

  • Registration for the workshop is open. (See LocalDetailsPC3 for info on how to register.)

Participating Teams

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.17 - 05 May 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 7 to 7

Current Status

Changed:
<
<
Teams should be implementing the workflow below, running the core queries and starting to export OPM formatted data. Workshop details can be found at LocalDetailsPC3.
>
>
Teams should be implementing the workflow below, running the core queries and starting to export OPM formatted data.

Workshop details can be found at LocalDetailsPC3.


Participating Teams

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.16 - 22 Apr 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 38 to 38

4. Export OPM Graphs and import from others [Apr 13 - May 4]

Changed:
<
<
5. Run queries on imported OPM graph [Apr 27 - Jun 1]
>
>
5. Run queries on imported OPM graph [May 4 - Jun 1]

6. Prepare slides for challenge [Jun 1 - Jun 8]

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.15 - 15 Apr 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 7 to 7

Current Status

Changed:
<
<
Teams should be implementing the workflow below, running the core queries and starting to export OPM formatted data.
>
>
Teams should be implementing the workflow below, running the core queries and starting to export OPM formatted data. Workshop details can be found at LocalDetailsPC3.

Participating Teams

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.14 - 12 Apr 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 5 to 5

The toplevel page for the third provenance challenge.
Deleted:
<
<


Current Status

Changed:
<
<
The Challenge has started. Teams should be implementing the workflow below. We are still looking for suggested queries. Participating teams should set up a page to document their results as outlined in the next section.
>
>
Teams should be implementing the workflow below, running the core queries and starting to export OPM formatted data.

Participating Teams

Pages for each participating team can be found at the ParticipatingTeams3 page. If you are participating, please create a link to your teams page there. You can use the Test Team page as a template for what should be included in a team page.

Sponsors

Changed:
<
<
Thanks to our sponsor, the Virtual Laboratory for e-Science.
>
>
Thanks to our sponsors, the Virtual Laboratory for e-Science and Microsoft

Changed:
<
<
vl-e sponsor logo
>
>
vl-e sponsor logo microsoft logo

Schedule

Line: 75 to 81

META FILEATTACHMENT logo.gif attr="h" comment="vl-e sponsor logo" date="1238431089" path="logo.gif" size="3614" user="PaulGroth" version="1.1"
Added:
>
>
META FILEATTACHMENT microsoft.jpg attr="h" comment="Microsoft logo" date="1239500475" path="microsoft.jpg" size="3112" user="PaulGroth" version="1.1"
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.13 - 04 Apr 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 13 to 13

Participating Teams

Changed:
<
<
Pages for each participating team can be found at the ParticipatingTeams3 page. If you are participating, please create a link to your teams page there. You can use the USC/ISI page as a template for what should be included in a team page.
>
>
Pages for each participating team can be found at the ParticipatingTeams3 page. If you are participating, please create a link to your teams page there. You can use the Test Team page as a template for what should be included in a team page.

Sponsors

Thanks to our sponsor, the Virtual Laboratory for e-Science.
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.12 - 31 Mar 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 56 to 56

Provenance Challenge Workflow

Changed:
<
<
Load Workflow (Control Flow) diagram | png format
[PNG] | [PDF] | [PPTX]
>
>

The PC3 workflow and its software implementation in .Net, Java, and shell scripts can be found at the ThirdPCWorkflow page. Below is the background of the workflow.


Background

The Pan-STARRS project is building and operating the next generation sky survey with the ability to continuously scan the visible sky once a week and build a time series of data. This helps detect moving objects that may potentially impact with earth besides building a massive catalog of the solar system and 99% of visible stars in the northern hemisphere. The collaboration is lead by the University of Hawai'i that operates the telescope and image pipeline while Johns Hopkins University is building the object data management (ODM) framework that is exposed to astronomers. The load workflow used in PC3 appears at the handoff between the image pipeline and the ODM, and uses the Trident workbench to ingest incoming CSV files into SQL Server databases.
Line: 68 to 69

Alex Szalay (Johns Hopkins University)

Deleted:
<
<

Workflow


Changed:
<
<

Activities

Activity Input Output
* Pre-Load Section *
IsCSVReadyFileExists ... Checks for existence of CSV Batch root directory and csv_ready.csv manifest file. * string CSVRootPathInput : Path to root directory of CSV Batch * bool IsCSVReadyFileExistsOutput : Returns true if the given CSV Batch root directory and the csv_ready.csv manifest file within it exist in file system. False otherwise.
ReadCSVReadyFile ... Reads the contents of the csv_ready.csv manifest file and creates a CSVFileEntry to hold metadata for each CSV file listed in manifest. * string CSVRootPathInput : Path to root directory of CSV Batch * List<CSVFileEntry> ReadCSVReadyFileOutput : List of CSVFileEntry 's read from the csv_ready.csv manifest file in the root directory. Each CSVFileEntry contains the FilePath of a CSV file to be loaded, the HeaderPath to a header file with list of data columns, the RowCount of number of rows in the file, the TargetTable name in the database, and the MD5 Checksum for the file. The ColumnsNames are not populated by this activity.
IsMatchCSVFileTables ... Checks if all tables to be loaded have corresponding CSV data files listed in the manifest. * List<CSVFileEntry> FileEntriesInput : List of CSVFileEntry 's read from the manifest file. * bool IsMatchCSVFileTablesOutput : Returns true if all tables have matching CSV files. False otherwise.
IsExistsCSVFile ... Checks for existence of CSV data file and Header file listed in the manifest. * CSVFileEntry FileEntryInput : A CSVFileEntry read from the manifest file. * bool IsExistsCSVFileOutput : Returns true if the CSV data file and Header files exist in file system. False otherwise.
ReadCSVFileColumnNames ... Reads the list of column names present in the CSV data file from the Header file. * CSVFileEntry FileEntryInput : A CSVFileEntry read from the manifest file. * CSVFileEntry FileEntryOutput : The input CSVFileEntry updated with the ColumnsNames field populated from the Header file.
IsMatchCSVFileColumnNames ... Checks if all columns expected for a target table are present in the corresponding CSV data file. * CSVFileEntry FileEntryInput : A CSVFileEntry read from the manifest file with columns names populated. * bool IsMatchCSVFileColumnNamesOutput : Returns true if column names present in CSV data files match expected column names for the target table to load the CSV file into. False otherwise.
* Load Section *
CreateEmptyLoadDB ... Creates an empty 'Load' database with static list of tables to load the CSV Batch into. * string JobID : Unique job (i.e. batch) identifier for the CSV Batch being loaded. * DatabaseEntry CreateEmptyLoadDBOutput : A DatabaseEntry with the DBName, unique DBGuid and ConnectionString for the newly created database instance.
LoadCSVFileIntoTable ... Load a single CSV data file into corresponding table in the Load database. * DatabaseEntry DBEntry : A DatabaseEntry with the target table to load the CSV data file into. * bool LoadCSVFileIntoTableOutput : Returns true if the CSV data file was successfully loaded into the target table. False otherwise.
* CSVFileEntry FileEntry : A CSVFileEntry to load into the corresponding target table in the database.
UpdateComputedColumns ... Updates the computed columns in the target table that was loaded. These derived columns would have been "empty" (-999) in the CSV data file. * DatabaseEntry DBEntry : A DatabaseEntry with the target table already loaded from the CSV data file. * bool UpdateComputedColumnsOutput : Returns true if the derived columns were successfully updated from existing columns. False otherwise.
* CSVFileEntry FileEntry : A CSVFileEntry containing the name of target table in the database to update.
* Post-Load Section *
IsMatchTableRowCount ... Checks if number of rows loaded into table matches expected number of rows in CSv data file. * DatabaseEntry DBEntry : A DatabaseEntry with the target table loaded and updated. * bool IsMatchTableRowCountOutput : Returns true if the number of rows in the target table equals the expected number of rows in the CSV data file.
* CSVFileEntry FileEntry : A CSVFileEntry containing the expected number of rows in the CSV data file and the target table name.
IsMatchTableColumnRanges ... Checks if the data loaded into table columns fall within the range of values expected for the column. * DatabaseEntry DBEntry : A DatabaseEntry with the target table loaded and updated. * bool IsMatchTableColumnRangesOutput : Returns true if the data values of columns in the target table fall within expected range. False otherwise.
* CSVFileEntry FileEntry : A CSVFileEntry containing the name of target table in the database to validate columns ranges.
CompactDatabase ... Shrinks the database after all write operations complete. * DatabaseEntry DBEntry : A DatabaseEntry with all tables loaded and validated. void

-- YogeshSimmhan - 17 Feb 2009

Provenance Challenge Software Setup

Overview

Flavors

C# .NET + SQL Server Flavor (Windows Platform)

Pre-requisites

  • .Net Framework v3.5 SP1
  • SQL Server 2008 (Express or better)
  • Windows XP/Vista/2003/2008
  • (Optional) Visual Studio 2008

Folder Organization

Setup/Compile/Build/Run

Batch Scripting/Executable Version

Java + Derby Flavor (All Platforms)

Pre-requisites

  • Java Development Kit 1.6
  • Apache Derby 10.4.2.0 (included)
  • Apache Ant 1.7.1

Folder Organization

Setup/Compile/Build/Run

Shell Scripting/Executable Version

Frequently Asked Questions

General

C# .NET

SQL Server

Java

Derby

Downloads

  • PC3.tar.gz: Pan-STARRS Load Workflow Code & Sample Data | tar.gz Format | MD5 38bf4848d591cbac4048b3034c313443
  • PC3.zip: Pan-STARRS Load Workflow Code & Sample Data | ZIP format | MD5 38bf4848d591cbac4048b3034c313443

  • Pan-STARRS Load Workflow Code (C# .Net Flavor Only)
  • Pan-STARRS Load Workflow Code (Java Flavor Only)

  • Sample Data
  • Job J062941 Sample Data (9K)
  • Job J062942 Sample Data (32K)
  • Job J062943 Sample Data (294K)
  • Job J062944 Sample Data (1.86M)
  • Job J062945 Sample Data (9.9M)

-- YogeshSimmhan - 17 Feb 2009

-- YogeshSimmhan - 04 Feb 2009

META FILEATTACHMENT PC3.tar.gz attr="" comment="Pan-STARRS Load Workflow Code & Data tar.gz format MD5 38bf4848d591cbac4048b3034c313443" date="1233718718" path="PC3.tar.gz" size="5752661" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT PC3.zip attr="" comment="Pan-STARRS Load Workflow Code & Data ZIP format MD5 c4eecc8639070a61e9d73993ffb76f97" date="1233718892" path="PC3.zip" size="5809921" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.pdf attr="" comment="Load Workflow (Control Flow) diagram pdf format" date="1234891407" path="LoadWorkflow.pdf" size="412271" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.pptx attr="" comment="Load Workflow (Control Flow) diagram pptx format" date="1234891450" path="LoadWorkflow.pptx" size="56992" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.png attr="" comment="Load Workflow (Control Flow) diagram png format" date="1234891890" path="LoadWorkflow.png" size="103394" user="YogeshSimmhan" version="1.1"
>
>


META FILEATTACHMENT logo.gif attr="h" comment="vl-e sponsor logo" date="1238431089" path="logo.gif" size="3614" user="PaulGroth" version="1.1"
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.11 - 30 Mar 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 12 to 15

Pages for each participating team can be found at the ParticipatingTeams3 page. If you are participating, please create a link to your teams page there. You can use the USC/ISI page as a template for what should be included in a team page.

Added:
>
>

Sponsors

Thanks to our sponsor, the Virtual Laboratory for e-Science.

vl-e sponsor logo


Schedule

1. Review of code and provenance query proposals (to Feb 27)

Line: 207 to 215

-- YogeshSimmhan - 04 Feb 2009

Added:
>
>


META FILEATTACHMENT PC3.tar.gz attr="" comment="Pan-STARRS Load Workflow Code & Data tar.gz format MD5 38bf4848d591cbac4048b3034c313443" date="1233718718" path="PC3.tar.gz" size="5752661" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT PC3.zip attr="" comment="Pan-STARRS Load Workflow Code & Data ZIP format MD5 c4eecc8639070a61e9d73993ffb76f97" date="1233718892" path="PC3.zip" size="5809921" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.pdf attr="" comment="Load Workflow (Control Flow) diagram pdf format" date="1234891407" path="LoadWorkflow.pdf" size="412271" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.pptx attr="" comment="Load Workflow (Control Flow) diagram pptx format" date="1234891450" path="LoadWorkflow.pptx" size="56992" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.png attr="" comment="Load Workflow (Control Flow) diagram png format" date="1234891890" path="LoadWorkflow.png" size="103394" user="YogeshSimmhan" version="1.1"
Added:
>
>
META FILEATTACHMENT logo.gif attr="h" comment="vl-e sponsor logo" date="1238431089" path="logo.gif" size="3614" user="PaulGroth" version="1.1"
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.10 - 20 Mar 2009 - YogeshSimmhan)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 52 to 52

[PNG] | [PDF] | [PPTX]

Background

Added:
>
>
The Pan-STARRS project is building and operating the next generation sky survey with the ability to continuously scan the visible sky once a week and build a time series of data. This helps detect moving objects that may potentially impact with earth besides building a massive catalog of the solar system and 99% of visible stars in the northern hemisphere. The collaboration is lead by the University of Hawai'i that operates the telescope and image pipeline while Johns Hopkins University is building the object data management (ODM) framework that is exposed to astronomers. The load workflow used in PC3 appears at the handoff between the image pipeline and the ODM, and uses the Trident workbench to ingest incoming CSV files into SQL Server databases.

Acknowledgement

Jim Heasley (University of Hawai'i)

Alex Szalay (Johns Hopkins University)


Workflow

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.9 - 19 Mar 2009 - LucMoreau)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 44 to 44

Provenance Questions

Changed:
<
<
Please list possible provenance queries for the Challenge here. If the query requires any additions to the workflow please detail them as well.

Suggested Query 1

For a given detection, which CSV files contributed to it?

Basic sample answer: The CSV file containing the Detection table.

Advanced sample answer: The CSV file containing the Detection table, CSV file containing the Image table (as the image is an attribute of the detection), and CSV file containing the FrameMetadata table (as the frame metadata is an attribute of the image).

Suggested Query 2

A CSV or header file is deleted during the workflow's execution. How much time expired between a successful IsMatchCSVFileTables test (when the file existed) and an unsuccessful IsExistsCSVFile? test (when the file had been deleted)?

Sample answer: 3ms

For testing the above query, we it may be simplest to edit the workflow to include deletion of the CSV file as a step.

Suggested Query 3

The user considers a table to contain values they do not expect. Was the range check (IsMatchTableColumnRanges) performed for this table?

Sample answer: Yes

Suggested Query 4

The workflow halts due to failing an IsMatchTableColumnRanges check. How many tables successfully loaded before the workflow halted due to a failed check?

Sample answer: 2

Suggested Query 5

Which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value?

Sample answer: call of ReadCSVReadyFile, call of CreateEmptyLoadDB, 2nd call of ReadCSVFileColumnNames, 2nd call of LoadCSVFileIntoTable (2nd calls because Image is loaded in the 2nd iteration of the for loop, excluded checks because they do not change anything, excluded UpdatedComputedColumns because it is non-computed, excluded CompactDatabase because it does not affect the value).

Suggested Query 6

Which pairs of procedures in the workflow could be swapped and the same result still be obtained (given the particular data input)?

Sample answer: (I won't enumerate them all, but I think some can be swapped as the checks in particular are not causally dependent, but we cannot swap those inside the loop with those outside).

>
>
Please list possible provenance queries for the Challenge here. If the query requires any additions to the workflow please detail them as well.

Provenance Challenge Workflow

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.8 - 02 Mar 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 6 to 6

Current Status

Changed:
<
<
As a result of a meeting at e-Science 2008. We have selected the Pan-Starrs workflow from the ThirdProvenanceChallengeWorkflowProposals.
>
>
The Challenge has started. Teams should be implementing the workflow below. We are still looking for suggested queries. Participating teams should set up a page to document their results as outlined in the next section.

Changed:
<
<
We are currently finalizing the PanStarrs? workflow code and documentation for the challenge. You can find the code and documentation below.
>
>

Participating Teams


Changed:
<
<
We are also putting together provenance queries to be used during the challenge. Please put your suggestions below
>
>
Pages for each participating team can be found at the ParticipatingTeams3 page. If you are participating, please create a link to your teams page there. You can use the USC/ISI page as a template for what should be included in a team page.

Schedule

Line: 30 to 30

PC3 Workshop June 10 - 11 held in Amsterdam

Added:
>
>

Challenge Goals

1. identify weaknesses and strengths of the the OPM specification

2. encourage the development of concrete bindings for OPM in a variety of languages

3. determine how well OPM can represent provenance for a variety of technologies (scientific workflow, databases, etc.)

4. demonstrate that a complex data products provenance can be constructed from provenance documentation produced by multiple combinations of heterogenous applications

5. bring together the community to further discuss the interoperability of provenance systems.


Provenance Questions

Please list possible provenance queries for the Challenge here. If the query requires any additions to the workflow please detail them as well.

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.7 - 26 Feb 2009 - SimonMiles)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 34 to 34

Please list possible provenance queries for the Challenge here. If the query requires any additions to the workflow please detail them as well.

Added:
>
>

Suggested Query 1

For a given detection, which CSV files contributed to it?

Basic sample answer: The CSV file containing the Detection table.

Advanced sample answer: The CSV file containing the Detection table, CSV file containing the Image table (as the image is an attribute of the detection), and CSV file containing the FrameMetadata table (as the frame metadata is an attribute of the image).

Suggested Query 2

A CSV or header file is deleted during the workflow's execution. How much time expired between a successful IsMatchCSVFileTables test (when the file existed) and an unsuccessful IsExistsCSVFile? test (when the file had been deleted)?

Sample answer: 3ms

For testing the above query, we it may be simplest to edit the workflow to include deletion of the CSV file as a step.

Suggested Query 3

The user considers a table to contain values they do not expect. Was the range check (IsMatchTableColumnRanges) performed for this table?

Sample answer: Yes

Suggested Query 4

The workflow halts due to failing an IsMatchTableColumnRanges check. How many tables successfully loaded before the workflow halted due to a failed check?

Sample answer: 2

Suggested Query 5

Which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value?

Sample answer: call of ReadCSVReadyFile, call of CreateEmptyLoadDB, 2nd call of ReadCSVFileColumnNames, 2nd call of LoadCSVFileIntoTable (2nd calls because Image is loaded in the 2nd iteration of the for loop, excluded checks because they do not change anything, excluded UpdatedComputedColumns because it is non-computed, excluded CompactDatabase because it does not affect the value).

Suggested Query 6

Which pairs of procedures in the workflow could be swapped and the same result still be obtained (given the particular data input)?

Sample answer: (I won't enumerate them all, but I think some can be swapped as the checks in particular are not causally dependent, but we cannot swap those inside the loop with those outside).


Provenance Challenge Workflow

Load Workflow (Control Flow) diagram | png format
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.6 - 17 Feb 2009 - YogeshSimmhan)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 36 to 36

Provenance Challenge Workflow

Added:
>
>
Load Workflow (Control Flow) diagram | png format
[PNG] | [PDF] | [PPTX]

Background

Line: 176 to 180

  • Job J062944 Sample Data (1.86M)
  • Job J062945 Sample Data (9.9M)
Added:
>
>

-- YogeshSimmhan - 17 Feb 2009


-- YogeshSimmhan - 04 Feb 2009

META FILEATTACHMENT PC3.tar.gz attr="" comment="Pan-STARRS Load Workflow Code & Data tar.gz format MD5 38bf4848d591cbac4048b3034c313443" date="1233718718" path="PC3.tar.gz" size="5752661" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT PC3.zip attr="" comment="Pan-STARRS Load Workflow Code & Data ZIP format MD5 c4eecc8639070a61e9d73993ffb76f97" date="1233718892" path="PC3.zip" size="5809921" user="YogeshSimmhan" version="1.1"
Added:
>
>
META FILEATTACHMENT LoadWorkflow?.pdf attr="" comment="Load Workflow (Control Flow) diagram pdf format" date="1234891407" path="LoadWorkflow.pdf" size="412271" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.pptx attr="" comment="Load Workflow (Control Flow) diagram pptx format" date="1234891450" path="LoadWorkflow.pptx" size="56992" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT LoadWorkflow?.png attr="" comment="Load Workflow (Control Flow) diagram png format" date="1234891890" path="LoadWorkflow.png" size="103394" user="YogeshSimmhan" version="1.1"
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.5 - 17 Feb 2009 - YogeshSimmhan)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 42 to 42

Workflow

Activities

Added:
>
>
Activity Input Output
* Pre-Load Section *
IsCSVReadyFileExists ... Checks for existence of CSV Batch root directory and csv_ready.csv manifest file. * string CSVRootPathInput : Path to root directory of CSV Batch * bool IsCSVReadyFileExistsOutput : Returns true if the given CSV Batch root directory and the csv_ready.csv manifest file within it exist in file system. False otherwise.
ReadCSVReadyFile ... Reads the contents of the csv_ready.csv manifest file and creates a CSVFileEntry to hold metadata for each CSV file listed in manifest. * string CSVRootPathInput : Path to root directory of CSV Batch * List<CSVFileEntry> ReadCSVReadyFileOutput : List of CSVFileEntry 's read from the csv_ready.csv manifest file in the root directory. Each CSVFileEntry contains the FilePath of a CSV file to be loaded, the HeaderPath to a header file with list of data columns, the RowCount of number of rows in the file, the TargetTable name in the database, and the MD5 Checksum for the file. The ColumnsNames are not populated by this activity.
IsMatchCSVFileTables ... Checks if all tables to be loaded have corresponding CSV data files listed in the manifest. * List<CSVFileEntry> FileEntriesInput : List of CSVFileEntry 's read from the manifest file. * bool IsMatchCSVFileTablesOutput : Returns true if all tables have matching CSV files. False otherwise.
IsExistsCSVFile ... Checks for existence of CSV data file and Header file listed in the manifest. * CSVFileEntry FileEntryInput : A CSVFileEntry read from the manifest file. * bool IsExistsCSVFileOutput : Returns true if the CSV data file and Header files exist in file system. False otherwise.
ReadCSVFileColumnNames ... Reads the list of column names present in the CSV data file from the Header file. * CSVFileEntry FileEntryInput : A CSVFileEntry read from the manifest file. * CSVFileEntry FileEntryOutput : The input CSVFileEntry updated with the ColumnsNames field populated from the Header file.
IsMatchCSVFileColumnNames ... Checks if all columns expected for a target table are present in the corresponding CSV data file. * CSVFileEntry FileEntryInput : A CSVFileEntry read from the manifest file with columns names populated. * bool IsMatchCSVFileColumnNamesOutput : Returns true if column names present in CSV data files match expected column names for the target table to load the CSV file into. False otherwise.
* Load Section *
CreateEmptyLoadDB ... Creates an empty 'Load' database with static list of tables to load the CSV Batch into. * string JobID : Unique job (i.e. batch) identifier for the CSV Batch being loaded. * DatabaseEntry CreateEmptyLoadDBOutput : A DatabaseEntry with the DBName, unique DBGuid and ConnectionString for the newly created database instance.
LoadCSVFileIntoTable ... Load a single CSV data file into corresponding table in the Load database. * DatabaseEntry DBEntry : A DatabaseEntry with the target table to load the CSV data file into. * bool LoadCSVFileIntoTableOutput : Returns true if the CSV data file was successfully loaded into the target table. False otherwise.
* CSVFileEntry FileEntry : A CSVFileEntry to load into the corresponding target table in the database.
UpdateComputedColumns ... Updates the computed columns in the target table that was loaded. These derived columns would have been "empty" (-999) in the CSV data file. * DatabaseEntry DBEntry : A DatabaseEntry with the target table already loaded from the CSV data file. * bool UpdateComputedColumnsOutput : Returns true if the derived columns were successfully updated from existing columns. False otherwise.
* CSVFileEntry FileEntry : A CSVFileEntry containing the name of target table in the database to update.
* Post-Load Section *
IsMatchTableRowCount ... Checks if number of rows loaded into table matches expected number of rows in CSv data file. * DatabaseEntry DBEntry : A DatabaseEntry with the target table loaded and updated. * bool IsMatchTableRowCountOutput : Returns true if the number of rows in the target table equals the expected number of rows in the CSV data file.
* CSVFileEntry FileEntry : A CSVFileEntry containing the expected number of rows in the CSV data file and the target table name.
IsMatchTableColumnRanges ... Checks if the data loaded into table columns fall within the range of values expected for the column. * DatabaseEntry DBEntry : A DatabaseEntry with the target table loaded and updated. * bool IsMatchTableColumnRangesOutput : Returns true if the data values of columns in the target table fall within expected range. False otherwise.
* CSVFileEntry FileEntry : A CSVFileEntry containing the name of target table in the database to validate columns ranges.
CompactDatabase ... Shrinks the database after all write operations complete. * DatabaseEntry DBEntry : A DatabaseEntry with all tables loaded and validated. void

Added:
>
>
-- YogeshSimmhan - 17 Feb 2009

Provenance Challenge Software Setup

Overview

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.4 - 06 Feb 2009 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 8 to 8

As a result of a meeting at e-Science 2008. We have selected the Pan-Starrs workflow from the ThirdProvenanceChallengeWorkflowProposals.

Changed:
<
<
Current Tasks (End of January 2009):
>
>
We are currently finalizing the PanStarrs? workflow code and documentation for the challenge. You can find the code and documentation below.

Changed:
<
<
  • Devise provenance questions for the challenge and suggest any extensions to the workflow. Please do this below. (Everyone)
  • Finalize the Pan-Starrs workflow and make available executable verions (Yogesh)
  • Select a conference to co-locate the provenance challenge with.
  • Make available libraries and bindings of OPM. Please post these to the OpenProvenanceModelBindings page.
>
>
We are also putting together provenance queries to be used during the challenge. Please put your suggestions below

Schedule

1. Review of code and provenance query proposals (to Feb 27)

March 2 - PC3 Starts

2. Make the workflow work with individual team's systems [Mar 2 - Mar 30]

3. Generate provenance for the challenge workflow & run queries on it [Mar 30 - Apr 13]

4. Export OPM Graphs and import from others [Apr 13 - May 4]

5. Run queries on imported OPM graph [Apr 27 - Jun 1]

6. Prepare slides for challenge [Jun 1 - Jun 8]

PC3 Workshop June 10 - 11 held in Amsterdam


Provenance Questions

 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.3 - 04 Feb 2009 - YogeshSimmhan)

META TOPICPARENT WebHome

Third Provenance Challenge

Line: 19 to 19

Please list possible provenance queries for the Challenge here. If the query requires any additions to the workflow please detail them as well.

Deleted:
<
<
-- PaulGroth - 15 Dec 2008

Added:
>
>

Provenance Challenge Workflow

Background

Workflow

Activities

Provenance Challenge Software Setup

Overview

Flavors

C# .NET + SQL Server Flavor (Windows Platform)

Pre-requisites

  • .Net Framework v3.5 SP1
  • SQL Server 2008 (Express or better)
  • Windows XP/Vista/2003/2008
  • (Optional) Visual Studio 2008

Folder Organization

Setup/Compile/Build/Run

Batch Scripting/Executable Version

Java + Derby Flavor (All Platforms)

Pre-requisites

  • Java Development Kit 1.6
  • Apache Derby 10.4.2.0 (included)
  • Apache Ant 1.7.1

Folder Organization

Setup/Compile/Build/Run

Shell Scripting/Executable Version

Frequently Asked Questions

General

C# .NET

SQL Server

Java

Derby

Downloads

  • PC3.tar.gz: Pan-STARRS Load Workflow Code & Sample Data | tar.gz Format | MD5 38bf4848d591cbac4048b3034c313443
  • PC3.zip: Pan-STARRS Load Workflow Code & Sample Data | ZIP format | MD5 38bf4848d591cbac4048b3034c313443
  • Pan-STARRS Load Workflow Code (C# .Net Flavor Only)
  • Pan-STARRS Load Workflow Code (Java Flavor Only)
  • Sample Data
  • Job J062941 Sample Data (9K)
  • Job J062942 Sample Data (32K)
  • Job J062943 Sample Data (294K)
  • Job J062944 Sample Data (1.86M)
  • Job J062945 Sample Data (9.9M)

-- YogeshSimmhan - 04 Feb 2009

META FILEATTACHMENT PC3.tar.gz attr="" comment="Pan-STARRS Load Workflow Code & Data tar.gz format MD5 38bf4848d591cbac4048b3034c313443" date="1233718718" path="PC3.tar.gz" size="5752661" user="YogeshSimmhan" version="1.1"
META FILEATTACHMENT PC3.zip attr="" comment="Pan-STARRS Load Workflow Code & Data ZIP format MD5 c4eecc8639070a61e9d73993ffb76f97" date="1233718892" path="PC3.zip" size="5809921" user="YogeshSimmhan" version="1.1"
 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.2 - 15 Dec 2008 - PaulGroth)

META TOPICPARENT WebHome

Third Provenance Challenge

The toplevel page for the third provenance challenge.

Changed:
<
<
We are currently in the process of identifying workflows: see ThirdProvenanceChallengeWorkflowProposals.
>
>

Current Status


Added:
>
>
As a result of a meeting at e-Science 2008. We have selected the Pan-Starrs workflow from the ThirdProvenanceChallengeWorkflowProposals.

Changed:
<
<
-- LucMoreau - 25 Nov 2008
>
>
Current Tasks (End of January 2009):

  • Devise provenance questions for the challenge and suggest any extensions to the workflow. Please do this below. (Everyone)
  • Finalize the Pan-Starrs workflow and make available executable verions (Yogesh)
  • Select a conference to co-locate the provenance challenge with.
  • Make available libraries and bindings of OPM. Please post these to the OpenProvenanceModelBindings page.

Provenance Questions

Please list possible provenance queries for the Challenge here. If the query requires any additions to the workflow please detail them as well.

-- PaulGroth - 15 Dec 2008


 <<O>>  Difference Topic ThirdProvenanceChallenge (r1.1 - 25 Nov 2008 - LucMoreau)
Line: 1 to 1
Added:
>
>
META TOPICPARENT WebHome

Third Provenance Challenge

The toplevel page for the third provenance challenge.

We are currently in the process of identifying workflows: see ThirdProvenanceChallengeWorkflowProposals.

-- LucMoreau - 25 Nov 2008

Revision r1.1 - 25 Nov 2008 - 13:25 - LucMoreau
Revision r1.21 - 17 Jun 2009 - 18:29 - PaulGroth