| Added: | |||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
| ||||||||||||||||||||||||||||||||||||||||||||||||
The Open Provenance Model (v1.01) | |||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Contents | ||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||
| Line: 25 to 10 | |||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
|||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
|||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Notes on the Wiki VersionThis is a wiki version of the Open Provenance Model version 1.01. This is based on the authoritative pdf that can be found at http://eprints.ecs.soton.ac.uk/16148/1/opm-v1.01.pdf . It is designed to help track comments and suggestions for the next revision of the OPM. In terms of comments, each section its own comment area. Please leave your comments there. Make sure to add your signature as well so we know the provenance of the comments. If you do not want to modify the wiki itself, there's a comment box which you can use. If you really need to leave comments within the text itself, please use another color and try to make your comment stand out from the rest of the text.Authors
| ||||||||||||||||||||||||||||||||||||||||||||||||
Abstract | |||||||||||||||||||||||||||||||||||||||||||||||||
| Line: 44 to 53 | |||||||||||||||||||||||||||||||||||||||||||||||||
1 Introduction | |||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle [5]. Interest for provenance in the "e-science community" [13] is also growing, since provenance is perceived as a crucial component of workflow systems [2]0 that can help scientists ensure reproducibility of their scientific analyses and processes. | ||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle [5]. Interest for provenance in the "e-science community" [13] is also growing, since provenance is perceived as a crucial component of workflow systems [2] that can help scientists ensure reproducibility of their scientific analyses and processes. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Against this background, the International Provenance and Annotation Workshop (IPAW'06), held on May 3-5, 2006 in Chicago, involved some 50 participants interested in the issues of data provenance, process documentation, data derivation, and data annotation [8, 1]. During a session on provenance standardization, a consensus began to emerge, whereby the provenance research community needed to understand better the capabilities of the different systems, the representations they used for provenance, their similarities, their differences, and the rationale that motivated their designs. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Line: 77 to 86 | |||||||||||||||||||||||||||||||||||||||||||||||||
Comments | |||||||||||||||||||||||||||||||||||||||||||||||||
| Deleted: | |||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
The Open Provenance Model (v1.01) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
The Open Provenance Model (v1.01) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Line: 17 to 17 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
Sections | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Sections
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Line: 30 to 44 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Introduction | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle~\citPREMIS:200. Interest for provenance in the "e-science community"~\citsimmhan05surve is also growing, since provenance is perceived as a crucial component of workflow systems~\citDeelman-Gil:NSF0 that can help scientists ensure reproducibility of their scientific analyses and processes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle [5]. Interest for provenance in the "e-science community" [13] is also growing, since provenance is perceived as a crucial component of workflow systems [2]0 that can help scientists ensure reproducibility of their scientific analyses and processes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
Against this background, the International Provenance and Annotation Workshop (IPAW'06), held on May 3-5, 2006 in Chicago, involved some 50 participants interested in the issues of data provenance, process documentation, data derivation, and data annotation~\citMoreau-Foster:IPAW06,Bose-Foster-Moreau:IPAW0. During a session on provenance standardization, a consensus began to emerge, whereby the provenance research community needed to understand better the capabilities of the different systems, the representations they used for provenance, their similarities, their differences, and the rationale that motivated their designs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Against this background, the International Provenance and Annotation Workshop (IPAW'06), held on May 3-5, 2006 in Chicago, involved some 50 participants interested in the issues of data provenance, process documentation, data derivation, and data annotation [8, 1]. During a session on provenance standardization, a consensus began to emerge, whereby the provenance research community needed to understand better the capabilities of the different systems, the representations they used for provenance, their similarities, their differences, and the rationale that motivated their designs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
Hence, the first Provenance Challenge was born, and from the outset, the challenge was set up to be informative rather than competitive. The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. Participants simulated or ran a Functional Magnetic Resonance Imaging workflow, from which they implemented and executed a pre-identified set of ``provenance queries''. Sixteen teams responded to the challenge, and reported their experience in a journal special issue~\citMoreau-Ludaescher:Challenge0. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Hence, the first Provenance Challenge was born, and from the outset, the challenge was set up to be informative rather than competitive. The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. Participants simulated or ran a Functional Magnetic Resonance Imaging workflow, from which they implemented and executed a pre-identified set of ``provenance queries''. Sixteen teams responded to the challenge, and reported their experience in a journal special issue [10]. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
The first Provenance Challenge was followed by the second Provenance Challenge, aiming at establishing inter-operability of systems, by exchanging provenance information. Thirteen teams~\citSecond:Challenge:0 responded to this second challenge. Discussions indicated that there was substantial agreement on a core representation of provenance. As a result, following a workshop in August 2007, in Salt Lake City, a data model was crafted and released as the Open Provenance Model (v1.00)~\citopm:200. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
The first Provenance Challenge was followed by the second Provenance Challenge, aiming at establishing inter-operability of systems, by exchanging provenance information. Thirteen teams [12] responded to this second challenge. Discussions indicated that there was substantial agreement on a core representation of provenance. As a result, following a workshop in August 2007, in Salt Lake City, a data model was crafted and released as the Open Provenance Model (v1.00) [9]. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
The starting point of this work is the community agreement summarized by Miles \citMiles:Challenge0. We assume that provenance of objects (whether digital or not) is represented by an annotated causality graph, which is a directed acyclic graph, enriched with annotations capturing further information pertaining to execution. For the purpose of this paper, a provenance graph is defined to be a record of a past execution (or current execution), and not a description of something that could happen in the future. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
The starting point of this work is the community agreement summarized by Miles [7]. We assume that provenance of objects (whether digital or not) is represented by an annotated causality graph, which is a directed acyclic graph, enriched with annotations capturing further information pertaining to execution. For the purpose of this paper, a provenance graph is defined to be a record of a past execution (or current execution), and not a description of something that could happen in the future. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The Open Provenance Model (OPM) is a model for provenance that is designed to meet the following requirements: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Line: 57 to 71 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
On June 19th 2008, twenty participants attended the first OPM workshop \citFirst:OPM:Workshop:0 to discuss the version of the specification. Minutes of the workshop and recommendations \citGroth:OPMWorkshopMinute were published, and led to the current version (v1.01) of the Open Provenance Model.
2 Basics2.1 EntitiesOur primary concern is to be able to represent how "things", whether digital data such as simulation results, physical objects such as cars, or immaterial entities such as decisions, came out to be in a given state, with a given set of characteristics, at a given moment. It is recognised that many of such "things" can be stateful: a car may be at various locations, it can contain different passengers, and it can have a tank full or empty; likewise, a file can contain different data at different moments of its existence. Hence, from the perspective of provenance, we introduce the concept of an artifact as an immutable1 piece of state; likewise, we introduce the concept of a process as actions resulting in new artifacts. A process usually takes place in some context, which enables or facilitates its execution: examples of such contexts are varied and include a place where the process executes, an individual controlling the process, or an institution sponsoring the process. These entities are being referred to as Agents. Agents, as we shall see when we discuss causality dependencies, are a cause (like a catalyst) of a process taking place. The Open Provenance Model is based on these three primary entities, which we define now. Definition 1 (Artifact) Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. Definition 2 (Process) Action or series of actions performed on or caused by artifacts, and resulting in new artifacts. Definition 3 (Agent) Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution. The Open Provenance Model is a model of artifacts in the past, explaining how they were derived. Likewise, as far as processes are concerned, they may also be in the past, i.e. they may have already completed their execution; in addition, processes can still be currently running (i.e., they have not completed their execution yet). In no case is OPM intended to describe the state of future artifacts and the activities of future processes. We introduce a graphical notation and a formal definition for provenance graphs. Specifically, artifacts are represented by circles, and are denoted by elements of the set Artifact. Likewise, processes are represented graphically by rectangles and denoted by elements of the set Process. Finally, agents are represented by octogons and are elements of the set Agent in the formal notation. Footnote 1: In the presence of streams, we consider an artifact to be a slice of stream in time, i.e. the stream content at a specific instant in the computation. A future version of OPM will refine the model to accomodate streams fully as they are recognized to be crucial in many applications. piece of state; likewise, we introduce the concept of a process as actions resulting in new artifacts.2.2 DependenciesA provenance graph aims to capture the causal dependencies between the abovementioned entities. Therefore, a provenance graph is defined as a directed graph, whose nodes are artifacts, processes and agents, and whose edges belong to one of following categories depicted in Figure 1. An edge represents a causal dependency, between its source, denoting the effect, and its destination, denoting the cause.
2.3 RolesA role is an annotation on used, wasGeneratedBy and wasControlledBy. Defintion 10 (Role) A role designates an artifact's or agent's function in a process. A role is used to differentiate among several use, generation, or controlling relations.
2.4 ExamplesAn example illustrating all the concepts and a few of the causal dependencies is displayed in Figure 2. This provenance graph expresses that John baked a cake with ingredients butter, eggs, sugar and flour.
3 Overlapping and HierarchichalFigure 4 shows two examples of provenance graphs describing what led the list (3,7) to being as it is. According to the left-hand graph, the list was generated by a process that added one to all constituents of the list (2,6). According to the right-hand graph, the derivation process of (3,7) required the list to be created from values 3 and 7, respectively obtained by adding one to 2 and 6, themselves being the data products obtained by accessing the contents of the original list (2,6).
4 Provenance Graph DefinitionThe open provenance model is defined according to the following rules, which we formalise in Section 5.
5 Timeless Formal ModelFigure 8 provides a set-theoretic definition \citschmidt86,Coqtutoria of the open provenance model, based on the concepts introduced so far. The model of causality we propose is timeless since time precedence does not imply causality: if a process P1 occurs before a process P2, in general, we cannot infer that P1 caused P2 to happen. However, the converse implication holds assuming time is measured according to a single clock. Even though the provenance model is timeless, we recognize the importance of time, since time is easily observable by computer systems or users. Hence, in Section 7, we examine how the causality graph can be annotated with time. We will also specify constraints that one would expect time annotations to satisfy (in terms of monotonicity with respect to time) in sound causality graphs. We assume the existence of a few primitive sets: identifiers for processes, artifacts and agents, roles, and accounts. These sets of identifiers provide indentifies to the corresponding entities within the scope of a given provenance graph. A given serialization will standardize on these sets, and provide concrete representations for them. It is important to stress that the purpose of these identifiers is to define the structure of graphs: they are not meant to define identities that are persistent and reliably resolvable over time. In the model, processes, artifacts and agents are identified by their IDs, and are associated with a value and zero or more accounts --- noted P(Account), the powerset notation. In the set-theoretic notation, identifiers map to the corresponding value and account membership. In other words, with a database perspective, elements of ProcessId, ArtifactId and AgentId are keys to processes, artifacts and agents, respectively. The five causality edges can be easily specified by sets used, wasGeneratedBy, triggeredBy, wasDerivedFrom, and wasControlledBy making use of identifiers for artifacts, processes or agents, roles, and the associated accounts. Finally, an OPM graph needs to identify explicitly which accounts are overlapping or refinements. For this, we use a set Overlaps enumerating lists of overlapping accounts, and a set Refines enumerating lists of refined accounts.
Concept is currently ill-defined. Definition remaining to be finalised. Can we define refinement just on syntactic properties of the graphs? Hence, the refinement relationship is reflexive, asymmetric and transitive. 6 InferencesThe Open Provenance Model has defined the notion of OPM graph based on a set of syntactic rules and the notion of Provenance Graph adding a set of topological constraints. Provenance graphs are aimed at representing causality graphs explaining how processes and artifacts came out to be. It is expected that a variety of reasoning algorithms will exploit this data model, in order to provide novel and powerful functionality to users. It is beyond the scope of this document to include an extensive coverage of relevant reasoning algorithms. However, provenance graphs, by means of edges, capture causal dependencies, which can be summarised by means of transitive closure that we describe in this section.6.1 One Step InferencesIn Section 2, we have introduced the two causal dependencies triggeredBy and wasDerivedFrom acting as abbreviation for causal dependencies used and wasGeneratedBy. Figure 9 shows their exact meaning.
The kind of inferences that can be made about wasDerivedFrom is of a different nature. Indeed, without any internal knowledge of P1 in Figure 9, it is impossible to ascertain there is an actual data dependency between A1 and A2. Remark. Concretely, a rule such as the following would lead to incorrect inferences since it allows arbitrary outputs to a process to be inferred to be dependent on arbitrary inputs to the same process.
⟨ a1,a2⟩∈WasDerivedFrom, then ⟨ a1,a2⟩∈MayHaveBeenDerivedFrom, but not vice-versa. Hence, Equation (3) states that a mayHaveBeenDerivedFrom edge can be derived from the existence of a succession of wasGeneratedBy and used edges. Equation (4) is to (2>) what wasDerivedFrom is to wasTriggeredBy.
6.2 Transitive ClosureUsers want to find out the causes of an artifact, not due to one process, but potentially, due to an unknown number of them. Hence, for the purpose of expressing queries or expressing inferences about provenance graphs, we introduce four new relationships, which are transitive versions of existing relationships, namely _used_^*, wasGeneratedBy^*, _wasDerivedFrom_^* and \TriggeredBy^*. Their definitions are displayed in Figure~\retransitive:figur. We note that Figure \retransitive:figur contains definitions (as opposed to inference rules of Figures \reone:step:inferences: and \reone:step:inferences:, which specify which edges can be inferred from which edges). For convenience, we have also introduced a generic causal dependency \wasDependentOn^* (see equations (\rewas:dependent:on:) to (\rewas:dependent:on:)). Note that similar inference rules can be defined for \MayHaveBeenDerivedFrom.
7 Formal Model and Time AnnotationsThe Open Provenance Model allows for causality graphs to be annotated with time annotations. In this model, time is not intended to be used for deriving causality: if causal dependencies exist, they need to be made explicit with the appropriate edges. However, time may have been observed during the course of a process, and we would expect such time information to be compatible with causal dependencies: the time of an effect should be greater than the time of its cause (for a same clock). Hence, time is useful in validating causality claims. In the Open Provenance Model, time may be associated to instantaneous occurrences in a process. We currently recognize four instantaneous occurrences, which have a reasonable shared understanding in real life and computer systems. Two of them pertain to artifacts, whereas the other two relate to processes. For artifacts, we consider the occurrences of creation and use, whereas for processes, we consider their starting and ending. The rationale for choosing instant time for the OPM model is the same as for adopting artifacts as immutable pieces of state. At a specific time, an object we consider will be in a specific state, which we refer to as artifact, and for which we can express the causality path that led to the object being in such a state. In some scenarios, occurrences of use or creation of objects and occurrences of starting or ending of processes may not be instantenous. To capture such scenarios, detailed processes and artifacts, and their respective causal dependencies, need to be made explicit, in order to be expressible in the OPM model. For instance, the starting of a nuclear power plant is not usefully modelled as an instantatenous occurrence, when one tries to understand failures that occurred during this activity; hence, this whole starting occurrence must be modelled by one process (or possibly several), which in turn have instanenous beginnings and endings. In the Open Provenance Model, time information is expected to be obtained by observing a clock when an occurrence occurs. Given that time is observed, time accuracy is limited by the granularity of the clock and the granularity of the observer's activities. Hence, while the notion of time we consider is instantaneous, the model allows for an interval of accuracy to support granularity of clocks and observers. In the OPM model, an instantaneous occurrence happening at time t is annotated by two observation times tm,tM, such that the occurrence is known to have occurred no later than tM and no earlier than tm. Hence, t ∈ [tm,tM].
8 Time Constraints and InferencesThe model of causality in OPM is essential timeless since time precedence does not imply causality: if a process P_1 occurs before a process P_2, in general, we cannot infer that P_1 caused P_2 to happen. However, the converse implication holds assuming time is measured according to a single clock. We therefore expect time annotations to be consistent with causality. To this end, we extend the definition of legal account view, defined as: an acyclic account view, which contains at most one wasGeneratedBy edge per artifact, and in which causation is time-monotonic, as displayed in Figure \retime:monotonicit, and discussed below.
9 Support for CollectionsCollections represent groups of objects. Computer programs in general, and workflows in particular, usually offer primitives to manipulate such collections. It is therefore important that OPM offers the means to represent collections and their provenance. Specifically, it is crucial to be able to distinguish the provenance of collections from the provenance of the items contained in them. Collections are represented by artifacts, and an OPM graph can express that a collection was used or was generated by a process. (Likewise, a summary edge can also express that a collection was derived from another.) At any point in a computation, a collection consists of a group of member artifacts, which can be enumerated by means of a collection accessor, and individually used by processes. Symmetrically, a group of artifacts generated by processes can be grouped into a collection by means of a collection constructor. Collection types are defined by means of collection accessors and constructors; such operations are expressed by OPM processes, and the algebraic properties of these operations define the properties of collections: e.g. ordered or unordered collections, bags or sets, indexable collections or not.
10 Example of RepresentationIn this Section, we construct an explicit representation of the model for Figure 4. It appears in 17, where we see:
11 ConclusionThe document has introduced the open provenance model, consisting of a technology-independent specification and a graphical notation, to express causality graphs representing past executions. In the future, we will define a serialization format for this model. We will also specify protocols by which provenance of artifacts can be determined, and protocols for applications to record descriptions of their execution. We invite teams that have defined their own provenance model to establish whether their representations can be converted into this model and vice-versa.Best Practice on the Use of AgentsWith the defined notion of account, we now revisit the sky mosaic example. Instead of Figure~\repegasus:figur, a different description could encompass the steps the operating system (or the grid) goes through in order to execute a program (as in the PASS and ES3 approaches). Figure~\repegasus3b:figur illustrates some possible causal dependencies for a system-level description. Here, we see an explicit reference to the workflow script used by the enactor.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
On June 19th 2008, twenty participants attended the first OPM workshop [3] to discuss the version of the specification. Minutes of the workshop and recommendations [4] were published, and led to the current version (v1.01) of the Open Provenance Model. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Added: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| > > |
Comments | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Deleted: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Deleted: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Deleted: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Deleted: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| < < |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The Open Provenance Model (v1.01) | ||||||||
| Added: | ||||||||
| > > |
| |||||||
| ||||||||
| Line: 12 to 15 | ||||||||
|---|---|---|---|---|---|---|---|---|
| ||||||||
| Added: | ||||||||
| > > |
| |||||||
Abstract | ||||||||
| Line: 19 to 28 | ||||||||
| Changed: | ||||||||
| < < |
Introduction | |||||||
| > > |
1 Introduction | |||||||
| Provenance is well understood in the context of art or digital libaries, where it respectively refers to the documented history of an art object, or the documentation of processes in a digital object's life cycle~\citPREMIS:200. Interest for provenance in the "e-science community"~\citsimmhan05surve is also growing, since provenance is perceived as a crucial component of workflow systems~\citDeelman-Gil:NSF0 that can help scientists ensure reproducibility of their scientific analyses and processes. | ||||||||
| Line: 51 to 60 | ||||||||
| On June 19th 2008, twenty participants attended the first OPM workshop \citFirst:OPM:Workshop:0 to discuss the version of the specification. Minutes of the workshop and recommendations \citGroth:OPMWorkshopMinute were published, and led to the current version (v1.01) of the Open Provenance Model. | ||||||||
| Changed: | ||||||||
| < < |
Basics\labebasics:sectio | |||||||
| > > |
2 Basics | |||||||
| Changed: | ||||||||
| < < |
Entities | |||||||
| > > |
2.1 Entities | |||||||
| Changed: | ||||||||
| < < |
Our primary concern is to be able to represent how "things", whether digital data such as simulation results, physical objects such as cars, or immaterial entities such as decisions, came out to be in a given state, with a given set of characteristics, at a given moment. It is recognised that many of such "things" can be stateful: a car may be at various locations, it can contain different passengers, and it can have a tank full or empty; likewise, a file can contain different data at different moments of its existence. Hence, from the perspective of provenance, we introduce the concept of an artifact as an immutable | |||||||
| > > |
Our primary concern is to be able to represent how "things", whether digital data such as simulation results, physical objects such as cars, or immaterial entities such as decisions, came out to be in a given state, with a given set of characteristics, at a given moment. It is recognised that many of such "things" can be stateful: a car may be at various locations, it can contain different passengers, and it can have a tank full or empty; likewise, a file can contain different data at different moments of its existence. Hence, from the perspective of provenance, we introduce the concept of an artifact as an immutable1 piece of state; likewise, we introduce the concept of a process as actions resulting in new artifacts. | |||||||
| A process usually takes place in some context, which enables or facilitates its execution: examples of such contexts are varied and include a place where the process executes, an individual controlling the process, or an institution sponsoring the process. These entities are being referred to as Agents. Agents, as we shall see when we discuss causality dependencies, are a cause (like a catalyst) of a process taking place. | ||||||||
| Line: 74 to 83 | ||||||||
| Footnote 1: In the presence of streams, we consider an artifact to be a slice of stream in time, i.e. the stream content at a specific instant in the computation. A future version of OPM will refine the model to accomodate streams fully as they are recognized to be crucial in many applications. piece of state; likewise, we introduce the concept of a process as actions resulting in new artifacts. | ||||||||
| Changed: | ||||||||
| < < |
Dependencies | |||||||
| > > |
2.2 Dependencies | |||||||
| Changed: | ||||||||
| < < |
A provenance graph aims to capture the causal dependencies between the abovementioned entities. Therefore, a provenance graph is defined as a directed graph, whose nodes are artifacts, processes and agents, and whose edges belong to one of following categories depicted in Figure~\reedges:figur. An edge represents a causal dependency, between its source, denoting the effect, and its destination, denoting the cause. | |||||||
| > > |
A provenance graph aims to capture the causal dependencies between the abovementioned entities. Therefore, a provenance graph is defined as a directed graph, whose nodes are artifacts, processes and agents, and whose edges belong to one of following categories depicted in Figure 1. An edge represents a causal dependency, between its source, denoting the effect, and its destination, denoting the cause. | |||||||
|
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
![]() | |||||||
| Figure 1: Edges in the Provenance Model | ||||||||
| Changed: | ||||||||
| < < |
The first two edges express that a process used an artifact and that an artifact was generated by a process. Since a process may have used several artifacts, it is important to identify the roles under which these artifacts were used. (Roles are denoted by the letter R in Figure~\reedges:figur.) Likewise, a process may have generated many artifacts, and each would have a specific role. For instance, the division process uses two numbers, with roles dividend and divisor, and produces two numbers, with roles quotient and remainder. Roles are meaningful only in the context of the process where they are defined. The meaning of roles is not defined by OPM but by application domains; OPM only uses roles syntactically (as "tags") to distinguish the involvement of artifacts in processes. | |||||||
| > > |
The first two edges express that a process used an artifact and that an artifact was generated by a process. Since a process may have used several artifacts, it is important to identify the roles under which these artifacts were used. (Roles are denoted by the letter R in Figure 1.) Likewise, a process may have generated many artifacts, and each would have a specific role. For instance, the division process uses two numbers, with roles dividend and divisor, and produces two numbers, with roles quotient and remainder. Roles are meaningful only in the context of the process where they are defined. The meaning of roles is not defined by OPM but by application domains; OPM only uses roles syntactically (as "tags") to distinguish the involvement of artifacts in processes. | |||||||
| A process is caused by an agent, essentially acting as a catalyst or controller: this causal dependency is expressed by the was controlled by edge. Given that a process may have been catalyzed by several agents, we also identify their roles as catalysts. We note that the dependency between an agent and a process represents a control relationship, and not a data derivation relationship. It is introduced in the model to more easily express how a user (or institution) controlled a process. | ||||||||
| Changed: | ||||||||