Models

Provenance has become a necessity in a wide range of social and scientific applications, in particular, in news and blogs scenarios, where it is key to determine the sources of the published information, the process that led to a publication or the references used, in order to generate a trust value for the information.
Several provenance models exist, and we have selected the Open Provenance Model (OPM) as our provenance model for various reasons. Firstly, OPM comes from years of community effort and discussion, being adopted already by many applications (like OurSpaces, Tupelo, Taverna or eBioFlow). Secondly, OPM has been selected by the W3C Provenance Incubator Group as the reference vocabulary to map popular existent provenance vocabularies, setting the first steps for working towards a definitive standard. The provenance graph produced by the model is also easily understandable by any user.

This network has been developed following the NeOn methodology , by reusing existing ontologies and vocabularies and taking into account the work and requirements done by the W3C provenance Incubator Group in the News Aggregator Scenario.
The network documentation is available here. The network consists on three different levels, as can be seen in the next figure:


The first level (block 1) consists on the OPM Ontology (OPMO) which is domain independent provenance model on top of which we build our profile (block 2), extending and adapting the core to our scenario. Finally, on top of the profile we reuse domain-specific vocabularies (SIOC, MPEG-7, W3C GEO)(block 3), to model the descriptive metadata of the artifacts.
Next, we describe briefly each one of ontologies that compose this network.

Open Provenance Model

OPM proposes a causal graph, where the nodes are either artifacts (immutable pieces of state), processes (action or series of actions performed on artifacts), or agents (controllers of processes); and the edges represent the causal relationships between the nodes: Used (a process used some artifact), WasControlledBy (an agent controlled some process), WasTriggeredBy (a process activated other process), WasGeneratedBy (a process generated an artifact) and WasDerivedFrom (an artifact was derived from another artifact). It also has the notion of accounts (partial subgraphs of the provenance graph), which are useful to represent multiple views of the same graph from different perspectives; and roles, which allow describing deeper some of the aforementioned causal relationships.

Two ontological approaches exist to model OPM: OPM Ontology(OPMO) and OPM Vocabulary (OPMV). The former is more complex and consists on an ontology that models the edges as an n-ary relationship pattern, while the latter is a lightweight ontology to assert the OPM concepts. OPMO is the one used to model the proposed scenario, because it allows adding extra metadata to the edges instead of having to use reification.
In the next figure we can see an overview of the OPMO representation. The artifacts are represented in the center boxes (Process, Artifact and Agent), while the edges are in the in the boxes WCB (WasControlledBy), WTB (WasTriggeredBy), WGB (WasGeneratedBy), Used and WDF (WasDerivedFrom). The figure also shows how are the boxes interconnected, according to the OPM specification: WasControlledBy has as cause an Agent and as effect a Process, WasTriggeredBy and WasDerivedFrom have as cause and effect processes and artifacts respectively, Used has as cause an Artifact and as effect a Process and WasGeneratedBy has as cause a Process and as an effect an Artifact. Optionally, some of the edges could have a Role, or occur at an interval of Time (startTime, endTime).

OPM Overview (taken from the OPMO specification)

SIOC Ontology

SIOC is an ontology designed to describe information from online communities (such as blogs or forums), and it is used with OPM in the blogging platform. SIOC fits perfectly for this task, since it was designed for this purpose. It has Containers, Items and Posts to model the posting activity in the blogs, tracks the followers and subscribers of a user and can even deal with different versions of a post (pointing to previous versions). It also models the comments of a post, the RSS feeds or belonging to a group and links some of these relationships to other popular vocabularies such as FOAF (Friend of a friend) or DC (Dublin Core) . It is a resource-centric vocabulary centered in the domain of online communities.

MPEG-7 Ontology

The MPEG-7 ontology is a transformation of the MPEG-7 standard into OWL-Full, allowing descriptions on every detail of an image, video or audio file: size, duration, color, decomposition in segments, etc. It has been used in our scenario to annotate the metadata of part of the contents provided by the users (the ones which refer to images or video).

WGS84 Vocabulary

This simple ontology is used for describing the location of spatial things in coordinates (latitude, longitude and height) and places (Madrid, Barcelona, Ireland, etc). According to the specification, a spatial thing is "Anything with spatial extent", so we have included in that definition the edges of the OPM graph.

WebN+1 ontology for the tourist domain

This ontology which has been implemented using OWL, models the tourist domain represented in the WebN+1 platform. The ontology is based on the ontología Infutur, as it fulfils most of the domain requirements of the "El Viajero" use case defined in the project. At the same time, the Infutur ontology reuses several ontologies, such as SIOC (for the specification of users), W3C Geo (for the specification of geographical data), FOAF (for the specification of information about people), RECO (for representing information for recommendations) or Review (for representing reviews and ratings). The main elements of the ontology of the tourist use case of the WebN+1 project include classes and properties for representing tourist attractions (TourismResource), accommodation (Accommodation), multimedia content (Image, Audio), the location of resources (SpatialThing), the users of the system (UserAccount, Profile, Preferences), recommendations (Recommendation) and different organisations.