Introduction to knowledge graphs (section 3.4): Data graphs – Context
This article is section 3.4 of part 3 of the Introduction to knowledge graphs series of articles. Recent research has identified the development of knowledge graphs as an important aspect of artificial intelligence (AI) in knowledge management (KM).
Drawing on Hogan and colleagues’ comprehensive tutorial article1, this fourth section of the data graphs part of the series describes context. By context, Hogan and colleagues’ refer to the scope of truth, and thus talk about the context in which some data are held to be true.
Many (arguably all) facts presented in the data graph of Figure 1 can be considered true with respect to a certain context. With respect to temporal context, “Santiago” has existed as a city since 1541, flights from “Arica” to “Santiago” began in 1956, and so on. With respect to provenance, data about “EID15” were taken from – and are thus said to be true with respect to – the Ñam webpage on 11 April 2020. Other forms of context may also be used and combined, such as to indicate that “Arica” is a Chilean city (geographic) since 1883 (temporal) per the Treaty of Ancón (provenance).
The graph of Figure 1 leaves much of its context implicit. However, making context explicit can allow for interpreting the data from different perspectives, such as to understand what held true in 2016, what holds true excluding webpages later found to have spurious data, and so on. Various explicit representations of context are now discussed.
Direct Representation
The first way to represent context is to consider it as data no different from other data. For example, the dates for the event “EID15” in Figure 1 can be seen as directly representing an ad hoc form of temporal context. Alternatively, a number of specifications have been proposed to directly represent context in a more standard way, including the Time Ontology2 for temporal context, the PROV Data Model3 for provenance, and so on.
Reification
Often, we may wish to directly define the context of edges themselves; for example, we may wish to state that the edge
is valid from 1956. One option is to use reification, which allows for describing edges themselves in a graph. Figure 9 presents three forms of reification for modelling temporal context: RDF reification, n-ary relations, and singleton properties. Unlike in a direct representation, e is seen as denoting an edge in the graph, not a flight. While n-ary relations and singleton properties are more succinct, and n-ary relations are more compatible with path expressions, the best choice of reification may depend on the system chosen. Other forms of reification have been proposed in the literature, including, for example, NdFluents4. In general, a reified edge does not assert the edge it reifies; for example, we may reify an edge to state that it is no longer valid.
Higher-arity Representation
We can also use higher-arity representations – that extend the graph model – for encoding context. Taking again the edge
Figure 10 illustrates three higher-arity representations of temporal context. First, we can use a named del graph (Figure 10(a)) to contain the edge and then define the temporal context on the graph name. Second, we can use a property graph (Figure 10(b)) where the temporal context is defined as an attribute on the edge. Third, we can use RDF*5 (Figure 10(c)): an extension of RDF that allows edges to be defined as nodes. The most flexible of the three is the named graph representation, where we can assign context to multiple edges at once by placing them in one named graph, for example, adding more edges valid from 1956 to the named graph of Figure 10(a). The least flexible option is RDF*, which, without an edge ID, cannot capture different groups of contextual values on an edge; for example, we can add four values to the edge
stating that it was valid from 2006 until 2010 and valid from 2014 until 2018, but we cannot pair the values.
Annotations
While the previous alternatives are concerned with representing context, annotations allow for defining contexts, which enables automated context-aware processing of data. Some annotations model a particular contextual domain; for example, Temporal RDF6 allows for annotating edges with time intervals, such as
while Fuzzy RDF7 allows for annotating edges with a degree of truth such as
indicating that it is more or less true – with a degree of 0.8 – that Santiago has a semi-arid climate.
Other frameworks are domain-independent. Annotated RDF8 allows for representing various forms of context modelled as semi-rings: algebraic structures consisting of domain values (e.g., temporal intervals, fuzzy values, etc.) and two main operators to combine domain values: meet and join (different from the relational algebra join).
Figure 11 gives an example where G is annotated with integers (1–365) denoting days of the year. An interval notation is used such that {[150, 152]} indicates the set {150, 151, 152}. Query Q asks for flights from Santiago to cities with events and returns the temporal validity of each answer. To derive these answers, the meet operator is first applied – defined here as set intersection – to compute the annotation for which a “flight” and “city” edge match; for example, applying meet on {[150,152]} and {[1,120],[220,365]} for “Punta Arenas” gives the empty time interval {}, and thus it may be omitted from the results (depending on the semantics chosen). However, for “Arica”, there are two non-empty intersections: {[123,125]} for “EID16” and {[276,279]} for “EID17”. Since we are interested in the city, rather than the event, these two annotations for “Arica” are combined using the join operator, returning the annotation in which either result holds true. In this scenario, join is defined as the union of sets, giving {[123,125],[276,279]}.
Other Contextual Frameworks
Other frameworks for modelling and reasoning about context in graphs include that of contextual knowledge repositories, which assign (sub-)graphs to contexts with one or more partially ordered dimensions (e.g., “2020-03-22” ≼ “2020-03” ≼ “2020”) allowing to select sub-graphs at different levels of contextual granularity.
Next part: (part 4): Deductive knowledge.
Header image source: Crow Intelligence, CC BY-NC-SA 4.0.
References:
- Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37. ↩
- Cox, S., & Little, C. (2022). Time Ontology in OWL. W3C Candidate Recommendation. ↩
- Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., … & Tilmes, C. (2013). PROV-DM: The prov data model. W3C Recommendation 30 April 2013. ↩
- Giménez-García, J. M., Zimmermann, A., & Maret, P. (2017). NdFluents: An ontology for annotated statements with inference preservation. In The Semantic Web: 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28–June 1, 2017, Proceedings, Part I 14 (pp. 638-654). Springer International Publishing. ↩
- Schrader, B. (2020, July 24). RDF*: What is it and Why do I Need it? Enterprise Knowledge. ↩
- Gutierrez, C., Hurtado, C., & Vaisman, A. (2005). Temporal RDF. In The Semantic Web: Research and Applications: Second European Semantic Web Conference, ESWC 2005, Heraklion, Crete, Greece, May 29–June 1, 2005. Proceedings 2 (pp. 93-107). Springer Berlin Heidelberg. ↩
- Lv, Y., Ma, Z. M., & Yan, L. (2008, June). Fuzzy RDF: A data model to represent fuzzy metadata. In 2008 IEEE international conference on fuzzy systems (IEEE world congress on computational intelligence) (pp. 1439-1445). IEEE. ↩
- Udrea, O., Recupero, D. R., & Subrahmanian, V. S. (2010). Annotated RDF. ACM Transactions on Computational Logic (TOCL), 11(2), 1-41. ↩