ABCs of KM Introduction to knowledge graphs

Introduction to knowledge graphs (section 3.4): Data graphs – Context

Bruce Boyes22 Apr 2023

581 5 minutes read

This article is section 3.4 of part 3 of the Introduction to knowledge graphs series of articles. Recent research has identified the development of knowledge graphs as an important aspect of artificial intelligence (AI) in knowledge management (KM).

Drawing on Hogan and colleagues’ comprehensive tutorial article¹, this fourth section of the data graphs part of the series describes context. By context, Hogan and colleagues’ refer to the scope of truth, and thus talk about the context in which some data are held to be true.

Many (arguably all) facts presented in the data graph of Figure 1 can be considered true with respect to a certain context. With respect to temporal context, “Santiago” has existed as a city since 1541, flights from “Arica” to “Santiago” began in 1956, and so on. With respect to provenance, data about “EID15” were taken from – and are thus said to be true with respect to – the Ñam webpage on 11 April 2020. Other forms of context may also be used and combined, such as to indicate that “Arica” is a Chilean city (geographic) since 1883 (temporal) per the Treaty of Ancón (provenance).

The graph of Figure 1 leaves much of its context implicit. However, making context explicit can allow for interpreting the data from different perspectives, such as to understand what held true in 2016, what holds true excluding webpages later found to have spurious data, and so on. Various explicit representations of context are now discussed.

Direct Representation

The first way to represent context is to consider it as data no different from other data. For example, the dates for the event “EID15” in Figure 1 can be seen as directly representing an ad hoc form of temporal context. Alternatively, a number of specifications have been proposed to directly represent context in a more standard way, including the Time Ontology² for temporal context, the PROV Data Model³ for provenance, and so on.

Reification

Often, we may wish to directly define the context of edges themselves; for example, we may wish to state that the edge

is valid from 1956. One option is to use reification, which allows for describing edges themselves in a graph. Figure 9 presents three forms of reification for modelling temporal context: RDF reification, n-ary relations, and singleton properties. Unlike in a direct representation, e is seen as denoting an edge in the graph, not a flight. While n-ary relations and singleton properties are more succinct, and n-ary relations are more compatible with path expressions, the best choice of reification may depend on the system chosen. Other forms of reification have been proposed in the literature, including, for example, NdFluents⁴. In general, a reified edge does not assert the edge it reifies; for example, we may reify an edge to state that it is no longer valid.

Three representations of temporal context on an edge in a directed-edge labelled graph. — Figure 9. Three representations of temporal context on an edge in a directed-edge labelled graph (source: Hogan et al. 2021).

Higher-arity Representation

We can also use higher-arity representations – that extend the graph model – for encoding context. Taking again the edge

Figure 10 illustrates three higher-arity representations of temporal context. First, we can use a named del graph (Figure 10(a)) to contain the edge and then define the temporal context on the graph name. Second, we can use a property graph (Figure 10(b)) where the temporal context is defined as an attribute on the edge. Third, we can use RDF*⁵ (Figure 10(c)): an extension of RDF that allows edges to be defined as nodes. The most flexible of the three is the named graph representation, where we can assign context to multiple edges at once by placing them in one named graph, for example, adding more edges valid from 1956 to the named graph of Figure 10(a). The least flexible option is RDF*, which, without an edge ID, cannot capture different groups of contextual values on an edge; for example, we can add four values to the edge

stating that it was valid from 2006 until 2010 and valid from 2014 until 2018, but we cannot pair the values.

hree higher-arity representations of temporal context on an edge. — Figure 10. Three higher-arity representations of temporal context on an edge (source: Hogan et al. 2021).

Annotations

While the previous alternatives are concerned with representing context, annotations allow for defining contexts, which enables automated context-aware processing of data. Some annotations model a particular contextual domain; for example, Temporal RDF⁶ allows for annotating edges with time intervals, such as

while Fuzzy RDF⁷ allows for annotating edges with a degree of truth such as

indicating that it is more or less true – with a degree of 0.8 – that Santiago has a semi-arid climate.

Other frameworks are domain-independent. Annotated RDF⁸ allows for representing various forms of context modelled as semi-rings: algebraic structures consisting of domain values (e.g., temporal intervals, fuzzy values, etc.) and two main operators to combine domain values: meet and join (different from the relational algebra join).

Figure 11 gives an example where G is annotated with integers (1–365) denoting days of the year. An interval notation is used such that {[150, 152]} indicates the set {150, 151, 152}. Query Q asks for flights from Santiago to cities with events and returns the temporal validity of each answer. To derive these answers, the meet operator is first applied – defined here as set intersection – to compute the annotation for which a “flight” and “city” edge match; for example, applying meet on {[150,152]} and {[1,120],[220,365]} for “Punta Arenas” gives the empty time interval {}, and thus it may be omitted from the results (depending on the semantics chosen). However, for “Arica”, there are two non-empty intersections: {[123,125]} for “EID16” and {[276,279]} for “EID17”. Since we are interested in the city, rather than the event, these two annotations for “Arica” are combined using the join operator, returning the annotation in which either result holds true. In this scenario, join is defined as the union of sets, giving {[123,125],[276,279]}.

Example query on a temporally annotated graph. — Figure 11. Example query on a temporally annotated graph (source: Hogan et al. 2021).

Other Contextual Frameworks

Other frameworks for modelling and reasoning about context in graphs include that of contextual knowledge repositories, which assign (sub-)graphs to contexts with one or more partially ordered dimensions (e.g., “2020-03-22” ≼ “2020-03” ≼ “2020”) allowing to select sub-graphs at different levels of contextual granularity.

Next part: (part 4): Deductive knowledge.

Header image source: Crow Intelligence, CC BY-NC-SA 4.0.

References:

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., … & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (CSUR), 54(4), 1-37. ↩
Cox, S., & Little, C. (2022). Time Ontology in OWL. W3C Candidate Recommendation. ↩
Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., … & Tilmes, C. (2013). PROV-DM: The prov data model. W3C Recommendation 30 April 2013. ↩
Giménez-García, J. M., Zimmermann, A., & Maret, P. (2017). NdFluents: An ontology for annotated statements with inference preservation. In The Semantic Web: 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28–June 1, 2017, Proceedings, Part I 14 (pp. 638-654). Springer International Publishing. ↩
Schrader, B. (2020, July 24). RDF*: What is it and Why do I Need it? Enterprise Knowledge. ↩
Gutierrez, C., Hurtado, C., & Vaisman, A. (2005). Temporal RDF. In The Semantic Web: Research and Applications: Second European Semantic Web Conference, ESWC 2005, Heraklion, Crete, Greece, May 29–June 1, 2005. Proceedings 2 (pp. 93-107). Springer Berlin Heidelberg. ↩
Lv, Y., Ma, Z. M., & Yan, L. (2008, June). Fuzzy RDF: A data model to represent fuzzy metadata. In 2008 IEEE international conference on fuzzy systems (IEEE world congress on computational intelligence) (pp. 1439-1445). IEEE. ↩
Udrea, O., Recupero, D. R., & Subrahmanian, V. S. (2010). Annotated RDF. ACM Transactions on Computational Logic (TOCL), 11(2), 1-41. ↩

5/5 - (1 vote)

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Direct Representation

Reification

Higher-arity Representation

Annotations

Other Contextual Frameworks

Bruce Boyes

Related Articles

Introduction to knowledge graphs (part 3): Data graphs

Open access to scholarly knowledge in the digital era (chapter 5.4): Toward linked open data for Latin America

Introduction to knowledge graphs (section 5.1): Inductive knowledge – Graph analytics

Introduction to knowledge graphs (section 4.3): Deductive knowledge – Reasoning