RDF Messages

Living Document,

This version:
https://www.pieter.pm/rdf-messages
Issue Tracking:
GitHub
Inline In Spec
Editors:
Pieter Colpaert
Piotr Sowiński

Abstract

Concepts and abstract data model for RDF Messages

1. Introduction

An RDF Message is an RDF Dataset that is intended to be interpreted atomically as a single communicative act. The dataset of the message can be empty.

Note: While no formal restrictions on the size of an RDF Message is defined, they are intended to be kept rather small and actionable.

PREFIX as: <https://www.w3.org/ns/activitystreams#>
PREFIX ex: <http://example.org/>

ex:like-1 a as:Like ;
  as:object ex:blogpost-1 ;
  as:actor <https://pietercolpaert.be/#me> .
Example of a message using the [activitystreams-vocabulary] vocabulary.

Note: You cannot refer to a specific RDF Message, you can only understand the quads belong together. You can however refer to resources defined within the message, such as to ex:like-1 in the example above.

An RDF Message Stream instance carries RDF Messages from one specific producer to one specific consumer.

Note: This is concept is different from an RDF quad stream that carries individual quads.

A stream consumer listens in on the stream using a stream protocol.

A stream producer makes available a stream using a stream protocol.

Note: The underlying stream protocol is out of scope of this specification. It can be for example [WebSockets], [LDN], [EventSource], Linked Data Event Streams, Jelly gRPC, MQTT, or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.

An RDF Message Log is an ordered collection of RDF Messages. The log can be serialized from an RDF Message Stream, and/or deserialized into an RDF Message Stream.

# a message defining the context
ex:Stream1 a ex:Dataset;
    rdfs:comment "A log of messages that appeared on a stream" .
# @message a next message is an observation in the stream
ex:Observation1
    a sosa:Observation ;
    sosa:resultTime "2026-01-01T00:00:00Z"^^xsd:dateTime ;
    sosa:hasSimpleResult "..." .
# @message an empty message
# @message another observation
ex:Observation2
    a sosa:Observation ;
    sosa:resultTime "2026-01-01T00:10:00Z"^^xsd:dateTime ;
    sosa:hasSimpleResult "..." .  
Example of an RDF Message Log publishing the RDF Messages that appeared in a stream so far.

Note: A producer may want to indicate that a certain property is used to indicate the timestamp of when the message was created. This can be done, for example, using ldes:timestampPath from Linked Data Event Streams. Alternatively, when vocabularies such as ActivityStreams, SSN/SOSA, or PROV-O are used, one can just assume the respective properties as:published, sosa:resultTime, or prov:generatedAtTime are going to be used for this purpose.

Note: The scope of blank nodes is defined by the underlying protocol. E.g. for documents with multiple RDF messages in them, this remains the document itself.

If blank nodes in an N-Quads Message Log are scoped to the entire document, then to serialize a long stream and avoid blank node collisions, the producer would need to store in memory all blank nodes ever used or rely on UUIDs. Is this feasible? See also: Issue #3

2. RDF Message Streams

A stream consumer has functionality to create and access a new RDF Message Stream instance. An instance thus only exists when it is being consumed.

A function is called on the stream consumer, as specified by the underlying protocol, when the stream producer sends a new RDF Message on the RDF Message Stream.

Note: This is an abstraction over the underlying protocol or API.

A stream producer MAY provide a mechanism to write only when a stream consumer is ready to process the next message.

Find out and document the similarities/differences to the RDF-JS Stream interface

3. Serializing and parsing RDF Message Logs

In this specification we propose that all RDF serializations MUST implement a way to group quads into RDF Messages. This way, a stream consumer can write the stream into an RDF Message Log that can be read again by a stream producer into an RDF Message Stream.

3.1. N-Triples, N-Quads, Turtle and TRiG

The RDF serializations are either way being revised in the upcoming RDF1.2 specification, in which version labels are proposed. This is a proposal to the working group to include this concept by including yet another content-type directive as follows: Content-Type: application/trig; version=1.2; messages=rdfm. This indicates that the messages are following this spec in this HTTP Response. Clients that do not rely on RDF Messages can still interpret the response as regular RDF1.2 data.

When the content-type flagged the support for messages, and a parser is in message mode, it MUST:

  1. Consider every triple in the document as part of an RDF Message. The document does not need to start with a delimiter. If it does start with a delimiter, the content after the delimiter is part of the first message and the document did not start with an empty message.

  2. Triples are added to the current message as long as no delimiter or EOF has been encountered.

  3. When a delimiter was encountered, the current RDF Message is finalized and a next one is opened.

The delimiter is a comment in the document that matches this regex: /^\s*@message/.

3.2. JSON-LD

This discussion is preliminary, yet on-going, in the JSON-LD group itself with a proposal called newline delimited JSON-LD. We propose that that specification becomes the preferred way of adding messages support in JSON-LD.

Instead of using a newline delimited format, also other types of RDF Message delimiting can be imagined, for example, by using an array of elements linked from an @message directive, indicating to the parser that the JSON-LD object that follows is to be interpreted as an RDF Message.

3.3. RDF/XML

Each message is a new XML document on a new line.

This is made discoverable with a new content-type: Content-Type: application/rdfm+xml

4. Examples and use cases

4.1. An archive of an RDF Stream

When you write out an RDF Message Log into a file, all RDF Messages are preserved when deserializing it again. They are streamed out in the same order as they were written into the file.

Without the semantics of an RDF Message, and without the syntax for it, trying to reconstruct the intended message becomes slow and cannot be solved without using sub-optimal heuristics. The performance loss is due to the fact that there could always be another quad at the end of the file that still needs to be considered for the message, as you cannot rely on the quads being grouped together. A heuristic is needed as you can only guess that e.g. subject-based star patterns, or maybe a [CBD], or maybe a named graph is going to be used. This is what is being used by the Linked Data Event Streams “member extraction” step.

4.2. SPARQL CONSTRUCT results

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
CONSTRUCT {
  ?company a dbo:Company ;
       dbo:location ?location .
  
  ?location rdf:type dbo:Country ;
            rdfs:label ?lname ;
            dbp:populationCensus ?pop .
} WHERE {
    ?company dbo:location | dbo:locationCountry ?location .
    
    ?location rdf:type dbo:Country ;
              rdfs:label ?lname ;
              dbp:populationCensus | dbo:populationTotal ?pop .
    
    FILTER (LANG(?lname) = "en")
} ORDER BY ?location LIMIT 10000
If the query engine would support RDF message logs to indicate that groups of triples are part of a certain result, it would speed up clients that want to use the message as a meaningful concept.

The example (Test it using the DBpedia SPARQL endpoint or using Comunica) generates 10000 companies in countries and lists the population number of the country. Now imagine that a consumer wants to process the results of this SPARQL query, where each construct result is an RDF Message. While the server could have grouped the quads for the consumer, the consumer will have to re-construct the BGP in the CONSTRUCT clause again on the client before it can proceed. The obvious solution here is to use an RDF Message Stream.

4.3. RiverBench dataset distributions

Datasets in RiverBench are streams of RDF datasets that can be processed individually as RDF Messages. They represent real-world use cases of streaming RDF data. For example, the officegraph dataset consists of almost 15 million RDF graphs with IoT measurements (see example below).

PREFIX ic:    <https://interconnectproject.eu/example/>
PREFIX om:    <http://www.wurvoc.org/vocabularies/om-1.8/>
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX xsd:   <http://www.w3.org/2001/XMLSchema#>

ic:property_R5_56__co2_
        a       ic:CO2Level .

ic:measurement_R5_56__co2__0
        a                        saref:Measurement;
        saref:hasTimestamp       "2022-02-28T23:59:00"^^xsd:dateTime;
        saref:hasValue           "504"^^xsd:float;
        saref:isMeasuredIn       om:partsPerMillion;
        saref:relatesToProperty  ic:property_R5_56__co2_ .
Example of an RDF Message in the officegraph dataset.

To distribute this stream, a TAR archive is used, where each file in the archive is an RDF Message in the stream. This could be greatly improved by using an RDF Message Log serialization instead, as this would allow to save the entire stream into a single file, while still being able to reconstruct the individual messages again.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[ACTIVITYSTREAMS-VOCABULARY]
James Snell; Evan Prodromou. Activity Vocabulary. URL: https://w3c.github.io/activitystreams/vocabulary/
[CBD]
Patrick Stickler, Nokia. CBD - Concise Bounded Description. 3 June 2005. W3C Member Submission. URL: https://www.w3.org/Submission/CBD/
[EventSource]
Ian Hickson. Server-Sent Events. URL: https://html.spec.whatwg.org/multipage/server-sent-events.html
[LDN]
Sarven Capadisli; Amy Guy. Linked Data Notifications. URL: https://linkedresearch.org/ldn/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WebSockets]
Adam Rice. WebSockets Standard. Living Standard. URL: https://websockets.spec.whatwg.org/

Issues Index

If blank nodes in an N-Quads Message Log are scoped to the entire document, then to serialize a long stream and avoid blank node collisions, the producer would need to store in memory all blank nodes ever used or rely on UUIDs. Is this feasible? See also: Issue #3
Find out and document the similarities/differences to the RDF-JS Stream interface
This discussion is preliminary, yet on-going, in the JSON-LD group itself with a proposal called newline delimited JSON-LD. We propose that that specification becomes the preferred way of adding messages support in JSON-LD.
Instead of using a newline delimited format, also other types of RDF Message delimiting can be imagined, for example, by using an array of elements linked from an @message directive, indicating to the parser that the JSON-LD object that follows is to be interpreted as an RDF Message.