1. Introduction
An RDF Message is an RDF Dataset that is intended to be interpreted atomically as a single communicative act. The dataset of the message can be empty.
Note: While no formal restrictions on the size of an RDF Message is defined, they are intended to be kept rather small and actionable.
PREFIX as: <https://www.w3.org/ns/activitystreams#> PREFIX ex: <http://example.org/> ex : like-1 a as : Like ; as : object ex : blogpost-1 ; as : actor <https://pietercolpaert.be/#me> .
Note: You cannot refer to a specific RDF Message, you can only understand the quads belong together. You can however refer to resources defined within the message, such as to ex:like-1 in the example above.
An RDF Message Stream instance carries RDF Messages from one specific producer to one specific consumer.
Note: This is concept is different from an RDF quad stream that carries individual quads.
A stream consumer listens in on the stream using a stream protocol.
A stream producer makes available a stream using a stream protocol.
Note: The underlying stream protocol is out of scope of this specification. It can be for example [WebSockets], [LDN], [EventSource], Linked Data Event Streams, Jelly gRPC, MQTT, or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.
An RDF Message Log is an ordered collection of RDF Messages. The log can be serialized from an RDF Message Stream, and/or deserialized into an RDF Message Stream.
# a message defining the context ex : Stream1 a ex : Dataset ; rdfs : comment "A log of messages that appeared on a stream" . # @message a next message is an observation in the stream ex : Observation1 a sosa : Observation ; sosa : resultTime "2026-01-01T00:00:00Z" ^^ xsd : dateTime ; sosa : hasSimpleResult "..." . # @message an empty message # @message another observation ex : Observation2 a sosa : Observation ; sosa : resultTime "2026-01-01T00:10:00Z" ^^ xsd : dateTime ; sosa : hasSimpleResult "..." .
Note: A producer may want to indicate that a certain property is used to indicate the timestamp of when the message was created. This can be done, for example, using ldes:timestampPath from Linked Data Event Streams. Alternatively, when vocabularies such as ActivityStreams, SSN/SOSA, or PROV-O are used, one can just assume the respective properties as:published, sosa:resultTime, or prov:generatedAtTime are going to be used for this purpose.
Note: The scope of blank nodes is defined by the underlying protocol. E.g. for documents with multiple RDF messages in them, this remains the document itself.
If blank nodes in an N-Quads Message Log are scoped to the entire document, then to serialize a long stream and avoid blank node collisions, the producer would need to store in memory all blank nodes ever used or rely on UUIDs. Is this feasible? See also: Issue #3
2. RDF Message Streams
A stream consumer has functionality to create and access a new RDF Message Stream instance. An instance thus only exists when it is being consumed.
A function is called on the stream consumer, as specified by the underlying protocol, when the stream producer sends a new RDF Message on the RDF Message Stream.
Note: This is an abstraction over the underlying protocol or API.
A stream producer MAY provide a mechanism to write only when a stream consumer is ready to process the next message.
Find out and document the similarities/differences to the RDF-JS Stream interface
3. Serializing and parsing RDF Message Logs
In this specification we propose that all RDF serializations MUST implement a way to group quads into RDF Messages. This way, a stream consumer can write the stream into an RDF Message Log that can be read again by a stream producer into an RDF Message Stream.
3.1. N-Triples, N-Quads, Turtle and TRiG
The RDF serializations are either way being revised in the upcoming RDF1.2 specification, in which version labels are proposed.
This is a proposal to the working group to include this concept by including yet another content-type directive as follows: Content-Type: application/trig; version=1.2; messages=rdfm.
This indicates that the messages are following this spec in this HTTP Response. Clients that do not rely on RDF Messages can still interpret the response as regular RDF1.2 data.
When the content-type flagged the support for messages, and a parser is in message mode, it MUST:
-
Consider every triple in the document as part of an RDF Message. The document does not need to start with a delimiter. If it does start with a delimiter, the content after the delimiter is part of the first message and the document did not start with an empty message.
-
Triples are added to the current message as long as no delimiter or EOF has been encountered.
-
When a delimiter was encountered, the current RDF Message is finalized and a next one is opened.
The delimiter is a comment in the document that matches this regex: /^\s*@message/.
3.2. JSON-LD
This discussion is preliminary, yet on-going, in the JSON-LD group itself with a proposal called newline delimited JSON-LD. We propose that that specification becomes the preferred way of adding messages support in JSON-LD.
Instead of using a newline delimited format, also other types of RDF Message delimiting can be imagined, for example, by using an array of elements linked from an @message directive, indicating to the parser that the JSON-LD object that follows is to be interpreted as an RDF Message.
3.3. RDF/XML
Each message is a new XML document on a new line.
This is made discoverable with a new content-type: Content-Type: application/rdfm+xml
4. Examples and use cases
4.1. An archive of an RDF Stream
When you write out an RDF Message Log into a file, all RDF Messages are preserved when deserializing it again. They are streamed out in the same order as they were written into the file.
Without the semantics of an RDF Message, and without the syntax for it, trying to reconstruct the intended message becomes slow and cannot be solved without using sub-optimal heuristics. The performance loss is due to the fact that there could always be another quad at the end of the file that still needs to be considered for the message, as you cannot rely on the quads being grouped together. A heuristic is needed as you can only guess that e.g. subject-based star patterns, or maybe a [CBD], or maybe a named graph is going to be used. This is what is being used by the Linked Data Event Streams “member extraction” step.
4.2. SPARQL CONSTRUCT results
PREFIX rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbo : <http://dbpedia.org/ontology/> PREFIX dbp : <http://dbpedia.org/property/> CONSTRUCT { ?company a dbo : Company ; dbo : location ?location . ?location rdf : type dbo : Country ; rdfs : label ?lname ; dbp : populationCensus ?pop . } WHERE { ?company dbo : location |dbo : locationCountry ?location . ?location rdf : type dbo : Country ; rdfs : label ?lname ; dbp : populationCensus |dbo : populationTotal ?pop . FILTER ( LANG ( ?lname ) = "en" ) } ORDER BY ?location LIMIT 10000
The example (Test it using the DBpedia SPARQL endpoint or using Comunica) generates 10000 companies in countries and lists the population number of the country. Now imagine that a consumer wants to process the results of this SPARQL query, where each construct result is an RDF Message. While the server could have grouped the quads for the consumer, the consumer will have to re-construct the BGP in the CONSTRUCT clause again on the client before it can proceed. The obvious solution here is to use an RDF Message Stream.
4.3. RiverBench dataset distributions
Datasets in RiverBench are streams of RDF datasets that can be processed individually as RDF Messages. They represent real-world use cases of streaming RDF data. For example, the officegraph dataset consists of almost 15 million RDF graphs with IoT measurements (see example below).
PREFIX ic: <https://interconnectproject.eu/example/> PREFIX om: <http://www.wurvoc.org/vocabularies/om-1.8/> PREFIX saref: <https://saref.etsi.org/core/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> ic : property_R5_56__co2_ a ic : CO2Level . ic : measurement_R5_56__co2__0 a saref : Measurement ; saref : hasTimestamp "2022-02-28T23:59:00" ^^ xsd : dateTime ; saref : hasValue "504" ^^ xsd : float ; saref : isMeasuredIn om : partsPerMillion ; saref : relatesToProperty ic : property_R5_56__co2_ .
To distribute this stream, a TAR archive is used, where each file in the archive is an RDF Message in the stream. This could be greatly improved by using an RDF Message Log serialization instead, as this would allow to save the entire stream into a single file, while still being able to reconstruct the individual messages again.