Search Shortcut cmd + k | ctrl + k
rdf

A DuckDB extension to read and write RDF

Maintainer(s): nonodename

Installing and Loading

INSTALL rdf FROM community;
LOAD rdf;

Example

-- 0. Assuming the extension is already installed and loaded

-- 1. Get number of ntriples in a directory
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');

-- 2. Get subjects and predicates of a turtle file
SELECT subject, predicate FROM read_rdf('test/rdf/tests.ttl');

-- 3. Write a query to turtle RDF, using R2RML mapping
COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt'
(FORMAT r2rml, mapping 'mapping.ttl');

-- 4. Execute a full R2RML mapping (with embedded queries) to write RDF
COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

-- 5. Check if an R2RML mapping is valid
SELECT is_valid_r2rml('mapping.ttl');

-- 6. Pivot RDF to a wide table
SELECT * FROM pivot_rdf('data.ttl');

-- 7. Read a SPARQL endpoint
SELECT * FROM read_sparql(
         'https://query.wikidata.org/sparql',
         'SELECT (COUNT(*) AS ?count) WHERE { ?item wdt:P31 wd:Q5 .}'
     );

About rdf

The duck_rdf extension enables DuckDB to read and write RDF (Resource Description Framework) data directly, using the SERD library for parsing and serialization.

Supported Formats

Read: Turtle (.ttl), NTriples (.nt), NQuads (.nq), TriG (.trig), and RDF/XML (.rdf/.xml, experimental — read-only).

Write: NTriples, Turtle, NQuads (via R2RML mapping).

Reading RDF

read_rdf() returns six columns: subject, predicate, object (always populated), and graph, language_tag, datatype (nullable). It accepts a file path or glob pattern; multiple matched files are scanned in parallel. .gz and .zst compressed files are supported (note that you need to load the parquet extension for the libzstd library to be loaded). The RDF format is auto-detected by file extension but can be overridden with the file_type parameter.

SELECT subject, predicate FROM read_rdf('data.ttl');
SELECT COUNT(*) FROM read_rdf('data/shards/*.nt');
SELECT * FROM read_rdf('data/*.dat', file_type = 'ttl', strict_parsing = false);

Optional parameters:

Parameter Default Description
strict_parsing true Set to false to allow malformed URIs
prefix_expansion false Expand CURIE-form URIs to full URIs (Turtle/TriG only)
file_type auto-detected Override format: ttl, nt, nq, trig, rdf/xml

pivot_rdf() takes the same path/glob argument as read_rdf() and returns a pivoted table, one column per predicate, at least one row per subject. (To operate on arbitrary file sizes subjects may be repeated if encountered out of sequence). While a pivot is possible in the SQL domain, it is subject to memory limits which this function aims to avoid by doing two passes on the RDF, the first profiling the shape of the data using profile_rdf().

The experimental read_sparql(endpoint, query) sends a SPARQL SELECT query to a remote endpoint and returns the result set as a DuckDB table. Column names are derived from the SPARQL variable names; all columns are VARCHAR. Unbound variables are returned as empty strings.

-- Count number of humans in wikidata
SELECT * FROM read_sparql(
            'https://query.wikidata.org/sparql',
            'SELECT (COUNT(*) AS ?count) WHERE {   ?item wdt:P31 wd:Q5 .}'
        );

Writing RDF (R2RML)

Write RDF using R2RML mapping files with DuckDB's COPY TO syntax. Two modes are supported:

Inside-out mode — DuckDB drives the query; the mapping has no rr:logicalTable:

COPY (SELECT empno, ename, deptno FROM emp)
TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

Full R2RML mode — the mapping defines its own queries:

COPY (SELECT 1) TO 'output.nt' (FORMAT r2rml, mapping 'mapping.ttl');

Write options:

Option Required Default Description
mapping Yes Path to R2RML mapping file (.ttl)
rdf_format No ntriples Output format: ntriples, turtle, or nquads
ignore_non_fatal_errors No true Raise an exception on the first parse error when false

Validation Helpers

SELECT is_valid_r2rml('mapping.ttl');      -- validate an R2RML mapping file
SELECT can_call_inside_out('mapping.ttl'); -- check if inside-out mode is supported

Added Functions

function_name function_type description comment examples
can_call_inside_out scalar Return true if the given R2RML mapping file can be executed in inside-out mode, where DuckDB runs the SQL query and the extension maps each output row to RDF triples. NULL [SELECT can_call_inside_out('mapping.ttl')]
is_valid_r2rml scalar Return true if the given file is a syntactically valid R2RML mapping document. NULL [SELECT is_valid_r2rml('mapping.ttl')]
pivot_rdf table Read RDF triples and pivot them into a wide table where each distinct predicate becomes a column and subjects become row identifiers. NULL [SELECT * FROM pivot_rdf('data.ttl'), SELECT * FROM pivot_rdf('data.nt', prefix_expansion=true)]
profile_rdf table Profile one or more RDF files and return a predicate-level statistical summary including value counts, datatypes, and cardinalities. NULL [SELECT * FROM profile_rdf('data.nt'), SELECT * FROM profile_rdf('data/*.ttl', strict_parsing=false)]
read_rdf table Read RDF triples from one or more files (Turtle, NTriples, NQuads, TriG, or RDF/XML) into a table with columns graph, subject, predicate, object, object_datatype, and object_lang. Glob patterns are supported. NULL [SELECT * FROM read_rdf('data.nt'), SELECT subject, predicate, object FROM read_rdf('*.ttl'), SELECT * FROM read_rdf('data.rdf', file_type='rdf', strict_parsing=false)]
read_rdf_prefixes table Read the namespace prefix declarations from one or more RDF files. Returns prefix (local name), uri (namespace URI), and is_base (true for @base declarations) columns. NULL [SELECT * FROM read_rdf_prefixes('data.ttl'), SELECT prefix, uri FROM read_rdf_prefixes('*.ttl')]
read_sparql table Execute a SPARQL SELECT query against a remote endpoint and return the results as a table. Each SPARQL variable becomes a VARCHAR column. Not available in WebAssembly builds. NULL [SELECT * FROM read_sparql('https://dbpedia.org/sparql', 'SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10')]

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.