Adding Provenance to a Table

Before ProvSQL can track provenance, the extension must be loaded and provenance must be enabled on each relevant table. ProvSQL represents provenance as a circuit of gates following the model of [Deutch et al., 2014].

Loading the Extension

In every database where you want provenance support:

CREATE EXTENSION provsql CASCADE;

The CASCADE option installs the required uuid-ossp dependency automatically.

To call ProvSQL functions without the provsql. prefix, add provsql to the search path at the start of each session:

SET search_path TO public, provsql;

To make this permanent for a specific database:

ALTER DATABASE mydb SET search_path TO public, provsql;

Most examples in this documentation omit the provsql. prefix, assuming provsql is in the search path.

Disabling Provenance Temporarily

Setting provsql.active to off makes ProvSQL silently drop all provenance annotations for the current session, as if the extension were not loaded:

SET provsql.active = off;

This is useful for running queries without provenance overhead while keeping the extension installed. See Configuration Reference for all configuration variables.

Enabling Provenance on a Table

Use add_provenance to add provenance tracking to an existing table:

SELECT provsql.add_provenance('mytable');

This adds a hidden provsql column of type uuid to the table. Each row receives a freshly generated UUID that identifies a leaf (input) gate in the provenance circuit.

After enabling provenance, every query that reads from mytable will automatically carry provenance annotations in its result set.

The table is recorded as TID (tuple-independent) in ProvSQL’s per-database metadata store. repair_key (see Probabilities) re-registers it as BID with the chosen block-key columns. This classification is consulted by the safe-query rewriter (the provsql.boolean_provenance opt-in optimisation, see Probabilities) to verify that any projection it introduces preserves the table’s block-key alignment.

Note

add_provenance must be called on the base table, not on a view.

Accessing the Provenance Token

The provsql column is intentionally opaque – it is silently removed from WHERE or ORDER BY clauses. To refer to the current row’s provenance token, use the provenance() function:

SELECT name, provenance() FROM mytable;

Within a query result, the provsql attribute carries a UUID value that represents the provenance circuit gate for that tuple.

Removing Provenance

To stop tracking provenance for a table (and drop the provsql column), use remove_provenance:

SELECT provsql.remove_provenance('mytable');

Provenance Mappings

A provenance mapping associates provenance tokens with values from a table column. Mappings are the bridge between abstract circuit tokens and domain-meaningful labels used by semiring evaluation functions. Use create_provenance_mapping to create one:

SELECT create_provenance_mapping('my_mapping', 'mytable', 'column_name');

The mapping is stored as an ordinary PostgreSQL table called my_mapping with two columns: token (uuid) and value (text or numeric, depending on the source column type).

Alternatively, create_provenance_mapping_view creates a view instead of a table. The view always reflects the current state of the source table, which is useful when the table is frequently updated:

SELECT create_provenance_mapping_view('my_mapping_view', 'mytable', 'column_name');

The view can be used anywhere a table-based mapping is expected (e.g., as the second argument to semiring evaluation functions).

ProvSQL Studio

ProvSQL Studio’s schema panel is an interactive surface for the operations above:

  • it lists every SELECT-able relation, with a purple prov pill on tables whose provsql column is injected by the planner (provenance tracking is active) and a gold mapping pill on relations shaped (value <T>, provenance uuid);

  • + prov and − prov action chips on provenance-eligible plain tables prefill the corresponding SELECT add_provenance(...) / SELECT remove_provenance(...) call into the query box;

  • clicking a column on a tracked table prefills a SELECT create_provenance_mapping('<table>_<col>_mapping', '<schema>.<table>', '<col>'); call, so a fresh provenance mapping is two clicks away.

Inspecting the Circuit

ProvSQL represents provenance as a circuit: a directed acyclic graph (DAG) of gates. Each tuple in a provenance-tracked table is associated with an input gate, created lazily the first time that tuple appears in a query result. Tuples may also carry more complex provenance – for instance, rows created by INSERT ... SELECT or CREATE TABLE AS inherit the provenance expression of the source query.

As queries combine tuples, internal gates record the semiring operations that were applied:

  • plus (⊕): alternative derivations (UNION, DISTINCT)

  • times (⊗): combined use (JOIN, cross product)

  • monus (⊖): difference (EXCEPT)

  • delta (δ): aggregation boundary (GROUP BY)

  • agg, semimod: aggregate provenance

  • project, eq: where-provenance (column tracking, equijoin)

  • cmp: HAVING comparisons (and the filter-on-RV comparator lift, see Continuous Distributions)

  • mulinput: multivalued input (one alternative of a block-independent input; see Probabilities)

Two constant gates represent the semiring identity elements: gate_zero (additive identity, 𝟘) and gate_one (multiplicative identity, 𝟙).

Three additional gate types support continuous random variables (see Continuous Distributions):

  • rv: random-variable leaf (Normal / Uniform / Exponential / Erlang)

  • arith: N-ary arithmetic over scalar children (+ - * /, unary -)

  • mixture: Bernoulli or categorical mixture of scalar random-variable roots

The following functions let you navigate and inspect the circuit:

  • get_gate_type – returns the type of a gate.

  • get_children – returns the child tokens of a gate.

  • identify_token – given a provenance token, returns the source table and row it originates from.

  • get_infos – returns the integer metadata attached to a gate (e.g., aggregate function OID, comparison operator OID).

  • get_extra – returns the text metadata attached to a gate (e.g., aggregate value, column positions for where-provenance).

  • get_nb_gates – returns the total number of gates in the circuit, useful for diagnosing circuit size and performance.

SELECT provsql.get_gate_type(provenance()) FROM mytable;
SELECT provsql.get_children(provenance()) FROM mytable;