Semiring Evaluation

ProvSQL represents query provenance as a circuit of gates. A semiring evaluation [Green et al., 2007] maps this circuit to a value in a chosen semiring by:

  1. Assigning a leaf value to each input gate via a provenance mapping.

  2. Propagating these values through plus and times gates (and monus gates for m-semirings [Geerts and Poggi, 2010]) according to the semiring operations.

The built-in evaluation functions all follow the same calling convention:

sr_<name>(provenance(), 'mapping_name')

where provenance() returns the token for the current output row and mapping_name is the name of a provenance mapping table. Semirings are also extended with a monus operation allowing to represent the output of non-monotone queries.

ProvSQL Studio’s evaluation strip drives every compiled and custom semiring interactively: pick a semiring and a provenance mapping in the dropdown, click Run, and the result lands inline. The rest of this chapter is the SQL reference for the same operations.

Boolean Semiring

sr_boolean evaluates the provenance in the Boolean semiring ({true, false}, ∨, ∧, false, true), answering the question “was this result derivable at all, depending on the Boolean mapping of input provenance tokens?”:

SELECT name, sr_boolean(provenance(), 'my_mapping')
FROM mytable;

Boolean-Expression Semiring

sr_boolexpr evaluates the provenance in the Boolean-expression semiring, returning a human-readable propositional formula. The mapping argument is optional: with one, leaves are labelled by the mapping’s value column; without one, leaves are rendered as bare x<id> placeholders.

SELECT name, sr_boolexpr(provenance(), 'my_mapping')
FROM mytable;

SELECT name, sr_boolexpr(provenance())
FROM mytable;

This is used as the basis for probability computation.

Symbolic Representation (as a Formula)

sr_formula returns a symbolic representation of the provenance as a human-readable formula using symbols and :

SELECT name, sr_formula(provenance(), 'witness_mapping')
FROM suspects;

For example, a result supported by tuples labelled a and b through a join would show as a⊗b; one that could come from either a or b would show as a⊕b.

Counting Semiring (m-semiring)

sr_counting evaluates the provenance in the counting m-semiring, counting the number of distinct supporting inputs:

SELECT name, sr_counting(provenance(), 'count_mapping')
FROM suspects;

The mapping should assign integer values (typically 1) to leaf tokens.

Why-Provenance

sr_why returns the why-provenance of a result – the set of minimal witnesses (sets of input tuples) that support the result:

SELECT name, sr_why(provenance(), 'my_mapping')
FROM mytable;

Leaf values may be bare labels (e.g. Alice, treated as the singleton witness {{Alice}}) or already-structured why-provenance values ({} for zero, {{}} for one, {{a},{b,c}} for a multi-witness set), which lets the output of one sr_why query be reused as input to another.

How-Provenance

sr_how returns the how-provenance of a result – the canonical polynomial in \mathbb{N}[X] over the input-tuple labels [Green et al., 2007]. Each derivation contributes a monomial; coefficients count distinct derivations of the same monomial:

SELECT name, sr_how(provenance(), 'my_mapping')
FROM mytable;

The result is rendered in canonical sum-of-products form, e.g. 2⋅Alice⋅Bob + Alice^2 + Bob^2. Multiplication is the dot ; exponents use ^k; 0 and 1 denote the additive and multiplicative identities. Because the form is canonical, two semantically-equivalent provenance circuits collapse to identical strings, making sr_how suitable for provenance-aware query equivalence (e.g. checking that two ETL pipelines produce the same provenance, not just the same tuples). The how-semiring is \mathbb{N}[X], the universal commutative semiring for provenance.

Leaf values may be bare labels (e.g. Alice, treated as the monomial Alice), the literal 0, or already-structured polynomials following the same canonical syntax as the output (e.g. 2⋅Alice⋅Bob^2 + 3⋅Charlie), so the output of one sr_how query can be reused as input to another.

Which-Provenance (Lineage)

sr_which returns the which-provenance (also known as lineage) of a result – a single set of input labels that contributed to it, namely the union of all witnesses:

SELECT name, sr_which(provenance(), 'my_mapping')
FROM mytable;

The result is rendered as {a,b,c} for a non-empty derivation, or if no derivation exists. Compared to sr_why, which-provenance is more compact (a flat set rather than a set of sets) but loses the breakdown into individual derivations.

Leaf values may be bare labels (e.g. Alice, treated as the singleton {Alice}), the literal (the additive zero), or already-structured sets ({} for the multiplicative identity, {a,b,c} for a non-empty set), so the output of one sr_which query can be reused as input to another.

Tropical Semiring (m-semiring)

sr_tropical evaluates the provenance in the tropical (min-plus) m-semiring (ℝ {+∞}, min, +, +∞, 0), returning the cost of the cheapest derivation:

SELECT name, sr_tropical(provenance(), 'cost_mapping')
FROM mytable;

The mapping should assign float8 cost values to leaf tokens; use 'Infinity'::float8 to encode the additive identity. This is useful for shortest-path or least-cost provenance, where plus selects the cheaper alternative and times accumulates cost along a derivation.

Viterbi Semiring (m-semiring)

sr_viterbi evaluates the provenance in the Viterbi (max-times) m-semiring ([0,1], max, ×, 0, 1), returning the probability of the most likely derivation:

SELECT name, sr_viterbi(provenance(), 'prob_mapping')
FROM mytable;

The mapping should assign float8 probability values in [0,1] to leaf tokens. Unlike probability_evaluate, which marginalises over derivations, the Viterbi semiring keeps only the single most likely derivation, making it suitable for most-probable-explanation style queries.

Łukasiewicz Fuzzy Semiring (m-semiring)

sr_lukasiewicz evaluates the provenance in the Łukasiewicz fuzzy m-semiring ([0,1], max, ⊗_Ł, 0, 1), where the multiplicative operation is the Łukasiewicz t-norm a \otimes_{\text{Ł}} b = \max(a + b - 1, 0):

SELECT name, sr_lukasiewicz(provenance(), 'evidence_mapping')
FROM mytable;

The mapping should assign float8 graded-truth values in [0,1] to leaf tokens. Compared to sr_viterbi (which multiplies probabilities), the Łukasiewicz t-norm preserves crisp truth (0.7 \otimes_{\text{Ł}} 1 = 0.7) and avoids the near-zero collapse of long product chains. This makes it the standard choice for fuzzy graded conjunctions where inputs are degrees of evidence rather than independent probabilities.

Temporal (Interval-Union) Semiring (m-semiring)

sr_temporal evaluates the provenance in the temporal (interval-union) m-semiring over PostgreSQL tstzmultirange values. Addition is multirange union, multiplication is intersection, monus is set difference; the additive identity is '{}' and the multiplicative identity is '{(,)}' (the universal range):

SELECT entity_id, sr_temporal(provenance(), 'validity_mapping')
FROM mytable;

This is the compiled counterpart of union_tstzintervals. The two compute the same quantity for plain SELECT-FROM-WHERE-GROUP BY queries, but sr_temporal additionally supports HAVING clauses, aggregation, and where-provenance, which the PL/pgSQL evaluator skips.

Requires PostgreSQL ≥ 14 (for tstzmultirange).

Numeric and Integer Interval-Union Semirings (m-semirings)

sr_interval_num and sr_interval_int are the numeric and integer counterparts of sr_temporal: the same interval-union algebra (union for ⊕, intersection for ⊗, set difference for monus) over nummultirange and int4multirange respectively.

SELECT name, sr_interval_num(provenance(), 'sensor_validity_mapping')
FROM mytable;

SELECT cite, sr_interval_int(provenance(), 'page_range_mapping')
FROM mytable;

Typical use cases:

  • sr_interval_num: measurement-validity provenance in scientific data integration (a sensor reading or model coefficient is valid only over a parameter range; joins compute jointly-valid ranges).

  • sr_interval_int: page-range or line-range provenance in scholarly / source-code corpora (“supported by pages [12,18] of doc A and pages [3,5] ∪ [40,42] of doc B”).

Both require PostgreSQL ≥ 14.

Min-Max and Max-Min Semirings (m-semirings)

sr_minmax and sr_maxmin evaluate the provenance in the min-max and max-min m-semirings over an arbitrary user-defined PostgreSQL ENUM type. The carrier order comes from pg_enum.enumsortorder; bottom and top are derived automatically.

  • sr_minmax: = min, = max, zero is the top of the enum, one is the bottom. The security shape: alternative derivations combine to the least sensitive label, joins combine to the most sensitive label.

  • sr_maxmin: dual = max, = min. The fuzzy / availability / trust shape: alternatives combine to the most permissive label, joins combine to the strictest label.

The third argument to both functions is a sample value of the carrier enum, used only for type inference; its value is ignored. Given a classification_level enum ordered from unclassified to not_available, where not_available is the top of the enum and plays the role of the semiring 𝟘 (no derivation possible):

SELECT create_provenance_mapping('personnel_level', 'personnel', 'classification');

SELECT city, sr_minmax(provenance(), 'personnel_level',
                       'unclassified'::classification_level) AS clearance
FROM (SELECT DISTINCT city FROM personnel) t;

This is the compiled replacement for the hand-rolled access-control semiring previously documented as the security semiring: a single implementation covers any user enum (security lattices, fuzzy-discrete trust levels, three-valued logic, project-specific orderings).

Custom Semirings with provenance_evaluate

Advanced users can define custom semirings in SQL and evaluate them using provenance_evaluate. The function takes a provenance token, a mapping table, a one element, and the names of aggregate functions implementing the semiring operations:

provenance_evaluate(
    token UUID,           -- provenance token to evaluate
    token2value regclass, -- mapping table (token → value)
    element_one,          -- identity element (type determines the semiring value type)
    plus_function,        -- name of the ⊕ aggregate
    times_function,       -- name of the ⊗ aggregate
    monus_function,       -- name of the ⊖ function (optional, for m-semirings)
    delta_function        -- name of the δ function (optional, for δ-semirings)
)

The plus and times operations must be defined as PostgreSQL aggregate functions with a two-argument state transition function. The monus and delta operations are plain functions.

As a worked example, consider the capability semiring over the bit(2) carrier interpreted as the diamond lattice \{00, 01, 10, 11\} (e.g., (can_read, can_write)). = | (bitwise OR) combines alternative derivations permissively; = & (bitwise AND) combines joins restrictively; = a & ~b (bitwise AND-NOT) is the Boolean difference monus; δ is the identity, so an aggregated group carries the OR of the capabilities of its supporting rows (e.g., a Paris group built from a read-only and a write-only row is annotated B'11', while a group built from two read-only rows stays B'01'). Zero is B'00', one is B'11':

CREATE FUNCTION cap_or  (state bit(2), v bit(2)) RETURNS bit(2) IMMUTABLE
  LANGUAGE SQL AS $$ SELECT state | v $$;
CREATE FUNCTION cap_and (state bit(2), v bit(2)) RETURNS bit(2) IMMUTABLE
  LANGUAGE SQL AS $$ SELECT state & v $$;
CREATE FUNCTION cap_minus(a bit(2), b bit(2))    RETURNS bit(2) IMMUTABLE
  LANGUAGE SQL AS $$ SELECT a & ~b $$;
CREATE FUNCTION cap_delta(v bit(2)) RETURNS bit(2) IMMUTABLE
  LANGUAGE SQL AS $$ SELECT v $$;

CREATE AGGREGATE cap_plus (bit(2)) (
  sfunc=cap_or,  stype=bit(2), initcond='00');
CREATE AGGREGATE cap_times(bit(2)) (
  sfunc=cap_and, stype=bit(2), initcond='11');

SELECT name,
       provenance_evaluate(provenance(), 'capability_mapping',
                           B'11'::bit(2),
                           'cap_plus', 'cap_times', 'cap_minus', 'cap_delta')
FROM mytable;

This is a commutative m-semiring on the four-element Boolean lattice B^2; the lattice is partial (the two middle elements are incomparable), so it is not subsumed by sr_minmax or sr_maxmin. Additional examples can be found in the test suite: test/sql/capability.sql (the capability semiring above) and test/sql/formula.sql (symbolic representation as a formula).

For queries involving aggregation (GROUP BY), use aggregation_evaluate instead, which additionally takes the names of an aggregate finalization function and a semimodule operation:

aggregation_evaluate(
    token,                -- aggregate result (agg_token)
    token2value,          -- mapping table
    agg_final_function,   -- finalization function for the aggregate
    agg_function,         -- aggregate function for group values
    semimod_function,     -- semimodule scalar multiplication
    element_one,          -- identity element
    plus_function,
    times_function,
    monus_function,       -- optional
    delta_function        -- optional
)

See test/sql/aggregation.sql for a complete example of aggregation_evaluate usage.

Note

provenance_evaluate and aggregation_evaluate are PL/pgSQL functions that traverse the provenance circuit recursively. They do not support cmp gates introduced by HAVING clauses; queries with HAVING will produce an error. The built-in compiled semirings (sr_formula, sr_counting, etc.) are implemented in C, support all gate types including HAVING, and are significantly faster. Prefer compiled semirings when available; use provenance_evaluate for semirings not covered by the built-in set.

Provenance Mappings

All semiring functions take a mapping name as their second argument. A mapping is created with create_provenance_mapping:

SELECT create_provenance_mapping('mapping_name', 'table_name', 'column_name');

The mapping table has columns token (uuid) and value. You can also populate it manually for custom scenarios:

INSERT INTO mapping_name(token, value)
SELECT provenance(), my_label FROM mytable;

Compatibility with Boolean-Provenance Rewriting

When provsql.boolean_provenance is on (see Probabilities), the planner rewrites hierarchical CQs to a read-once form whose probability is computable in linear time, and tags the root of the rewritten circuit so that semirings whose algebra is not Boolean-faithful refuse to evaluate it. The refusal raises an explicit error rather than silently producing a wrong value:

ERROR: ProvSQL: semiring 'sr_counting' is not compatible with
       boolean-provenance rewriting; re-run with
       provsql.boolean_provenance = off

The following compiled semirings are Boolean-faithful and run on any circuit, including rewritten ones: sr_boolean, sr_boolexpr, sr_formula, sr_temporal, sr_interval_num, sr_interval_int. The following are not, and refuse on rewritten circuits: sr_counting, sr_how, sr_why, sr_which, sr_tropical, sr_viterbi, sr_lukasiewicz, sr_minmax, sr_maxmin.

To run a refused semiring, switch provsql.boolean_provenance off (in the current session, or for the role / database) and re-evaluate; the next provenance circuit built for the query is unrewritten and accepts any semiring.

ProvSQL Studio’s evaluation strip filters the semiring dropdown to hide the refusing semirings whenever the root carries the rewrite tag.