Semiring Evaluation
ProvSQL represents query provenance as a circuit of gates. A semiring evaluation [Green et al., 2007] maps this circuit to a value in a chosen semiring by:
Assigning a leaf value to each input gate via a provenance mapping.
Propagating these values through
plusandtimesgates (andmonusgates for m-semirings [Geerts and Poggi, 2010]) according to the semiring operations.
The built-in evaluation functions all follow the same calling convention:
sr_<name>(provenance(), 'mapping_name')
where provenance() returns the token for the current output row and
mapping_name is the name of a provenance mapping table.
Semirings are also extended with a monus operation allowing to
represent the output of non-monotone queries.
Boolean Semiring
sr_boolean evaluates the provenance in the Boolean semiring
({true, false}, ∨, ∧, false, true), answering the question “was this
result derivable at all, depending on the Boolean mapping of input
provenance tokens?”:
SELECT name, sr_boolean(provenance(), 'my_mapping')
FROM mytable;
Boolean-Expression Semiring
sr_boolexpr evaluates the provenance in the Boolean-expression
semiring, returning a human-readable propositional formula:
SELECT name, sr_boolexpr(provenance(), 'my_mapping')
FROM mytable;
This is used as the basis for probability computation.
Formula Semiring
sr_formula returns the provenance as an algebraic formula using
a pseudo-semiring using symbols ⊕ and ⊗:
SELECT name, sr_formula(provenance(), 'witness_mapping')
FROM suspects;
For example, a result supported by tuples labelled a and b through
a join would show as a⊗b; one that could come from either a or
b would show as a⊕b.
Counting Semiring (m-semiring)
sr_counting evaluates the provenance in the counting m-semiring,
counting the number of distinct supporting inputs:
SELECT name, sr_counting(provenance(), 'count_mapping')
FROM suspects;
The mapping should assign integer values (typically 1) to leaf tokens.
Why-Provenance
sr_why returns the why-provenance of a result – the set of
minimal witnesses (sets of input tuples) that support the result:
SELECT name, sr_why(provenance(), 'my_mapping')
FROM mytable;
Security Semiring
The security semiring assigns security-level labels to tuples and
propagates them through queries according to a lattice. It is implemented
using provenance_evaluate with custom aggregates for the
semiring plus and times operations. For example, given a type
classification_level (an enum ordered from unclassified to
top_secret):
-- Define the semiring operations
CREATE FUNCTION security_plus_state(state classification_level,
level classification_level)
RETURNS classification_level LANGUAGE SQL IMMUTABLE AS $$
SELECT LEAST(state, level)
$$;
CREATE FUNCTION security_times_state(state classification_level,
level classification_level)
RETURNS classification_level LANGUAGE SQL IMMUTABLE AS $$
SELECT GREATEST(state, level)
$$;
CREATE AGGREGATE security_plus(classification_level) (
sfunc = security_plus_state, stype = classification_level,
initcond = 'top_secret'
);
CREATE AGGREGATE security_times(classification_level) (
sfunc = security_times_state, stype = classification_level,
initcond = 'unclassified'
);
-- Evaluate the security level of a query result
SELECT create_provenance_mapping('personnel_level', 'personnel', 'classification');
SELECT city, provenance_evaluate(provenance(), 'personnel_level',
'unclassified'::classification_level,
'security_plus', 'security_times')
FROM (SELECT DISTINCT city FROM personnel) t;
Custom Semirings
Advanced users can define custom semirings in SQL and pass them to
provenance_evaluate directly, as shown in the security semiring
example above.
Provenance Mappings
All semiring functions take a mapping name as their second argument.
A mapping is created with create_provenance_mapping:
SELECT create_provenance_mapping('mapping_name', 'table_name', 'column_name');
The mapping table has columns token (uuid) and value. You can
also populate it manually for custom scenarios:
INSERT INTO mapping_name(token, value)
SELECT provenance(), my_label FROM mytable;