Provenance Export
ProvSQL can export provenance circuits in standard or visual formats for use in external tools or for inspection.
Symbolic Representation
sr_formula returns a symbolic representation of the provenance
as a human-readable formula. Each leaf token is replaced by its
mapped value, and the circuit operations are rendered using ⊕
and ⊗:
SELECT name, sr_formula(provenance(), 'witness_mapping')
FROM suspects;
This is the simplest way to inspect provenance interactively. See Semiring Evaluation for the full description of semiring evaluation functions.
PROV-XML Export
PROV-XML is a W3C standard for
representing provenance information. ProvSQL can serialise provenance
circuits to this format using to_provxml:
SELECT provsql.to_provxml(provenance())
FROM mytable;
The function returns an XML document conforming to the PROV-XML schema, representing the provenance circuit rooted at the given token.
Example output (abbreviated):
<prov:document
xmlns:prov="http://www.w3.org/ns/prov#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<prov:entity prov:id="..."/>
<prov:wasDerivedFrom>
<prov:generatedEntity prov:id="..."/>
<prov:usedEntity prov:id="..."/>
</prov:wasDerivedFrom>
</prov:document>
The same export is reachable from Studio’s evaluation strip: it lists PROV-XML under the Other optgroup and serialises the currently rendered circuit on demand.
Circuit Visualisation
view_circuit renders a provenance circuit as an ASCII box-art
diagram. Internally it writes the circuit in GraphViz DOT format to a
temporary file and runs graph-easy --as=boxart on it, returning the
result as a text value. The graph-easy executable must be installed and
accessible in the PostgreSQL server’s PATH (or in a directory listed in the
provsql.tool_search_path GUC; see Configuration Reference).
SELECT provsql.view_circuit(provenance(), 'my_mapping')
FROM mytable
LIMIT 1;
To visualise the circuits for multiple rows simultaneously:
SELECT city, view_circuit(provenance(), 'my_mapping')
FROM (SELECT DISTINCT city FROM personnel) t;
For an interactive alternative, see Studio’s
Circuit mode. It renders the same DAG
in the browser: click a UUID cell in a query result to display its
circuit, hover to highlight a subtree, expand the frontier on
demand, and read every gate’s metadata (including stored
probability) in the side inspector. It does not require
graph-easy and is unaffected by provsql.tool_search_path.
Verbosity
The provsql.verbose_level GUC variable controls diagnostic output.
Setting it to 20 is useful when debugging circuit export: intermediate
DOT and circuit files are kept on disk instead of being deleted, and the
d-DNNF method and gate count are reported. For example:
SET provsql.verbose_level = 20;
The default is 0 (silent). See Configuration Reference for the full
description of all thresholds.
Subcircuit Introspection
For programmatic exploration of a circuit (rather than a flat formula or
ASCII diagram), circuit_subgraph returns a BFS expansion of
the DAG rooted at a token, capped at a configurable depth. Each row
describes one (parent, node) edge — gate type, info1 / info2
payload and BFS depth come along on the same row. The root is reported
once with parent and child_pos NULL; a node with several
parents within the bound is reported once per incoming edge (callers
that need a one-row-per-node view should deduplicate on node):
SELECT * FROM provsql.circuit_subgraph(
(SELECT provenance() FROM personnel WHERE name = 'John'),
4);
depth is the node’s shortest-path distance from the root, so an edge
(parent, child) always satisfies parent.depth + 1 >= child.depth;
equality holds on shortest-path edges, strict inequality on
“shortcut” edges into a multi-parent child. For a node at
depth = max_depth, the caller can compare get_children
against the edges reported to detect a frontier node and request
another layer.
To resolve an input-gate UUID back to the row that produced it,
resolve_input searches every provenance-tracked relation and
returns the row encoded as JSONB:
SELECT relation, row_data
FROM provsql.resolve_input(
(SELECT provsql FROM personnel WHERE name = 'John'));
It returns zero rows for non-input gates and for tokens that don’t match any tracked row, without raising.
Circuit Structure
All the representations above describe the same underlying DAG. Gate types that appear:
input– input (variable) gate; corresponds to a base tupleplus– semiring addition; generated by UNION and disjunctive conditionstimes– semiring multiplication; generated by joins and conjunctive conditionsmonus– m-semiring monus; generated by EXCEPTzero,one– semiring additive and multiplicative identity elementsproject,eq– where-provenance gates (column projection and equijoin)agg– aggregation gate (for aggregate provenance)semimod– semimodule scalar multiplication (for aggregate provenance)value– scalar value (for aggregate provenance, or numeric constant lifted into a continuous random variable)mulinput– multivalued input (for Boolean provenance)cmp– comparison gate (HAVING predicates, and the planner-hook lift of WHERE comparators onrandom_variablecolumns; see Continuous Distributions)delta– δ-semiring operator [Amsterdamer et al., 2011]update– update operation gaterv– continuous random-variable leaf (Normal / Uniform / Exponential / Erlang); see Continuous Distributionsarith– N-ary arithmetic over scalar children (+ - * /, unary-)mixture– Bernoulli or categorical mixture of scalar random-variable roots