Provenance Export

ProvSQL can export provenance circuits in standard or visual formats for use in external tools or for inspection.

Symbolic Representation

sr_formula returns a symbolic representation of the provenance as a human-readable formula. Each leaf token is replaced by its mapped value, and the circuit operations are rendered using and :

SELECT name, sr_formula(provenance(), 'witness_mapping')
FROM suspects;

This is the simplest way to inspect provenance interactively. See Semiring Evaluation for the full description of semiring evaluation functions.

PROV-XML Export

PROV-XML is a W3C standard for representing provenance information. ProvSQL can serialise provenance circuits to this format using to_provxml:

SELECT provsql.to_provxml(provenance())
FROM mytable;

The function returns an XML document conforming to the PROV-XML schema, representing the provenance circuit rooted at the given token.

Example output (abbreviated):

<prov:document
    xmlns:prov="http://www.w3.org/ns/prov#"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <prov:entity prov:id="..."/>
  <prov:wasDerivedFrom>
    <prov:generatedEntity prov:id="..."/>
    <prov:usedEntity prov:id="..."/>
  </prov:wasDerivedFrom>
</prov:document>

The same export is reachable from Studio’s evaluation strip: it lists PROV-XML under the Other optgroup and serialises the currently rendered circuit on demand.

Circuit Visualisation

view_circuit renders a provenance circuit as an ASCII box-art diagram. Internally it writes the circuit in GraphViz DOT format to a temporary file and runs graph-easy --as=boxart on it, returning the result as a text value. The graph-easy executable must be installed and accessible in the PostgreSQL server’s PATH (or in a directory listed in the provsql.tool_search_path GUC; see Configuration Reference).

SELECT provsql.view_circuit(provenance(), 'my_mapping')
FROM mytable
LIMIT 1;

To visualise the circuits for multiple rows simultaneously:

SELECT city, view_circuit(provenance(), 'my_mapping')
FROM (SELECT DISTINCT city FROM personnel) t;

For an interactive alternative, see Studio’s Circuit mode. It renders the same DAG in the browser: click a UUID cell in a query result to display its circuit, hover to highlight a subtree, expand the frontier on demand, and read every gate’s metadata (including stored probability) in the side inspector. It does not require graph-easy and is unaffected by provsql.tool_search_path.

Verbosity

The provsql.verbose_level GUC variable controls diagnostic output. Setting it to 20 is useful when debugging circuit export: intermediate DOT and circuit files are kept on disk instead of being deleted, and the d-DNNF method and gate count are reported. For example:

SET provsql.verbose_level = 20;

The default is 0 (silent). See Configuration Reference for the full description of all thresholds.

Subcircuit Introspection

For programmatic exploration of a circuit (rather than a flat formula or ASCII diagram), circuit_subgraph returns a BFS expansion of the DAG rooted at a token, capped at a configurable depth. Each row describes one (parent, node) edge — gate type, info1 / info2 payload and BFS depth come along on the same row. The root is reported once with parent and child_pos NULL; a node with several parents within the bound is reported once per incoming edge (callers that need a one-row-per-node view should deduplicate on node):

SELECT * FROM provsql.circuit_subgraph(
  (SELECT provenance() FROM personnel WHERE name = 'John'),
  4);

depth is the node’s shortest-path distance from the root, so an edge (parent, child) always satisfies parent.depth + 1 >= child.depth; equality holds on shortest-path edges, strict inequality on “shortcut” edges into a multi-parent child. For a node at depth = max_depth, the caller can compare get_children against the edges reported to detect a frontier node and request another layer.

To resolve an input-gate UUID back to the row that produced it, resolve_input searches every provenance-tracked relation and returns the row encoded as JSONB:

SELECT relation, row_data
FROM provsql.resolve_input(
  (SELECT provsql FROM personnel WHERE name = 'John'));

It returns zero rows for non-input gates and for tokens that don’t match any tracked row, without raising.

Circuit Structure

All the representations above describe the same underlying DAG. Gate types that appear:

  • input – input (variable) gate; corresponds to a base tuple

  • plus – semiring addition; generated by UNION and disjunctive conditions

  • times – semiring multiplication; generated by joins and conjunctive conditions

  • monus – m-semiring monus; generated by EXCEPT

  • zero, one – semiring additive and multiplicative identity elements

  • project, eq – where-provenance gates (column projection and equijoin)

  • agg – aggregation gate (for aggregate provenance)

  • semimod – semimodule scalar multiplication (for aggregate provenance)

  • value – scalar value (for aggregate provenance, or numeric constant lifted into a continuous random variable)

  • mulinput – multivalued input (for Boolean provenance)

  • cmp – comparison gate (HAVING predicates, and the planner-hook lift of WHERE comparators on random_variable columns; see Continuous Distributions)

  • delta – δ-semiring operator [Amsterdamer et al., 2011]

  • update – update operation gate

  • rv – continuous random-variable leaf (Normal / Uniform / Exponential / Erlang); see Continuous Distributions

  • arith – N-ary arithmetic over scalar children (+ - * /, unary -)

  • mixture – Bernoulli or categorical mixture of scalar random-variable roots