Configuration Reference

ProvSQL is controlled by GUC (Grand Unified Configuration) variables, all settable per session with SET or permanently in postgresql.conf or with ALTER DATABASE / ALTER ROLE.

provsql.active (default: on)

Master switch. When off, ProvSQL drops all provenance annotations silently, as if the extension were not loaded. Useful to temporarily disable provenance tracking without unloading the extension.

provsql.where_provenance (default: off)

Enable where-provenance tracking (see Where-Provenance). Adds project and eq gates to record the source cell of each output value. Disabled by default due to overhead.

provsql.update_provenance (default: off)

Enable provenance tracking for INSERT, UPDATE, and DELETE statements (see Data Modification Tracking). Requires PostgreSQL ≥ 14.

provsql.boolean_provenance (default: off)

Umbrella switch for every optimisation that is sound only when provenance is interpreted in the Boolean semiring. When on, two independent layers activate:

  • Planner-level safe-query rewriting. Self-join-free hierarchical conjunctive queries (and UCQs of such queries) over TID / BID base tables are rewritten with per-atom DISTINCT projections so the resulting provenance circuit is read-once. A read-once circuit is probability-evaluated in linear time by the 'independent' method, replacing the 'tree-decomposition' or 'compilation' fallback that would otherwise be needed. Queries outside the recognised class pass through unchanged.

  • Load-time Boolean-only circuit simplification. When a circuit is loaded from the mmap store, foldBooleanIdentities applies rewrites that hold in the Boolean semiring but not in counting / tropical / Viterbi / …: idempotence (drop repeated child wires of plus / times), plus-with-one absorption (plus(…, one, …) one), and absorption (plus(x, times(x, y, …), …) plus(x, …) and dually for times). This is independent of provsql.simplify_on_load: disabling the universal passes does not disable the Boolean-only ones.

In both cases the rewritten gates are tagged (persistently for the rewriter, in a side-band set for the load-time simplifier) so that semirings whose algebra is not Boolean-faithful refuse to evaluate them; see Probabilities and the compatibility note in Semiring Evaluation. Off by default because the rewrites change the multiset of result rows / the underlying polynomial and are therefore unsound for per-row provenance interrogations and for non-Boolean-faithful semirings.

provsql.classify_top_level (default: off)

Emit a NOTICE for every top-level SELECT reporting the certified kind of the result relation under the provsql_table_kind taxonomy (TID / BID / OPAQUE) and the provenance-tracked base relations it touches:

NOTICE:  ProvSQL: query result is TID (sources: public.personnel)
NOTICE:  ProvSQL: query result is OPAQUE
NOTICE:  ProvSQL: query result is TID (no provenance-tracked sources)

The source list is reported for TID and BID results (including the explicit no provenance-tracked sources marker for the deterministic case) but omitted for OPAQUE results: when the shape gate trips on a sublink, a set operation, a GROUP BY, etc., the rtable walk only reaches the syntactically visible sources, so a printed list would be partial and misleadingly suggest completeness.

The classifier runs on the user’s parsed Query before any rewriting and only on the user’s outermost statement; PL/pgSQL helpers the rewriter calls into (provenance_times, provenance_aggregate, …) do not produce extra notices.

ProvSQL Studio enables this GUC automatically and renders the certified kind on the result-table provenance pill; see ProvSQL Studio.

provsql.verbose_level (default: 0)

Controls the verbosity of ProvSQL diagnostic messages. 0 is silent. The meaningful thresholds are:

  • ≥ 20 – print the rewritten SQL query before and after provenance rewriting (requires PostgreSQL ≥ 15); print the Tseytin circuit and compiled d-DNNF filenames during knowledge compilation; report which d-DNNF method was chosen (direct interpretation, tree decomposition, or external compilation) and its gate count; keep all intermediate temporary files (Tseytin, d-DNNF, DOT) instead of deleting them.

  • ≥ 40 – also print the time spent by the planner on rewriting.

  • ≥ 50 – also print the full internal parse-tree representation of the query before and after rewriting.

provsql.aggtoken_text_as_uuid (default: off)

Controls how an agg_token cell renders as text. By default the output function returns the human-friendly "value (*)" form, where value is the running aggregate state. When set to on, it returns the underlying provenance UUID instead. UI layers (notably ProvSQL Studio) flip this on per session so aggregate cells expose the circuit root UUID for click-through; the user-facing display string is recovered via agg_token_value_text for any such UUID. Has no effect on EXPLAIN output, on the underlying storage, or on numeric / casting behaviour of agg_token.

provsql.monte_carlo_seed (default: -1)

Seed for the Monte Carlo sampler used throughout the probability and continuous-distribution paths. The default -1 seeds from std::random_device for non-deterministic sampling; any other integer value (including 0) is used as a literal seed for std::mt19937_64, making probability_evaluate(..., 'monte-carlo', 'n') reproducible across runs and across the Bernoulli and continuous (gate_rv) sampling paths.

provsql.rv_mc_samples (default: 10000)

Default sample count for the Monte-Carlo fallback inside the analytical evaluators (expected, variance, moment, rv_sample, rv_histogram) when a sub-circuit cannot be decomposed and must be sampled. Set to 0 to disable the fallback entirely: callers raise an exception rather than sampling, which is useful when only analytical answers are acceptable. Unrelated to probability_evaluate(..., 'monte-carlo', 'n') where the sample count is an explicit argument.

provsql.simplify_on_load (default: on)

Apply the universal peephole simplifier (currently the RangeCheck cmp-resolution pass; future passes plug into the same pipeline) when loading a provenance circuit from the mmap store into memory. Every comparator decidable from the propagated support intervals collapses to a Bernoulli gate_input with probability 0 or 1, transparent to every downstream consumer (semiring evaluators, Monte Carlo, view_circuit, PROV-XML export, ProvSQL Studio). Set to off to inspect raw circuit structure (e.g. when debugging gate-creation paths). See Continuous Distributions for the broader hybrid-evaluation context.

provsql.tool_search_path (default: empty)

Colon-separated list of directories prepended to PATH when ProvSQL spawns external command-line tools: the d-DNNF compilers (d4, c2d, minic2d, dsharp), the WeightMC weighted model counter (weightmc), and the GraphViz ASCII renderer (graph-easy). The server’s PATH is searched as a fallback, so an entry here only needs to be set when a tool lives outside the server’s default PATH (e.g. in $HOME/local/bin, a Conda environment, /opt/...). Example:

SET provsql.tool_search_path = '/opt/d4:/home/postgres/bin';

All variables above have user-level scope: any user can change them for their own session without superuser privileges.