Diagram of a query Q rewritten into Q-hat by ProvSQL's planner hook, then evaluated by PostgreSQL with provenance UUIDs in the result.

Transparent Query Rewriting

SQL queries are automatically rewritten to track provenance circuits, across joins, aggregation, set operations, CTEs, and recursive queries (WITH RECURSIVE). No changes to your schema or application code required.

Learn more

Diagram of semiring instantiation: a semiring K is compiled into a C++ class and combined with a provenance mapping to evaluate circuits and produce a result table with explanations in K.

Semiring Provenance

A unified semiring API for Boolean, counting, why-, how-, and which-provenance, symbolic formulas, tropical, Viterbi, Łukasiewicz, and min-max / max-min evaluations – all through one compiled-evaluation path.

Semiring docs

ProvSQL Studio screenshot: a small provenance circuit with the evaluation strip's Probability-method dropdown open, listing the exact (independent, possible-worlds, tree-decomposition, compilation) and approximate (monte-carlo, weightmc) backends.

Probabilistic Databases

Compute tuple-independent probabilities of query answers via independent-circuit evaluation, tree decomposition, d-DNNF knowledge compilation, Monte Carlo, or weighted model counting. Inspect every intermediate artifact (CNF, compiled d-DNNF, tree decomposition) or benchmark all methods at once. An opt-in planner-side rewrite recognises hierarchical conjunctive queries (and a family of FD-aware extensions) and routes them through the linear-time independent evaluator; the inversion-free class of queries compiles directly to a linear-size d-DNNF.

Probability docs

ProvSQL Studio screenshot: a Normal random_variable gate N(40, 4) above its Distribution-profile panel, conditioned on the threshold event x ≥ 35 (truncated PDF with the analytical curve overlaid).

Continuous Distributions

First-class random-variable columns. Build queries with Normal, Uniform, Exponential, Erlang, Mixture, and Categorical distributions; evaluate expectations and moments analytically or by Monte Carlo; condition on filter predicates inline.

Continuous-distribution docs

Image-retrieval demo screenshot: per-object table showing the probability and Shapley value of each predicted object contributing to a probabilistic count query, with the highest-contribution row highlighted.

Shapley & Banzhaf Values

Quantify each input tuple’s contribution to a query answer through Shapley and Banzhaf values, computed in a single circuit traversal.

Shapley docs

ProvSQL Studio Circuit-mode screenshot showing a provenance DAG with a pinned input gate.

ProvSQL Studio

A web UI for provenance inspection: render the circuit DAG behind any result token, evaluate any compiled semiring on a pinned subnode, and hover output cells to highlight the source rows that produced them. Available on PyPI as provsql-studio.

Studio docs

SQL API

Full SQL-level API for managing provenance tokens and circuit gates.

SQL API docs

C/C++ API

Internal C/C++ API for extending ProvSQL with new semirings and gate types.

C/C++ API docs

Publications

Research papers describing the theory and implementation of ProvSQL.

See publications