Transparent Query Rewriting
SQL queries are automatically rewritten to track provenance circuits – across inner and outer joins, subqueries anywhere (EXISTS, IN, quantified and scalar comparisons), aggregation, set operations, CTEs, and recursive queries (WITH RECURSIVE). No changes to your schema or application code required.
Semiring Provenance
A unified semiring API for Boolean, counting, why-, how-, and which-provenance, symbolic formulas, tropical, Viterbi, Łukasiewicz, and min-max / max-min evaluations – all through one compiled-evaluation path.
Aggregate Provenance
Aggregation results carry provenance too: agg_token values record symbolically how a SUM, COUNT, MIN/MAX, or AVG depends on base tuples, support further arithmetic, evaluate in any m-semiring, and give exact probabilities to HAVING predicates – with SQL-faithful empty-group and NULL semantics.
Probabilistic Databases
Attach probabilities to input tuples and get the probability of every query answer. Ask for a guarantee – exact, or additive / relative (ε, δ) – and a cost-based chooser runs the cheapest method that meets it: independent evaluation, tree decomposition, d-DNNF knowledge compilation, certified-bounds d-trees, sieve, Monte Carlo, Karp-Luby, or weighted model counting. Tractable classes (hierarchical, FD-aware, inversion-free queries) are recognised at the planner and evaluated in linear time.
Knowledge Compilation
Compile a Boolean provenance circuit to a CNF and on to a d-DNNF / d-D for exact probability: Tseytin encoding with a variable-to-source-tuple mapping, an in-process tree-decomposition compiler, and a registry of external compilers and weighted model counters (d4, c2d, dsharp, miniC2D, ganak…), with ddnnf_stats to measure the result.
Network Reliability & Reachability
Recursive WITH RECURSIVE reachability becomes exact network reliability: ProvSQL compiles the query along a tree decomposition of the data graph, evaluating a #P-hard problem in time linear in the number of edges on bounded-treewidth graphs – two-terminal and k-terminal reliability, bounded-hop and min-cost paths, on cyclic data too.
Continuous Distributions
First-class random-variable columns. Build queries with Normal, Uniform, Exponential, Erlang, Mixture, and Categorical distributions; evaluate expectations and moments analytically or by Monte Carlo; condition on filter predicates inline.
Conditioning
Ask conditional questions with one operator: A | B (“A given B”) gives P(A | B), conditional expectations and variances, and truncated distributions – computed exactly and correlation-aware over the shared provenance circuit, uniformly across discrete events, continuous random variables, and probabilistic aggregates.
Shapley & Banzhaf Values
Quantify each input tuple’s contribution to a query answer through Shapley and Banzhaf values, computed in a single circuit traversal.
Where-Provenance
Column-level provenance: track which source cells – not just which rows – each output value was copied or derived from, through projections and equijoins. Studio’s Where mode shows it live: hover an output cell to highlight its origins.
Temporal & Update Provenance
Provenance of data modifications: INSERT / UPDATE / DELETE are tracked as update gates, enabling audit and undo. Combined with the interval-union semiring, validity timestamps turn a provenance-tracked database into a temporal one – time-travel queries included.
Lean Formalization
The algebraic core – m-semirings, annotated databases, relational-algebra and aggregation semantics, query rewriting, circuits and probability – is formally verified in Lean 4, with machine-checked theorems backing ProvSQL’s architecture.
ProvSQL Studio
A web UI for provenance inspection, in three modes: render the circuit DAG behind any result token and evaluate any compiled semiring on a pinned subnode, hover output cells to highlight the source rows that produced them, or work in notebooks. Available on PyPI as provsql-studio.
Notebooks
Jupyter-style notebooks over a ProvSQL database: SQL, Markdown, circuit, and evaluation cells, per-cell provenance schemes, saved and loaded as standard .ipynb files. The tutorial and the case studies ship as runnable example notebooks.
ProvSQL Playground
The whole system in your browser: PostgreSQL with ProvSQL compiled to WebAssembly, Studio running on top – no install, no server, nothing leaves the page. Pre-loaded databases and runnable notebooks for the tutorial and the case studies.
C/C++ API
Internal C/C++ API for extending ProvSQL with new semirings and gate types.