ProvSQL is a PostgreSQL extension that adds semiring provenance and uncertainty management to SQL queries. It is implemented as a PostgreSQL planner hook that transparently rewrites queries – no changes to the application or schema are required.

For a full introduction to the concepts and capabilities, see the Introduction in the user documentation.

A pre-built container is also published on Docker Hub as inriavalda/provsql, for a zero-install trial. See the Docker instructions in the installation guide.

Query Rewriting

When a table is registered for provenance tracking via add_provenance(), each tuple gains a provsql UUID column. ProvSQL’s planner hook intercepts every query involving such tables and rewrites it to compute a provenance circuit over those UUIDs, appending the result UUID to the SELECT list.

The rewriter handles:

  • SELECT-FROM-WHERE, JOIN, nested subqueries
  • GROUP BY, SELECT DISTINCT
  • UNION / UNION ALL / EXCEPT
  • VALUES
  • UPDATE / INSERT / DELETE (when provsql.update_provenance is enabled)

Semiring evaluations, probability computation, and Shapley/Banzhaf values are described in the user documentation.

ProvSQL Studio

ProvSQL Studio is a web UI for provenance inspection that pairs with the extension. It runs as a separate Python package (on PyPI as provsql-studio), connects to any ProvSQL-enabled PostgreSQL database, and offers two complementary modes: a Circuit view that renders the provenance DAG behind a result token with on-the-fly semiring evaluation on any pinned subnode, and a Where view that highlights, on hover, the source cells that contributed to each output value.

Lean Formalization

Key parts of the algebraic framework underlying ProvSQL – m-semirings, annotated databases, relational algebra semantics, and aggregation – have been formally verified in Lean 4. See the Lean formalization page for details.

Archival and Citation

DOI Archived in Software Heritage

ProvSQL is continuously archived by Software Heritage, the universal software preservation infrastructure. You can browse the archived source tree at archive.softwareheritage.org.

Every tagged release receives a persistent DOI from Zenodo. The concept DOI above resolves to the latest version; a versioned DOI is available for each release from the Zenodo record page.

To cite ProvSQL in academic work, click the Cite this repository button on the GitHub repository page, or read the CITATION.cff file directly. The canonical reference is:

Aryak Sen, Silviu Maniu, Pierre Senellart. ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data. Proc. 42nd IEEE International Conference on Data Engineering (ICDE), Montréal, Canada, May 2026. arXiv:2504.12058

Download BibTeX

See the Publications page for a full list of research papers related to ProvSQL.

Architecture

The diagram below shows the end-to-end flow of a query through ProvSQL (see the architecture chapter in the developer guide for details):

ProvSQL dataflow

Get Started