Overview
ProvSQL is a PostgreSQL extension that adds semiring provenance and uncertainty management to SQL queries. It is implemented as a PostgreSQL planner hook that transparently rewrites queries – no changes to the application or schema are required.
For a full introduction to the concepts and capabilities, see the Introduction in the user documentation.
A pre-built container is also published on Docker Hub as
inriavalda/provsql, for
a zero-install trial. See the Docker
instructions in the
installation guide.
Query Rewriting
When a table is registered for provenance tracking via
add_provenance(),
each tuple gains a provsql UUID column. ProvSQL’s planner hook intercepts
every query involving such tables and rewrites it to compute a provenance
circuit over those UUIDs, appending the result UUID to the SELECT list.
The rewriter handles:
- SELECT-FROM-WHERE, JOIN, nested subqueries
- GROUP BY, SELECT DISTINCT
- UNION / UNION ALL / EXCEPT
- VALUES
- UPDATE / INSERT / DELETE (when
provsql.update_provenanceis enabled)
Semiring evaluations, probability computation, and Shapley/Banzhaf values are described in the user documentation.
ProvSQL Studio
ProvSQL Studio is a web UI for provenance
inspection that pairs with the extension. It runs as a separate Python
package (on PyPI as
provsql-studio), connects
to any ProvSQL-enabled PostgreSQL database, and offers two complementary
modes: a Circuit view that renders the provenance DAG behind a
result token with on-the-fly semiring evaluation on any pinned subnode,
and a Where view that highlights, on hover, the source cells that
contributed to each output value.
Lean Formalization
Key parts of the algebraic framework underlying ProvSQL – m-semirings, annotated databases, relational algebra semantics, and aggregation – have been formally verified in Lean 4. See the Lean formalization page for details.
Archival and Citation
ProvSQL is continuously archived by Software Heritage, the universal software preservation infrastructure. You can browse the archived source tree at archive.softwareheritage.org.
Every tagged release receives a persistent DOI from Zenodo. The concept DOI above resolves to the latest version; a versioned DOI is available for each release from the Zenodo record page.
To cite ProvSQL in academic work, click the
Cite this repository
button on the GitHub repository page, or read the
CITATION.cff
file directly. The canonical reference is:
Aryak Sen, Silviu Maniu, Pierre Senellart. ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data. Proc. 42nd IEEE International Conference on Data Engineering (ICDE), Montréal, Canada, May 2026. arXiv:2504.12058
See the Publications page for a full list of research papers related to ProvSQL.
Architecture
The diagram below shows the end-to-end flow of a query through ProvSQL (see the architecture chapter in the developer guide for details):
- SQL API reference – user-facing SQL functions
- C/C++ API reference – internal implementation
- Source code on GitHub
- Video demonstrations of ProvSQL in action
- Contributors and funding