Overview
ProvSQL is a PostgreSQL extension that adds semiring provenance and uncertainty management to SQL queries. It is implemented as a PostgreSQL planner hook that transparently rewrites queries – no changes to the application or schema are required.
For a full introduction to the concepts and capabilities, see the Introduction in the user documentation.
Query Rewriting
When a table is registered for provenance tracking via
add_provenance(),
each tuple gains a provsql UUID column. ProvSQL’s planner hook intercepts
every query involving such tables and rewrites it to compute a provenance
circuit over those UUIDs, appending the result UUID to the SELECT list.
The rewriter handles:
- SELECT-FROM-WHERE, JOIN, nested subqueries
- GROUP BY, SELECT DISTINCT
- UNION / UNION ALL / EXCEPT
- VALUES
- UPDATE / INSERT / DELETE (when
provsql.update_provenanceis enabled)
Semiring evaluations, probability computation, and Shapley/Banzhaf values are described in the user documentation.
Lean Formalization
Key parts of the algebraic framework underlying ProvSQL – m-semirings, annotated databases, and relational algebra semantics – have been formally verified in Lean 4. See the Lean formalization page for details.
Architecture and More
- SQL API reference – user-facing SQL functions
- C/C++ API reference – internal implementation
- Source code on GitHub
- Video demonstrations of ProvSQL in action
- Contributors and funding