Welcome to ProvSQL’s documentation!
ProvSQL is a PostgreSQL extension that adds semiring provenance and uncertainty management to SQL queries. It transparently rewrites queries to track which input tuples contribute to each result, then evaluates the provenance in any user-chosen semiring – Boolean reachability, counting, probability, Shapley values, and more.
A companion Python web UI, ProvSQL Studio, provides interactive provenance inspection on top of any ProvSQL-enabled database.
This documentation is organized into four parts:
The User Guide explains how to install, configure, and use ProvSQL from the SQL level and through ProvSQL Studio. Start here if you are new to ProvSQL.
The Case Studies present extended worked examples of ProvSQL applied to realistic scenarios, covering a broader range of features than the introductory tutorial.
The Developer Guide describes ProvSQL’s internal architecture and is aimed at contributors. It covers the PostgreSQL extension concepts ProvSQL relies on, the architecture and component map, the query rewriting pipeline, memory management, the where-provenance and data-modification subsystems, aggregation and semiring evaluation, probability computation, coding conventions, testing, debugging, the build system, and ProvSQL Studio’s architecture.
The API Reference provides auto-generated reference documentation for the SQL and C/C++ APIs (via Doxygen).
User Guide
- Introduction
- Getting ProvSQL
- Tutorial: Who Killed Daphine?
- The Scenario
- Setup
- Step 1: Explore the Database
- Step 2: Build a Sightings Table
- Step 3: Enable Provenance
- Step 4: Find Contradictions
- Step 5: Display Provenance Formulas
- Step 6: Build a Consistent Sightings Table
- Step 7: Identify Suspects
- Step 8: Count Confirming Sightings
- Step 9: Assign Reliability Probabilities
- Step 10: Find the Murderer
- Adding Provenance to a Table
- Querying with Provenance
- Aggregation and Grouping
- Semiring Evaluation
- Boolean Semiring
- Boolean-Expression Semiring
- Symbolic Representation (as a Formula)
- Counting Semiring (m-semiring)
- Why-Provenance
- How-Provenance
- Which-Provenance (Lineage)
- Tropical Semiring (m-semiring)
- Viterbi Semiring (m-semiring)
- Łukasiewicz Fuzzy Semiring (m-semiring)
- Temporal (Interval-Union) Semiring (m-semiring)
- Numeric and Integer Interval-Union Semirings (m-semirings)
- Min-Max and Max-Min Semirings (m-semirings)
- Custom Semirings with
provenance_evaluate - Provenance Mappings
- Compatibility with Boolean-Provenance Rewriting
- Probabilities
- Continuous Distributions
- Introduction
- Distribution Constructors
- Arithmetic on Random Variables
- Probabilistic Queries
- Configuration of the Monte Carlo Sampler
- Closed-Form Evaluation
- Moments and Support
- Conditional Inference
- Sampling and Histograms
- Mixtures and Categorical Random Variables
- Aggregation Over Random Variables
- Studio Integration
- Out of Scope / Open Follow-ups
- Shapley and Banzhaf Values
- Where-Provenance
- Data Modification Tracking
- Temporal Features
- Provenance Export
- ProvSQL Studio
- Configuration Reference
Case Studies
- Case Study: The Intelligence Agency
- The Scenario
- Setup
- Step 1: Explore the Database
- Step 2: Enable Provenance and Create a Name Mapping
- Step 3: Cities Shared by Multiple Agents
- Step 4: Minimum Security Clearance (sr_minmax)
- Step 5: Cities with Exactly One Agent (EXCEPT / Monus)
- Step 6: Where-Provenance
- Step 7: Assign Probabilities
- Step 8: Probability – Exact
- Step 9: Probability – Monte Carlo
- Step 10: Probability – Knowledge Compiler
- Step 11: Visualise a Provenance Circuit
- Step 12: Export to XML
- Step 13: Large Circuit Benchmark
- Step 14: The Boolean Expression Behind a Token
- Step 15: Programmatic Circuit Inspection
- Case Study: The Open Science Database
- The Scenario
- Setup
- Step 1: Explore the Database
- Step 2: Enable Provenance and Join with Lookup Tables
- Step 3: Create a Provenance Mapping
- Step 4: Identify Single-Source Claims
- Step 5: Why-Provenance for Replicated Findings
- Step 6: Evidence Grade Semiring
- Step 7: Where-Provenance
- Step 8: Where-Provenance on the Base Table
- Step 9: Assign Probabilities
- Step 10: Build the Replication View
- Step 11: Inspect Replication with
sr_counting - Step 12: Probability of Replication
- Step 13: Shapley Values
- Step 14: Banzhaf Values
- Step 15: Bulk Shapley/Banzhaf with
shapley_all_varsandbanzhaf_all_vars - Step 16: Arithmetic on Aggregate Results
- Case Study: Île-de-France Public Transit
- Case Study: Government Ministers Over Time
- The Scenario
- Setup
- Step 1: Explore the Database
- Step 2: Union of Temporal Intervals
- Step 3: Timeslice – Who Was in Government During Macron’s First Term?
- Step 4: History – All Holders of the Minister of Justice Role
- Step 5: Timetravel – The Socialist Government of July 1981
- Step 6: Data Modification – Replace the Prime Minister
- Step 7: Undo – Reinstate the Original Prime Minister
- Case Study: The Wildlife Photo Archive
- The Scenario
- Setup
- Step 1: Explore the Database
- Step 2: Enable Provenance and Create a Name Mapping
- Step 3: Inline Lookup with
VALUES - Step 4: Conjunctive Query (Naive)
- Step 5: Mutually Exclusive Candidates with
repair_key - Step 6: Assign Probabilities and Verify Mutual Exclusion
- Step 7: Probabilistic Ranking vs. Threshold Filtering
- Step 8: Absence Constraint with
EXCEPT - Step 9: Multi-Condition Query via a CTE
- Step 10: Expected Species Counts with
expected
- Case Study: The City Air-Quality Sensor Network
- The Scenario
- Setup
- Step 1: Inspect a Noisy Reading
- Step 2: A First Probabilistic Threshold
- Step 3: The Simplifier in Action
- Step 4: Calibration via Mixtures
- Step 5: Aggregation Over Random Variables
- Step 6: Conditional Inference
- Step 7: Diagnostic Sampling
- Step 8: Combining Batches via UNION
- Step 9: Filtering Grouped Random Variables by Expected Value
- Step 10: Independent vs Monte Carlo
Developer Guide
- Developer Guide
- PostgreSQL Extension Primer
- Architecture Overview
- Query Rewriting Pipeline
- Memory Management
- Where-Provenance
- Data-Modification Tracking
- Aggregation Provenance
- Semiring Evaluation
- Probability Evaluation
- Continuous Distributions
- Coding Conventions
- Testing
- Debugging
- Build System
- ProvSQL Studio
API Reference
References