Case Studies: Overview
The tutorial is the gentle first contact with ProvSQL: it walks through the core workflow – add provenance to a table, run a query, evaluate the result in a semiring – on a single small example. The case studies that follow are longer, self-contained worked examples, each built around a realistic dataset and centred on a different facet of the system. They go well beyond the tutorial in both the breadth of SQL they exercise and the depth to which they push a particular capability.
Each case study is independent: most ship a single self-contained
setup.sql to download and load; case study 3 instead gives
instructions for fetching the large Île-de-France GTFS dataset (not
bundled, due to its size), and case study 4 loads bundled data files
from the source tree under doc/casestudy4/data/. Each states its
scenario and tasks up front and can be read on its own. Read the one whose theme
matches what you want to learn; the coverage matrix at the end of this page is the quickest way to
find which study demonstrates a given feature.
What each case study covers
- Case study 1 – Intelligence Agency
The broadest single tour of provenance evaluation. A security-classification scenario over seven agents drives a custom min-clearance semiring, where-provenance, circuit export (PROV-XML,
graph-easy), and the full probability-method line-up side by side (possible-worlds, Monte-Carlo, tree-decomposition, and knowledge compilation throughd4/c2d/dsharp/minic2d). Start here for a panoramic view.- Case study 2 – Open Science Database
Evidence synthesis over a (fictional) biomedical literature corpus: single-source vs. replicated claims, contradictory findings, and strength-of-evidence ranking. This is the home of Shapley and Banzhaf values – attributing a result to the studies behind it – alongside counting / why-provenance, a custom evidence-grade semiring,
HAVING, and where-provenance.- Case study 3 – Île-de-France Public Transit
Boolean provenance at real-world scale. On the STIF GTFS dataset (hundreds of routes, tens of thousands of stops) it answers a reachability-and-accessibility question – which stops are reachable from Bagneux, and is the whole journey wheelchair-accessible – where a result token is true iff every record along the path carries the accessibility flag.
- Case study 4 – Government Ministers Over Time
The temporal extension and data-modification tracking. Over a database of French and Singaporean ministers, every fact carries a validity interval; the study demonstrates time-travel, history, timeslice, and an
INSERT/DELETEround-trip that is then rolled back withundo.- Case study 5 – Wildlife Photo Archive
Uncertainty that comes out of a machine-learning detector. Candidate species per bounding box are modelled with
repair_keyand themulinputgate (block-correlated alternatives); the study contrasts probabilistic ranking against naive confidence thresholding, and computesexpectedspecies counts.- Case study 6 – City Air-Quality Sensor Network
The continuous-distribution surface, end to end:
random_variablecolumns (Normal / Uniform / Exponential / Erlang / categorical / mixture), arithmetic and comparison on them, analytic moments and Monte-Carlo fallback, and conditional inference. It is the first study driven primarily through ProvSQL Studio, where PDFs, mixture DAGs, and simplifier before/after views make the machinery visible.- Case study 7 – Peer-Review Assignment and Knowledge Compilation
How the shape of a query, together with the keys the schema declares, fixes the shape of the Boolean provenance circuit and thereby which probability method is cheap. A peer-reviewing scenario opens with one coverage question asked three ways – safe by shape, safe by a key, and genuinely
-hard – exposing the knowledge-compilation pipeline (Tseytin CNF, d-DNNF compilation,
ddnnf_stats, theprovsql.boolean_provenancerewriter, the external-tool registry). It then climbs three further rungs that escape the hard case in different ways: aHAVING count(*)Poisson-binomial shortcut, an inversion-free self-join that is linear-time from a query-derived variable order, and arepair_keytable whose block correlation stays tractable. A closing section turns to recursive reachability as network reliability. Also driven through Studio.
Feature coverage matrix
The tables below cross-reference every user-facing feature documented under the User Guide against the tutorial and the seven case studies.
Columns:
T – Tutorial (Who Killed Daphine?)
1 – Case study 1 (Intelligence Agency)
2 – Case study 2 (Open Science Database)
3 – Case study 3 (Île-de-France Public Transit)
4 – Case study 4 (Government Ministers Over Time)
5 – Case study 5 (Wildlife Photo Archive)
6 – Case study 6 (City Air-Quality Sensor Network)
7 – Case study 7 (Peer-Review Assignment and Knowledge Compilation)
Cells: ✓ the feature is exercised; (✓) it is mentioned in
passing but not actually executed; empty means it is not covered.
Setup and basics
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
|
✓ |
|||||||
|
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
|
✓ |
✓ |
✓ |
✓ |
✓ |
|||
|
✓ |
|||||||
Hand-built mapping table |
✓ |
|||||||
|
||||||||
|
Supported SQL constructs
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
SELECT-FROM-WHERE / inner JOIN |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
Self-join |
✓ |
✓ |
✓ |
✓ |
✓ |
|||
Subqueries in FROM / nested |
✓ |
✓ |
✓ |
|||||
GROUP BY |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
|
SELECT DISTINCT |
✓ |
✓ |
✓ |
✓ |
✓ |
|||
EXCEPT (monus) |
✓ |
✓ |
✓ |
|||||
UNION / UNION ALL |
✓ |
✓ |
||||||
HAVING |
✓ |
✓ |
✓ |
|||||
VALUES |
✓ |
✓ |
||||||
CTE (WITH) |
✓ |
✓ |
||||||
WITH RECURSIVE |
✓ |
|||||||
LATERAL |
✓ |
|||||||
Window functions |
||||||||
FILTER clause on aggregates |
||||||||
CREATE TABLE AS SELECT |
✓ |
✓ |
✓ |
✓ |
||||
Provenance-bearing VIEW |
✓ |
✓ |
||||||
INSERT … SELECT (provenance propagation) |
Aggregation
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
COUNT / SUM / MIN / MAX / AVG |
✓ |
✓ |
✓ |
✓ |
||||
|
✓ |
|||||||
|
||||||||
|
||||||||
Arithmetic / cast on aggregate result |
✓ |
|||||||
|
||||||||
|
Circuit inspection
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
✓ |
||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
✓ |
||||||
|
(✓) |
✓ |
Knowledge compilation and safe queries
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
|
✓ |
|||||||
Safe-query rewriter (hierarchical / read-once) |
✓ |
|||||||
Tseytin CNF export (DIMACS) |
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
Inversion-free certificate ( |
✓ |
|||||||
External-tool registry ( |
(✓) |
|||||||
|
||||||||
|
||||||||
|
Semiring evaluation
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
|
✓ |
✓ |
||||||
|
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
||
|
✓ |
✓ |
||||||
|
✓ |
(✓) |
||||||
|
(✓) |
|||||||
|
||||||||
|
||||||||
|
||||||||
|
||||||||
|
✓ |
|||||||
|
✓ |
|||||||
Custom semiring via |
✓ |
|||||||
|
Probabilities
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
✓ |
✓ |
✓ |
✓ |
|||
|
✓ |
✓ |
||||||
|
✓ |
✓ |
✓ |
✓ |
||||
|
✓ |
✓ |
||||||
|
✓ |
✓ |
||||||
|
(✓) |
✓ |
✓ |
✓ |
||||
|
(✓) |
✓ |
✓ |
✓ |
✓ |
|||
|
(✓) |
✓ |
✓ |
|||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
✓ |
||||||
|
✓ |
✓ |
||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
Continuous random variables
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
Arithmetic on |
✓ |
|||||||
Comparison |
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
Conditional inference via |
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
Shapley and Banzhaf values
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
Where-provenance
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
✓ |
||||||
|
✓ |
✓ |
Data-modification tracking
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
INSERT / UPDATE / DELETE tracked |
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
Temporal features
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
✓ |
|||||||
|
Export and visualisation
Feature |
T |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|---|---|---|---|---|---|---|---|---|
|
✓ |
|||||||
|
✓ |
|||||||
|
(✓) |
|||||||
ProvSQL Studio (Circuit mode + Where mode) |
✓ |
✓ |