Developer Guide
This section is for anyone who wants to understand or contribute to ProvSQL’s internals. It complements the Doxygen API reference with explanations of design decisions, data flow, and extension points that are not apparent from the code alone.
The primary reference for the system design is the ICDE 2026 paper [Sen et al., 2026].
Chapters
- PostgreSQL Extension Primer
The PostgreSQL extension concepts the rest of this guide assumes you know:
shared_preload_libraries, planner hooks, theQuerynode tree, background workers,palloc,ereport, SQL-callable C functions, OIDs, and GUCs. Read this first if you have not written a PostgreSQL extension before.- Architecture Overview
High-level overview: how the extension is loaded, how its C and C++ components fit together, and how data flows from an SQL query to a provenance evaluation result. Covers the
constants_tOID cache and thegate_typeenum.- Query Rewriting Pipeline
Detailed walkthrough of
provsql.c– the planner hook, theprocess_queryfunction, and every phase of query rewriting (CTE inlining, set operations, aggregation, expression building, splicing,HAVING, where-provenance,INSERT ... SELECT).- Memory Management
How provenance circuits are persisted across transactions via memory-mapped files, the background worker architecture, IPC via anonymous pipes, and the shared-memory coordination layer.
- Where-Provenance
The where-provenance subsystem: how
WhereCircuitdiffers from the semiring world, howprojectandeqgates are built and interpreted, and how the column map produced bybuild_column_mapties query rewriting to per-cell locator output.- Data-Modification Tracking
Tracking the provenance of
INSERT/UPDATE/DELETEstatements: theprovsql.update_provenanceGUC, the trigger machinery, thegate_updategate type, theupdate_provenancehousekeeping table, and the temporal features (undo,timetravel,timeslice,history) built on top.- Aggregation Provenance
The semimodule data model for aggregate provenance, the role of the
agg_tokentype, theagg/semimod/value/deltagates, the aggregation-specific phases ofprocess_query, and a step-by-step guide for adding a new compiled aggregate.- Semiring Evaluation
The
Semiringinterface, walkthroughs of the Boolean and Counting semirings, a step-by-step guide for adding a new compiled semiring, and notes on symbolic-representation semirings such assr_formula.- Probability Evaluation
The probability-evaluation dispatcher in
probability_evaluate.cpp, the d-DNNF data structure, the Tseytin encoding, knowledge compilation through external compilers, weighted model counting, the tree-decomposition path, and a step-by-step guide for plugging in a new method.- Coding Conventions
Naming, error reporting, memory management, the C/C++ boundary, and the small set of project-specific conventions that reviewers will ask new contributors to follow.
- Testing
How ProvSQL’s
pg_regress-based test suite works, how to write new tests, the external-tool skip pattern, and how to read test failures.- Debugging
Debug builds, the
provsql.verbose_levelGUC, circuit inspection SQL functions, circuit visualization, and GDB tips.- Build System
The two-Makefile structure, PGXS integration, PostgreSQL version guards, generated SQL files,
make website/make deploy,release.sh, and CI workflows.