Developer Guide

This section is for anyone who wants to understand or contribute to ProvSQL’s internals. It complements the Doxygen API reference with explanations of design decisions, data flow, and extension points that are not apparent from the code alone.

The primary reference for the system design is the ICDE 2026 paper [Sen et al., 2026].

Chapters

PostgreSQL Extension Primer

The PostgreSQL extension concepts the rest of this guide assumes you know: shared_preload_libraries, planner hooks, the Query node tree, background workers, palloc, ereport, SQL-callable C functions, OIDs, and GUCs. Read this first if you have not written a PostgreSQL extension before.

Architecture Overview

High-level overview: how the extension is loaded, how its C and C++ components fit together, and how data flows from an SQL query to a provenance evaluation result. Covers the constants_t OID cache and the gate_type enum.

Query Rewriting Pipeline

Detailed walkthrough of provsql.c – the planner hook, the process_query function, and every phase of query rewriting (CTE inlining, set operations, aggregation, expression building, splicing, HAVING, where-provenance, INSERT ... SELECT).

Memory Management

How provenance circuits are persisted across transactions via memory-mapped files, the background worker architecture, IPC via anonymous pipes, and the shared-memory coordination layer.

Where-Provenance

The where-provenance subsystem: how WhereCircuit differs from the semiring world, how project and eq gates are built and interpreted, and how the column map produced by build_column_map ties query rewriting to per-cell locator output.

Data-Modification Tracking

Tracking the provenance of INSERT / UPDATE / DELETE statements: the provsql.update_provenance GUC, the trigger machinery, the gate_update gate type, the update_provenance housekeeping table, and the temporal features (undo, timetravel, timeslice, history) built on top.

Aggregation Provenance

The semimodule data model for aggregate provenance, the role of the agg_token type, the agg / semimod / value / delta gates, the aggregation-specific phases of process_query, and a step-by-step guide for adding a new compiled aggregate.

Semiring Evaluation

The Semiring interface, walkthroughs of the Boolean and Counting semirings, a step-by-step guide for adding a new compiled semiring, and notes on symbolic-representation semirings such as sr_formula.

Probability Evaluation

The probability-evaluation dispatcher in probability_evaluate.cpp, the d-DNNF data structure, the Tseytin encoding, knowledge compilation through external compilers, weighted model counting, the tree-decomposition path, and a step-by-step guide for plugging in a new method.

Coding Conventions

Naming, error reporting, memory management, the C/C++ boundary, and the small set of project-specific conventions that reviewers will ask new contributors to follow.

Testing

How ProvSQL’s pg_regress-based test suite works, how to write new tests, the external-tool skip pattern, and how to read test failures.

Debugging

Debug builds, the provsql.verbose_level GUC, circuit inspection SQL functions, circuit visualization, and GDB tips.

Build System

The two-Makefile structure, PGXS integration, PostgreSQL version guards, generated SQL files, make website / make deploy, release.sh, and CI workflows.