Coding Conventions

This chapter collects the small style and naming conventions ProvSQL follows. Most of them are not enforced by tooling, so reviewers rely on contributors to apply them by convention. When in doubt, prefer to look at how nearby code is written and match that.

Languages and File Layout

ProvSQL is a mixed C and C++ codebase, with the boundary chosen deliberately:

  • C for everything that touches the PostgreSQL extension API: the planner hook, SQL-callable functions, GUC declarations, the mmap background worker shell, type I/O for agg_token, cross-version compatibility shims. C is what PostgreSQL itself is written in and what its headers expect.

  • C++17 for everything else: circuit data structures, semiring evaluators, knowledge compilation, tree decomposition, the standalone tdkc tool. C++ buys us templates, std:: containers, RAII, and exceptions.

When a file needs to call into the other language, the boundary goes through plain C function declarations marked extern "C". See provsql_utils_cpp.h and c_cpp_compatibility.h for the helper headers that mediate this.

Files are named after the main type or feature they implement (BooleanCircuit.cpp, provsql_mmap.c). The component map in Architecture Overview lists every source file with a one-line description.

Naming

  • C functions follow snake_case; many of the planner-hook internals additionally start with a verb (make_, replace_, transform_, rewrite_).

  • C++ types use PascalCase (BooleanCircuit, Aggregator).

  • C++ methods use camelCase (getGate, probabilityEvaluation).

  • Macros are UPPER_CASE.

  • Public symbols that are part of ProvSQL’s “API surface” but live outside any particular type get a provsql_ prefix (provsql_error, provsql_planner, provsql_interrupted).

  • Gate-type enum values are gate_xxx (lowercase, see gate_type).

Error Reporting

Use the convenience macros from provsql_error.h rather than calling elog() directly:

The macros require a literal format string (the prefix is concatenated at compile time). If you need to format a runtime string, build it first and pass it as "%s".

Inside C++ code, prefer raising a CircuitException (or a purpose-built subclass like SemiringException or TreeDecompositionException) and let the SQL-callable wrapper catch it and call provsql_error. Throwing across the C/C++ boundary is undefined behaviour, so the catch must happen inside the C++ side.

Memory Management

C code uses palloc / pfree and lives inside PostgreSQL’s memory contexts: most allocations get reclaimed automatically when the surrounding query or transaction ends. Do not call malloc from C code.

C++ code uses ordinary heap allocation, std::unique_ptr and std::shared_ptr, and STL containers. These do not interact with PostgreSQL’s memory contexts, so a long-lived C++ object that holds a lot of memory should be released explicitly when no longer needed.

When a function takes ownership of a pointer it should say so in its Doxygen comment, and the caller should not free it; when it borrows, it should say that, and the caller is responsible for freeing.

Doxygen Comments

ProvSQL’s API reference is auto-generated by Doxygen from comments embedded in the source. Every public (or otherwise non-trivial) function, type, class, SQL function, file, and global variable should carry a documentation comment. The prevailing style is JavaDoc (/** ... */) using the standard Doxygen tags:

  • @brief – one-line summary (required);

  • @param / @return – parameters and return value;

  • @file – at the top of each source and header file;

  • @defgroup / @ingroup – used in provsql.sql to organise SQL functions into logical groups.

Existing files provide good templates: provsql.c for C code, BooleanCircuit.h for C++ classes, and provsql.sql for SQL functions. Note that provsql.sql is generated by the Makefile from the sql/provsql.*.sql sources (see Build System) – edit those, not the generated file.

When you cross-reference an SQL function from prose, use the sqlfunc role and add a corresponding entry to _SQL_FUNC_MAP in doc/source/conf.py. Same for the cfunc role and _C_FUNC_MAP on the C/C++ side, and the cfile / sqlfile roles for filenames. The coherence checker (check-doc-links.py) run at the end of make docs will fail the build if a referenced function does not have a map entry, if a map entry is unused, or if a map entry points at a nonexistent Doxygen anchor.

The coherence checker does not enforce that newly added code carries Doxygen comments at all – adding them is a convention the project relies on developers to uphold.

Wiring a New SQL-Callable C Function

Adding a new SQL-callable C function requires touching a handful of files. The standard pattern:

  1. Implement the C function with the Datum function_name(PG_FUNCTION_ARGS) signature, registered via PG_FUNCTION_INFO_V1(function_name). Use the PG_GETARG_* and PG_RETURN_* macros to convert between Datum and native types. The PG_FUNCTION_INFO_V1 macro takes care of exporting the symbol; no extra PGDLLEXPORT is needed.

  2. Declare the SQL function in provsql.sql (i.e., in either sql/provsql.common.sql or sql/provsql.14.sql):

    CREATE OR REPLACE FUNCTION function_name(arg type, ...)
      RETURNS rettype AS
      'MODULE_PATHNAME', 'function_name'
      LANGUAGE C STRICT;
    

    The MODULE_PATHNAME placeholder is rewritten by PostgreSQL at install time to point at the extension shared object.

  3. Add a Doxygen comment to the SQL declaration – the SQL layer is also documented through Doxygen via the perl filter in plpgsql_filter.pl. Use @ingroup to put the function in the right SQL API group.

  4. Reference it from the user docs with the sqlfunc role and add the corresponding entry to _SQL_FUNC_MAP in doc/source/conf.py. The coherence checker enforces both sides.

Adding a New Test

See Testing for the full procedure. In short: drop a .sql file under test/sql/ and a matching .out under test/expected/, register it in test/schedule.common (or test/schedule.14), and make sure the test does not depend on random UUIDs or unstable orderings of symbolic representations (Testing has the patterns for normalising both).

Pitfalls to Avoid

A non-exhaustive list of things to avoid because they have caused real bugs in the past:

  • Calling C++ code from C without an extern "C" boundary function. Names get mangled and linking will silently pick the wrong overload (or fail in confusing ways).

  • Throwing C++ exceptions across the C boundary. Always catch in the outermost C++ wrapper and convert to a provsql_error call.

  • Calling palloc from a thread other than the PostgreSQL backend. PostgreSQL’s memory contexts are not thread-safe. Workers and external tools that allocate must use the C++ side.

  • Editing sql/provsql.sql or sql/provsql--<version>.sql directly. Both are generated by the Makefile from sql/provsql.common.sql and sql/provsql.14.sql; edit those instead and rebuild.

  • Editing test/schedule directly. Same story: it is generated from test/schedule.common (and optionally test/schedule.14).