Coding Conventions
This chapter collects the small style and naming conventions ProvSQL follows. Most of them are not enforced by tooling, so reviewers rely on contributors to apply them by convention. When in doubt, prefer to look at how nearby code is written and match that.
Languages and File Layout
ProvSQL is a mixed C and C++ codebase, with the boundary chosen deliberately:
C for everything that touches the PostgreSQL extension API: the planner hook, SQL-callable functions, GUC declarations, the mmap background worker shell, type I/O for
agg_token, cross-version compatibility shims. C is what PostgreSQL itself is written in and what its headers expect.C++17 for everything else: circuit data structures, semiring evaluators, knowledge compilation, tree decomposition, the standalone
tdkctool. C++ buys us templates,std::containers, RAII, and exceptions.
When a file needs to call into the other language, the boundary
goes through plain C function declarations marked
extern "C". See provsql_utils_cpp.h and
c_cpp_compatibility.h for the helper headers that
mediate this.
Files are named after the main type or feature they implement
(BooleanCircuit.cpp, provsql_mmap.c). The
component map in Architecture Overview lists every source file with
a one-line description.
Naming
C functions follow
snake_case; many of the planner-hook internals additionally start with a verb (make_,replace_,transform_,rewrite_).C++ types use
PascalCase(BooleanCircuit,Aggregator).C++ methods use
camelCase(getGate,probabilityEvaluation).Macros are
UPPER_CASE.Public symbols that are part of ProvSQL’s “API surface” but live outside any particular type get a
provsql_prefix (provsql_error,provsql_planner,provsql_interrupted).Gate-type enum values are
gate_xxx(lowercase, seegate_type).
Error Reporting
Use the convenience macros from provsql_error.h rather
than calling elog() directly:
provsql_error– aborts the current transaction with the message prefixed byProvSQL:. Never returns.provsql_warning,provsql_notice,provsql_log– non-aborting variants for theWARNING,NOTICE, andLOGlevels.
The macros require a literal format string (the prefix is
concatenated at compile time). If you need to format a runtime
string, build it first and pass it as "%s".
Inside C++ code, prefer raising a CircuitException (or a
purpose-built subclass like SemiringException or
TreeDecompositionException) and let the SQL-callable wrapper
catch it and call provsql_error. Throwing across the
C/C++ boundary is undefined behaviour, so the catch must happen
inside the C++ side.
Memory Management
C code uses palloc / pfree and lives inside PostgreSQL’s
memory contexts: most allocations get reclaimed automatically when
the surrounding query or transaction ends. Do not call
malloc from C code.
C++ code uses ordinary heap allocation, std::unique_ptr and
std::shared_ptr, and STL containers. These do not interact
with PostgreSQL’s memory contexts, so a long-lived C++ object
that holds a lot of memory should be released explicitly when no
longer needed.
When a function takes ownership of a pointer it should say so in its Doxygen comment, and the caller should not free it; when it borrows, it should say that, and the caller is responsible for freeing.
Doxygen Comments
ProvSQL’s API reference is auto-generated by Doxygen from comments
embedded in the source. Every public (or otherwise non-trivial)
function, type, class, SQL function, file, and global variable
should carry a documentation comment. The prevailing style is
JavaDoc (/** ... */) using the standard Doxygen tags:
@brief– one-line summary (required);@param/@return– parameters and return value;@file– at the top of each source and header file;@defgroup/@ingroup– used inprovsql.sqlto organise SQL functions into logical groups.
Existing files provide good templates: provsql.c for C
code, BooleanCircuit.h for C++ classes, and
provsql.sql for SQL functions. Note that
provsql.sql is generated by the Makefile from the
sql/provsql.*.sql sources (see Build System) – edit
those, not the generated file.
When you cross-reference an SQL function from prose, use the
sqlfunc role and add a corresponding entry to
_SQL_FUNC_MAP in doc/source/conf.py. Same for the
cfunc role and _C_FUNC_MAP on the C/C++ side, and the
cfile / sqlfile roles for filenames. The coherence
checker (check-doc-links.py) run at the end of make docs
will fail the build if a referenced function does not have a map
entry, if a map entry is unused, or if a map entry points at a
nonexistent Doxygen anchor.
The coherence checker does not enforce that newly added code carries Doxygen comments at all – adding them is a convention the project relies on developers to uphold.
Wiring a New SQL-Callable C Function
Adding a new SQL-callable C function requires touching a handful of files. The standard pattern:
Implement the C function with the
Datum function_name(PG_FUNCTION_ARGS)signature, registered viaPG_FUNCTION_INFO_V1(function_name). Use thePG_GETARG_*andPG_RETURN_*macros to convert betweenDatumand native types. ThePG_FUNCTION_INFO_V1macro takes care of exporting the symbol; no extraPGDLLEXPORTis needed.Declare the SQL function in
provsql.sql(i.e., in eithersql/provsql.common.sqlorsql/provsql.14.sql):CREATE OR REPLACE FUNCTION function_name(arg type, ...) RETURNS rettype AS 'MODULE_PATHNAME', 'function_name' LANGUAGE C STRICT;
The
MODULE_PATHNAMEplaceholder is rewritten by PostgreSQL at install time to point at the extension shared object.Add a Doxygen comment to the SQL declaration – the SQL layer is also documented through Doxygen via the perl filter in
plpgsql_filter.pl. Use@ingroupto put the function in the right SQL API group.Reference it from the user docs with the
sqlfuncrole and add the corresponding entry to_SQL_FUNC_MAPindoc/source/conf.py. The coherence checker enforces both sides.
Adding a New Test
See Testing for the full procedure. In short: drop a
.sql file under test/sql/ and a matching .out under
test/expected/, register it in test/schedule.common (or
test/schedule.14), and make sure the test does not depend on
random UUIDs or unstable orderings of symbolic representations
(Testing has the patterns for normalising both).
Pitfalls to Avoid
A non-exhaustive list of things to avoid because they have caused real bugs in the past:
Calling C++ code from C without an
extern "C"boundary function. Names get mangled and linking will silently pick the wrong overload (or fail in confusing ways).Throwing C++ exceptions across the C boundary. Always catch in the outermost C++ wrapper and convert to a
provsql_errorcall.Calling
pallocfrom a thread other than the PostgreSQL backend. PostgreSQL’s memory contexts are not thread-safe. Workers and external tools that allocate must use the C++ side.Editing
sql/provsql.sqlorsql/provsql--<version>.sqldirectly. Both are generated by the Makefile fromsql/provsql.common.sqlandsql/provsql.14.sql; edit those instead and rebuild.Editing
test/scheduledirectly. Same story: it is generated fromtest/schedule.common(and optionallytest/schedule.14).