![]() |
ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
|
Core types, constants, and utilities shared across ProvSQL. More...
#include "pg_config.h"#include "c.h"#include "postgres.h"#include "utils/uuid.h"#include "postgres_ext.h"#include "nodes/pg_list.h"#include "MMappedTableInfo.h"#include "provsql_error.h"

Go to the source code of this file.
Classes | |
| struct | pg_uuid_t |
| UUID structure. More... | |
| struct | constants_t |
| Structure to store the value of various constants. More... | |
| struct | database_constants_t |
| Structure to store the value of various constants for a specific database. More... | |
| struct | ProvenanceRelationKey |
| One PRIMARY-KEY or NOT-NULL-UNIQUE key on a relation. More... | |
| struct | ProvenanceRelationKeys |
| Per-relation set of PRIMARY-KEY and NOT-NULL-UNIQUE keys. More... | |
Macros | |
| #define | UUID_LEN 16 |
| Number of bytes in a UUID. | |
| #define | PROVSQL_COLUMN_NAME "provsql" |
Canonical name of the per-row provenance column installed by add_provenance / repair_key. | |
| #define | PROVSQL_KEY_CACHE_MAX_KEYS 4 |
| Upper bounds for the relation-key cache. | |
| #define | PROVSQL_KEY_CACHE_MAX_KEY_COLS 8 |
Enumerations | |
| enum | gate_type { gate_input , gate_plus , gate_times , gate_monus , gate_project , gate_zero , gate_one , gate_eq , gate_agg , gate_semimod , gate_cmp , gate_delta , gate_value , gate_mulinput , gate_update , gate_rv , gate_arith , gate_mixture , gate_assumed_boolean , gate_annotation , gate_invalid , nb_gate_types } |
| Possible gate types in the provenance circuit. More... | |
| enum | provsql_arith_op { PROVSQL_ARITH_PLUS = 0 , PROVSQL_ARITH_TIMES = 1 , PROVSQL_ARITH_MINUS = 2 , PROVSQL_ARITH_DIV = 3 , PROVSQL_ARITH_NEG = 4 } |
Arithmetic operator tags used by gate_arith. More... | |
Functions | |
| constants_t | get_constants (bool failure_if_not_possible) |
| Retrieve the cached OID constants for the current database. | |
| Oid | find_equality_operator (Oid ltypeId, Oid rtypeId) |
| Find the equality operator OID for two given types. | |
| const char * | provsql_kcmcp_managed_endpoint (void) |
| Read the live endpoint of the managed KCMCP server from shared memory (e.g. | |
| void | RegisterProvSQLKCMCPWorker (void) |
| Register the supervisor background worker that launches and supervises the managed KCMCP server; called from _PG_init alongside the mmap worker. | |
| bool | provsql_lookup_table_info (Oid relid, ProvenanceTableInfo *out) |
| Look up per-table provenance metadata with a backend-local cache. | |
| bool | provsql_fetch_table_info (Oid relid, ProvenanceTableInfo *out) |
| Raw IPC fetch (no cache). | |
| bool | provsql_lookup_ancestry (Oid relid, uint16 *ancestor_n_out, Oid *ancestors_out) |
| Look up the base-ancestor set of a tracked relation. | |
| bool | provsql_fetch_ancestry (Oid relid, uint16 *ancestor_n_out, Oid *ancestors_out) |
| Raw IPC fetch for the ancestry half (no cache). | |
| bool | provsql_lookup_relation_keys (Oid relid, ProvenanceRelationKeys *out) |
| Look up the PRIMARY-KEY and NOT-NULL-UNIQUE keys of a relation with a backend-local cache. | |
Variables | |
| const char * | gate_type_name [] |
| Names of gate types. | |
| bool | provsql_interrupted |
| Global variable that becomes true if this particular backend received an interrupt signal. | |
| bool | provsql_where_provenance |
| Global variable that indicates if where-provenance support has been activated through the provsql.where_provenance run-time configuration parameter. | |
| int | provsql_verbose |
| Global variable that indicates the verbosity level set by the provsql.verbose_level run-time configuration parameter was set. | |
| bool | provsql_aggtoken_text_as_uuid |
| Global flag controlling agg_token text output: when true, agg_token_out emits the underlying provenance UUID instead of the default "value (*)" display string. | |
| char * | provsql_tool_search_path |
| Colon-separated list of directories prepended to PATH when ProvSQL spawns external tools (d4, c2d, minic2d, dsharp, weightmc, graph-easy), set by the provsql.tool_search_path run-time configuration parameter. | |
| char * | provsql_fallback_compiler |
| Compiler invoked as the final fallback in BooleanCircuit::makeDD when both interpretAsDD() and the in-process tree-decomposition path fail (the latter typically on treewidth blow-up). | |
| char * | provsql_kcmcp_server |
| Launch command for the managed KCMCP knowledge-compiler server, set by the provsql.kcmcp_server run-time configuration parameter (PGC_SIGHUP). | |
| int | provsql_monte_carlo_seed |
| Seed for the Monte Carlo sampler, set by the provsql.monte_carlo_seed run-time configuration parameter. | |
| int | provsql_rv_mc_samples |
| Default sample count for Monte Carlo fallbacks when an analytical evaluator (Expectation, future hybrid evaluator, ...) cannot decompose a sub-circuit structurally. | |
| bool | provsql_simplify_on_load |
When true (default), every GenericCircuit returned by getGenericCircuit is run through the universal cmp-resolution passes (RangeCheck for now, plus any future passes that decide comparators to certain Boolean values). | |
| bool | provsql_hybrid_evaluation |
| Run the hybrid evaluator (simplifier + per-cmp island decomposer) before dispatching a probability_evaluate query. | |
| bool | provsql_cmp_probability_evaluation |
Hidden diagnostic flag for the family of closed-form / analytic probability evaluators that resolve gate_cmps inside probability_evaluate ; see the provsql.cmp_probability_evaluation GUC. | |
| bool | provsql_inversion_free |
Kill-switch for the inversion-free structured-d-DNNF probability path; see the provsql.inversion_free GUC. | |
| bool | provsql_boolean_provenance |
Opt-in safe-query optimisation for hierarchical conjunctive queries; see the provsql.boolean_provenance GUC. | |
Core types, constants, and utilities shared across ProvSQL.
This header is included by virtually every source file in the extension. It provides:
gate_type enumeration listing all circuit-gate kinds recognised by ProvSQL (input, semiring operations, aggregation, etc.)constants_t structure caching PostgreSQL OIDs for the types, functions, and operators that ProvSQL installs, so that OID lookups happen once per session rather than on every query.database_constants_t wrapper for per-database OID caches.provsql_error.h for the provsql_error / provsql_warning / provsql_notice / provsql_log macros. Definition in file provsql_utils.h.
| #define PROVSQL_COLUMN_NAME "provsql" |
Canonical name of the per-row provenance column installed by add_provenance / repair_key.
Centralised so the planner-hook entry points (src/provsql.c) and the safe-query detector (src/safe_query.c) agree on the literal; see the provenance_guard trigger in sql/provsql.common.sql.
Definition at line 111 of file provsql_utils.h.
| #define PROVSQL_KEY_CACHE_MAX_KEY_COLS 8 |
Definition at line 454 of file provsql_utils.h.
| #define PROVSQL_KEY_CACHE_MAX_KEYS 4 |
Upper bounds for the relation-key cache.
Each relation contributes at most PROVSQL_KEY_CACHE_MAX_KEYS distinct PRIMARY-KEY / NOT-NULL-UNIQUE column-sets, each over at most PROVSQL_KEY_CACHE_MAX_KEY_COLS columns. These bounds keep the cache entry fixed-size (so the backend-local sorted-array representation can reuse the provsql_lookup_table_info pattern verbatim); relations with more or wider keys silently drop the overflow, treating the missing keys as if they did not exist (over-conservative – the §2 FD-aware detector simply misses an optimisation, never produces an unsound rewrite).
Definition at line 453 of file provsql_utils.h.
| #define UUID_LEN 16 |
Number of bytes in a UUID.
Definition at line 30 of file provsql_utils.h.
| enum gate_type |
Possible gate types in the provenance circuit.
gates.mmap backing file (see MMappedCircuit). Reordering, inserting, or renumbering existing members will silently invalidate every existing installation's persistent circuit. New gate types must be appended at the end, just before gate_invalid. If an existing gate type ever needs to be removed or renumbered, the mmap format must gain a version header and a migration path. Definition at line 55 of file provsql_utils.h.
| enum provsql_arith_op |
Arithmetic operator tags used by gate_arith.
Stored in the gate's info1 field. Local enum (not a PostgreSQL operator OID) because arithmetic in the sampler / evaluator is just C++ doubles, with no need to dispatch through the PG catalog.
gate_type, these integer values are persisted (in info1). Reordering or renumbering existing tags will silently invalidate every existing installation's persistent circuit. New tags must be appended at the end. Definition at line 92 of file provsql_utils.h.
| Oid find_equality_operator | ( | Oid | ltypeId, |
| Oid | rtypeId ) |
Find the equality operator OID for two given types.
Searches pg_operator for the = operator that accepts ltypeId on the left and rtypeId on the right.
| ltypeId | OID of the left operand type. |
| rtypeId | OID of the right operand type. |
InvalidOid if none is found. Definition at line 130 of file provsql_utils.c.


| constants_t get_constants | ( | bool | failure_if_not_possible | ) |
Retrieve the cached OID constants for the current database.
On first call (or after a cache miss) this function looks up the OIDs of all ProvSQL-specific types, functions, and operators in the system catalogs and stores them in a per-database cache. Subsequent calls return the cached values without touching the catalogs.
| failure_if_not_possible | If true, call provsql_error when the ProvSQL schema cannot be found (e.g. the extension is not installed in the current database). If false, return a constants_t with ok==false instead of aborting. |
constants_t whose ok field is true on success. Definition at line 544 of file provsql_utils.c.


|
extern |
Raw IPC fetch for the ancestry half (no cache).
Implementation detail of provsql_lookup_ancestry, exposed so the cache layer in provsql_utils.c can reach it. Callers in the planner hot path should go through provsql_lookup_ancestry.
Raw IPC fetch for the ancestry half (no cache).
Sends an 'a' message to the background worker and reads back the response. No caching: every call hits the worker. Use provsql_lookup_ancestry for the cached, planner-hot-path variant.
| relid | pg_class OID of the relation to look up. |
| ancestor_n_out | On true return, count of valid entries. |
| ancestors_out | On true return, the ancestor OIDs (caller buffer of PROVSQL_TABLE_INFO_MAX_ANCESTORS). |
true if the worker returned a non-zero ancestor count; false otherwise (no record, or empty ancestor set). Definition at line 884 of file provsql_mmap.c.


|
extern |
Raw IPC fetch (no cache).
Implementation detail of provsql_lookup_table_info, exposed only so the cache layer in provsql_utils.c can reach it. Callers in the planner hot path should go through provsql_lookup_table_info.
Raw IPC fetch (no cache).
Sends an 's' message to the background worker and reads back the response. No caching: every call hits the worker. Use provsql_lookup_table_info() for the cached, planner-hot-path variant.
| relid | pg_class OID of the relation to look up. |
| out | On success, filled with the stored record. |
true if the worker returned a record, false otherwise. Definition at line 680 of file provsql_mmap.c.


| const char * provsql_kcmcp_managed_endpoint | ( | void | ) |
Read the live endpoint of the managed KCMCP server from shared memory (e.g.
"unix:/tmp/..."), or an empty string when none is running.
Definition at line 66 of file kcmcp_supervisor.c.


|
extern |
Look up the base-ancestor set of a tracked relation.
Per-backend cached over IPC. Returns the ancestor set when relid is tracked and the registry has a non-empty entry for it. false either when relid has no metadata record at all (the relation was never run through add_provenance / repair_key) or when the record exists but ancestor_n == 0 (the CTAS hook hasn't populated the lineage yet, or the registry was explicitly cleared). The two failure modes share the false return because both make the safe-query rewriter take the conservative refuse path – there is no use case for treating them differently.
Backed by the same per-backend cache as provsql_lookup_table_info and invalidated through the same relcache-invalidation callback, so concurrent set_ancestors / add_provenance / repair_key calls in other backends are reflected here without polling.
| relid | pg_class OID of the relation to look up. |
| ancestor_n_out | On true return, count of valid entries in ancestors_out. |
| ancestors_out | On true return, the sorted-deduplicated ancestor OIDs (caller-allocated buffer of PROVSQL_TABLE_INFO_MAX_ANCESTORS Oid). |
true on a non-empty ancestor set; false otherwise. Definition at line 805 of file provsql_utils.c.


|
extern |
Look up the PRIMARY-KEY and NOT-NULL-UNIQUE keys of a relation with a backend-local cache.
Companion to provsql_lookup_table_info. The cache lives in a separate backing array with its own relcache-invalidation callback so that a future ALTER TABLE that adds / drops a constraint refreshes the next lookup without polling. Returns true when the relation has at least one PRIMARY KEY or NOT-NULL UNIQUE constraint; false otherwise (in which case *out is filled with key_n = 0). Safe to call from the planner hot path.
| relid | pg_class OID of the relation to inspect. |
| out | Filled on return. out->relid is set to relid regardless of return value; out->keys holds up to PROVSQL_KEY_CACHE_MAX_KEYS keys. |
Definition at line 1042 of file provsql_utils.c.


|
extern |
Look up per-table provenance metadata with a backend-local cache.
Resolves to a cached value when the relation's relcache entry has not been invalidated since the last fetch; otherwise issues one 's' IPC to the background worker. The cache is invalidated via CacheRegisterRelcacheCallback, so concurrent add_provenance / repair_key / remove_provenance in other backends are reflected here without polling.
Safe to call from the planner hot path.
| relid | pg_class OID of the relation to look up. |
| out | On true return, filled with the stored record. |
true if a record exists for relid, false otherwise. Definition at line 677 of file provsql_utils.c.


| void RegisterProvSQLKCMCPWorker | ( | void | ) |
Register the supervisor background worker that launches and supervises the managed KCMCP server; called from _PG_init alongside the mmap worker.
Definition at line 236 of file kcmcp_supervisor.c.

|
extern |
Names of gate types.
Definition at line 54 of file provsql_utils.c.
|
extern |
Global flag controlling agg_token text output: when true, agg_token_out emits the underlying provenance UUID instead of the default "value (*)" display string.
Driven by the provsql.aggtoken_text_as_uuid GUC.
Global flag controlling agg_token text output: when true, agg_token_out emits the underlying provenance UUID instead of the default "value (*)" display string.
|
extern |
Opt-in safe-query optimisation for hierarchical conjunctive queries; see the provsql.boolean_provenance GUC.
When true, the planner is permitted to rewrite self-join-free hierarchical CQs (and independent UCQs) over TID / BID tables to a read-once form whose probability is computable in linear time. The rewriter tags the resulting root gate so that semiring evaluations incompatible with this rewrite refuse to run on the produced circuit.
Opt-in safe-query optimisation for hierarchical conjunctive queries; see the provsql.boolean_provenance GUC.
|
extern |
Hidden diagnostic flag for the family of closed-form / analytic probability evaluators that resolve gate_cmps inside probability_evaluate ; see the provsql.cmp_probability_evaluation GUC.
When on (default), probability_evaluate runs pre-passes that recognise specific gate_cmp shapes and replace each cmp with a Bernoulli gate_input carrying the closed-form probability, bypassing the DNF that provsql_having's enumerate_valid_worlds would otherwise emit. The first implementation in this family is the Poisson-binomial pre-pass for HAVING COUNT(*) op C over distinct gate_input leaves (see CountCmpEvaluator.h) ; future MIN / MAX / SUM evaluators will gate on the same flag. Off forces every cmp to fall through to the enumeration path. End users have no reason to flip this ; exists for developer A/B testing and as a bisection escape valve.
Hidden diagnostic flag for the family of closed-form / analytic probability evaluators that resolve gate_cmps inside probability_evaluate ; see the provsql.cmp_probability_evaluation GUC.
|
extern |
Compiler invoked as the final fallback in BooleanCircuit::makeDD when both interpretAsDD() and the in-process tree-decomposition path fail (the latter typically on treewidth blow-up).
Defaults to "d4"; set by the provsql.fallback_compiler run-time configuration parameter to any compiler accepted by BooleanCircuit::compilation (d4 / d4v2 / c2d / minic2d / dsharp / panini-*).
Compiler invoked as the final fallback in BooleanCircuit::makeDD when both interpretAsDD() and the in-process tree-decomposition path fail (the latter typically on treewidth blow-up).
|
extern |
Run the hybrid evaluator (simplifier + per-cmp island decomposer) before dispatching a probability_evaluate query.
Debug-only GUC, hidden from SHOW ALL and from postgresql.conf.sample (registered with GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE). When on (default), probability_evaluate runs the HybridEvaluator simplifier between RangeCheck and AnalyticEvaluator and the per-cmp MC island decomposer after AnalyticEvaluator: gate_arith subtrees are constant-folded and family-closed (normals, Erlang), and residual continuous-island comparators are MC-marginalised into Bernoulli gate_input leaves so the surrounding circuit becomes purely Boolean.
Set to off to bypass both passes: undecidable comparators then fall through to whole-circuit MC (for the monte-carlo method) or raise (for independent / tree-decomposition). End users have no reason to flip this – on is strictly better for them. Exists for developer A/B testing of the analytic path against the raw MC path and as a bisection knob if a closure rule turns out to be unsound on some workload.
Run the hybrid evaluator (simplifier + per-cmp island decomposer) before dispatching a probability_evaluate query.
|
extern |
|
extern |
Kill-switch for the inversion-free structured-d-DNNF probability path; see the provsql.inversion_free GUC.
When on (default), probability_evaluate, on a query carrying an inversion-free tractability certificate, tries the structured-d-DNNF builder after independentEvaluation and before tree-decomposition / d4. Off disables only that automatic insertion (for A/B testing); the explicit probability_evaluate(token,'inversion-free') method ignores this flag. The path is self-gating on the certificate, which is attached only to certified queries, so leaving it on is safe.
Kill-switch for the inversion-free structured-d-DNNF probability path; see the provsql.inversion_free GUC.
|
extern |
Launch command for the managed KCMCP knowledge-compiler server, set by the provsql.kcmcp_server run-time configuration parameter (PGC_SIGHUP).
The literal substring "{endpoint}" is replaced by the Unix-socket path the supervisor background worker chooses, e.g. "tdkc --kcmcp unix:{endpoint}". NULL or empty means no managed server is launched.
Launch command for the managed KCMCP knowledge-compiler server, set by the provsql.kcmcp_server run-time configuration parameter (PGC_SIGHUP).
|
extern |
Seed for the Monte Carlo sampler, set by the provsql.monte_carlo_seed run-time configuration parameter.
-1 (default) means seed from std::random_device for non-deterministic sampling; any other value (including 0) is a literal seed for std::mt19937_64. Used by both the Bernoulli path (BooleanCircuit::monteCarlo) and the continuous path (gate_rv sampling), so a single GUC controls reproducibility end-to-end.
Seed for the Monte Carlo sampler, set by the provsql.monte_carlo_seed run-time configuration parameter.
|
extern |
Default sample count for Monte Carlo fallbacks when an analytical evaluator (Expectation, future hybrid evaluator, ...) cannot decompose a sub-circuit structurally.
Unlike probability_evaluate(token, 'monte-carlo', n) where the sample count is an explicit argument, these implicit MC paths have no natural place to take n from.
Set by the provsql.rv_mc_samples run-time configuration parameter; default 10000. Setting it to 0 disables the implicit MC fallback entirely: callers must then raise an exception rather than sampling. Useful for callers that want to guarantee analytical-only evaluation.
Default sample count for Monte Carlo fallbacks when an analytical evaluator (Expectation, future hybrid evaluator, ...) cannot decompose a sub-circuit structurally.
|
extern |
When true (default), every GenericCircuit returned by getGenericCircuit is run through the universal cmp-resolution passes (RangeCheck for now, plus any future passes that decide comparators to certain Boolean values).
Decisions become Bernoulli gate_input gates with probability 0 or 1, transparent to every downstream consumer (semiring evaluators, MC, view_circuit, PROV export, etc.). Set provsql.simplify_on_load to off when inspecting a circuit's raw structure (e.g. debugging gate-creation code paths).
When true (default), every GenericCircuit returned by getGenericCircuit is run through the universal cmp-resolution passes (RangeCheck for now, plus any future passes that decide comparators to certain Boolean values).
|
extern |
Colon-separated list of directories prepended to PATH when ProvSQL spawns external tools (d4, c2d, minic2d, dsharp, weightmc, graph-easy), set by the provsql.tool_search_path run-time configuration parameter.
NULL or empty means rely on the server's PATH alone.
Colon-separated list of directories prepended to PATH when ProvSQL spawns external tools (d4, c2d, minic2d, dsharp, weightmc, graph-easy), set by the provsql.tool_search_path run-time configuration parameter.
|
extern |