ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
HybridEvaluator.h File Reference

Peephole simplifier for continuous gate_arith sub-circuits. More...

#include "GenericCircuit.h"
Include dependency graph for HybridEvaluator.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

namespace  provsql

Functions

unsigned provsql::runHybridSimplifier (GenericCircuit &gc)
 Run the peephole simplifier over gc.
unsigned provsql::runConstantFold (GenericCircuit &gc)
 Constant-fold pass over every gate_arith in gc.
unsigned provsql::runHybridDecomposer (GenericCircuit &gc, unsigned samples)
 Marginalise unresolved continuous-island gate_cmp gates into Bernoulli gate_input leaves.

Detailed Description

Peephole simplifier for continuous gate_arith sub-circuits.

The hybrid evaluator owns the simplification + island-decomposition passes invoked between the universal cmp-resolution (RangeCheck) and the probability-specific analytic-CDF resolution (AnalyticEvaluator). This header exposes the simplifier; the island decomposer will be added alongside in the same namespace.

The term peephole comes from compiler engineering (McKeeman, CACM 8(7), 1965): a small sliding window over consecutive instructions / gates, a fixed list of local pattern -> replacement rules, iterated to a fixed point. Each rule below looks at one gate_arith plus its immediate children, never further, which is exactly the peephole scope. Contrast with RangeCheck (global interval- propagation walk) and runHybridDecomposer (union-find over base-RV footprints across the whole circuit).

The simplifier rewrites gate_arith gates in-place using closure rules that preserve the gate's scalar value in every world:

  • Constant folding of gate_arith subtrees over only gate_value leaves — collapses to a single gate_value.
  • Multiplicative zero short-circuit: TIMES with any constant-zero wire becomes gate_value:0, regardless of the other (possibly non-constant) wires.
  • Identity-element drops: remove gate_value:0 wires from a PLUS gate; remove gate_value:1 wires from a TIMES gate.
  • Mixture lift: a PLUS / TIMES over a single gate_mixture child pushes the other wires inside the branches. For the classic 3-wire shape mixture(p, X, Y) each branch becomes a fresh arith over the lifted wires; for the categorical N-wire shape mixture(key, mul_1, ..., mul_n) the constant offset / factor is folded directly into each mulinput's value text (sharing the key keeps the new mixture correlated with the original).
  • PLUS coefficient aggregation: a PLUS whose wires decompose as a_i·Z_i + b_i merges same-Z_i terms by summing coefficients (so X+X folds to 2·X, X-X to 0, etc.) and consolidates the constant offsets.
  • Scalar-times-RV closure: c·X for X a gate_rv whose distribution admits a closed-form scale (Normal, Uniform, Exp, Erlang) folds to a single scaled gate_rv.
  • Normal-family closure: a PLUS over linear combinations of independent normal gate_rv leaves folds to a single normal gate_rv whose mean and variance are the closed-form combinations. The independence test mirrors Expectation's footprint check: each contributing normal must be reached through a disjoint base-RV UUID.
  • Erlang-family closure: a PLUS over k ≥ 2 i.i.d. exponential gate_rv leaves with the same rate λ and pairwise-distinct UUIDs folds to a single Erlang(k, λ) gate_rv.

The rules form a fixed-point loop per gate: after any rewrite succeeds, the gate is re-evaluated under all rules until none fire. Compositions like arith(NEG, arith(PLUS, value, value)) therefore collapse fully in one bottom-up pass.

The simplifier runs in two passes for shared-RV identity preservation: pass 1 applies every rule EXCEPT the scalar-times-RV closure so the aggregator gets to see c·X-shaped wires inside a parent PLUS with their underlying RV identity intact (otherwise a bottom-up fold of the inner TIMES would mint a fresh gate_rv there and decouple it from a sibling reference to the same X). Pass 2 then folds the remaining TIMES wrappers with the scalar-times-RV rule.

Soundness: every rule produces a circuit semantically equivalent to the original in every world (same scalar distribution, same support, same comparator outcomes). The closures merge independent leaves into a new leaf with a fresh distribution; downstream consumers (sampler, RangeCheck, AnalyticEvaluator, Expectation) see the new leaf and use the closed-form distribution machinery as if the user had written it directly.

Operates on the in-memory GenericCircuit only; the persistent mmap store is never mutated. Children that become orphaned by a rewrite are not reaped here.

Gated by the provsql.hybrid_evaluation GUC (default on): probability_evaluate skips the pass entirely when the GUC is off, letting users verify that the analytic path and the whole-circuit MC fallback agree on the same query.

Definition in file HybridEvaluator.h.