ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
HybridEvaluator.h
Go to the documentation of this file.
1/**
2 * @file HybridEvaluator.h
3 * @brief Peephole simplifier for continuous @c gate_arith sub-circuits.
4 *
5 * The hybrid evaluator owns the simplification + island-decomposition
6 * passes invoked between the universal cmp-resolution (@c RangeCheck)
7 * and the probability-specific analytic-CDF resolution
8 * (@c AnalyticEvaluator). This header exposes the simplifier; the
9 * island decomposer will be added alongside in the same namespace.
10 *
11 * The term @e peephole comes from compiler engineering (McKeeman, CACM
12 * 8(7), 1965): a small sliding window over consecutive instructions /
13 * gates, a fixed list of local pattern -> replacement rules, iterated
14 * to a fixed point. Each rule below looks at one @c gate_arith plus
15 * its immediate children, never further, which is exactly the
16 * peephole scope. Contrast with @c RangeCheck (global interval-
17 * propagation walk) and @c runHybridDecomposer (union-find over
18 * base-RV footprints across the whole circuit).
19 *
20 * The simplifier rewrites @c gate_arith gates in-place using closure
21 * rules that preserve the gate's scalar value in every world:
22 * - Constant folding of @c gate_arith subtrees over only @c gate_value
23 * leaves — collapses to a single @c gate_value.
24 * - Multiplicative zero short-circuit: @c TIMES with any constant-zero
25 * wire becomes @c gate_value:0, regardless of the other (possibly
26 * non-constant) wires.
27 * - Identity-element drops: remove @c gate_value:0 wires from a
28 * @c PLUS gate; remove @c gate_value:1 wires from a @c TIMES gate.
29 * - Mixture lift: a @c PLUS / @c TIMES over a single @c gate_mixture
30 * child pushes the other wires inside the branches. For the classic
31 * 3-wire shape <tt>mixture(p, X, Y)</tt> each branch becomes a fresh
32 * arith over the lifted wires; for the categorical N-wire shape
33 * <tt>mixture(key, mul_1, ..., mul_n)</tt> the constant offset / factor
34 * is folded directly into each mulinput's value text (sharing the key
35 * keeps the new mixture correlated with the original).
36 * - PLUS coefficient aggregation: a @c PLUS whose wires decompose as
37 * <tt>a_i·Z_i + b_i</tt> merges same-@c Z_i terms by summing
38 * coefficients (so @c X+X folds to @c 2·X, @c X-X to @c 0, etc.) and
39 * consolidates the constant offsets.
40 * - Scalar-times-RV closure: <tt>c·X</tt> for @c X a @c gate_rv whose
41 * distribution admits a closed-form scale (Normal, Uniform, Exp,
42 * Erlang) folds to a single scaled @c gate_rv.
43 * - Normal-family closure: a @c PLUS over linear combinations of
44 * independent normal @c gate_rv leaves folds to a single normal
45 * @c gate_rv whose mean and variance are the closed-form
46 * combinations. The independence test mirrors @c Expectation's
47 * footprint check: each contributing normal must be reached through
48 * a disjoint base-RV UUID.
49 * - Erlang-family closure: a @c PLUS over k ≥ 2 i.i.d. exponential
50 * @c gate_rv leaves with the same rate λ and pairwise-distinct
51 * UUIDs folds to a single Erlang(k, λ) @c gate_rv.
52 *
53 * The rules form a fixed-point loop per gate: after any rewrite
54 * succeeds, the gate is re-evaluated under all rules until none fire.
55 * Compositions like <tt>arith(NEG, arith(PLUS, value, value))</tt>
56 * therefore collapse fully in one bottom-up pass.
57 *
58 * The simplifier runs in two passes for shared-RV identity preservation:
59 * pass 1 applies every rule EXCEPT the scalar-times-RV closure so the
60 * aggregator gets to see @c c·X-shaped wires inside a parent PLUS with
61 * their underlying RV identity intact (otherwise a bottom-up fold of
62 * the inner TIMES would mint a fresh @c gate_rv there and decouple it
63 * from a sibling reference to the same @c X). Pass 2 then folds the
64 * remaining @c TIMES wrappers with the scalar-times-RV rule.
65 *
66 * Soundness: every rule produces a circuit semantically equivalent to
67 * the original in every world (same scalar distribution, same support,
68 * same comparator outcomes). The closures merge independent leaves
69 * into a new leaf with a fresh distribution; downstream consumers
70 * (sampler, RangeCheck, AnalyticEvaluator, Expectation) see the new
71 * leaf and use the closed-form distribution machinery as if the user
72 * had written it directly.
73 *
74 * Operates on the in-memory @c GenericCircuit only; the persistent
75 * mmap store is never mutated. Children that become orphaned by a
76 * rewrite are not reaped here.
77 *
78 * Gated by the @c provsql.hybrid_evaluation GUC (default on):
79 * @c probability_evaluate skips the pass entirely when the GUC is off,
80 * letting users verify that the analytic path and the whole-circuit
81 * MC fallback agree on the same query.
82 */
83#ifndef PROVSQL_HYBRID_EVALUATOR_H
84#define PROVSQL_HYBRID_EVALUATOR_H
85
86#include "GenericCircuit.h"
87
88namespace provsql {
89
90/**
91 * @brief Run the peephole simplifier over @p gc.
92 *
93 * Visits every gate in post-order and applies the closure rules
94 * described in the header comment until a fixed point is reached.
95 *
96 * @param gc Circuit to mutate in place.
97 * @return Number of gate rewrites performed by this pass.
98 */
99unsigned runHybridSimplifier(GenericCircuit &gc);
100
101/**
102 * @brief Constant-fold pass over every @c gate_arith in @p gc.
103 *
104 * Walks the circuit bottom-up and replaces any @c gate_arith whose
105 * children all evaluate to scalar constants with the equivalent
106 * @c gate_value (e.g. @c arith(NEG, value:2) becomes @c value:-2,
107 * @c arith(PLUS, value:1, value:2) becomes @c value:3).
108 *
109 * Strictly a subset of @c runHybridSimplifier (only @c try_eval_constant
110 * fires; no family closures, identity drops, or mixture lifts), and
111 * therefore safe to run at load time alongside @c runRangeCheck and
112 * @c foldSemiringIdentities: the resulting @c gate_value gates carry
113 * no random identity, so no consumer's shared-RV coupling is broken
114 * by the rewrite. The family closures stay behind the separate
115 * @c hybrid_evaluation GUC because they replace a multi-leaf subtree
116 * with a fresh @c gate_rv UUID and would decouple shared base RVs
117 * that other parts of the circuit reference.
118 *
119 * Lifts the @c -c::random_variable parser quirk (which builds an
120 * @c arith(NEG, value:c) gate rather than @c value:-c) into a clean
121 * @c gate_value before downstream consumers like
122 * @c collectRvConstraints / @c asRvVsConstCmp inspect the circuit.
123 *
124 * @return Number of gates rewritten.
125 */
126unsigned runConstantFold(GenericCircuit &gc);
127
128/**
129 * @brief Marginalise unresolved continuous-island @c gate_cmp gates
130 * into Bernoulli @c gate_input leaves.
131 *
132 * Runs after @c RangeCheck, the simplifier (@c runHybridSimplifier),
133 * and @c AnalyticEvaluator have done what they can. Picks up the
134 * residual comparators whose two sides are an entirely continuous
135 * island (subtree of @c gate_value, @c gate_rv, @c gate_arith with
136 * no Boolean structure underneath) but whose specific shape is not
137 * one the analytic CDF resolver handles &mdash; e.g.
138 * <tt>Normal + Uniform &gt; 0</tt>, heterogeneous-rate sums of
139 * exponentials, or other compositions the simplifier could not fold
140 * to a bare distribution leaf.
141 *
142 * Each qualifying comparator is marginalised by drawing @p samples
143 * worlds and applying the comparator scalar-by-scalar; the empirical
144 * probability replaces the @c gate_cmp via
145 * @c resolveCmpToBernoulli. The circuit downstream becomes purely
146 * Boolean, so the existing @c independent / @c tree-decomposition /
147 * compilation methods become available on circuits that would
148 * otherwise have to fall through to whole-circuit MC.
149 *
150 * **Singleton groups** are marginalised into a single
151 * @c gate_input via @c GenericCircuit::resolveCmpToBernoulli.
152 *
153 * **Multi-cmp shared-island groups** (k comparators sharing one or
154 * more base @c gate_rv leaves, detected via pairwise footprint
155 * overlap with union-find) are resolved by inlining a 2^k joint
156 * distribution table:
157 * - One anonymous @c gate_input acts as the block key.
158 * - One @c gate_mulinput per joint outcome with positive probability,
159 * all sharing the key, carries the joint mass (mutually-exclusive
160 * block).
161 * - Each comparator is rewritten as @c gate_plus over the mulinputs
162 * whose joint outcome word has the comparator's bit set.
163 * The downstream OR over the rewritten comparators thereby observes
164 * the dependent joint distribution: mulinputs across comparators
165 * dedup at OR sites in
166 * @c BooleanCircuit::independentEvaluationInternal (or are
167 * Bayesian-tree-rewritten by @c rewriteMultivaluedGates before
168 * @c tree-decomposition / @c monte-carlo / external compilers).
169 * Groups with k > @c JOINT_TABLE_K_MAX (currently 8, i.e. 256
170 * outcomes) fall through to whole-circuit MC to keep the
171 * materialisation bounded.
172 *
173 * @param gc Circuit to mutate in place.
174 * @param samples Number of MC iterations used per marginalisation.
175 * Callers typically pass @c provsql_rv_mc_samples.
176 * @return Number of comparators resolved by this pass.
177 */
178unsigned runHybridDecomposer(GenericCircuit &gc, unsigned samples);
179
180} // namespace provsql
181
182#endif // PROVSQL_HYBRID_EVALUATOR_H
Semiring-agnostic in-memory provenance circuit.
unsigned runConstantFold(GenericCircuit &gc)
Constant-fold pass over every gate_arith in gc.
unsigned runHybridSimplifier(GenericCircuit &gc)
Run the peephole simplifier over gc.
unsigned runHybridDecomposer(GenericCircuit &gc, unsigned samples)
Marginalise unresolved continuous-island gate_cmp gates into Bernoulli gate_input leaves.