ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
RangeCheck.h
Go to the documentation of this file.
1/**
2 * @file RangeCheck.h
3 * @brief Support-based bound check for continuous-RV comparators.
4 *
5 * Each continuous distribution has a known support: @c uniform(a,b)
6 * on @c [a,b], @c exponential(λ) on @c [0,∞), @c normal on @c ℝ.
7 * Interval arithmetic propagates these supports through @c gate_arith
8 * combinators, giving a per-gate @c [lo,hi] bound on the value the
9 * gate can take in any possible world. When a @c gate_cmp's two
10 * sides have provably-disjoint supports, the comparator's probability
11 * is 0 or 1 exactly – no sampling required.
12 *
13 * The peephole pass walks every @c gate_cmp reachable from a root,
14 * decides probabilities where it can, and replaces decided
15 * comparators by Bernoulli @c gate_input gates via
16 * @c GenericCircuit::resolveCmpToBernoulli. Anything it cannot
17 * decide is left untouched for downstream evaluators (analytic CDFs
18 * where they apply, Monte Carlo otherwise).
19 *
20 * No external dependency.
21 */
22#ifndef PROVSQL_RANGE_CHECK_H
23#define PROVSQL_RANGE_CHECK_H
24
25#include <memory>
26#include <optional>
27#include <utility>
28#include <variant>
29#include <vector>
30
31#include "GenericCircuit.h"
32#include "RandomVariable.h"
33
34namespace provsql {
35
36/**
37 * @brief Compute the @c [lo, hi] support interval of a scalar
38 * sub-circuit rooted at @p root.
39 *
40 * Same interval-arithmetic propagation @c runRangeCheck uses
41 * internally, exposed for the SQL @c support() function:
42 * - @c gate_value: point @c [c, c].
43 * - @c gate_rv: distribution support (uniform exact, exponential
44 * on @c [0, +∞), normal on @c (-∞, +∞)).
45 * - @c gate_arith: propagated through @c +, @c −, @c ×, @c /, unary
46 * @c −.
47 *
48 * Anything else collapses to the conservative all-real interval
49 * @c (-∞, +∞). Never throws on unrecognised gates -- callers receive
50 * the wide interval instead, which is the right semantic for "we
51 * cannot prove a tighter bound".
52 *
53 * When @p event_root is set, the returned interval is the
54 * intersection of the unconditional support with the per-RV
55 * constraints implied by the AND-conjunct chain rooted at the event
56 * (`rv op c` cmps over @p root collected via the same walker
57 * @c runRangeCheck uses for joint feasibility). Constraints we
58 * cannot interpret are silently skipped: the result is then a
59 * conservative superset of the true conditional support, never a
60 * subset.
61 */
62std::pair<double, double>
63compute_support(const GenericCircuit &gc, gate_t root,
64 std::optional<gate_t> event_root = std::nullopt);
65
66/**
67 * @brief Walk @p event_root collecting `rv op c` constraints on @p target_rv.
68 *
69 * Descends through AND-conjunct factors (@c gate_times, @c gate_one,
70 * Boolean leaves whose footprint doesn't include @p target_rv -- these
71 * are independent of the RV and contribute no truncation) collecting
72 * every @c gate_cmp interpretable as `target_rv op c` for a constant
73 * @c c, and intersects them into a running interval seeded with the
74 * unconditional support of @p target_rv.
75 *
76 * Returns the resulting interval as @c (lo, hi), or @c std::nullopt
77 * if the walk found a structure that defeats the recognisers (a
78 * @c gate_plus / @c gate_monus disjunction over the chain, a cmp
79 * shape other than `rv op const`, ...). Callers treat
80 * @c std::nullopt as "fall back to the unconditional case" --
81 * sound for support and MC fallback for moments.
82 *
83 * Constraints on RVs other than @p target_rv are ignored; they affect
84 * @c P(A) but not the truncation of the target's distribution.
85 */
86std::optional<std::pair<double, double>>
87collectRvConstraints(const GenericCircuit &gc, gate_t event_root,
88 gate_t target_rv);
89
90/**
91 * @brief Detection result for a closed-form, optionally-truncated
92 * single-RV shape.
93 *
94 * Carries the parsed distribution and the @c [lo, hi] support after
95 * intersection with the conditioning event (or the natural support
96 * when @c truncated == @c false). Either bound may be infinite when
97 * the RV's natural support or the event leaves the corresponding
98 * side unbounded (e.g. @c Normal | @c X > 0 yields
99 * @c (0, +&infin;)).
100 */
102 DistributionSpec spec; ///< Parsed kind + parameters
103 double lo; ///< Lower bound (-INF if unbounded)
104 double hi; ///< Upper bound (+INF if unbounded)
105 bool truncated; ///< True iff the bounds came from a
106 ///< non-trivial @c event_root
107};
108
109/**
110 * @brief Detect a closed-form, optionally-truncated single-RV shape.
111 *
112 * Common shape-detection helper shared by every closed-form
113 * single-RV consumer:
114 * - @c try_truncated_closed_form (truncated moments,
115 * @c Expectation.cpp);
116 * - @c try_truncated_closed_form_sample (rejection-free sampling,
117 * @c MonteCarloSampler.cpp);
118 * - @c rv_analytical_curves (PDF / CDF overlay,
119 * @c RvAnalyticalCurves.cpp).
120 *
121 * Returns @c std::nullopt when the shape is not tractable:
122 * - @p root is not a bare @c gate_rv;
123 * - the gate's @c extra does not parse as a @c DistributionSpec;
124 * - @p event_root resolves to @c gate_zero (event already decided
125 * infeasible by @c runRangeCheck);
126 * - @c collectRvConstraints fails (incomplete walk);
127 * - the resulting interval is empty or degenerate
128 * (@c lo @>= @c hi).
129 *
130 * When @p event_root is omitted or resolves to @c gate_one, the
131 * returned @c TruncatedSingleRv carries the RV's natural support and
132 * @c truncated = @c false; callers that don't distinguish the
133 * conditional and unconditional cases (e.g. the analytical-curves
134 * x-range chooser) can read uniformly off the result.
135 */
136std::optional<TruncatedSingleRv>
138 std::optional<gate_t> event_root);
139
140/**
141 * @brief True iff the conditioning event is provably infeasible for
142 * a bare @c gate_rv root.
143 *
144 * Distinguishes "event proved infeasible" (event resolves to
145 * @c gate_zero, or @c collectRvConstraints intersects to an empty
146 * interval) from "shape unsupported by @c matchTruncatedSingleRv"
147 * (return @c std::nullopt that just means "fall back to MC").
148 *
149 * Used by the conditional-moment dispatcher to raise an explicit
150 * infeasibility error before falling through to MC rejection — MC
151 * would still detect the same condition by accepting 0 of N samples,
152 * but the closed-form predicate spots it without ten thousand
153 * wasted draws and emits a tighter message.
154 *
155 * Returns @c false for non-@c gate_rv roots and for roots whose
156 * event/support pair is not provably infeasible by this cheap pass
157 * (the caller can still proceed to MC).
158 */
160 std::optional<gate_t> event_root);
161
162/**
163 * @brief Point mass at a finite scalar value (a @c gate_value root, or
164 * an @c as_random(c) leaf surfaced as a @c gate_value).
165 *
166 * Unconditional only; @c matchClosedFormDistribution bails when an
167 * event_root is supplied for a Dirac root. Carries no probability
168 * field: a Dirac root standalone has total mass 1, and inside a
169 * @c BernoulliMixtureShape the parent's @c p / 1-p weight is applied
170 * by the curve renderer.
171 */
173 double value;
174};
175
176/**
177 * @brief Categorical distribution over a finite outcome set.
178 *
179 * Matches a categorical-form @c gate_mixture
180 * (@c isCategoricalMixture): the key @c gate_input is ignored (its
181 * own probability is irrelevant — the mass is on the @c mulinputs),
182 * each remaining wire contributes one @c {value, prob} pair.
183 *
184 * Unconditional only; conditioning on a categorical's value is
185 * handled upstream by @c RangeCheck folding the cmp into a Bernoulli,
186 * leaving no event_root for this matcher to see.
187 */
189 std::vector<std::pair<double, double>> outcomes; ///< (value, mass) pairs
190};
191
192struct BernoulliMixtureShape; // forward (variant cycle)
193
194/**
195 * @brief One of the closed-form shapes the analytical-curves payload
196 * can render: bare RV (continuous PDF/CDF), Dirac (point mass),
197 * categorical (multiple point masses), or Bernoulli mixture
198 * of any two of the above.
199 */
204
205/**
206 * @brief Bernoulli mixture (@c gate_mixture with the
207 * @c [p_token, x_token, y_token] shape).
208 *
209 * @c p is @c getProb(p_token) when @c p_token is a bare
210 * @c gate_input; a compound Boolean @c p_token bails (its probability
211 * would require a recursive @c probability_evaluate call, out of
212 * scope for the static predicate). @c left and @c right recursively
213 * match the two arms; either may itself be a mixture, a bare RV, a
214 * Dirac, or a categorical, but always unconditional (truncation
215 * under a mixture is deferred until a real query needs it).
216 */
218 double p;
219 std::shared_ptr<ClosedFormShape> left;
220 std::shared_ptr<ClosedFormShape> right;
221};
222
223/**
224 * @brief Detect any of the closed-form shapes supported by
225 * @c rv_analytical_curves.
226 *
227 * Generalisation of @c matchTruncatedSingleRv that adds Bernoulli
228 * mixtures, categoricals, and Dirac (scalar @c gate_value) roots.
229 * Conditioning (@p event_root) is honoured for bare RV roots only;
230 * Dirac / categorical / mixture roots bail when the event isn't
231 * @c gate_one (the post-load-simplification "always true" default).
232 *
233 * Returns @c std::nullopt when none of the supported shapes match;
234 * callers fall back to histogram-only rendering.
235 */
236std::optional<ClosedFormShape>
238 std::optional<gate_t> event_root);
239
240/**
241 * @brief Run the support-based pruning pass over @p gc.
242 *
243 * For every @c gate_cmp in the circuit, computes the interval of
244 * @c (lhs - rhs) via interval arithmetic over @c gate_value,
245 * @c gate_rv, and @c gate_arith leaves; when the interval is
246 * provably above, below, or disjoint from zero, replaces the
247 * @c gate_cmp by a Bernoulli @c gate_input carrying the decided
248 * probability (0 or 1).
249 *
250 * Comparators whose interval is inconclusive (overlaps zero) are
251 * left intact for downstream passes.
252 *
253 * Iterates every gate (rather than walking from a specific root) so
254 * that a single sweep at @c getGenericCircuit time benefits every
255 * downstream consumer regardless of which sub-circuit they later
256 * traverse.
257 *
258 * @param gc Circuit to mutate in place.
259 * @return Number of comparators resolved by this pass.
260 */
261unsigned runRangeCheck(GenericCircuit &gc);
262
263} // namespace provsql
264
265#endif // PROVSQL_RANGE_CHECK_H
gate_t
Strongly-typed gate identifier.
Definition Circuit.h:49
Semiring-agnostic in-memory provenance circuit.
Continuous random-variable helpers (distribution parsing, moments).
In-memory provenance circuit with semiring-generic evaluation.
std::pair< double, double > compute_support(const GenericCircuit &gc, gate_t root, std::optional< gate_t > event_root)
Compute the [lo, hi] support interval of a scalar sub-circuit rooted at root.
std::optional< ClosedFormShape > matchClosedFormDistribution(const GenericCircuit &gc, gate_t root, std::optional< gate_t > event_root)
Detect any of the closed-form shapes supported by rv_analytical_curves.
unsigned runRangeCheck(GenericCircuit &gc)
Run the support-based pruning pass over gc.
bool eventIsProvablyInfeasible(const GenericCircuit &gc, gate_t root, std::optional< gate_t > event_root)
True iff the conditioning event is provably infeasible for a bare gate_rv root.
std::optional< std::pair< double, double > > collectRvConstraints(const GenericCircuit &gc, gate_t event_root, gate_t target_rv)
Walk event_root collecting rv op c constraints on target_rv.
std::optional< TruncatedSingleRv > matchTruncatedSingleRv(const GenericCircuit &gc, gate_t root, std::optional< gate_t > event_root)
Detect a closed-form, optionally-truncated single-RV shape.
std::variant< TruncatedSingleRv, DiracShape, CategoricalShape, BernoulliMixtureShape > ClosedFormShape
One of the closed-form shapes the analytical-curves payload can render: bare RV (continuous PDF/CDF),...
Definition RangeCheck.h:200
Bernoulli mixture (gate_mixture with the [p_token, x_token, y_token] shape).
Definition RangeCheck.h:217
std::shared_ptr< ClosedFormShape > right
Definition RangeCheck.h:220
std::shared_ptr< ClosedFormShape > left
Definition RangeCheck.h:219
Categorical distribution over a finite outcome set.
Definition RangeCheck.h:188
std::vector< std::pair< double, double > > outcomes
(value, mass) pairs
Definition RangeCheck.h:189
Point mass at a finite scalar value (a gate_value root, or an as_random(c) leaf surfaced as a gate_va...
Definition RangeCheck.h:172
Parsed distribution spec (kind + up to two parameters).
Detection result for a closed-form, optionally-truncated single-RV shape.
Definition RangeCheck.h:101
double lo
Lower bound (-INF if unbounded).
Definition RangeCheck.h:103
DistributionSpec spec
Parsed kind + parameters.
Definition RangeCheck.h:102
double hi
Upper bound (+INF if unbounded).
Definition RangeCheck.h:104
bool truncated
True iff the bounds came from a non-trivial event_root.
Definition RangeCheck.h:105