ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
CountCmpEvaluator.h
Go to the documentation of this file.
1/**
2 * @file CountCmpEvaluator.h
3 * @brief Closed-form Poisson-binomial CDF resolution for HAVING
4 * @c COUNT(*) @c op @c C @c gate_cmps.
5 *
6 * For comparators that reduce to
7 * @c gate_cmp(gate_agg(COUNT, semimod children), gate_value(C)) where
8 * each semimod child's K side is a distinct single @c gate_input
9 * leaf (the first-slice scope: flat-table HAVING COUNT queries), the
10 * comparator's probability is @c Pr(B op C) with @c B Poisson-binomial
11 * over the children's stored marginal probabilities. A partial DP
12 * computed on the smaller side of @c C (either the lower tail
13 * @c [0, C-1] on the original Bernoullis, or the upper tail computed
14 * via the complementary @c [0, N-C] on the inverted Bernoullis
15 * @c Y_i @c = @c 1-X_i) gives the answer in @c O(N x min(C, N-C))
16 * time per cmp, without ever materialising the @c binom(N, C) DNF
17 * that @c provsql_having's @c enumerate_valid_worlds path would
18 * otherwise emit.
19 *
20 * Soundness condition (checked per cmp gate) :
21 * - shape matches `gate_cmp(gate_agg(COUNT, semimod_i(K_i, 1)*), gate_value(C))`
22 * in either order ;
23 * - every K_i is a single @c gate_input leaf (no @c gate_times,
24 * no @c gate_mulinput, no nested arithmetic) ;
25 * - the K_i leaves are pairwise distinct (no shared input across
26 * children) ;
27 * - none of the K_i leaves is reachable from the rest of the
28 * surrounding circuit, i.e. each input appears exactly once and
29 * only inside this cmp's subtree.
30 *
31 * The first two conditions are cheap to check locally. The third
32 * (no outside reachability) is checked once per pass via a
33 * reference-count walk over the whole circuit.
34 *
35 * After replacement the cmp is a Bernoulli @c gate_input carrying
36 * the numeric probability, semantically meaningful only on the
37 * probability path. Like @c runAnalyticEvaluator, the pass runs
38 * inside @c probability_evaluate.cpp, not at load time.
39 *
40 * Empty-group semantics mirror @c count_enum and SQL : worlds with
41 * zero present children are excluded. In particular the
42 * "all-absent" mass @c dp[N][0] is never included regardless of
43 * operator.
44 */
45#ifndef PROVSQL_COUNT_CMP_EVALUATOR_H
46#define PROVSQL_COUNT_CMP_EVALUATOR_H
47
48#include "GenericCircuit.h"
49
50namespace provsql {
51
52/**
53 * @brief Run the Poisson-binomial pre-pass over @p gc.
54 *
55 * For every @c gate_cmp whose shape matches the first-slice scope
56 * (see file docstring), computes the comparator's probability by
57 * Poisson-binomial CDF and replaces the cmp by a Bernoulli
58 * @c gate_input via @c GenericCircuit::resolveCmpToBernoulli.
59 *
60 * @param gc Circuit to mutate in place.
61 * @return Number of comparators resolved by this pass.
62 */
63unsigned runCountCmpEvaluator(GenericCircuit &gc);
64
65} // namespace provsql
66
67#endif // PROVSQL_COUNT_CMP_EVALUATOR_H
Semiring-agnostic in-memory provenance circuit.
unsigned runCountCmpEvaluator(GenericCircuit &gc)
Run the Poisson-binomial pre-pass over gc.