ProvSQL SQL API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
Probability and Shapley values

Functions for computing probabilities, expected values, and game-theoretic contribution measures (Shapley/Banzhaf values) from provenance circuits. More...

Functions

DOUBLE PRECISION provsql.probability_evaluate (UUID token, TEXT method=NULL, TEXT arguments=NULL)
 Compute the probability of a provenance token.
DOUBLE PRECISION provsql.expected (ANYELEMENT input, UUID prov=gate_one(), TEXT method=NULL, TEXT arguments=NULL)
 Compute the expected value of a probabilistic scalar.
DOUBLE PRECISION provsql.rv_moment (UUID token, INTEGER k, BOOLEAN central, UUID prov=gate_one())
 Internal: shared C entry point for variance / moment / central_moment.
DOUBLE PRECISION provsql.agg_raw_moment (AGG_TOKEN token, INTEGER k, UUID prov=gate_one(), TEXT method=NULL, TEXT arguments=NULL)
 Compute the raw moment E[X^k | prov] of an AGG_TOKEN aggregate.
DOUBLE PRECISION provsql.variance (ANYELEMENT input, UUID prov=gate_one(), TEXT method=NULL, TEXT arguments=NULL)
 Compute the variance Var[X | prov] of a probabilistic scalar.
DOUBLE PRECISION provsql.moment (ANYELEMENT input, INTEGER k, UUID prov=gate_one(), TEXT method=NULL, TEXT arguments=NULL)
 Compute the raw moment E[X^k | prov] of a probabilistic scalar.
VOID provsql.rv_support (UUID token, UUID prov=gate_one(), float8 &lo, float8 &hi)
 Internal: rv-side support computation.
VOID provsql.support (ANYELEMENT input, UUID prov=gate_one(), TEXT method=NULL, TEXT arguments=NULL, float8 &lo, float8 &hi)
 Compute the support interval [lo, hi] of a probabilistic (or deterministic) scalar.
DOUBLE PRECISION provsql.central_moment (ANYELEMENT input, INTEGER k, UUID prov=gate_one(), TEXT method=NULL, TEXT arguments=NULL)
 Compute the central moment E[(X - E[X|prov])^k | prov].
DOUBLE PRECISION provsql.shapley (UUID token, UUID variable, TEXT method=NULL, TEXT arguments=NULL, BOOLEAN banzhaf='f')
 Compute the Shapley value of an input variable.
SETOF RECORD provsql.shapley_all_vars (UUID token, TEXT method=NULL, TEXT arguments=NULL, BOOLEAN banzhaf='f', OUT UUID variable, OUT DOUBLE PRECISIONvalue)
 Compute Shapley values for all input variables at once.
DOUBLE PRECISION provsql.banzhaf (UUID token, UUID variable, TEXT method=NULL, TEXT arguments=NULL)
 Compute the Banzhaf power index of an input variable.
SETOF RECORD provsql.banzhaf_all_vars (UUID token, TEXT method=NULL, TEXT arguments=NULL, OUT UUID variable, OUT DOUBLE PRECISIONvalue)
 Compute Banzhaf power indices for all input variables at once.

Detailed Description

Functions for computing probabilities, expected values, and game-theoretic contribution measures (Shapley/Banzhaf values) from provenance circuits.

Function Documentation

◆ agg_raw_moment()

DOUBLE PRECISION provsql.agg_raw_moment ( AGG_TOKEN token,
INTEGER k,
UUID prov = gate_one(),
TEXT method = NULL,
TEXT arguments = NULL )

Compute the raw moment E[X^k | prov] of an AGG_TOKEN aggregate.

Sister of expected() for the AGG_TOKEN side of the polymorphic moment / variance / central_moment dispatch. Supports the same aggregation functions as expected: SUM (which COUNT normalises to at the gate level via Aggregation.cpp:322), MIN, and MAX.

Strategy:

  • SUM: with X = Σᵢ Iᵢ·vᵢ (Iᵢ the per-row inclusion indicator, vᵢ the row's value), expanding X^k and taking expectation gives \(E[X^k] = \sum_{(i_1,\ldots,i_k) \in \{1..n\}^k} v_{i_1}\cdots v_{i_k} \cdot P(\bigwedge_{i \in \TEXT{distinct}(i_1..i_k)} I_i)\). We enumerate the \(n^k\) tuples, conjoin the distinct inclusion tokens (and prov when conditioning), and evaluate the probability via probability_evaluate.
  • MIN / MAX: replace v with v^k in the rank-based enumeration that expected already uses; MAX is handled by sign-flipping per the existing trick (negate vs. rerank), with the outer multiplier becoming \((-1)^k\) instead of just \(-1\).

Cost: SUM is \(O(n^k)\) probability evaluations – tractable for small k or small n; for larger sizes, prefer reaching for the sampler. MIN / MAX stay linear in n.

Source code
provsql.sql line 3182

◆ banzhaf()

DOUBLE PRECISION provsql.banzhaf ( UUID token,
UUID variable,
TEXT method = NULL,
TEXT arguments = NULL )

Compute the Banzhaf power index of an input variable.

Source code
provsql.sql line 3689

◆ banzhaf_all_vars()

SETOF RECORD provsql.banzhaf_all_vars ( UUID token,
TEXT method = NULL,
TEXT arguments = NULL,
OUT UUID variable,
OUT DOUBLE PRECISION value )

Compute Banzhaf power indices for all input variables at once.

◆ central_moment()

DOUBLE PRECISION provsql.central_moment ( ANYELEMENT input,
INTEGER k,
UUID prov = gate_one(),
TEXT method = NULL,
TEXT arguments = NULL )

Compute the central moment E[(X - E[X|prov])^k | prov].

k = 0 returns 1; k = 1 returns 0; k = 2 is equivalent to variance(input, prov, ...). Polymorphic dispatcher: routes random_variable through rv_moment, and AGG_TOKEN through the binomial expansion \(E[(X-\mu)^k|A] = \sum_{i=0}^{k} \binom{k}{i} (-\mu)^{k-i} E[X^i|A]\) with \(\mu = E[X|A]\), where each \(E[X^i|A]\) comes from agg_raw_moment.

Source code
provsql.sql line 3592

◆ expected()

DOUBLE PRECISION provsql.expected ( ANYELEMENT input,
UUID prov = gate_one(),
TEXT method = NULL,
TEXT arguments = NULL )

Compute the expected value of a probabilistic scalar.

Computes E[input | prov] for either an AGG_TOKEN (discrete SUM/MIN/MAX aggregation over Boolean-input gate_agg circuits, with prov as the Boolean conditioning event) or a random_variable (continuous distribution, traversed by the analytical / MC evaluator from Expectation.cpp).

Implementation: thin wrapper over moment(input, 1, prov, method, arguments). Both branches converge on the same machinery; the AGG_TOKEN side computes E[X] as the \(k=1\) instance of the \(n^k\)-tuple enumeration in agg_raw_moment, the random_variable side calls compute_expectation through rv_moment.

Parameters
inputaggregate expression or random variable to compute E[·] of
provprovenance condition (defaults to gate_one(), i.e., unconditional)
methodknowledge compilation method (AGG_TOKEN path only)
argumentsadditional arguments for the method (AGG_TOKEN path only)
Source code
provsql.sql line 3131

◆ moment()

DOUBLE PRECISION provsql.moment ( ANYELEMENT input,
INTEGER k,
UUID prov = gate_one(),
TEXT method = NULL,
TEXT arguments = NULL )

Compute the raw moment E[X^k | prov] of a probabilistic scalar.

k must be a non-negative INTEGER. k = 0 returns 1; k = 1 is equivalent to expected(input). Polymorphic dispatcher: routes random_variable through rv_moment (analytical / MC) and AGG_TOKEN through agg_raw_moment (SUM via tuple enumeration, MIN / MAX via rank enumeration).

Source code
provsql.sql line 3395

◆ probability_evaluate()

DOUBLE PRECISION provsql.probability_evaluate ( UUID token,
TEXT method = NULL,
TEXT arguments = NULL )

Compute the probability of a provenance token.

Compiles the provenance circuit to d-DNNF and evaluates the probability. The compilation method can be selected explicitly.

Parameters
tokenprovenance token to evaluate
methodknowledge compilation method (NULL for default)
argumentsadditional arguments for the method
Source code
provsql.sql line 3103

◆ rv_moment()

DOUBLE PRECISION provsql.rv_moment ( UUID token,
INTEGER k,
BOOLEAN central,
UUID prov = gate_one() )

Internal: shared C entry point for variance / moment / central_moment.

The expected() SQL function reaches the Expectation evaluator through provenance_evaluate_compiled(..., 'expectation', ...). The variance / raw-moment / central-moment SQL functions need an extra k INTEGER argument that does not fit that dispatcher's signature, so they go through this dedicated entry point. Returns E[X^k] when central is FALSE, or E[(X - E[X])^k] when TRUE.

Source code
provsql.sql line 3150

◆ rv_support()

VOID provsql.rv_support ( UUID token,
UUID prov = gate_one(),
float8 & lo,
float8 & hi )

Internal: rv-side support computation.

Lifts provsql.compute_support out of RangeCheck.cpp – the same interval-arithmetic propagation runRangeCheck uses to decide gate_cmps. Returns [-Infinity, +Infinity] when the tightest bound is the conservative all-real interval (e.g. for a normal RV, or any sub-circuit that mixes a normal in).

Source code
provsql.sql line 3430

◆ shapley()

DOUBLE PRECISION provsql.shapley ( UUID token,
UUID variable,
TEXT method = NULL,
TEXT arguments = NULL,
BOOLEAN banzhaf = 'f' )

Compute the Shapley value of an input variable.

Measures the contribution of a specific input variable to the truth of a provenance expression, using game-theoretic Shapley values.

Parameters
tokenprovenance token to evaluate
variableUUID of the input variable
methodknowledge compilation method
argumentsadditional arguments for the method
banzhafif true, compute the Banzhaf value instead
Source code
provsql.sql line 3667

◆ shapley_all_vars()

SETOF RECORD provsql.shapley_all_vars ( UUID token,
TEXT method = NULL,
TEXT arguments = NULL,
BOOLEAN banzhaf = 'f',
OUT UUID variable,
OUT DOUBLE PRECISION value )

Compute Shapley values for all input variables at once.

◆ support()

VOID provsql.support ( ANYELEMENT input,
UUID prov = gate_one(),
TEXT method = NULL,
TEXT arguments = NULL,
float8 & lo,
float8 & hi )

Compute the support interval [lo, hi] of a probabilistic (or deterministic) scalar.

Polymorphic dispatcher mirroring expected / variance / moment / central_moment, with two extra "free" branches:

  • Plain NUMERIC (smallint / INTEGER / bigint / NUMERIC / real / double precision): degenerate point support \([c, c]\). Lets callers ask for the support of a literal without round-tripping through as_random.
  • random_variable / bare UUID (any provenance gate token, including the implicit random_variable -> UUID cast): routes to rv_support, which propagates distribution supports (uniform exact, exponential [0,+∞), normal (-∞,+∞)) through gate_arith via interval arithmetic. gate_value gives the same \([c, c]\) point support as the NUMERIC branch; any non-scalar gate (Boolean gates, aggregates, ...) safely falls back to the conservative all-real interval without raising. Conditioning on prov is not yet supported.
  • AGG_TOKEN: closed-form per aggregation function:
    • SUM : \([\sum_i \min(0,v_i), \sum_i \max(0,v_i)]\) (every row is independently in or out of the included set; the extreme SUMs are reached by including only positive or only negative-valued rows).
    • MIN : \([\min_i v_i, \max_i v_i]\) in the non-empty subsets, plus +Infinity if the empty subset has positive probability under prov.
    • MAX : symmetric – -Infinity if empty has positive probability under prov, otherwise min_i v_i; hi is always max_i v_i.

Other aggregation functions raise.

Returns the composite RECORD (lo, hi) via the function's OUT parameters, with -Infinity / +Infinity marking unbounded ends.

Source code
provsql.sql line 3474

◆ variance()

DOUBLE PRECISION provsql.variance ( ANYELEMENT input,
UUID prov = gate_one(),
TEXT method = NULL,
TEXT arguments = NULL )

Compute the variance Var[X | prov] of a probabilistic scalar.

Polymorphic dispatcher that mirrors expected: random_variable inputs go through the analytical / MC evaluator (rv_moment(UUID, 2, true)); AGG_TOKEN inputs go through the agg_raw_moment helper, computing \(\mathrm{Var}[X|A] = E[X^2|A] - E[X|A]^2\). Conditioning on prov is supported for AGG_TOKEN (matching expected) but not yet for random_variable.

Source code
provsql.sql line 3345