Probability distributions over Boolean variables #

This file defines the intensional probability semantics underlying ProvSQL's probabilistic query evaluation, following Section IV-D of Sen, Maniu & Senellart, ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data.

Given a finite set X of Boolean variables and an assignment Pr : X → ℚ of probabilities (with values in [0, 1]), we extend Pr to:

a probability distribution over valuations v : X → Bool, assuming the variables are independent: Pr(v) = ∏_{v(x)=⊤} Pr(x) · ∏_{v(x)=⊥} (1 - Pr(x));
a probability of a Boolean function f : BoolFunc X, defined as the sum of Pr(v) over satisfying valuations: Pr(f) = ∑_{v ⊨ f} Pr(v).

This is the foundation for Theorem 12 of the paper (intensional probabilistic query evaluation correctness), proved below as ProbAssignment.theorem_12: for any non-aggregation query q, any BoolFunc X-instance Î and any tuple t, Pr(t ∈ q(Î)) = Pr(⋁_{(t,α) ∈ ⟪q⟫^Î} α). Its structural core is randomWorld_evaluateAnnotated, the commutation of annotated evaluation with random-world projection, which doubles as the adequacy theorem of the annotated semantics for the full non-aggregation fragment (difference and duplicate elimination included); see the section “The structural commutation theorem” below for how this relates to the ℕ-adequacy theorem of [BCBC+21].

Main definitions #

ProbAssignment X – a probability assignment to each variable, bundled with 0 ≤ Pr(x) ≤ 1.
ProbAssignment.valProb – Pr(v) for a single valuation v : X → Bool.
ProbAssignment.funcProb – Pr(f) for a Boolean function f : BoolFunc X.

Main results #

ProbAssignment.valProb_nonneg, valProb_le_one, sum_valProb_eq_one – basic properties of the valuation distribution.
ProbAssignment.funcProb_zero, funcProb_one, funcProb_nonneg, funcProb_le_one – basic properties of Pr(f).
ProbAssignment.funcProb_congr – pointwise-equal Boolean functions have equal probabilities.
randomWorld_evaluateAnnotated – annotated evaluation commutes with random-world projection (adequacy of the BoolFunc X-annotated semantics).
ProbAssignment.theorem_12, corollary_13 – correctness of intensional probabilistic query evaluation, on the annotated and on the plain rewritten query respectively.

References #

Sen, Maniu & Senellart (Section IV-D)
Benzaken, Cohen-Boulakia, Contejean, Keller & Zucchini

Probability distributions over Boolean variables #

Main definitions #

Main results #

References #

Random worlds and the disjunctive tuple annotation #

Pointwise meaning of tupleAnnotation #

Marginal probability and the statement of Theorem 12 #

Random worlds commute with annotated query evaluation #

Helper lemmas: random-world commutes with Multiset operations #

Random world commutes with find #

Diff annotation helper #

The structural commutation theorem #

Corollary 13: probability via the plain rewritten query #

Pointwise meaning of `tupleAnnotation` #

Random world commutes with `find` #