Algebraic identities behind HAVING (count) provenance #
This file gathers query-free algebraic identities, in an arbitrary commutative
m-semiring, that underpin the correctness of the possible-world semantics for
HAVING (count op C) predicates.
For a finite ambient set U : Finset ι and a family α : ι → K, we define
A α W := ∏ x ∈ W, α x,T α U W := A α W ⊖ ⊕_{x ∈ U \ W} A α (W ∪ {x}),S α U C := ⊕_{W ⊆ U, |W| = C} A α W,F α U C := ⊕_{W ⊆ U, |W| ≥ C} T α U W.
The main results are include/exclude-style recurrences for S and F and a
bounding lemma A_V ≤ ⊕_{V ⊆ W ⊆ U} T_U(W) in an idempotent m-semiring.
T_U(W) = A_W ⊖ ⊕_{x ∈ U \ W} A_{W ∪ {x}}: the “exactly-W” contribution
that removes from A_W all one-step extensions of W inside U.
Instances For
S_C(U) = ⊕_{W ⊆ U, |W| = C} A_W: the JOIN-based provenance for a
HAVING count = C aggregate (up to surface-level reindexing).
Equations
- Having.S α U C = ∑ W ∈ Finset.powersetCard C U, Having.A α W
Instances For
F_C(U) = ⊕_{W ⊆ U, |W| ≥ C} T_U(W): the possible-world provenance for
a HAVING count ≥ C predicate.
Instances For
Alternative form T_U(W) = A_W ⊗ (𝟙 ⊖ ⊕_{x ∈ U \ W} α x). This is the
shape in which T_U(W) first arises from the possible-world semantics; the
definition of T is the rewritten form obtained via distributivity of ⊗
over ⊖ and over ⊕. Holds in any commutative m-semiring with
mul_sub_left_distributive.
Include/exclude recurrence for the JOIN-based provenance S:
S_{C+1}(U) = S_{C+1}(U \ {u}) ⊕ S_C(U \ {u}) ⊗ α u. The proof partitions
(insert u U').powersetCard (C+1) into subsets that do not contain u and
images of C-sized subsets of U' under insert u.
Upward expansion bound: in an idempotent m-semiring, the monomial of any
subset V ⊆ U is bounded above by the sum of T_U(W) over all supersets
W ⊇ V inside U. The proof is by strong induction on (U \ V).card,
using le_plus_monus for the inductive step and the auxiliary
sum_le_of_forall_le to collapse multiplicities by idempotence.
Include/exclude recurrence for the possible-world provenance F:
F_{C+1}(U) = F_{C+1}(U \ {u}) ⊕ F_C(U \ {u}) ⊗ α u, in any idempotent
commutative m-semiring with left-distributivity of ⊗ over ⊖. The proof
splits the powerset of U by whether u ∈ W, simplifies the u ∈ W part
to F_C(U') ⊗ α u, and combines two opposite inequalities using the upward
expansion bound upward_expansion.
Upward-closed family collapse in absorptive m-semirings #
The provenance of a finite family F of subsets, weighted by A, agrees
with the provenance of any subfamily M ⊆ F such that every element of
F contains some element of M. When F is upward-closed under
inclusion, the canonical such M is the set of minimal elements of F.
Upward-closed family collapse: in an absorptive commutative m-semiring,
the A-weighted sum over a finite family F equals the A-weighted sum
over any subfamily M ⊆ F such that every element of F is a superset of
some element of M. Taking M = the minimal elements of F (when F is
upward-closed) is the canonical application: the provenance of an
upward-closed family of worlds collapses to the provenance of its minimal
worlds.
F equals S: algebraic skeleton of HAVING count ≥ C #
The possible-world provenance F_C(U) agrees with the join-based provenance
S_C(U) for all C ≥ 1, in any absorptive commutative m-semiring with
left-distributivity of ⊗ over ⊖. Proof by induction on U.card driven
by the FC_recurrence and SC_recurrence recurrences; the C = 1 base of
the recurrence (where F α U' 0 and S α U' 0 appear on the right-hand
side) is closed by the auxiliary fact F_zero_eq_one.
In an absorptive idempotent m-semiring, F α U 0 = 𝟙: the
unconstrained possible-world provenance collapses to 𝟙. Lower bound
from upward_expansion with V = ∅; upper bound from T α U W ≤ A α W
and A α W ≤ 𝟙 (the latter via prod_le_one_absorptive).
Algebraic skeleton of Theorem 1(i) for HAVING count ≥ C: in an
absorptive commutative m-semiring with mul_sub_left_distributive, the
possible-world provenance F_C(U) equals the join-based provenance
S_C(U) for all C ≥ 1. Absorptive is a strictly stronger hypothesis
than the bare “idempotent + distributive” combination one might wish for,
but it is what makes the C = 1 base of the recurrence-driven induction
go through, via F_zero_eq_one. The absorptive hypothesis is essential:
Provenance.Semirings.Tropical.TropicalR.F_ne_S exhibits a non-absorptive
(but idempotent and distributive) instance – Tropical (WithTop ℝ) – for
which the conclusion fails. The idempotent m-semirings in the library
with mul_sub_left_distributive that are absorptive (Bool, BoolFunc,
IntervalUnion, Tropical (WithTop ℕ), Viterbi, Łukasiewicz) all satisfy
the conclusion.