Query evaluation commutes with m-semiring homomorphisms #
Given an m-semiring homomorphism h : K → K', evaluating a (non-aggregation)
query over an annotated database whose annotations have been pushed forward
along h is the same as evaluating over the original database and then pushing
the resulting relation forward along h:
evaluateAnnotated q (h ⋆ d) = h ⋆ evaluateAnnotated q d
This is the formal counterpart to ProvSQL's “compile once, evaluate many”
architecture: each sr_* semiring evaluator is the application of a (different)
homomorphism to the same persistent provenance representation, and this theorem
says going through the homomorphism on the input side and on the output side
give the same answer.
The result is distinct from How.universal, which only proves existence of
a homomorphism out of ℕ[X] and does not connect it to query evaluation.
(The persistent provenance representation is not literally an element of
ℕ[X]. First, on the algebraic side, ℕ[X] is not the universal m-semiring
(see Geerts & Poggi, Example 10, or How.not_universal_m
in Provenance/Semirings/How.lean), so it could not factor every m-semiring
evaluation even if we restricted to the operators it supports. Second, on the
implementation side, ProvSQL's circuit has gate types that go beyond the
semiring framework altogether: δ (the δ operator for aggregation row
provenance from Amsterdamer, Deutch & Tannen, Provenance for aggregate
queries), cmp (HAVING-style comparison gates),
agg / semimod / value (the K-semimodule construction for aggregate
values), mulinput (block-independent database alternatives), and the
continuous-distribution stack rv / arith / mixture. The commutation
theorem here applies to the RA⁺(\) fragment that is purely m-semiring-
valued; it neither covers nor speaks to those other gate types.)
For the positive fragment (Rel, Proj, Sel, Prod, Sum, Dedup), this
is the content of Green, Karvounarakis & Tannen, Provenance Semirings,
Proposition 3.5: for any commutative-semiring map
h : K → K', the transformation on K-relations commutes with every RA⁺
query of one argument if and only if h is a semiring homomorphism. Note
that Green et al. (and Geerts & Poggi below) work with set semantics, in which
a K-relation is a function U-Tup → K and union is defined pointwise by
(R₁ ∪ R₂)(t) = R₁(t) + R₂(t); deduplication-with-annotation-sum is therefore
automatic. Our library works with Multiset (AnnotatedTuple T K n) instead,
so the deduplication step is exposed as a separate Dedup operator that
explicitly sums duplicate-tuple annotations. The Dedup case below is the
bookkeeping artifact of working in multiset semantics; it requires only that
h respect +, not monus, and so still belongs to Green et al.'s
proposition rather than to the difference extension below.
The genuine extension, covering Diff (the only case that requires monus),
is Geerts & Poggi, On database query languages for K-relations,
Proposition 1: for any m-semiring map h : K → K', the
transformation commutes with every RA⁺(\) query if and only if h is an
m-semiring homomorphism (i.e. it preserves monus in addition to addition
and multiplication). The Diff case of Query.evaluateAnnotated_hom
crucially uses SemiringWithMonusHom's map_sub field; this is the only
case that goes beyond Green et al.'s setting.
Geerts & Poggi also observe (ibid., Example 10) that
ℕ[X] is not the universal m-semiring (the analog of Theorem 4.3 of
Green et al. fails for RA⁺(\)); hence this commutation
theorem genuinely lives at the level of m-homomorphisms, and there is no
single universal factorization through ℕ[X].
Main definitions and results #
SemiringWithMonusHom.mapAnnotatedTuple,mapAnnotatedRelation,mapAnnotatedDatabase: pointwise pushforward along the homomorphism.Query.evaluateAnnotated_hom: the commutation theorem (Green et al., Proposition 3.5, Geerts & Poggi, Proposition 1).
Push an annotated tuple forward along a semiring homomorphism: the data
component is left untouched, the annotation is mapped through h.
Instances For
Push an annotated relation forward along a semiring homomorphism.
Equations
Instances For
Push an annotated database forward along a semiring homomorphism.
Equations
- h.mapAnnotatedDatabase d = List.map (fun (e : String × (n : ℕ) × AnnotatedRelation T K n) => (e.1, ⟨e.2.fst, h.mapAnnotatedRelation e.2.snd⟩)) d
Instances For
The data side of an annotated relation is unaffected by the pushforward.
Mapping the annotation side pushes the homomorphism through.
Filtering by an fst-only predicate commutes with the pushforward.
Annotation-side sum over the entries with a given first component commutes
with the pushforward: it pulls out h.toRingHom.
Filter.eval on the data side commutes with pushforward, in the form needed
to interchange Multiset.filter and mapAnnotatedRelation in the Sel case.
find commutes with the database-level pushforward.
find? on groupByKey, projected to the value side, equals the semiring
sum of the annotations at the chosen key. Bridges the groupByKey-based
aggregation used in evaluateAnnotated's Diff clause with the filter-based
aggregation that sum_filter_map_snd_mapAnnotatedRelation reasons about.
The commutation theorem #
Commutation of query evaluation with m-semiring homomorphisms. For an
m-semiring homomorphism h : K → K', evaluating a (non-aggregation) query on
the pushed-forward annotated database equals pushing forward the result of
evaluating on the original database.
This is the “if” direction of Green, Karvounarakis & Tannen, Proposition
3.5 for the positive cases (Rel, Proj, Sel, Prod,
Sum, Dedup), and of Geerts & Poggi, Proposition 1
for the only case that needs monus (Diff). The Dedup case is the explicit
form, in our multiset semantics, of the annotation-summing that is automatic
in Green et al.'s set semantics; it still requires only that h respect +.
The “only if” direction (that commutation forces h to be an m-hom) is not
formalised here.
Aggregation (Agg) is excluded via the noAgg precondition; see the comment
on Query.evaluateAnnotated for the underlying limitation.