Documentation

Provenance.QueryAnnotatedDatabaseHom

Query evaluation commutes with m-semiring homomorphisms #

Given an m-semiring homomorphism h : K → K', evaluating a (non-aggregation) query over an annotated database whose annotations have been pushed forward along h is the same as evaluating over the original database and then pushing the resulting relation forward along h:

  evaluateAnnotated q (h ⋆ d) = h ⋆ evaluateAnnotated q d

This is the formal counterpart to ProvSQL's “compile once, evaluate many” architecture: each sr_* semiring evaluator is the application of a (different) homomorphism to the same persistent provenance representation, and this theorem says going through the homomorphism on the input side and on the output side give the same answer.

The result is distinct from How.universal, which only proves existence of a homomorphism out of ℕ[X] and does not connect it to query evaluation.

(The persistent provenance representation is not literally an element of ℕ[X]. First, on the algebraic side, ℕ[X] is not the universal m-semiring (see Geerts & Poggi, Example 10, or How.not_universal_m in Provenance/Semirings/How.lean), so it could not factor every m-semiring evaluation even if we restricted to the operators it supports. Second, on the implementation side, ProvSQL's circuit has gate types that go beyond the semiring framework altogether: δ (the δ operator for aggregation row provenance from Amsterdamer, Deutch & Tannen, Provenance for aggregate queries), cmp (HAVING-style comparison gates), agg / semimod / value (the K-semimodule construction for aggregate values), mulinput (block-independent database alternatives), and the continuous-distribution stack rv / arith / mixture. The commutation theorem here applies to the RA⁺(\) fragment that is purely m-semiring- valued; it neither covers nor speaks to those other gate types.)

For the positive fragment (Rel, Proj, Sel, Prod, Sum, Dedup), this is the content of Green, Karvounarakis & Tannen, Provenance Semirings, Proposition 3.5: for any commutative-semiring map h : K → K', the transformation on K-relations commutes with every RA⁺ query of one argument if and only if h is a semiring homomorphism. Note that Green et al. (and Geerts & Poggi below) work with set semantics, in which a K-relation is a function U-Tup → K and union is defined pointwise by (R₁ ∪ R₂)(t) = R₁(t) + R₂(t); deduplication-with-annotation-sum is therefore automatic. Our library works with Multiset (AnnotatedTuple T K n) instead, so the deduplication step is exposed as a separate Dedup operator that explicitly sums duplicate-tuple annotations. The Dedup case below is the bookkeeping artifact of working in multiset semantics; it requires only that h respect +, not monus, and so still belongs to Green et al.'s proposition rather than to the difference extension below.

The genuine extension, covering Diff (the only case that requires monus), is Geerts & Poggi, On database query languages for K-relations, Proposition 1: for any m-semiring map h : K → K', the transformation commutes with every RA⁺(\) query if and only if h is an m-semiring homomorphism (i.e. it preserves monus in addition to addition and multiplication). The Diff case of Query.evaluateAnnotated_hom crucially uses SemiringWithMonusHom's map_sub field; this is the only case that goes beyond Green et al.'s setting.

Geerts & Poggi also observe (ibid., Example 10) that ℕ[X] is not the universal m-semiring (the analog of Theorem 4.3 of Green et al. fails for RA⁺(\)); hence this commutation theorem genuinely lives at the level of m-homomorphisms, and there is no single universal factorization through ℕ[X].

Main definitions and results #

Push an annotated tuple forward along a semiring homomorphism: the data component is left untouched, the annotation is mapped through h.

Equations
Instances For

    Push an annotated relation forward along a semiring homomorphism.

    Equations
    Instances For

      Push an annotated database forward along a semiring homomorphism.

      Equations
      Instances For
        @[simp]

        The data side of an annotated relation is unaffected by the pushforward.

        @[simp]

        Mapping the annotation side pushes the homomorphism through.

        Filtering by an fst-only predicate commutes with the pushforward.

        Annotation-side sum over the entries with a given first component commutes with the pushforward: it pulls out h.toRingHom.

        Filter.eval on the data side commutes with pushforward, in the form needed to interchange Multiset.filter and mapAnnotatedRelation in the Sel case.

        theorem groupByKey_find_eq_filter_sum {T : Type} [ValueType T] [AddCommSemigroup T] [Sub T] [Mul T] {K : Type} [SemiringWithMonus K] [DecidableEq K] {n : } (r : AnnotatedRelation T K n) (u : Tuple T n) :
        (Option.map Prod.snd (List.find? (fun (x : Tuple T n × K) => decide (x.1 = u)) (groupByKey r))).getD 0 = (Multiset.map Prod.snd (Multiset.filter (fun (p : AnnotatedTuple T K n) => p.1 = u) r)).sum

        find? on groupByKey, projected to the value side, equals the semiring sum of the annotations at the chosen key. Bridges the groupByKey-based aggregation used in evaluateAnnotated's Diff clause with the filter-based aggregation that sum_filter_map_snd_mapAnnotatedRelation reasons about.

        The commutation theorem #

        Commutation of query evaluation with m-semiring homomorphisms. For an m-semiring homomorphism h : K → K', evaluating a (non-aggregation) query on the pushed-forward annotated database equals pushing forward the result of evaluating on the original database.

        This is the “if” direction of Green, Karvounarakis & Tannen, Proposition 3.5 for the positive cases (Rel, Proj, Sel, Prod, Sum, Dedup), and of Geerts & Poggi, Proposition 1 for the only case that needs monus (Diff). The Dedup case is the explicit form, in our multiset semantics, of the annotation-summing that is automatic in Green et al.'s set semantics; it still requires only that h respect +. The “only if” direction (that commutation forces h to be an m-hom) is not formalised here.

        Aggregation (Agg) is excluded via the noAgg precondition; see the comment on Query.evaluateAnnotated for the underlying limitation.