Documentation

Provenance.QueryEvaluateInVK

Evaluating rewritten queries in the V_K-lifted semantics #

This file defines Query.evaluateInVK, the V_K interpretation of a rewritten query q̂ : Query (T ⊕ K) n evaluated against the composite encoding Î.toComposite of a K-annotated database Î. It is the “corrected” target of the rewriting rule (R5) of Sen, Maniu & Senellart, ProvSQL, avoiding the information loss that the plain T ⊕ K Mul introduces on mixed operands.

Why a separate evaluator #

The rules (R1)–(R4) of Query.rewriting produce queries that only multiply same-kind values (data × data or annotation × annotation), so evaluating them via the standard Query.evaluate on Î.toComposite yields the right semantics – the mixed Mul rule is never exercised.

The aggregation rule (R5) is different. Its per-column rewritten term is tⱼ * #(k+1), which evaluates Sum.inl v * Sum.inr α on the composite tuple – and the existing ValueType (T ⊕ K) Mul collapses this to Sum.inl v, dropping the K-side α before the aggregator ever sees it. The paper resolves this by interpreting the rewritten aggregation in the K-semimodule V_K, where the product is the K-tensor monomial α ⊗ v.

Query.evaluateInVK realises that interpretation:

Scope #

References #

Term evaluation in V_K #

def Term.evalInVK {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] {n : } :
Term (T K) nTuple (LiftedTK T K) nLiftedTK T K

Evaluate a term Term (T ⊕ K) n in V_K semantics, against a tuple of LiftedTK T K values. The crucial point is that the mul case uses LiftedTK's Mul, which produces a ktensor monomial on mixed data v × ann α operands.

Equations
Instances For

    δ on LiftedTK #

    Apply the K-semiring's δ to the K-side of a LiftedTK value. Identity on data v; applies SemiringWithMonus.delta on ann α; identity on ktensor t (the un-quotiented representation does not support δ on tensors, and the (R5) rewriting does not require it).

    Equations
    Instances For

      Helper lemmas for the (R5) correctness proof #

      These lemmas isolate the small algebraic facts that make the V_K interpretation of the rewritten aggregation match the Definition 7 annotated semantics:

      Together with the rewriting-correctness theorem on the inner query and the bridge toLifted_eq_map_ofSum_toComposite, these reduce the per-key body of the V_K-interpreted rewriting to the Definition 7 form.

      The V_K interpretation of a castToAnnotatedTuple-lifted term on a toLifted tuple is the data-side evaluation: the original term ranges over the first m data columns of p.toLifted, which are exactly the LiftedTK.data wrappings of p.fst.

      The V_K interpretation of the K-side index #(m+1) on a toLifted tuple is the row's LiftedTK.ann annotation.

      theorem LiftedTK.fold_add_ann {T K : Type} [CommSemiringWithMonus K] [AddCommSemigroup T] (s : Multiset K) :
      Multiset.fold (fun (x1 x2 : LiftedTK T K) => x1 + x2) 0 (Multiset.map ann s) = ann s.sum

      Folding (· + ·) over a multiset of LiftedTK.ann cells (starting from the additive identity ann 0) collapses to a single ann wrapping the K-side multiset sum.

      theorem LiftedTK.fold_add_ktensor_nonempty {T K : Type} [CommSemiringWithMonus K] [AddCommSemigroup T] (s : Multiset (KTensor K T)) (h : s 0) :
      Multiset.fold (fun (x1 x2 : LiftedTK T K) => x1 + x2) 0 (Multiset.map ktensor s) = ktensor s.sum

      Folding (· + ·) over a non-empty multiset of LiftedTK.ktensor cells collapses to a single ktensor wrapping the underlying KTensor sum. Non-emptiness is required because the empty fold would return the initial ann 0, which is not of the form ktensor _.

      @[simp]
      theorem Multiset.map_ofSum_toComposite {T K : Type} {m : } (ar : AnnotatedRelation T K m) :
      map (fun (tuple : Tuple (T K) (m + 1)) (k : Fin (m + 1)) => LiftedTK.ofSum (tuple k)) ar.toComposite = ar.toLifted

      A simp-friendly restatement of the toComposite + ofSum-on-tuples ↔ toLifted bridge.

      The lambda's return-type annotation Tuple (LiftedTK T K) (m + 1) is load-bearing: without it, the lambda elaborates with the unfolded function-type codomain (k : Fin (m+1)) → LiftedTK T K, which prevents rw/simp from finding the pattern in Query.evaluateInVK's goal, whose Multiset.map carries the unfolded Tuple-named codomain. With this annotation, simp only [Multiset.map_ofSum_toComposite] fires reliably in the (R5) correctness proof.

      theorem KTensor.sum_map_embed {T K α : Type} (s : Multiset α) (f : αK) (g : αT) :
      (Multiset.map (fun (x : α) => embed (f x) (g x)) s).sum = Multiset.map (fun (x : α) => (f x, g x)) s

      Summing a multiset of single-monomial K-tensors α ⊗ v recovers the underlying multiset of pairs (α, v).

      Query evaluation in V_K #

      noncomputable def Query.evaluateInVK {T : Type} [ValueType T] {K : Type} [HasAltLinearOrder K] [CommSemiringWithMonus K] [DecidableEq K] {n : } :
      Query (T K) nAnnotatedDatabase T KMultiset (Tuple (LiftedTK T K) n)

      V_K interpretation of a rewritten query. See file docstring for the design rationale and the scope (aggregation-at-root only).

      Equations
      Instances For

        Unified annotated semantics #

        Query.evaluateAnnotatedFull is the single annotated semantics that matches the rewriting target. For non-aggregating queries it coincides (via AnnotatedRelation.toLifted) with the existing Query.evaluateAnnotated, and for top-level aggregations it is the Definition 7 semantics of Sen, Maniu & Senellart: each output row carries n₁ grouping data values, n₂ K-tensor annotations (Σ αᵤ ⊗ value_u_k), and one δ-applied K annotation (δ(Σ αᵤ)). The shared output type is Multiset (Tuple (LiftedTK T K) (n + 1)).

        noncomputable def Query.evaluateAnnotatedFull {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] [DecidableEq K] [AddCommSemigroup T] [Zero T] {n : } (q : Query T n) :
        q.wellFormedAnnotatedDatabase T KMultiset (Tuple (LiftedTK T K) (n + 1))

        Unified K-annotated semantics in LiftedTK form. Dispatches on whether the query is a top-level aggregation; otherwise lifts Query.evaluateAnnotated.

        Equations
        Instances For

          Unified rewriting correctness #

          The single theorem Query.rewriting_valid_full packages both the (R1)–(R4) correctness (via Query.rewriting_valid) and the (R5) correctness into a uniform statement: for any well-formed query (no aggregation, or a top-level aggregation with a non-aggregating inner query), the LiftedTK-form annotated semantics agrees with the V_K interpretation of the rewriting.

          Unified rewriting correctness. For any well-formed query q (non-aggregating, or top-level aggregation with non-aggregating inner query), the Def-7-style annotated semantics of q on Î matches the V_K interpretation of the rewritten query on Î.toComposite.

          R1–R4 are proven via the existing Query.rewriting_valid plus the bridge between AnnotatedRelation.toLifted and the composite-then-lift on tuples. R5 is fully proved: the dedup/projection factor via liftData : g k ↦ data (g k), the matching multisets bridge via Multiset.filter_map and LiftedTK.data injectivity, and the per-column body equality follows from Term.castToAnnotatedTuple_evalInVK, Term.evalInVK_index_last, LiftedTK.fold_add_ann, LiftedTK.fold_add_ktensor_nonempty, and KTensor.sum_map_embed.