Evaluating rewritten queries in the V_K-lifted semantics #

This file defines Query.evaluateInVK, the V_K interpretation of a rewritten query q̂ : Query (T ⊕ K) n evaluated against the composite encoding Î.toComposite of a K-annotated database Î. It is the “corrected” target of the rewriting rule (R5) of Sen, Maniu & Senellart, ProvSQL, avoiding the information loss that the plain T ⊕ K Mul introduces on mixed operands.

Why a separate evaluator #

The rules (R1)–(R4) of Query.rewriting produce queries that only multiply same-kind values (data × data or annotation × annotation), so evaluating them via the standard Query.evaluate on Î.toComposite yields the right semantics – the mixed Mul rule is never exercised.

The aggregation rule (R5) is different. Its per-column rewritten term is tⱼ * #(k+1), which evaluates Sum.inl v * Sum.inr α on the composite tuple – and the existing ValueType (T ⊕ K) Mul collapses this to Sum.inl v, dropping the K-side α before the aggregator ever sees it. The paper resolves this by interpreting the rewritten aggregation in the K-semimodule V_K, where the product is the K-tensor monomial α ⊗ v.

Query.evaluateInVK realises that interpretation:

The result type is Multiset (Tuple (LiftedTK T K) n) rather than Multiset (Tuple (T ⊕ K) n).
For all non-Agg constructors, evaluation reduces to Query.evaluate ∘ Î.toComposite followed by a pointwise LiftedTK.ofSum lift. The two evaluators agree there because mixed Mul never fires.
For Agg, the aggregator works directly on LiftedTK values: the per-row term is evaluated via Term.evalInVK (which uses LiftedTK's K-tensor-producing Mul), and the per-column aggregator is interpreted in V_K – AggFunc.sum is multiset union on KTensor K T, and AggFunc.sumDelta is the same followed by a δ application on the K side.

Scope #

Aggregation is assumed to occur at the root only, matching both the ICDE paper's convention and the constraint of the existing Query.evaluateAggSum. Nested aggregations (an Agg inside an Agg) are not exercised here.
Filter predicates (Sel) inside the rewritten query operate on data-side values that were produced by castToAnnotatedTuple, so they never compare ktensor values. The reduction to the plain evaluate on T ⊕ K is safe for these.

References #

Sen, Maniu & Senellart, ProvSQL (Section IV-B, Definition 7, R5)
Amsterdamer, Deutch & Tannen

Term evaluation in V_K #

source

def Term.evalInVK {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] {n : ℕ} :

Term (T ⊕ K) n → Tuple (LiftedTK T K) n → LiftedTK T K

Evaluate a term Term (T ⊕ K) n in V_K semantics, against a tuple of LiftedTK T K values. The crucial point is that the mul case uses LiftedTK's Mul, which produces a ktensor monomial on mixed data v × ann α operands.

Equations

(Term.const x_2).evalInVK x✝ = LiftedTK.ofSum x_2
(#k).evalInVK x✝ = x✝ k
(t₁.add t₂).evalInVK x✝ = t₁.evalInVK x✝ + t₂.evalInVK x✝
(t₁.sub t₂).evalInVK x✝ = t₁.evalInVK x✝ - t₂.evalInVK x✝
(t₁.mul t₂).evalInVK x✝ = t₁.evalInVK x✝ * t₂.evalInVK x✝

Instances For

δ on `LiftedTK` #

source

def LiftedTK.applyDelta {T K : Type} [CommSemiringWithMonus K] :

LiftedTK T K → LiftedTK T K

Apply the K-semiring's δ to the K-side of a LiftedTK value. Identity on data v; applies SemiringWithMonus.delta on ann α; identity on ktensor t (the un-quotiented representation does not support δ on tensors, and the (R5) rewriting does not require it).

Equations

(LiftedTK.data v).applyDelta = LiftedTK.data v
(LiftedTK.ann α).applyDelta = LiftedTK.ann (SemiringWithMonus.delta α)
(LiftedTK.ktensor t).applyDelta = LiftedTK.ktensor t

Instances For

Helper lemmas for the (R5) correctness proof #

These lemmas isolate the small algebraic facts that make the V_K interpretation of the rewritten aggregation match the Definition 7 annotated semantics:

Term.castToAnnotatedTuple_evalInVK / Term.evalInVK_index_last – evaluating the rewritten per-column term tⱼ * #(k+1) on a toLifted tuple sees data on the left and an ann α on the right, so its V_K product produces the K-tensor monomial α ⊗ vⱼ.
fold_add_ann / fold_add_ktensor_nonempty – folding a multiset of LiftedTK.ann (resp. non-empty LiftedTK.ktensor) cells with (· + ·) collapses to a single ann (resp. ktensor) wrapping the underlying carrier's sum.
KTensor.sum_map_embed – summing a multiset of single-monomial K-tensors α ⊗ v recovers the underlying multiset of pairs (α, v).

Together with the rewriting-correctness theorem on the inner query and the bridge toLifted_eq_map_ofSum_toComposite, these reduce the per-key body of the V_K-interpreted rewriting to the Definition 7 form.

source

theorem Term.castToAnnotatedTuple_evalInVK {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] {m : ℕ} (t : Term T m) (p : AnnotatedTuple T K m) :

t.castToAnnotatedTuple.evalInVK p.toLifted = LiftedTK.data (t.eval p.1)

The V_K interpretation of a castToAnnotatedTuple-lifted term on a toLifted tuple is the data-side evaluation: the original term ranges over the first m data columns of p.toLifted, which are exactly the LiftedTK.data wrappings of p.fst.

source

theorem Term.evalInVK_index_last {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] {m : ℕ} (p : AnnotatedTuple T K m) :

(#(Fin.last m)).evalInVK p.toLifted = LiftedTK.ann p.2

The V_K interpretation of the K-side index #(m+1) on a toLifted tuple is the row's LiftedTK.ann annotation.

source

theorem LiftedTK.fold_add_ann {T K : Type} [CommSemiringWithMonus K] [AddCommSemigroup T] (s : Multiset K) :

Multiset.fold (fun (x1 x2 : LiftedTK T K) => x1 + x2) 0 (Multiset.map ann s) = ann s.sum

Folding (· + ·) over a multiset of LiftedTK.ann cells (starting from the additive identity ann 0) collapses to a single ann wrapping the K-side multiset sum.

source

theorem LiftedTK.fold_add_ktensor_nonempty {T K : Type} [CommSemiringWithMonus K] [AddCommSemigroup T] (s : Multiset (KTensor K T)) (h : s ≠ 0) :

Multiset.fold (fun (x1 x2 : LiftedTK T K) => x1 + x2) 0 (Multiset.map ktensor s) = ktensor s.sum

Folding (· + ·) over a non-empty multiset of LiftedTK.ktensor cells collapses to a single ktensor wrapping the underlying KTensor sum. Non-emptiness is required because the empty fold would return the initial ann 0, which is not of the form ktensor _.

source

@[simp]

theorem Multiset.map_ofSum_toComposite {T K : Type} {m : ℕ} (ar : AnnotatedRelation T K m) :

map (fun (tuple : Tuple (T ⊕ K) (m + 1)) (k : Fin (m + 1)) => LiftedTK.ofSum (tuple k)) ar.toComposite = ar.toLifted

A simp-friendly restatement of the toComposite + ofSum-on-tuples ↔ toLifted bridge.

The lambda's return-type annotation Tuple (LiftedTK T K) (m + 1) is load-bearing: without it, the lambda elaborates with the unfolded function-type codomain (k : Fin (m+1)) → LiftedTK T K, which prevents rw/simp from finding the pattern in Query.evaluateInVK's goal, whose Multiset.map carries the unfolded Tuple-named codomain. With this annotation, simp only [Multiset.map_ofSum_toComposite] fires reliably in the (R5) correctness proof.

source

theorem KTensor.sum_map_embed {T K α : Type} (s : Multiset α) (f : α → K) (g : α → T) :

(Multiset.map (fun (x : α) => embed (f x) (g x)) s).sum = Multiset.map (fun (x : α) => (f x, g x)) s

Summing a multiset of single-monomial K-tensors α ⊗ v recovers the underlying multiset of pairs (α, v).

Query evaluation in V_K #

source

noncomputable def Query.evaluateInVK {T : Type} [ValueType T] {K : Type} [HasAltLinearOrder K] [CommSemiringWithMonus K] [DecidableEq K] {n : ℕ} :

Query (T ⊕ K) n → AnnotatedDatabase T K → Multiset (Tuple (LiftedTK T K) n)

V_K interpretation of a rewritten query. See file docstring for the design rationale and the scope (aggregation-at-root only).

Equations

One or more equations did not get rendered due to their size.
x✝¹.evaluateInVK x✝ = Multiset.map (fun (tuple : Tuple (T ⊕ K) x✝²) (k : Fin x✝²) => LiftedTK.ofSum (tuple k)) (x✝¹.evaluate x✝.toComposite)

Instances For

Unified annotated semantics #

Query.evaluateAnnotatedFull is the single annotated semantics that matches the rewriting target. For non-aggregating queries it coincides (via AnnotatedRelation.toLifted) with the existing Query.evaluateAnnotated, and for top-level aggregations it is the Definition 7 semantics of Sen, Maniu & Senellart: each output row carries n₁ grouping data values, n₂ K-tensor annotations (Σ αᵤ ⊗ value_u_k), and one δ-applied K annotation (δ(Σ αᵤ)). The shared output type is Multiset (Tuple (LiftedTK T K) (n + 1)).

source

noncomputable def Query.evaluateAnnotatedFull {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] [DecidableEq K] [AddCommSemigroup T] [Zero T] {n : ℕ} (q : Query T n) :

q.wellFormed → AnnotatedDatabase T K → Multiset (Tuple (LiftedTK T K) (n + 1))

Unified K-annotated semantics in LiftedTK form. Dispatches on whether the query is a top-level aggregation; otherwise lifts Query.evaluateAnnotated.

Equations

One or more equations did not get rendered due to their size.
(Query.Rel x s).evaluateAnnotatedFull h x_1 = ((Query.Rel x s).evaluateAnnotated h x_1).toLifted
(Π ts q').evaluateAnnotatedFull h x_1 = ((Π ts q').evaluateAnnotated h x_1).toLifted
(σ φ q').evaluateAnnotatedFull h x_1 = ((σ φ q').evaluateAnnotated h x_1).toLifted
(q₁ × q₂).evaluateAnnotatedFull h x_1 = ((q₁ × q₂).evaluateAnnotated h x_1).toLifted
(q₁ ⊎ q₂).evaluateAnnotatedFull h x_1 = ((q₁ ⊎ q₂).evaluateAnnotated h x_1).toLifted
(ε q').evaluateAnnotatedFull h x_1 = ((ε q').evaluateAnnotated h x_1).toLifted
(q₁ - q₂).evaluateAnnotatedFull h x_1 = ((q₁ - q₂).evaluateAnnotated h x_1).toLifted

Instances For

Unified rewriting correctness #

The single theorem Query.rewriting_valid_full packages both the (R1)–(R4) correctness (via Query.rewriting_valid) and the (R5) correctness into a uniform statement: for any well-formed query (no aggregation, or a top-level aggregation with a non-aggregating inner query), the LiftedTK-form annotated semantics agrees with the V_K interpretation of the rewriting.

source

theorem Query.rewriting_valid_full {T : Type} [ValueType T] {K : Type} [CommSemiringWithMonus K] [DecidableEq K] [HasAltLinearOrder K] {n : ℕ} (q : Query T n) (hq : q.wellFormed) (Î : AnnotatedDatabase T K) :

q.evaluateAnnotatedFull hq Î = (q.rewritingFull hq).evaluateInVK Î

Unified rewriting correctness. For any well-formed query q (non-aggregating, or top-level aggregation with non-aggregating inner query), the Def-7-style annotated semantics of q on Î matches the V_K interpretation of the rewritten query on Î.toComposite.

R1–R4 are proven via the existing Query.rewriting_valid plus the bridge between AnnotatedRelation.toLifted and the composite-then-lift on tuples. R5 is fully proved: the dedup/projection factor via liftData : g k ↦ data (g k), the matching multisets bridge via Multiset.filter_map and LiftedTK.data injectivity, and the per-column body equality follows from Term.castToAnnotatedTuple_evalInVK, Term.evalInVK_index_last, LiftedTK.fold_add_ann, LiftedTK.fold_add_ktensor_nonempty, and KTensor.sum_map_embed.

Documentation

Provenance.QueryEvaluateInVK

Evaluating rewritten queries in the V_K-lifted semantics #

Why a separate evaluator #

Scope #

References #

Term evaluation in V_K #

δ on `LiftedTK` #

Helper lemmas for the (R5) correctness proof #

Query evaluation in V_K #

Unified annotated semantics #

Unified rewriting correctness #

Evaluating rewritten queries in the V_K-lifted semantics #

Why a separate evaluator #

Scope #

References #

Term evaluation in V_K #

δ on LiftedTK #

Helper lemmas for the (R5) correctness proof #

Query evaluation in V_K #

Unified annotated semantics #

Unified rewriting correctness #

δ on `LiftedTK` #