Evaluating rewritten queries in the V_K-lifted semantics #
This file defines Query.evaluateInVK, the V_K interpretation of a
rewritten query q̂ : Query (T ⊕ K) n evaluated against the composite
encoding Î.toComposite of a K-annotated database Î. It is the
“corrected” target of the rewriting rule (R5) of
Sen, Maniu & Senellart, ProvSQL, avoiding the
information loss that the plain T ⊕ K Mul introduces on mixed
operands.
Why a separate evaluator #
The rules (R1)–(R4) of Query.rewriting produce queries that only
multiply same-kind values (data × data or annotation × annotation), so
evaluating them via the standard Query.evaluate on Î.toComposite
yields the right semantics – the mixed Mul rule is never exercised.
The aggregation rule (R5) is different. Its per-column rewritten term is
tⱼ * #(k+1), which evaluates Sum.inl v * Sum.inr α on the composite
tuple – and the existing ValueType (T ⊕ K) Mul collapses this to
Sum.inl v, dropping the K-side α before the aggregator ever sees it.
The paper resolves this by interpreting the rewritten aggregation in the
K-semimodule V_K, where the product is the K-tensor monomial α ⊗ v.
Query.evaluateInVK realises that interpretation:
- The result type is
Multiset (Tuple (LiftedTK T K) n)rather thanMultiset (Tuple (T ⊕ K) n). - For all non-
Aggconstructors, evaluation reduces toQuery.evaluate ∘ Î.toCompositefollowed by a pointwiseLiftedTK.ofSumlift. The two evaluators agree there because mixedMulnever fires. - For
Agg, the aggregator works directly onLiftedTKvalues: the per-row term is evaluated viaTerm.evalInVK(which usesLiftedTK's K-tensor-producingMul), and the per-column aggregator is interpreted in V_K –AggFunc.sumis multiset union onKTensor K T, andAggFunc.sumDeltais the same followed by a δ application on the K side.
Scope #
- Aggregation is assumed to occur at the root only, matching both the
ICDE paper's convention and the constraint of the existing
Query.evaluateAggSum. Nested aggregations (anAgginside anAgg) are not exercised here. - Filter predicates (
Sel) inside the rewritten query operate on data-side values that were produced bycastToAnnotatedTuple, so they never comparektensorvalues. The reduction to the plainevaluateonT ⊕ Kis safe for these.
References #
- Sen, Maniu & Senellart, ProvSQL (Section IV-B, Definition 7, R5)
- Amsterdamer, Deutch & Tannen
Term evaluation in V_K #
Evaluate a term Term (T ⊕ K) n in V_K semantics, against a tuple of
LiftedTK T K values. The crucial point is that the mul case uses
LiftedTK's Mul, which produces a ktensor monomial on mixed
data v × ann α operands.
Equations
Instances For
Apply the K-semiring's δ to the K-side of a LiftedTK value. Identity
on data v; applies SemiringWithMonus.delta on ann α; identity on
ktensor t (the un-quotiented representation does not support δ on
tensors, and the (R5) rewriting does not require it).
Equations
Instances For
Helper lemmas for the (R5) correctness proof #
These lemmas isolate the small algebraic facts that make the V_K interpretation of the rewritten aggregation match the Definition 7 annotated semantics:
Term.castToAnnotatedTuple_evalInVK/Term.evalInVK_index_last– evaluating the rewritten per-column termtⱼ * #(k+1)on atoLiftedtuple sees data on the left and anann αon the right, so its V_K product produces the K-tensor monomialα ⊗ vⱼ.fold_add_ann/fold_add_ktensor_nonempty– folding a multiset ofLiftedTK.ann(resp. non-emptyLiftedTK.ktensor) cells with(· + ·)collapses to a singleann(resp.ktensor) wrapping the underlying carrier's sum.KTensor.sum_map_embed– summing a multiset of single-monomial K-tensorsα ⊗ vrecovers the underlying multiset of pairs(α, v).
Together with the rewriting-correctness theorem on the inner query and
the bridge toLifted_eq_map_ofSum_toComposite, these reduce the per-key
body of the V_K-interpreted rewriting to the Definition 7 form.
The V_K interpretation of a castToAnnotatedTuple-lifted term on a
toLifted tuple is the data-side evaluation: the original term ranges
over the first m data columns of p.toLifted, which are exactly the
LiftedTK.data wrappings of p.fst.
The V_K interpretation of the K-side index #(m+1) on a toLifted
tuple is the row's LiftedTK.ann annotation.
Folding (· + ·) over a multiset of LiftedTK.ann cells (starting
from the additive identity ann 0) collapses to a single ann wrapping
the K-side multiset sum.
Folding (· + ·) over a non-empty multiset of LiftedTK.ktensor cells
collapses to a single ktensor wrapping the underlying KTensor sum.
Non-emptiness is required because the empty fold would return the
initial ann 0, which is not of the form ktensor _.
A simp-friendly restatement of the
toComposite + ofSum-on-tuples ↔ toLifted bridge.
The lambda's return-type annotation Tuple (LiftedTK T K) (m + 1) is
load-bearing: without it, the lambda elaborates with the unfolded
function-type codomain (k : Fin (m+1)) → LiftedTK T K, which prevents
rw/simp from finding the pattern in Query.evaluateInVK's goal,
whose Multiset.map carries the unfolded Tuple-named codomain. With
this annotation, simp only [Multiset.map_ofSum_toComposite] fires
reliably in the (R5) correctness proof.
Summing a multiset of single-monomial K-tensors α ⊗ v recovers the
underlying multiset of pairs (α, v).
Query evaluation in V_K #
V_K interpretation of a rewritten query. See file docstring for the design rationale and the scope (aggregation-at-root only).
Equations
- One or more equations did not get rendered due to their size.
- x✝¹.evaluateInVK x✝ = Multiset.map (fun (tuple : Tuple (T ⊕ K) x✝²) (k : Fin x✝²) => LiftedTK.ofSum (tuple k)) (x✝¹.evaluate x✝.toComposite)
Instances For
Unified annotated semantics #
Query.evaluateAnnotatedFull is the single annotated semantics that
matches the rewriting target. For non-aggregating queries it coincides
(via AnnotatedRelation.toLifted) with the existing
Query.evaluateAnnotated, and for top-level aggregations it is the
Definition 7 semantics of Sen, Maniu & Senellart:
each output row carries n₁ grouping data values, n₂ K-tensor
annotations (Σ αᵤ ⊗ value_u_k), and one δ-applied K annotation
(δ(Σ αᵤ)). The shared output type is
Multiset (Tuple (LiftedTK T K) (n + 1)).
Unified K-annotated semantics in LiftedTK form. Dispatches on
whether the query is a top-level aggregation; otherwise lifts
Query.evaluateAnnotated.
Equations
- One or more equations did not get rendered due to their size.
- (Query.Rel x s).evaluateAnnotatedFull h x_1 = ((Query.Rel x s).evaluateAnnotated h x_1).toLifted
- (Π ts q').evaluateAnnotatedFull h x_1 = ((Π ts q').evaluateAnnotated h x_1).toLifted
- (σ φ q').evaluateAnnotatedFull h x_1 = ((σ φ q').evaluateAnnotated h x_1).toLifted
- (q₁ × q₂).evaluateAnnotatedFull h x_1 = ((q₁ × q₂).evaluateAnnotated h x_1).toLifted
- (q₁ ⊎ q₂).evaluateAnnotatedFull h x_1 = ((q₁ ⊎ q₂).evaluateAnnotated h x_1).toLifted
- (ε q').evaluateAnnotatedFull h x_1 = ((ε q').evaluateAnnotated h x_1).toLifted
- (q₁ - q₂).evaluateAnnotatedFull h x_1 = ((q₁ - q₂).evaluateAnnotated h x_1).toLifted
Instances For
Unified rewriting correctness #
The single theorem Query.rewriting_valid_full packages both the
(R1)–(R4) correctness (via Query.rewriting_valid) and the (R5)
correctness into a uniform statement: for any well-formed query (no
aggregation, or a top-level aggregation with a non-aggregating inner
query), the LiftedTK-form annotated semantics agrees with the V_K
interpretation of the rewriting.
Unified rewriting correctness. For any well-formed query q
(non-aggregating, or top-level aggregation with non-aggregating inner
query), the Def-7-style annotated semantics of q on Î matches the
V_K interpretation of the rewritten query on Î.toComposite.
R1–R4 are proven via the existing Query.rewriting_valid plus the
bridge between AnnotatedRelation.toLifted and the composite-then-lift
on tuples. R5 is fully proved: the dedup/projection factor via
liftData : g k ↦ data (g k), the matching multisets bridge via
Multiset.filter_map and LiftedTK.data injectivity, and the
per-column body equality follows from Term.castToAnnotatedTuple_evalInVK,
Term.evalInVK_index_last, LiftedTK.fold_add_ann,
LiftedTK.fold_add_ktensor_nonempty, and KTensor.sum_map_embed.