Documentation

Provenance.QueryRewriting

Query evaluation by rewriting #

This file provides an alternative approach to evaluating queries on annotated databases: instead of directly interpreting operators over annotated tuples, a query on T is rewritten into a query on T ⊕ K that operates on plain tuples whose values encode both data and provenance.

The rewriting implemented here realises rules (R1)–(R5) from Sen, Maniu & Senellart, ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data.

A correctness proof that Query.rewriting agrees with Query.evaluateAnnotated is fully formalised for rules (R1)–(R4): each operator is machine-checked end-to-end. The Diff case splits into an unmatched_eq half (proved via the semijoin identity Multiset.semijoin_proj_eq_filter, after bridging the LinearOrder.toDecidableEq vs instDecidableEqSum mismatch on the inner dedup via Query.rewriting_valid_diff_inner_dd_inst) and a matched_eq half (proved via the keyed-projection semijoin Multiset.semijoin_keyed_proj_eq_filter, after substituting the inner aggregation with the closed-form Query.evaluate_agg_rewriting_eq). The (R5) aggregation correctness lives in Provenance.QueryEvaluateInVK as Query.rewriting_valid_full, with its V_K interpretation; the syntactic (R5) rewriting itself is in this file as Query.rewritingAgg.

References #

Sen, Maniu & Senellart, ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data

def Query.rewriting {T : Type} {n : ℕ} {K : Type} [ValueType T] (q : Query T n) (hq : q.noAgg) :

Query (T ⊕ K) (n + 1)

Equations

One or more equations did not get rendered due to their size.
(Query.Rel n s).rewriting hq_2 = Query.Rel (n + 1) s
(Π ts q_2).rewriting hq_2 = (Π fun (k : Fin (n + 1)) => if h : ↑k < n then (ts ⟨↑k, h⟩).castToAnnotatedTuple else #(Fin.last q_2.arity)) (q_2.rewriting ⋯)
(σ φ q_2).rewriting hq_2 = σ φ.castToAnnotatedTuple (q_2.rewriting ⋯)
(q₁ ⊎ q₂).rewriting hq_2 = (q₁.rewriting ⋯ ⊎ q₂.rewriting ⋯)
(ε q_2).rewriting hq_2 = Query.Agg (fun (k : Fin n) => Fin.castLE ⋯ k) ![#(Fin.last n)] ![AggFunc.sum ] (q_2.rewriting ⋯)
(Query.Agg a a_1 a_2 a_3).rewriting hq_2 = False.elim hq_2

Instances For

theorem Query.rewriting_valid_prod_heqn {n₁ n₂ n : ℕ} (hn : n₁ + n₂ = n) :

n₁ + 1 + (n₂ + 1) = n + 2

theorem Query.rewriting_valid_prod0 {K T : Type} [Mul K] {n₁ n₂ n : ℕ} (hn : n₁ + n₂ = n) (heq : (Fin (n₁ + n₂) → T) = (Fin n → T)) (ar₁ : AnnotatedRelation T K n₁) (ar₂ : AnnotatedRelation T K n₂) :

AnnotatedRelation.toComposite (Multiset.map (fun (x : AnnotatedTuple T K n₁ × AnnotatedTuple T K n₂) => (cast heq (Fin.append x.1.1 x.2.1), x.1.2 * x.2.2)) (Multiset.product ar₁ ar₂)) = Relation.cast ⋯ (AnnotatedRelation.toComposite (Multiset.map (fun (x : AnnotatedTuple T K n₁ × AnnotatedTuple T K n₂) => (Fin.append x.1.1 x.2.1, x.1.2 * x.2.2)) (Multiset.product ar₁ ar₂)))

theorem cast_apply {T : Type} {n : ℕ} {α : Sort u_1} {m : ℕ} (f : Tuple T n → α) (t : Tuple T m) (hn : n = m) :

cast ⋯ f t = f (Tuple.cast ⋯ t)

theorem Query.rewriting_valid_prod1 {T K : Type} {n₂ n₁ n : ℕ} [ValueType (T ⊕ K)] (hn : n₁ + 1 + (n₂ + 1) = n + 2) (f : Tuple (T ⊕ K) (n + 2) → Tuple (T ⊕ K) (n + 1)) (r : Relation (T ⊕ K) (n₁ + 1 + (n₂ + 1))) :

Multiset.map f (Relation.cast hn r) = Multiset.map (fun (t : Tuple (T ⊕ K) (n₁ + 1 + (n₂ + 1))) => f (Tuple.cast hn t)) r

theorem Query.rewriting_append_left {T : Type} {n₁ n₂ n : ℕ} (t₁ : Tuple T n₁) (t₂ : Tuple T n₂) (hn : n₁ + n₂ = n) (k : Fin n) (hk : ↑k < n₁) :

Eq.rec (motive := fun (x : ℕ) (h : n₁ + n₂ = x) => Fin x → T) (Fin.append t₁ t₂) hn k = t₁ (k.castLT hk)

theorem Query.rewriting_append_right {T : Type} {n₁ n₂ n : ℕ} (t₁ : Tuple T n₁) (t₂ : Tuple T n₂) (hn : n₁ + n₂ = n) (k : Fin n) (hk : ¬↑k < n₁) :

Eq.rec (motive := fun (x : ℕ) (h : n₁ + n₂ = x) => Fin x → T) (Fin.append t₁ t₂) hn k = t₂ ⟨↑k - n₁, ⋯⟩

theorem Query.tupleCast_append_left {T : Type} {n₁ n₂ n : ℕ} (t₁ : Tuple T n₁) (t₂ : Tuple T n₂) (hn : n₁ + n₂ = n) (k : Fin n) (hk : ↑k < n₁) :

Tuple.cast hn (Fin.append t₁ t₂) k = t₁ (k.castLT hk)

Tuple.cast-flavoured variant of rewriting_append_left. Both Tuple.cast's and ▸'s Eq.rec motives must syntactically agree for rw to fire on Lean v4.29; this version matches the motive produced by Tuple.cast.

theorem Query.tupleCast_append_right {T : Type} {n₁ n₂ n : ℕ} (t₁ : Tuple T n₁) (t₂ : Tuple T n₂) (hn : n₁ + n₂ = n) (k : Fin n) (hk : ¬↑k < n₁) :

Tuple.cast hn (Fin.append t₁ t₂) k = t₂ ⟨↑k - n₁, ⋯⟩

Tuple.cast-flavoured variant of rewriting_append_right.

Helper lemmas for the `Dedup` case of `rewriting_valid` #

theorem Multiset.fold_addFn_map_inr {T K : Type} [ValueType T] [SemiringWithMonus K] [HasAltLinearOrder K] (m : Multiset K) :

fold addFn 0 (map (fun (k : K) => Sum.inr k) m) = Sum.inr m.sum

Folding addFn over a multiset of Sum.inr k values in T⊕K reduces to the Multiset.sum in K, wrapped in Sum.inr.

theorem AnnotatedRelation.toComposite_filter_map_last {T K : Type} [ValueType T] [DecidableEq K] {n : ℕ} (ar : AnnotatedRelation T K n) (v : Tuple T n) :

Multiset.map (fun (u : Tuple (T ⊕ K) (n + 1)) => u (Fin.last n)) (Multiset.filter (fun (u : Tuple (T ⊕ K) (n + 1)) => ∀ (k' : Fin n), u (Fin.castLE ⋯ k') = Sum.inl (v k')) ar.toComposite) = Multiset.map (fun (p : AnnotatedTuple T K n) => Sum.inr p.2) (Multiset.filter (fun (p : AnnotatedTuple T K n) => p.1 = v) ar)

Filtering ar.toComposite by "first-n columns match Sum.inl ∘ v" and projecting to the last column yields the Sum.inr-wrapped annotations of the matching entries of ar.

theorem AnnotatedRelation.dedup_toComposite_proj_first_n {T K : Type} [ValueType T] [DecidableEq K] {n : ℕ} (ar : AnnotatedRelation T K n) (h : n ≤ n + 1) :

(Multiset.map (fun (u : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u (Fin.castLE h k)) ar.toComposite).dedup = Multiset.map (fun (v : Tuple T n) (k : Fin n) => Sum.inl (v k)) (Multiset.map Prod.fst ar).dedup

The dedup of the first-n projection of ar.toComposite is the Sum.inl-image of the dedup of the first-projection of ar.

theorem groupByKey_key_iff {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] {n : ℕ} (ar : AnnotatedRelation T K n) (v : Tuple T n) :

(∃ (w : K), (v, w) ∈ ↑(groupByKey ar)) ↔ v ∈ Multiset.map Prod.fst ar

Auxiliary: key set of groupByKey ar equals first-projection keys of ar.

theorem groupByKey_value {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] {n : ℕ} (ar : AnnotatedRelation T K n) (v : Tuple T n) (w : K) :

(v, w) ∈ ↑(groupByKey ar) → w = (Multiset.map Prod.snd (Multiset.filter (fun (p : AnnotatedTuple T K n) => p.1 = v) ar)).sum

Auxiliary: if (v, w) ∈ (groupByKey ar).val, then w is the semiring-sum of annotations of entries in ar with key v.

theorem groupByKey_multiset_eq {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] {n : ℕ} (ar : AnnotatedRelation T K n) :

↑↑(groupByKey ar) = Multiset.map (fun (v : Tuple T n) => (v, (Multiset.map Prod.snd (Multiset.filter (fun (p : AnnotatedTuple T K n) => p.1 = v) ar)).sum)) (Multiset.map Prod.fst ar).dedup

groupByKey ar, as a multiset, equals the dedup of the first-projection of ar, with each key paired with the semiring-sum of annotations sharing that key.

Helper lemmas for the `Diff` case of `rewriting_valid` #

theorem Multiset.dedup_eq_of_instances {α : Type u_1} (inst₁ inst₂ : DecidableEq α) (m : Multiset α) :

m.dedup = m.dedup

Multiset.dedup only depends on the DecidableEq instance up to subsingleton equality.

theorem Multiset.filter_eq_of_instances {α : Type u_1} (p : α → Prop) (inst₁ inst₂ : DecidablePred p) (m : Multiset α) :

filter p m = filter p m

Multiset.filter only depends on the DecidablePred instance up to subsingleton equality.

theorem Filter.eval_foldr_and_map {T : Type} [ValueType T] {N : ℕ} {α : Type u_1} (list : List α) (f : α → Filter T N) (t : Tuple T N) :

(List.foldr (fun (t t' : Filter T N) => t.And t') True (List.map f list)).eval t ↔ ∀ x ∈ list, (f x).eval t

Folded Filter.And over a mapped list is equivalent to the universal conjunction.

theorem Query.rewriting_valid_joinCond_eval {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] [HasAltLinearOrder K] {N n : ℕ} [NeZero N] (t : Tuple (T ⊕ K) N) :

(List.foldr (fun (t t' : Filter (T ⊕ K) N) => t.And t') Filter.True (List.map (fun (k : ℕ) => Filter.BT ((fun (x y : Term (T ⊕ K) N) => BoolTerm.EQ x y) #(Fin.ofNat N k) #(Fin.ofNat N (k + n + 1)))) (List.range n))).eval t ↔ ∀ (k : Fin n), t (Fin.ofNat N ↑k) = t (Fin.ofNat N (↑k + n + 1))

The folded join condition (#k == #(k+n+1)) for k ∈ List.range n evaluates true iff the tuple's values at indices ofNat k and ofNat (k+n+1) agree for every k < n.

theorem Query.rewriting_valid_find_getD_eq_sum {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] {n : ℕ} (ar : AnnotatedRelation T K n) (u : Tuple T n) :

(Option.map Prod.snd (List.find? (fun (x : Tuple T n × K) => decide (x.1 = u)) ↑(groupByKey ar))).getD 0 = (Multiset.map Prod.snd (Multiset.filter (fun (p : AnnotatedTuple T K n) => p.1 = u) ar)).sum

Semiring-sum over the filter, via groupByKey.find?-based lookup.

theorem Query.rewriting_valid_sub_inr {T K : Type} [ValueType T] [HasAltLinearOrder K] [SemiringWithMonus K] (a b : K) :

Sum.inr a - Sum.inr b = Sum.inr (a - b)

Subtraction distributes over Sum.inr in T⊕K.

theorem AnnotatedRelation.toComposite_proj_first_n {T K : Type} [ValueType T] {n : ℕ} (ar : AnnotatedRelation T K n) (h : n ≤ n + 1) :

Multiset.map (fun (u : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u (Fin.castLE h k)) ar.toComposite = Multiset.map (fun (v : Tuple T n) (k : Fin n) => Sum.inl (v k)) (Multiset.map Prod.fst ar)

Non-dedup form of dedup_toComposite_proj_first_n: the first-n projection of ar.toComposite is the Sum.inl-lift of the first-projection of ar.

theorem Sum.inl_lift_injective {T K : Type} {n : ℕ} :

Function.Injective fun (v : Tuple T n) (k : Fin n) => inl (v k)

Sum.inl-lift of tuples is injective.

theorem Multiset.filter_notMem_map_of_injective {α : Type u_1} {β : Type u_2} [DecidableEq α] [DecidableEq β] {f : α → β} (hf : Function.Injective f) (m s : Multiset α) :

filter (fun (b : β) => b ∉ map f s) (map f m) = map f (filter (fun (a : α) => a ∉ s) m)

Filtering by "not a member of an injective image" pulls through the map.

theorem AnnotatedTuple.toComposite_castLE {T K : Type} [Zero K] {n : ℕ} (p : AnnotatedTuple T K n) (k : Fin n) :

p.toComposite (Fin.castLE ⋯ k) = Sum.inl (p.1 k)

Helper: the data part Tuple.fromComposite and AnnotatedTuple.toComposite agree on data.

theorem AnnotatedTuple.toComposite_last {T K : Type} [Zero K] {n : ℕ} (p : AnnotatedTuple T K n) :

p.toComposite (Fin.last n) = Sum.inr p.2

The annotation part of p.toComposite is Sum.inr p.2.

theorem Tuple.fromComposite_toComposite {T K : Type} [ValueType T] [Zero K] {n : ℕ} (p : AnnotatedTuple T K n) :

fromComposite p.toComposite = p

Roundtrip: Tuple.fromComposite ∘ AnnotatedTuple.toComposite = id. The composite encoding loses no information: peeling the data columns and the annotation column back out reconstructs the original annotated tuple.

theorem AnnotatedRelation.map_fromComposite_toComposite {T K : Type} [ValueType T] [Zero K] {n : ℕ} (r : AnnotatedRelation T K n) :

Multiset.map Tuple.fromComposite r.toComposite = r

Pushforward version of Tuple.fromComposite_toComposite: mapping Tuple.fromComposite over a composite-encoded annotated relation recovers the original annotated relation.

theorem Query.rewriting_valid_diff_inner_dd {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] [HasAltLinearOrder K] {n : ℕ} (AR₁ AR₂ : AnnotatedRelation T K n) :

(Multiset.filter (fun (u : Tuple (T ⊕ K) n) => u ∉ Multiset.map (fun (u' : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u' (Fin.castLE ⋯ k)) AR₂.toComposite) (Multiset.map (fun (u' : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u' (Fin.castLE ⋯ k)) AR₁.toComposite)).dedup = Multiset.map (fun (v : Tuple T n) (k : Fin n) => Sum.inl (v k)) (Multiset.filter (fun (v : Tuple T n) => v ∉ Multiset.map Prod.fst AR₂) (Multiset.map Prod.fst AR₁)).dedup

Reduction of the inner Dedup ∘ Diff ∘ Proj block of the Diff rewriting: deduping the difference of first-n projections of AR₁.toComposite and AR₂.toComposite yields the Sum.inl-lift of the deduped "unmatched-keys" filter over the data part. Stated using Fin.castLE (function form) and dot notation (.dedup) so the LHS pattern matches what simp only [evaluate] produces in the Diff case of rewriting_valid.

theorem Relation.cast_eq_map {T : Type} {n m : ℕ} (h : n = m) (r : Relation T n) :

cast h r = Multiset.map (Tuple.cast h) r

Relation.cast rewrites to a Multiset.map of Tuple.cast.

theorem proj_outer_cast_append_eq_fst {α : Type} {n : ℕ} (h : n + 1 + n = 2 * n + 1) (p : Tuple α (n + 1)) (q : Tuple α n) :

(fun (k : Fin (n + 1)) => Tuple.cast h (Fin.append p q) (Fin.castLE ⋯ k)) = p

Projecting the first n+1 columns of Tuple.cast h (Fin.append p q) (for p : Tuple α (n+1), q : Tuple α n, h : n+1+n = 2*n+1) returns p.

theorem cast_append_at_ofNat_left {α : Type} {n : ℕ} (h : n + 1 + n = 2 * n + 1) (p : Tuple α (n + 1)) (q : Tuple α n) (k : Fin n) [NeZero (2 * n + 1)] :

Tuple.cast h (Fin.append p q) (Fin.ofNat (2 * n + 1) ↑k) = p (Fin.castLE ⋯ k)

Reading Tuple.cast h (Fin.append p q) at index Fin.ofNat _ k.val (for k : Fin n) returns p k.castSucc.

theorem cast_append_at_ofNat_right {α : Type} {n : ℕ} (h : n + 1 + n = 2 * n + 1) (p : Tuple α (n + 1)) (q : Tuple α n) (k : Fin n) [NeZero (2 * n + 1)] :

Tuple.cast h (Fin.append p q) (Fin.ofNat (2 * n + 1) (↑k + n + 1)) = q k

Reading Tuple.cast h (Fin.append p q) at index Fin.ofNat _ (k.val+n+1) (for k : Fin n) returns q k.

theorem selFilter_cast_append_iff {T K : Type} [ValueType T] [SemiringWithMonus K] [HasAltLinearOrder K] {n : ℕ} (h : n + 1 + n = 2 * n + 1) (p : Tuple (T ⊕ K) (n + 1)) (q : Tuple (T ⊕ K) n) [NeZero (2 * n + 1)] :

(List.foldr (fun (t t' : Filter (T ⊕ K) (2 * n + 1)) => t.And t') Filter.True (List.map (fun (k : ℕ) => Filter.BT ((fun (x y : Term (T ⊕ K) (2 * n + 1)) => BoolTerm.EQ x y) #(Fin.ofNat (2 * n + 1) k) #(Fin.ofNat (2 * n + 1) (k + n + 1)))) (List.range n))).eval (Tuple.cast h (Fin.append p q)) ↔ (fun (k : Fin n) => p (Fin.castLE ⋯ k)) = q

selFilter on Tuple.cast h (Fin.append p q) characterises the first-n projection equality between p and q.

theorem cast_append_2n2_at_ofNat_left {α : Type} {n : ℕ} (h : n + 1 + (n + 1) = 2 * n + 2) (p q : Tuple α (n + 1)) (k : Fin n) [NeZero (2 * n + 2)] :

Tuple.cast h (Fin.append p q) (Fin.ofNat (2 * n + 2) ↑k) = p (Fin.castLE ⋯ k)

Arity-(2n+2) analogue of cast_append_at_ofNat_left: reading Tuple.cast h (Fin.append p q) at index Fin.ofNat _ k.val (for k : Fin n) returns p (k.castLE (Nat.le_succ n)). Here q : Tuple α (n+1) (rather than Tuple α n).

theorem cast_append_2n2_at_ofNat_right {α : Type} {n : ℕ} (h : n + 1 + (n + 1) = 2 * n + 2) (p q : Tuple α (n + 1)) (k : Fin n) [NeZero (2 * n + 2)] :

Tuple.cast h (Fin.append p q) (Fin.ofNat (2 * n + 2) (↑k + n + 1)) = q (Fin.castLE ⋯ k)

Arity-(2n+2) analogue of cast_append_at_ofNat_right: reading Tuple.cast h (Fin.append p q) at index Fin.ofNat _ (k.val+n+1) (for k : Fin n) returns q (k.castLE (Nat.le_succ n)).

theorem cast_append_2n2_at_ofNat_n {α : Type} {n : ℕ} (h : n + 1 + (n + 1) = 2 * n + 2) (p q : Tuple α (n + 1)) [NeZero (2 * n + 2)] :

Tuple.cast h (Fin.append p q) (Fin.ofNat (2 * n + 2) n) = p (Fin.last n)

Arity-(2n+2) helper: reading Tuple.cast h (Fin.append p q) at index Fin.ofNat _ n returns p (Fin.last n).

theorem cast_append_2n2_at_last {α : Type} {n : ℕ} (h : n + 1 + (n + 1) = 2 * n + 2) (p q : Tuple α (n + 1)) :

Tuple.cast h (Fin.append p q) (Fin.last (2 * n + 1)) = q (Fin.last n)

Arity-(2n+2) helper: reading Tuple.cast h (Fin.append p q) at index Fin.last (2*n+1) (the last index of Fin (2*n+2)) returns q (Fin.last n).

theorem proj_outer_2n2_cast_append_eq_fst {α : Type} {n : ℕ} (h : n + 1 + (n + 1) = 2 * n + 2) (p q : Tuple α (n + 1)) (k : Fin (n + 1)) :

Tuple.cast h (Fin.append p q) (Fin.castLE ⋯ k) = p k

Arity-(2n+2) projection helper: reading Tuple.cast h (Fin.append p q) at index k.castLE _ (for k : Fin (n+1)) returns p k. This is the analogue of proj_outer_cast_append_eq_fst for the 2n+2 case (i.e., q : Tuple α (n+1)).

theorem selFilter_cast_append_2n2_iff {T K : Type} [ValueType T] [SemiringWithMonus K] [HasAltLinearOrder K] {n : ℕ} (h : n + 1 + (n + 1) = 2 * n + 2) (p q : Tuple (T ⊕ K) (n + 1)) [NeZero (2 * n + 2)] :

(List.foldr (fun (t t' : Filter (T ⊕ K) (2 * n + 2)) => t.And t') Filter.True (List.map (fun (k : ℕ) => Filter.BT ((fun (x y : Term (T ⊕ K) (2 * n + 2)) => BoolTerm.EQ x y) #(Fin.ofNat (2 * n + 2) k) #(Fin.ofNat (2 * n + 2) (k + n + 1)))) (List.range n))).eval (Tuple.cast h (Fin.append p q)) ↔ (fun (k : Fin n) => p (Fin.castLE ⋯ k)) = fun (k : Fin n) => q (Fin.castLE ⋯ k)

Arity-(2n+2) analogue of selFilter_cast_append_iff: the join condition on Tuple.cast h (Fin.append p q) with q : Tuple (T⊕K) (n+1) characterises equality of the first-n projections of p and q.

theorem AnnotatedRelation.toComposite_filter {T K : Type} [ValueType T] [Zero K] {n : ℕ} (ar : AnnotatedRelation T K n) (pred : AnnotatedTuple T K n → Prop) [DecidablePred pred] :

toComposite (Multiset.filter pred ar) = Multiset.filter (fun (t : Tuple (T ⊕ K) (n + 1)) => pred t.fromComposite) ar.toComposite

Filter pushes through AnnotatedRelation.toComposite via the Tuple.fromComposite ∘ AnnotatedTuple.toComposite = id roundtrip: filtering before taking the composite encoding equals filtering the composite encoding by the same predicate composed with Tuple.fromComposite.

theorem Multiset.semijoin_proj_eq_filter {α : Type u_1} {β : Type u_2} [DecidableEq β] (r : Multiset α) (s : Multiset β) (g : α → β) (hs : s.Nodup) :

map Prod.fst (filter (fun (pair : α × β) => g pair.1 = pair.2) (r.product s)) = filter (fun (a : α) => g a ∈ s) r

Semijoin reduction. Given multisets r : Multiset α and s : Multiset β and a key function g : α → β, with s Nodup, the projection-after-filter of the cartesian product (keeping pairs whose g-image matches) coincides with filtering r to those a whose g a belongs to s. This is the multiset version of the relational semijoin and is the structural identity behind the unmatched_eq half of the Diff-case rewriting correctness.

theorem Multiset.semijoin_keyed_proj_eq_filter {α : Type u_1} {γ : Type u_2} {δ : Type u_3} {β : Type u_4} [DecidableEq β] (r : Multiset α) (S : Multiset β) (val : β → γ) (key_r : α → β) (key_s : γ → β) (mk : α → γ → δ) (hS : S.Nodup) (h_val : ∀ v ∈ S, key_s (val v) = v) :

map (fun (pair : α × γ) => mk pair.1 pair.2) (filter (fun (pair : α × γ) => key_r pair.1 = key_s pair.2) (r.product (map val S))) = map (fun (a : α) => mk a (val (key_r a))) (filter (fun (a : α) => key_r a ∈ S) r)

Keyed-projection semijoin. Generalizes Multiset.semijoin_proj_eq_filter in two directions: the right multiset is the image S.map val of a Nodup keyset S under a value function val : β → γ, and the projection is an arbitrary mk : α → γ → δ rather than Prod.fst. The compatibility hypothesis h_val asserts that key_s ∘ val is the identity on S (i.e., val reconstructs an element whose key_s-image is the original key). The semijoin then reduces to filtering r by key_r a ∈ S and projecting through mk a (val (key_r a)) (the unique matching γ-value). This is the structural identity behind the matched_eq half of the Diff-case rewriting correctness.

theorem Query.evaluate_agg_rewriting_eq {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] [HasAltLinearOrder K] {n : ℕ} (q : Query T n) (hq : q.noAgg) (d : AnnotatedDatabase T K) (ih : (q.evaluateAnnotated hq d).toComposite = (q.rewriting hq).evaluate d.toComposite) :

(Agg (fun (k : Fin n) => Fin.castLE ⋯ k) ![#(Fin.last n)] ![AggFunc.sum ] (q.rewriting hq)).evaluate d.toComposite = Multiset.map (fun (v : Tuple T n) => AnnotatedTuple.toComposite (v, (Multiset.map Prod.snd (Multiset.filter (fun (p : AnnotatedTuple T K n) => p.1 = v) (q.evaluateAnnotated hq d))).sum)) (Multiset.map Prod.fst (q.evaluateAnnotated hq d)).dedup

The Agg of q.rewriting (the inner aggregation used in both the Dedup and Diff rewritings) evaluates to a map over the deduped data-projection of the inner annotated relation, with each row paired (via AnnotatedTuple.toComposite) with the semiring sum of the matching annotations.

theorem Query.rewriting_valid_diff_inner_dd_inst {T K : Type} [ValueType T] [SemiringWithMonus K] [DecidableEq K] [HasAltLinearOrder K] {n : ℕ} (AR₁ AR₂ : AnnotatedRelation T K n) (instA : DecidableEq (Tuple (T ⊕ K) n)) (instDP : DecidablePred fun (u : Tuple (T ⊕ K) n) => u ∉ Multiset.map (fun (u' : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u' (Fin.castLE ⋯ k)) AR₂.toComposite) :

(Multiset.filter (fun (u : Tuple (T ⊕ K) n) => u ∉ Multiset.map (fun (u' : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u' (Fin.castLE ⋯ k)) AR₂.toComposite) (Multiset.map (fun (u' : Tuple (T ⊕ K) (n + 1)) (k : Fin n) => u' (Fin.castLE ⋯ k)) AR₁.toComposite)).dedup = Multiset.map (fun (v : Tuple T n) (k : Fin n) => Sum.inl (v k)) (Multiset.filter (fun (v : Tuple T n) => v ∉ Multiset.map Prod.fst AR₂) (Multiset.map Prod.fst AR₁)).dedup

Instance-polymorphic restatement of Query.rewriting_valid_diff_inner_dd. Inside the Diff case of rewriting_valid, Lean's instance synthesis picks inconsistent DecidableEq (T⊕K) instances at different positions in the goal: the inner Multiset.dedup is elaborated with LinearOrder.toDecidableEq (via ValueType (T⊕K)), while the surrounding Multiset.filter's decidableMem uses instDecidableEqSum. This wrapper accepts both as explicit parameters and bridges to the canonical helper via Subsingleton.elim.

theorem Query.rewriting_valid {T K : Type} {n : ℕ} [ValueType T] [SemiringWithMonus K] [DecidableEq K] [HasAltLinearOrder K] (q : Query T n) (hq : q.noAgg) (d : AnnotatedDatabase T K) :

(q.evaluateAnnotated hq d).toComposite = (q.rewriting hq).evaluate d.toComposite

(R5) Rewriting of top-level aggregation #

The aggregation rewriting rule (R5) of Sen, Maniu & Senellart, ProvSQL:

γ_{i₁,…,iₘ}[t₁ : f₁, …, tₙ : fₙ](q) is rewritten to γ_{i₁,…,iₘ}[t₁ * #(k+1) : f̂₁, …, tₙ * #(k+1) : f̂ₙ, #(k+1) : δ(⊕)](q).

Unlike (R1)–(R4), which keep the rewriting target in Query (T ⊕ K) and the standard evaluate semantics, (R5)'s rewritten query is interpreted in the K-semimodule V_K – the per-column term t_j * #(k+1) evaluates to a K-tensor monomial α ⊗ v_j, not to a plain T ⊕ K value. The companion evaluator Query.evaluateInVK (in Provenance.QueryEvaluateInVK) carries that interpretation.

Query.rewritingAgg here implements the rewriting syntactically as a Query (T ⊕ K). Its semantic correctness – the analogue of rewriting_valid stating that ⟪Agg ...⟫_Î matches evaluateInVK (rewritingAgg ...) Î.toComposite – is proved as the (R5) case of Query.rewriting_valid_full (in Provenance.QueryEvaluateInVK), packaged together with the (R1)–(R4) correctness. The R4 sorries in Query.rewriting_valid for the diff case are carried over there as the only remaining gap.

def Query.rewritingAgg {T K : Type} [ValueType T] {m n₁ n₂ : ℕ} (is : Tuple (Fin m) n₁) (ts : Tuple (Term T m) n₂) (as : Tuple AggFunc n₂) (q_inner : Query T m) (hq_inner : q_inner.noAgg) :

Query (T ⊕ K) (n₁ + n₂ + 1)

(R5) Top-level aggregation rewriting. Produces a plain Query (T ⊕ K) representing γ_{is}[t_j * #(k+1) : f̂_j, #(k+1) : δ(⊕)](q.rewriting).

The inner query q is required to be noAgg (the ICDE paper imposes aggregation-at-root); q.rewriting handles its (R1)–(R4) rewriting and the resulting query operates on tuples of arity m+1 (the original m data columns plus one annotation column). The output Agg has n₁ grouping columns, n₂+1 aggregated columns (the original n₂ plus the δ(⊕) annotation column at the end), and arity n₁ + n₂ + 1.

Equations

One or more equations did not get rendered due to their size.

Instances For

Unified rewriting for queries with at-most-top-level aggregation #

def Query.wellFormed {T : Type} {n : ℕ} :

Query T n → Prop

A query is well-formed for rewriting if it has no aggregation at all (corresponding to the (R1)–(R4) cases) or if it is a top-level aggregation whose inner query has no aggregation (the (R5) case). This is the structural precondition imposed by the ICDE paper, which restricts the aggregation operator to the root of a query plan.

Equations

(Query.Agg a a_1 a_2 q_inner).wellFormed = q_inner.noAgg
x✝.wellFormed = x✝.noAgg

Instances For

def Query.rewritingFull {T K : Type} [ValueType T] {n : ℕ} (q : Query T n) :

q.wellFormed → Query (T ⊕ K) (n + 1)

Unified rewriting: dispatches between the (R1)–(R4) rewriting for non-aggregating queries and the (R5) rewriting for top-level aggregations. The single function realises the rewriting rules (R1)–(R5) of Sen, Maniu & Senellart together.

Equations

(Query.Agg is ts as q_inner).rewritingFull h = Query.rewritingAgg is ts as q_inner h
(Query.Rel x s).rewritingFull h = (Query.Rel x s).rewriting h
(Π ts q').rewritingFull h = (Π ts q').rewriting h
(σ φ q').rewritingFull h = (σ φ q').rewriting h
(q₁ × q₂).rewritingFull h = (q₁ × q₂).rewriting h
(q₁ ⊎ q₂).rewritingFull h = (q₁ ⊎ q₂).rewriting h
(ε q').rewritingFull h = (ε q').rewriting h
(q₁ - q₂).rewritingFull h = (q₁ - q₂).rewriting h

Instances For