February 11, 2021
If X and Y are equivalent and Y is better,
then replace all Xs with Ys
Today's focus: Provable Equivalence for RA Expressions
We say that Q1≡Q2 if and only if
we can guarantee that the bag of tuples produced by Q1(R,S,T,…)
is the same as the bag of tuples produced by Q2(R,S,T,…)
for any combination of valid inputs R,S,T,….
... that satisfy any necessary properties.
Selection | |
---|---|
σc1∧c2(R)≡σc1(σc2(R)) | (Decomposability) |
Projection | |
πA(R)≡πA(πA∪B(R)) | (Idempotence) |
Cross Product | |
R×(S×T)≡(R×S)×T | (Associativity) |
R×S≡S×R | (Commutativity) |
Union | |
R∪(S∪T)≡(R∪S)∪T | (Associativity) |
R∪S≡S∪R | (Commutativity) |
Show that R×(S×T)≡T×(S×R)
Show that σc1(σc2(R))≡σc2(σc1(R))
Show that R⋈cS≡S⋈cR
Show that σR.B=S.B∧R.A>3(R×S)≡σR.A>3(R⋈BS)
Selection + Projection | |
---|---|
πA(σc(R))≡σc(πA(R)) | (Commutativity) |
... but only if A and c are compatible
A must include all columns referenced by c (cols(c))
Show that πA(σc(R))≡πA(σc(π(A∪cols(c))(R)))
Selection + Cross Product | |
---|---|
σc(R×S)≡(σc(R))×S | (Commutativity) |
... but only if c references only columns of R
cols(c)⊆cols(R)
Show that σR.B=S.B∧R.A>3(R×S)≡(σR.A>3(R))⋈BS
Projection + Cross Product | |
---|---|
πA(R×S)≡(πAR(R))×(πAS(S)) | (Commutativity) |
... where AR and AS are the columns of A from R and S respectively.
AR=A∩cols(R) AS=A∩cols(S)
Show that πA(R⋈cS)≡(πAR(R))⋈c(πAS(S))
Intersection | |
---|---|
R∩(S∩T)≡(R∩S)∩T | (Associativity) |
R∩S≡S∩R | (Commutativity) |
Selection + | |
σc(R∪S)≡(σc(R))∪(σc(R)) | (Commutativity) |
σc(R∩S)≡(σc(R))∩(σc(R)) | (Commutativity) |
Projection + Union | |
πA(R∪S)≡(πA(R))∪(πA(R)) | (Commutativity) |
Cross Product + Union | |
R×(S∪T)≡(R×S)∪(R×T) | (Distributivity) |
SELECT R.A, T.E
FROM R, S, T
WHERE R.B = S.B
AND S.C < 5
AND S.D = T.D
➔
Input: Dumb translation of SQL to RA
⬇︎
Apply rewrites
⬇︎
Output: Better, but equivalent query
Which rewrite rules should we apply?
Some rewrites are situational... we need more information to decide when to apply them.
(note: c is always compatible in this direction)
plan.transform {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}
match/case lets you find patterns.
transform lets you apply rewrite rules.
(Slight oversimplification since Spark uses extended relational algebra)
What happens if I apply this rewrite to:
Filter(condition, Project(columns1, Project(columns2, child)))
↕
σc(πA1(πA2(R)))
⇓ πA1(σc(πA2(R)))
var last = null
while( ! plan.equals(last) ){
last = plan
plan = plan.transform { ... }
}
Repeat until we reach a "fixed-point"
plan.transformDown {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}
transformUp: Require bottom-up tree traversal.
transformDown: Require top-down tree traversal.
plan.transformDown {
case Filter(condition, Union(children, /* other goop */)) =>
Union(
children.map { child =>
Filter(condition, child)
},
/* other goop */
)
}