CSE-4/562 Spring 2021 - Relational Algebra Equivalences

### Relational Algebra Equivalences

#### CSE-4/562 Spring 2021

February 11, 2021

### The running theme

If X and Y are equivalent and Y is better,
then replace all Xs with Ys

Today's focus: Provable Equivalence for RA Expressions

### Equivalence

$$Q_1 = \pi_{A}\left( \sigma_{c}( R ) \right)$$ $$Q_2 = \sigma_{c}\left( \pi_{A}( R ) \right)$$
$$Q_1 \stackrel{?}{\equiv} Q_2$$

### Ground Rules

Only Instances Matter
Obviously $Q_1 \neq Q_2$. What we care about is whether $Q_1(R) = Q_2(R)$...
Data Independent
... for all valid input data $R$.
However, it's fair to talk about equivalence when we know the data has some properties. (more on this later)
Data-Model Dependent
It's important to be clear whether we're talking about sets, bags, or lists.
Attribute Order Doesn't matter
A, B, C is the same as C, B, A

### In summary...

We say that $Q_1 \equiv Q_2$ if and only if
we can guarantee that the bag of tuples produced by $Q_1(R, S, T, \ldots)$
is the same as the bag of tuples produced by $Q_2(R, S, T, \ldots)$
for any combination of valid inputs $R, S, T, \ldots$.

... that satisfy any necessary properties.

### Starting Rules

Selection
$\sigma_{c_1 \wedge c_2}(R) \equiv \sigma_{c_1}(\sigma_{c_2}(R))$ (Decomposability)
Projection
$\pi_{A}(R) \equiv \pi_{A}(\pi_{A \cup B}(R))$ (Idempotence)
Cross Product
$R \times (S \times T) \equiv (R \times S) \times T$ (Associativity)
$R \times S \equiv S \times R$ (Commutativity)
Union
$R \cup (S \cup T) \equiv (R \cup S) \cup T$ (Associativity)
$R \cup S \equiv S \cup R$ (Commutativity)

### Try it!

Show that $$R \times (S \times T) \equiv T \times (S \times R)$$

Show that $$\sigma_{c_1}(\sigma_{c_2}(R)) \equiv \sigma_{c_2}(\sigma_{c_1}(R))$$

Show that $$R \bowtie_{c} S \equiv S \bowtie_{c} R$$

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv \sigma_{R.A > 3}(R \bowtie_{B} S)$$

### Cross-Operator Rules

Selection + Projection
$\pi_{A}(\sigma_{c}(R)) \equiv \sigma_{c}(\pi_{A}(R))$ (Commutativity)

... but only if $A$ and $c$ are compatible

$A$ must include all columns referenced by $c$ ($cols(c)$)

### Try it!

Show that $$\pi_A(\sigma_c(R)) \equiv \pi_A(\sigma_c(\pi_{(A \cup cols(c))}(R)))$$

Selection + Cross Product
$\sigma_c(R \times S) \equiv (\sigma_{c}(R)) \times S$ (Commutativity)

... but only if $c$ references only columns of $R$

$cols(c) \subseteq cols(R)$

### Try it!

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv (\sigma_{R.A > 3}(R)) \bowtie_{B} S$$

When is this rewrite a good idea?

Projection + Cross Product
$\pi_A(R \times S) \equiv (\pi_{A_R}(R)) \times (\pi_{A_S}(S))$ (Commutativity)

... where $A_R$ and $A_S$ are the columns of $A$ from $R$ and $S$ respectively.

$A_R = A \cap cols(R)$     $A_S = A \cap cols(S)$

### Try it!

Show that $$\pi_{A}(R \bowtie_c S) \equiv (\pi_{A_R}(R)) \bowtie_c (\pi_{A_S}(S))$$

When does this condition hold?

Intersection
$R \cap (S \cap T) \equiv (R \cap S) \cap T$ (Associativity)
$R \cap S \equiv S \cap R$ (Commutativity)
Selection +
$\sigma_c(R \cup S) \equiv (\sigma_c(R)) \cup (\sigma_c(R))$ (Commutativity)
$\sigma_c(R \cap S) \equiv (\sigma_c(R)) \cap (\sigma_c(R))$ (Commutativity)
Projection + Union
$\pi_A(R \cup S) \equiv (\pi_A(R)) \cup (\pi_A(R))$ (Commutativity)
Cross Product + Union
$R \times (S \cup T) \equiv (R \times S) \cup (R \times T)$ (Distributivity)

### Example


SELECT R.A, T.E

FROM R, S, T

WHERE R.B = S.B
AND S.C < 5
AND S.D = T.D             ### General Query Optimizers

Input: Dumb translation of SQL to RA

⬇︎

Apply rewrites

⬇︎

Output: Better, but equivalent query

Which rewrite rules should we apply?

Selection Pushdown
Always commute Selections as close to the leaves as possible.
Join Construction
Joins are always better than cross-products. (if there's a good join algorithm)
(Optional) Projection Pushdown
Commuting Projections down to the leaves removes redundant columns, and may be beneficial for some systems.
Join Algorithm Selection
Joins can be implemented differently, depending on the join predicate.
Join/Union Ordering
The order in which joins are evaluated may affect query runtimes.
Access Paths
$(\sigma_c(R))$ and $(Q(\ldots) \bowtie_c R)$ are special cases that we can make fast!

Some rewrites are situational... we need more information to decide when to apply them.

### General Query Optimizers

1. Apply blind heuristics (e.g., push down selections)
2. Enumerate all possible execution plans by varying (or for a reasonable subset)
• Join/Union Evaluation Order (commutativity, associativity, distributivity)
• Algorithms for Joins, Aggregates, Sort, Distinct, and others
• Data Access Paths
3. Estimate the cost of each execution plan
4. Pick the execution plan with the lowest cost (or best risk/reward)

## In Spark...

$$\sigma_{c}(\pi_{A}(R)) \Rightarrow \pi_{A}(\sigma_{c}(R))$$

(note: $c$ is always compatible in this direction)


plan.transform {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}


match/case lets you find patterns.

transform lets you apply rewrite rules.

(Slight oversimplification since Spark uses extended relational algebra)

What happens if I apply this rewrite to:


Filter(condition, Project(columns1, Project(columns2, child)))

↕ $$\sigma_c(\pi_{A_1}(\pi_{A_2}(R)))$$

⇓ $$\pi_{A_1}(\sigma_c(\pi_{A_2}(R)))$$

### Naive Solution


var last = null
while( ! plan.equals(last) ){
last = plan
plan = plan.transform { ... }
}


Repeat until we reach a "fixed-point"

### Spark


plan.transformDown {
case Filter(condition, Project(columns, child)) =>
Project(columns, Filter(condition, child))
}


transformUp: Require bottom-up tree traversal.

transformDown: Require top-down tree traversal.

### Spark

$$\sigma_c(R \cup S) \Rightarrow (\sigma_c(R)) \cup (\sigma_c(R))$$

plan.transformDown {
case Filter(condition, Union(children, /* other goop */)) =>
Union(
children.map { child =>
Filter(condition, child)
},
/* other goop */
)
}

Next time on Bat-Database Systems:
Checkpoint 1 overview.