CSE-4/562 Spring 2019 - Relational Algebra Equivalences

February 6, 2019

### The running theme

If X and Y are equivalent and Y is better,
then replace all Xs with Ys

Today's focus: Provable Equivalence for RA Expressions

### Equivalence

$$Q_1 = \pi_{A}\left( \sigma_{c}( R ) \right)$$ $$Q_2 = \sigma_{c}\left( \pi_{A}( R ) \right)$$
$$Q_1 \stackrel{?}{\equiv} Q_2$$

### Ground Rules

Only Relational Values Matter
Obviously $Q_1 \neq Q_2$. What we care about is whether $Q_1(R) = Q_2(R)$...
Data Independent
... for all valid input data $R$.
However, it's fair to talk about equivalence when we know the data has some properties. (more on this later)
Data-Model Dependent
It's important to be clear whether we're talking about sets, bags, or lists.

### In summary...

We say that $Q_1 \equiv Q_2$ if and only if
we can guarantee that the bag of tuples produced by $Q_1(R, S, T, \ldots)$
is the same as the bag of tuples produced by $Q_2(R, S, T, \ldots)$
for any combination of valid inputs $R, S, T, \ldots$.

... that satisfy any necessary properties.

### Starting Rules

Selection
$\sigma_{c_1 \wedge c_2}(R) \equiv \sigma_{c_1}(\sigma_{c_2}(R))$ (Decomposability)
Projection
$\pi_{A}(R) \equiv \pi_{A}(\pi_{A \cup B}(R))$ (Idempotence)
Cross Product
$R \times (S \times T) \equiv (R \times S) \times T$ (Associativity)
$R \times S \equiv S \times R$ (Commutativity)
Union
$R \cup (S \cup T) \equiv (R \cup S) \cup T$ (Associativity)
$R \cup S \equiv S \cup R$ (Commutativity)

### Try it!

Show that $$R \times (S \times T) \equiv T \times (S \times R)$$

Show that $$\sigma_{c_1}(\sigma_{c_2}(R)) \equiv \sigma_{c_2}(\sigma_{c_1}(R))$$

Show that $$R \bowtie_{c} S \equiv S \bowtie_{c} R$$

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv \sigma_{R.A > 3}(R \bowtie_{B} S)$$

### Rules for Multiple Operators

Selection + Projection
$\pi_{A}(\sigma_{c}(R)) \equiv \sigma_{c}(\pi_{A}(R))$ (Commutativity)

... but only if $A$ and $c$ are compatible

$A$ must include all columns referenced by $c$ ($cols(c)$)

### Try it!

Show that $$\pi_A(\sigma_c(R)) \equiv \pi_A(\sigma_c(\pi_{(A \cup cols(c))}(R)))$$

Selection + Cross Product
$\sigma_c(R \times S) \equiv (\sigma_{c}(R)) \times S$ (Commutativity)

... but only if $c$ references only columns of $R$

$cols(c) \subseteq cols(R)$

### Try it!

Show that $$\sigma_{R.B = S.B \wedge R.A > 3}(R \times S) \equiv (\sigma_{R.A > 3}(R)) \bowtie_{B} S$$

When is this rewrite a good idea?

Projection + Cross Product
$\pi_A(R \times S) \equiv (\pi_{A_R}(R)) \times (\pi_{A_S}(S))$ (Commutativity)

... where $A_R$ and $A_S$ are the columns of $A$ from $R$ and $S$ respectively.

$A_R = A \cap cols(R)$     $A_S = A \cap cols(S)$

### Try it!

Show that $$\pi_{A}(R \bowtie_c S) \equiv (\pi_{A_R}(R)) \bowtie_c (\pi_{A_S}(S))$$

When does this condition hold?

Intersection
$R \cap (S \cap T) \equiv (R \cap S) \cap T$ (Associativity)
$R \cap S \equiv S \cap R$ (Commutativity)
Selection +
$\sigma_c(R \cup S) \equiv (\sigma_c(R)) \cup (\sigma_c(R))$ (Commutativity)
$\sigma_c(R \cap S) \equiv (\sigma_c(R)) \cap (\sigma_c(R))$ (Commutativity)
Projection +
$\pi_A(R \cup S) \equiv (\pi_A(R)) \cup (\pi_A(R))$ (Commutativity)
$\pi_A(R \cap S) \equiv (\pi_A(R)) \cap (\pi_A(R))$ (Commutativity)
Cross Product + Union
$R \times (S \cup T) \equiv (R \times S) \cup (R \times T)$ (Distributivity)

### Example


SELECT R.A, T.E

FROM R, S, T

WHERE R.B = S.B
AND S.C < 5
AND S.D = T.D


### General Query Optimizers

Input: Dumb translation of SQL to RA

⬇︎

Apply rewrites

⬇︎

Output: Better, but equivalent query

Which rewrite rules should we apply?

Selection Pushdown
Always commute Selections as close to the leaves as possible.
Join Construction
Joins are always better than cross-products. (if there's a good join algorithm)
(Optional) Projection Pushdown
Commuting Projections down to the leaves removes redundant columns, and may be beneficial for some systems.
Join Algorithm Selection
Joins can be implemented differently, depending on the join predicate.
Join/Union Ordering
The order in which joins are evaluated may affect query runtimes.
Access Paths
$(\sigma_c(R))$ and $(Q(\ldots) \bowtie_c R)$ are special cases that we can make fast!

Some rewrites are situational... we need more information to decide when to apply them.

### General Query Optimizers

1. Apply blind heuristics (e.g., push down selections)
2. Enumerate all possible execution plans by varying (or for a reasonable subset)
• Join/Union Evaluation Order (commutativity, associativity, distributivity)
• Algorithms for Joins, Aggregates, Sort, Distinct, and others
• Data Access Paths
3. Estimate the cost of each execution plan
4. Pick the execution plan with the lowest cost
Next Class: Extended Relational Algebra and Basic Join Algorithms