- Relational Algebra Equivalences

Relational Algebra Equivalences

February 11, 2021

Garcia-Molina/Ullman/Widom: Ch. 16.2

The running theme

If X and Y are equivalent and Y is better,
then replace all Xs with Ys

Today's focus: Provable Equivalence for RA Expressions

Ground Rules

Only Instances Matter: Obviously $Q_1 \neq Q_2$ . What we care about is whether $Q_1(R) = Q_2(R)$ ...
Data Independent: ... for all valid input data $R$ .; However, it's fair to talk about equivalence when we know the data has some properties. (more on this later)
Data-Model Dependent: It's important to be clear whether we're talking about sets, bags, or lists.
Attribute Order Doesn't matter: A, B, C is the same as C, B, A

Starting Rules

Selection
$\sigma_{c_1 \wedge c_2}(R) \equiv \sigma_{c_1}(\sigma_{c_2}(R))$	(Decomposability)
Projection
$\pi_{A}(R) \equiv \pi_{A}(\pi_{A \cup B}(R))$	(Idempotence)
Cross Product
$R \times (S \times T) \equiv (R \times S) \times T$	(Associativity)
$R \times S \equiv S \times R$	(Commutativity)
Union
$R \cup (S \cup T) \equiv (R \cup S) \cup T$	(Associativity)
$R \cup S \equiv S \cup R$	(Commutativity)

Cross-Operator Rules

Selection + Projection
$\pi_{A}(\sigma_{c}(R)) \equiv \sigma_{c}(\pi_{A}(R))$	(Commutativity)

Selection + Cross Product
$\sigma_c(R \times S) \equiv (\sigma_{c}(R)) \times S$	(Commutativity)

Projection + Cross Product
$\pi_A(R \times S) \equiv (\pi_{A_R}(R)) \times (\pi_{A_S}(S))$	(Commutativity)

Intersection
$R \cap (S \cap T) \equiv (R \cap S) \cap T$	(Associativity)
$R \cap S \equiv S \cap R$	(Commutativity)
Selection +
$\sigma_c(R \cup S) \equiv (\sigma_c(R)) \cup (\sigma_c(R))$	(Commutativity)
$\sigma_c(R \cap S) \equiv (\sigma_c(R)) \cap (\sigma_c(R))$	(Commutativity)
Projection + Union
$\pi_A(R \cup S) \equiv (\pi_A(R)) \cup (\pi_A(R))$	(Commutativity)
Cross Product + Union
$R \times (S \cup T) \equiv (R \times S) \cup (R \times T)$	(Distributivity)

Selection Pushdown: Always commute Selections as close to the leaves as possible.
Join Construction: Joins are always better than cross-products. (if there's a good join algorithm)
(Optional) Projection Pushdown: Commuting Projections down to the leaves removes redundant columns, and may be beneficial for some systems.
Join Algorithm Selection: Joins can be implemented differently, depending on the join predicate.
Join/Union Ordering: The order in which joins are evaluated may affect query runtimes.
Access Paths: $(\sigma_c(R))$ and $(Q(\ldots) \bowtie_c R)$ are special cases that we can make fast!

Some rewrites are situational... we need more information to decide when to apply them.

$\sigma_{c}(\pi_{A}(R)) \Rightarrow \pi_{A}(\sigma_{c}(R))$

(note: $c$ is always compatible in this direction)


  plan.transform {
    case Filter(condition, Project(columns, child)) => 
      Project(columns, Filter(condition, child))
  }

match/case lets you find patterns.

transform lets you apply rewrite rules.

(Slight oversimplification since Spark uses extended relational algebra)

What happens if I apply this rewrite to:


  Filter(condition, Project(columns1, Project(columns2, child)))

↕

$\sigma_c(\pi_{A_1}(\pi_{A_2}(R)))$

⇓ $\pi_{A_1}(\sigma_c(\pi_{A_2}(R)))$

Naive Solution


  var last = null
  while( ! plan.equals(last) ){
    last = plan
    plan = plan.transform { ... }
  }

Repeat until we reach a "fixed-point"

Spark


  plan.transformDown {
    case Filter(condition, Project(columns, child)) => 
      Project(columns, Filter(condition, child))
  }

transformUp: Require bottom-up tree traversal.

transformDown: Require top-down tree traversal.

Spark

$\sigma_c(R \cup S) \Rightarrow (\sigma_c(R)) \cup (\sigma_c(R))$


  plan.transformDown {
    case Filter(condition, Union(children, /* other goop */)) => 
      Union(
        children.map { child => 
          Filter(condition, child)
        },
        /* other goop */
      )
  }

Relational Algebra Equivalences February 11, 2021 Garcia-Molina/Ullman/Widom: Ch. 16.2