- Exam Review

Exam Format

You will need:

A smartphone or second device with a camera and zoom installed.
A quiet, well lit area.
A view of your work surface, you, and your computer screen.

Relational Algebra

We start with a database instance with a fixed schema

Queries are applied to Relations $Q(\textbf{Trees}, \textbf{SpeciesInfo})$

Queries are also Relations! $Q_2(\textbf{SpeciesInfo}, Q_1(\textbf{Trees}))$ (Relational Algebra is Closed)

Relational Algebra

Operation	Sym	Meaning
Selection	$\sigma$	Select a subset of the input rows
Projection	$\pi$	Delete unwanted columns
Cross-product	$\times$	Combine two relations
Set-difference	$-$	Tuples in Rel 1, but not Rel 2
Union	$\cup$	Tuples either in Rel 1 or in Rel 2
Intersection	$\cap$	Tuples in both Rel 1 and Rel 2
Join	$\bowtie$	Pairs of tuples matching a specified condition
Division	$/$	"Inverse" of cross-product
Sort	$\tau_A$	Sort records by attribute(s) $A$
Limit	$\texttt{LIMIT}_N$	Return only the first $N$ records (according to sort order if paired with sort).

Rule	Notes
$\sigma_{C_1\wedge C_2}(R) \equiv \sigma_{C_1}(\sigma_{C_2}(R))$
$\sigma_{C_1\vee C_2}(R) \equiv \sigma_{C_1}(R) \cup \sigma_{C_2}(R)$	Note, this is only true for set, not bag union
$\sigma_C(R \times S) \equiv R \bowtie_C S$
$\sigma_C(R \times S) \equiv \sigma_C(R) \times S$	If $C$ references only $R$ 's attributes, also works for joins
$\pi_{A}(\pi_{A \cup B}(R)) \equiv \pi_{A}(R)$
$\sigma_C(\pi_{A}(R)) \equiv \pi_A(\sigma_C(R))$	If $A$ contains all of the attributes referenced by $C$
$\pi_{A\cup B}(R\times S) \equiv \pi_A(R) \times \pi_B(S)$	Where $A$ (resp., $B$ ) contains attributes in $R$ (resp., $S$ )
$R \times (S \times T) \equiv (R \times S) \times T$	Also works for joins
$R \times S \equiv S \times R$	Also works for joins
$R \cup (S \cup T) \equiv (R \cup S) \cup T$	Also works for intersection and bag-union
$R \cup S \equiv S \cup R$	Also works for intersections and bag-union
$\sigma_{C}(R \cup S) \equiv \sigma_{C}(R) \cup \sigma_{C}(S)$	Also works for intersections and bag-union
$\pi_{A}(R \cup S) \equiv \pi_{A}(R) \cup \pi_{A}(S)$	Also works for intersections and bag-union
$\sigma_{C}(\gamma_{A, AGG}(R)) \equiv \gamma_{A, AGG}(\sigma_{C}(R))$	If $A$ contains all of the attributes referenced by $C$

Operation	RA	Total IOs (#pages)	Memory (#tuples)
Table Scan	$R$	$\frac{\|R\|}{\mathcal P}$	$O(1)$
Projection	$\pi(R)$	$\textbf{io}(R)$	$O(1)$
Selection	$\sigma(R)$	$\textbf{io}(R)$	$O(1)$
Union	$R \uplus S$	$\textbf{io}(R) + \textbf{io}(S)$	$O(1)$
Sort (In-Mem)	$\tau(R)$	$\textbf{io}(R)$	$O(\|R\|)$
Sort (On-Disk)	$\tau(R)$	$\frac{2 \cdot \lfloor log_{\mathcal B}(\|R\|) \rfloor}{\mathcal P} + \textbf{io}(R)$	$O(\mathcal B)$
(B+Tree) Index Scan	$Index(R, c)$	$\log_{\mathcal I}(\|R\|) + \frac{\|\sigma_c(R)\|}{\mathcal P}$	$O(1)$
(Hash) Index Scan	$Index(R, c)$	$1$	$O(1)$

Tuples per Page ( $\mathcal P$ ) – Normally defined per-schema
Size of $R$ ( $|R|$ )
Pages of Buffer ( $\mathcal B$ )
Keys per Index Page ( $\mathcal I$ )

Operation	RA	Total IOs (#pages)	Mem (#tuples)
Nested Loop Join (Buffer $S$ in mem)	$R \times_{mem} S$	$\textbf{io}(R)+\textbf{io}(S)$	$O(\|S\|)$
Block NLJ (Buffer $S$ on disk)	$R \times_{disk} S$	$\frac{\|R\|}{\mathcal B} \cdot \frac{\|S\|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$	$O(1)$
Block NLJ (Recompute $S$ )	$R \times_{redo} S$	$\textbf{io}(R) + \frac{\|R\|}{\mathcal B} \cdot \textbf{io}(S)$	$O(1)$
1-Pass Hash Join	$R \bowtie_{1PH, c} S$	$\textbf{io}(R) + \textbf{io}(S)$	$O(\|S\|)$
2-Pass Hash Join	$R \bowtie_{2PH, c} S$	$\frac{2\|R\| + 2\|S\|}{\mathcal P} + \textbf{io}(R) + \textbf{io}(S)$	$O(1)$
Sort-Merge Join	$R \bowtie_{SM, c} S$	[Sort]	[Sort]
(Tree) Index NLJ	$R \bowtie_{INL, c}$	$\|R\| \cdot (\log_{\mathcal I}(\|S\|) + \frac{\|\sigma_c(S)\|}{\mathcal P})$	$O(1)$
(Hash) Index NLJ	$R \bowtie_{INL, c}$	$\|R\| \cdot 1$	$O(1)$
(In-Mem) Aggregate	$\gamma_A(R)$	$\textbf{io}(R)$	$adom(A)$
(Sort/Merge) Aggregate	$\gamma_A(R)$	[Sort]	[Sort]

Tuples per Page ( $\mathcal P$ ) – Normally defined per-schema
Size of $R$ ( $|R|$ )
Pages of Buffer ( $\mathcal B$ )
Keys per Index Page ( $\mathcal I$ )
Number of distinct values of $A$ ( $adom(A)$ )

Operator	RA	Estimated Size
Table	$R$	$\|R\|$
Projection	$\pi(Q)$	$\|Q\|$
Union	$Q_1 \uplus Q_2$	$\|Q_1\| + \|Q_2\|$
Cross Product	$Q_1 \times Q_2$	$\|Q_1\| \times \|Q_2\|$
Sort	$\tau(Q)$	$\|Q\|$
Limit	$\texttt{LIMIT}_N(Q)$	$N$
Selection	$\sigma_c(Q)$	$\|Q\| \times \texttt{SEL}(c, Q)$
Join	$Q_1 \bowtie_c Q_2$	$\|Q_1\| \times \|Q_2\| \times \texttt{SEL}(c, Q_1\times Q_2)$
Distinct	$\delta_A(Q)$	$\texttt{UNIQ}(A, Q)$
Aggregate	$\gamma_{A, B \leftarrow \Sigma}(Q)$	$\texttt{UNIQ}(A, Q)$

$\texttt{SEL}(c, Q)$ : Selectivity of $c$ on $Q$ , or $\frac{|\sigma_c(Q)|}{|Q|}$
$\texttt{UNIQ}(A, Q)$ : # of distinct values of $A$ in $Q$ .

Flips	Score	Probability	E[# Games]
(👽)	0	0.5	2
(🐕)(👽)	1	0.25	4
(🐕)(🐕)(👽)	2	0.125	8
(🐕) $\times N$ (👽)	$N$	$\frac{1}{2^{N+1}}$	$2^{N+1}$

If I told you that in a series of games, my best score was $N$ , you might expect that I played $2^{N+1}$ games.

To do that, I only need to track my top score!

Count Sketches

Pick a number of "trials" and a number of "bins"
For each record Oi
1. For each "trial" j
  1. Use a hash function $h_j(O_i)$ to pick a bin
  2. Add a $\pm 1$ value determined by hash function $\delta_j(O_i)$ to the bin

For each trial

$j$ , estimate the count of

$O_i$ by the value of bin

$h_j(O_i)$

Take the median value for all trials.

Count-Min Sketches

Pick a number of "trials" and a number of "bins"
For each record Oi
1. For each "trial" j
  1. Use a hash function $h_j(O_i)$ to pick a bin
  2. Add 1 to the bin

For each trial

$j$ , estimate the count of

$O_i$ by the value of bin

$h_j(O_i)$

Take the minimum value for all trials.

Exam Review

Exam Format

Day-Of

Record Layouts

Record Layout 1: Fixed

Record Layout 2: Delimiters

Record Layout 2: Headers

Relational Algebra

Relational Algebra

RA Equivs

Algorithms

Nested-Loop Join

Block-Nested Loop Join

Strategies for Implementing $R \bowtie_{R.A = S.A} S$

Sort/Merge Join

Sort/Merge Join

1-Pass Hash Join

2-Pass Hash Join

Index Nested Loop Join

Basic Aggregate Pattern

Basic Aggregate Types

Grouping Algorithms

Accounting

(Some) Estimation Techniques

Sketching

Flajolet-Martin Sketches

( $\approx$ HyperLogLog)

Count Sketches

Count-Min Sketches