May 4, 2021
You will need:
SELECT [DISTINCT] targetlist
FROM relationlist
WHERE condition
This is the least efficient strategy to compute a query! A good optimizer will find more efficient strategies to compute the same answer.
We start with a database instance with a fixed schema
Queries are applied to Relations Q(Trees,SpeciesInfo)
Queries are also Relations! Q2(SpeciesInfo,Q1(Trees)) (Relational Algebra is Closed)
Operation | Sym | Meaning |
---|---|---|
Selection | σ | Select a subset of the input rows |
Projection | π | Delete unwanted columns |
Cross-product | × | Combine two relations |
Set-difference | − | Tuples in Rel 1, but not Rel 2 |
Union | ∪ | Tuples either in Rel 1 or in Rel 2 |
Intersection | ∩ | Tuples in both Rel 1 and Rel 2 |
Join | ⋈ | Pairs of tuples matching a specified condition |
Division | / | "Inverse" of cross-product |
Sort | τA | Sort records by attribute(s) A |
Limit | LIMITN | Return only the first N records (according to sort order if paired with sort). |
We say that Q1≡Q2 if and only if
we can guarantee that the bag of tuples produced by Q1(R,S,T,…)
is the same as the bag of tuples produced by Q2(R,S,T,…)
for any combination of valid inputs R,S,T,….
... that satisfy any necessary properties.
Rule | Notes |
---|---|
σC1∧C2(R)≡σC1(σC2(R)) | |
σC1∨C2(R)≡σC1(R)∪σC2(R) | Note, this is only true for set, not bag union |
σC(R×S)≡R⋈CS | |
σC(R×S)≡σC(R)×S | If C references only R's attributes, also works for joins |
πA(πA∪B(R))≡πA(R) | |
σC(πA(R))≡πA(σC(R)) | If A contains all of the attributes referenced by C |
πA∪B(R×S)≡πA(R)×πB(S) | Where A (resp., B) contains attributes in R (resp., S) |
R×(S×T)≡(R×S)×T | Also works for joins |
R×S≡S×R | Also works for joins |
R∪(S∪T)≡(R∪S)∪T | Also works for intersection and bag-union |
R∪S≡S∪R | Also works for intersections and bag-union |
σC(R∪S)≡σC(R)∪σC(S) | Also works for intersections and bag-union |
πA(R∪S)≡πA(R)∪πA(S) | Also works for intersections and bag-union |
σC(γA,AGG(R))≡γA,AGG(σC(R)) | If A contains all of the attributes referenced by C |
Sort/Merge typically expressed as 3 operators
(2xSort + Merge)
Can partition on data-values to support other types of queries.
Grey et. al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Cost-Based Optimization
Figure out the cost of each individual operator.
Only count the number of IOs added by each operator.
Operation | RA | Total IOs (#pages) | Memory (#tuples) |
---|---|---|---|
Table Scan | R | |R|P | O(1) |
Projection | π(R) | io(R) | O(1) |
Selection | σ(R) | io(R) | O(1) |
Union | R⊎S | io(R)+io(S) | O(1) |
Sort (In-Mem) | τ(R) | io(R) | O(|R|) |
Sort (On-Disk) | τ(R) | 2⋅⌊logB(|R|)⌋P+io(R) | O(B) |
(B+Tree) Index Scan | Index(R,c) | logI(|R|)+|σc(R)|P | O(1) |
(Hash) Index Scan | Index(R,c) | 1 | O(1) |
Operation | RA | Total IOs (#pages) | Mem (#tuples) |
---|---|---|---|
Nested Loop Join (Buffer S in mem) | R×memS | io(R)+io(S) | O(|S|) |
Block NLJ (Buffer S on disk) | R×diskS | |R|B⋅|S|P+io(R)+io(S) | O(1) |
Block NLJ (Recompute S) | R×redoS | io(R)+|R|B⋅io(S) | O(1) |
1-Pass Hash Join | R⋈1PH,cS | io(R)+io(S) | O(|S|) |
2-Pass Hash Join | R⋈2PH,cS | 2|R|+2|S|P+io(R)+io(S) | O(1) |
Sort-Merge Join | R⋈SM,cS | [Sort] | [Sort] |
(Tree) Index NLJ | R⋈INL,c | |R|⋅(logI(|S|)+|σc(S)|P) | O(1) |
(Hash) Index NLJ | R⋈INL,c | |R|⋅1 | O(1) |
(In-Mem) Aggregate | γA(R) | io(R) | adom(A) |
(Sort/Merge) Aggregate | γA(R) | [Sort] | [Sort] |
Estimating IOs requires Estimating |Q(R)|
Operator | RA | Estimated Size |
---|---|---|
Table | R | |R| |
Projection | π(Q) | |Q| |
Union | Q1⊎Q2 | |Q1|+|Q2| |
Cross Product | Q1×Q2 | |Q1|×|Q2| |
Sort | τ(Q) | |Q| |
Limit | LIMITN(Q) | N |
Selection | σc(Q) | |Q|×SEL(c,Q) |
Join | Q1⋈cQ2 | |Q1|×|Q2|×SEL(c,Q1×Q2) |
Distinct | δA(Q) | UNIQ(A,Q) |
Aggregate | γA,B←Σ(Q) | UNIQ(A,Q) |
Flips | Score | Probability | E[# Games] |
---|---|---|---|
(👽) | 0 | 0.5 | 2 |
(🐕)(👽) | 1 | 0.25 | 4 |
(🐕)(🐕)(👽) | 2 | 0.125 | 8 |
(🐕)×N (👽) | N | 12N+1 | 2N+1 |
If I told you that in a series of games, my best score was N, you might expect that I played 2N+1 games.
To do that, I only need to track my top score!