March 11, 2019
What are Databases?
SELECT [DISTINCT] targetlist
FROM relationlist
WHERE condition
This is the least efficient strategy to compute a query! A good optimizer will find more efficient strategies to compute the same answer.
,
) between fields\r\n
) at record endRelational Algebra
Operation | Sym | Meaning |
---|---|---|
Selection | σ | Select a subset of the input rows |
Projection | π | Delete unwanted columns |
Cross-product | × | Combine two relations |
Set-difference | − | Tuples in Rel 1, but not Rel 2 |
Union | ∪ | Tuples either in Rel 1 or in Rel 2 |
Intersection | ∩ | Tuples in both Rel 1 and Rel 2 |
Join | ⋈ | Pairs of tuples matching a specified condition |
Division | / | "Inverse" of cross-product |
Sort | τA | Sort records by attribute(s) A |
Limit | LIMITN | Return only the first N records (according to sort order if paired with sort). |
Rule | Notes |
---|---|
σC1∧C2(R)≡σC1(σC2(R)) | |
σC1∨C2(R)≡σC1(R)∪σC2(R) | Only true for set, not bag union |
σC(R×S)≡R⋈CS | |
σC(R×S)≡σC(R)×S | If C references only R's attributes, also works for joins |
πA(πA∪B(R))≡πA(R) | |
σC(πA(R))≡πA(σC(R)) | If A contains all of the attributes referenced by C |
πA∪B(R×S)≡πA(R)×πB(S) | Where A (resp., B) contains attributes in R (resp., S) |
R×(S×T)≡(R×S)×T | Also works for joins |
R×S≡S×R | Also works for joins |
R∪(S∪T)≡(R∪S)∪T | Also works for intersection and bag-union |
R∪S≡S∪R | Also works for intersections and bag-union |
σC(R∪S)≡σC(R)∪σC(S) | Also works for intersections and bag-union |
πA(R∪S)≡πA(R)∪πA(S) | Also works for intersections and bag-union |
σC(γA,AGG(R))≡γA,AGG(σC(R)) | If A contains all of the attributes referenced by C |
Sort/Merge typically expressed as 3 operators
(2xSort + Merge)
Can partition on data-values to support other types of queries.
Grey et. al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Indexing
Maintain Invariant: All Nodes ≥ 50% Full
(Exception: The Root)
Cost-Based Optimization
Figure out the cost of each individual operator.
Only count the number of IOs added by each operator.
Operation | RA | IOs Added (#pages) | Memory (#tuples) |
---|---|---|---|
Table Scan | R | |R|P | O(1) |
Projection | π(R) | 0 | O(1) |
Selection | σ(R) | 0 | O(1) |
Union | R⊎S | 0 | O(1) |
Sort (In-Mem) | τ(R) | 0 | O(|R|) |
Sort (On-Disk) | τ(R) | 2⋅⌊logB(|R|)⌋P | O(B) |
(B+Tree) Index Scan | Index(R,c) | logI(|R|)+|σc(R)|P | O(1) |
(Hash) Index Scan | Index(R,c) | 1 | O(1) |
Operation | RA | IOs Added (#pages) | Memory (#tuples) |
---|---|---|---|
Nested Loop Join (Buffer S in mem) | R×S | 0 | O(|S|) |
Nested Loop Join (Buffer S on disk) | R×diskS | (1+|R|)⋅|S|P | O(1) |
1-Pass Hash Join | R⋈1PH,cS | 0 | O(|S|) |
2-Pass Hash Join | R⋈2PH,cS | 2|R|+2|S|P | O(1) |
Sort-Merge Join | R⋈SM,cS | [Sort] | [Sort] |
(Tree) Index NLJ | R⋈INL,c | |R|⋅(logI(|S|)+|σc(S)|P) | O(1) |
(Hash) Index NLJ | R⋈INL,c | |R|⋅1 | O(1) |
(In-Mem) Aggregate | γA(R) | 0 | adom(A) |
(Sort/Merge) Aggregate | γA(R) | [Sort] | [Sort] |
Estimating IOs requires Estimating |Q(R)|
Operator | RA | Estimated Size |
---|---|---|
Table | R | |R| |
Projection | π(Q) | |Q| |
Union | Q1⊎Q2 | |Q1|+|Q2| |
Cross Product | Q1×Q2 | |Q1|×|Q2| |
Sort | τ(Q) | |Q| |
Limit | LIMITN(Q) | N |
Selection | σc(Q) | |Q|×SEL(c,Q) |
Join | Q1⋈cQ2 | |Q1|×|Q2|×SEL(c,Q1×Q2) |
Distinct | δA(Q) | UNIQ(A,Q) |
Aggregate | γA,B←Σ(Q) | UNIQ(A,Q) |
Flips | Score | Probability | E[# Games] |
---|---|---|---|
(👽) | 0 | 0.5 | 2 |
(🐕)(👽) | 1 | 0.25 | 4 |
(🐕)(🐕)(👽) | 2 | 0.125 | 8 |
(🐕)×N (👽) | N | 12N+1 | 2N+1 |
If I told you that in a series of games, my best score was N, you might expect that I played 2N+1 games.
To do that, I only need to track my top score!
WINDOW
Operator
SELECT L.state, T.month,
AVG(S.sales) OVER W as movavg
FROM Sales S, Times T, Locations L
WHERE S.timeid = T.timeid
AND S.locid = L.locid
WINDOW W AS (
PARTITION BY L.state
ORDER BY T.month
RANGE BETWEEN INTERVAL ‘1’ MONTH PRECEDING
AND INTERVAL ‘1’ MONTH FOLLOWING
)
WINDOW
joins.