May 1, 2019
SELECT * FROM Posts WHERE image_class = 'Cat';
SELECT COUNT(*) FROM Posts WHERE image_class = 'Cat';
SELECT user_id FROM Posts
WHERE image_class = 'Cat'
GROUP BY user_id HAVING COUNT(*) > 10;
| or |
|
Incomplete Database (D): A set of possible worlds
Possible World (D∈D): One (of many) database instances
(Require all possible worlds to have the same schema)
What does it mean to run a query on an incomplete database?
Q(D)=?
Q(D)={Q(D)|D∈D}
| or |
|
Q1=πName(σstate='NY'(R⋈zipZipLookups))
{ |
| or |
|
} |
| or |
|
Q2=πName(σregion='Northeast'(R⋈zipZipLookups))
{ |
| or |
|
} |
| or |
|
Q2=πName(σregion='Northeast'(R⋈zipZipLookups))
{ |
|
} |
Challenge: There can be lots of possible worlds.
Observation: Possibilities for database creation break down into lots of independent choices.
Factorize the database.
|
| ||||||||||||||||||||||||
|
|
Alice appears in both databases.
The only differences are Bob and Carol's zip codes.
R | Name | ZipCode |
---|---|---|
1 | Alice | 10003 |
2 | Bob | 14260 |
3 | Bob | 19260 |
4 | Carol | 13201 |
5 | Carol | 18201 |
[bob∈{4,9},carol∈{3,8}]
R | Name | ZipCode | |
---|---|---|---|
1 | Alice | 10003 | → always |
2 | Bob | 14260 | → if bob=4 |
3 | Bob | 19260 | → if bob=9 |
4 | Carol | 13201 | → if carol=3 |
5 | Carol | 18201 | → if carol=8 |
[bob∈{4,9},carol∈{3,8}]
R | Name | ZipCode | |
---|---|---|---|
1 | Alice | 10003 | → a |
2 | Bob | 14260 | → b |
3 | Bob | 19260 | → c |
4 | Carol | 13201 | → d |
5 | Carol | 18201 | → e |
Pick one of each: [{a},{b,c},{d,e}]
Set those variables to T and all others to F
R1≡[a→T,b→T,d→T,∗→F]
R | Name | ZipCode | |
---|---|---|---|
1 | Alice | 10003 | → T (a) |
2 | Bob | 14260 | → T (b) |
3 | Bob | 19260 | → F (c) |
4 | Carol | 13201 | → T (d) |
5 | Carol | 18201 | → F (e) |
Use provenance as before...
... but what about aggregates?
SELECT COUNT(*)
FROM R NATURAL JOIN ZipCodeLookup
WHERE State = 'NY'
={1if bob=9∧carol=82if bob=4∧carol=8∨bob=9∧carol=33if bob=4∧carol=3
Problem: A combinatorial explosion of possibilities
Idea: Simplify the problem
Pick your favorite SAT solver, plug in and go
As before, factorize the possible outcomes
1+{1ifbob=4}+{1ifcarol=3}
Not bigger than the aggregate input...
...but at least it only reduces to bin-packing
(or a similarly known NP problem.)