Checkpoint 2

This project follows the same outline as Checkpoint 1. Your code gets SQL queries and is expected to answer them. There are a few key differences:

Sorting and Grouping Data

Sort is a blocking operator. Before it emits even one row, it needs to see the entire dataset. If you have enough memory to hold the entire input to be sorted, then you can just use Java's built-in Collections.sort method. However, for at least a few queries you will likely not have enough memory to keep everything available. In that case, a good option is to use the 2-pass sort algorithm that we discussed in class.

Group-by aggregates are also a blocking operator. If you run out of memory for the groups, you will need to implement a memory-aware grouping operator. One idea is to re-use the sort operator to group values together and use the sorted grouping technique that we discussed in class.

Preprocessing

Your code will be tested in 2 phases. In the first phase, you will have 1GB of memory and 2 minutes with each CREATE TABLE statement. In the second phase, you will have 150MB of memory and 5 minutes with each CREATE TABLE statement. The reference implementation uses this time to build indexes over the data – in-memory and/or on-disk, depending on phase. Students in prior years have come up with other creative ways to use this time.

CREATE TABLE statements will include index suggestions, both via unique PRIMARY KEY attributes and non-unique INDEX fields. You can get access to both through the getIndexes() method of CreateTable

Grading Workflow

As before, all .java files in the src directory at the root of your repository will be compiled (and linked against JSQLParser). Also as before, the class dubstep.Main will be invoked with no arguments, and a stream of semicolon-delimited queries will be printed to System.in (after you print out a prompt)

bash> java -cp build:jsqlparser.jar dubstep.Main -
$> CREATE TABLE R(A int, B int, C int);
$> SELECT B, C FROM R WHERE A = 1;

This page last updated 2024-12-03 16:56:15 -0500