Intro
CSE-4/562 Spring 2019
January 28, 2019
Textbook: Ch. 1, 2.1-2.2
Why Are Databases Awesome?
They're Everywhere
$$$
Rank |
Organization |
Sales (B$) |
FY |
Market cap (B$) |
Headquarters |
1 |  | Microsoft | 86.6 | 2017 | 601 | Redmond, WA, US |
2 |  | Oracle | 37.2 | 2017 | 205 | Redwood City, CA, US |
3 |  | SAP | 23.2 | 2017 | 117 | Walldorf, Germany |
4 |  | Salesforce.com | 8.4 | 2017 | 69 | San Francisco, CA, US |
5 |  | VMware | 6.7 | 2017 | 48 | Palo Alto, CA, US |
6 |  | Fiserv | 5.3 | 2017 | 26 | Brookfield, WI, US |
7 |  | Adobe Systems | 5 | 2017 | 84 | San Jose, CA, US |
8 |  | Symantec | 5.4 | 2017 | 19 | Mountain View, CA, US |
9 |  | Amadeus IT Holdings | 4.3 | 2017 | 25 | Madrid, Spain |
5 of 9 Forbes Top Software Companies
Have a Focus on Data Management Systems
(Source wikipedia.org)
Interesting Problems
How do we ask and answer questions about data?
How do we manipulate and persist data?
Databases
Techniques |
Data Modeling
Cost-Based Optimization
|
Recipes |
Join Algorithms
Index Data Structures
|
Knowledge |
The Memory Hierarchy
Data Consistency
|
Which Tools To Use
And When?
Template for 90% of this class
What is the best, correct technique for task X, when Y is true?
- How do you define Correct and Best?
- What correct alternatives are available?
- How do you find the best available alternative
General Course Information
Expectations
- Algorithms / Data Structures
- O(⋅) Analysis, Sort Algos, Trees, Hash Tables
- How to use a database
- CSE 4/560 (or equivalent)
- Java (or Scala)
- This means actual programming experience
- (C++, C# or similar is usually good enough)
Me
Oliver Kennedy
TAs
Vicky Zheng
Qiuling Suo
Ninjas
William Spoth
Darshana Balakrishnan
Carl Nuessle
|
|
$150 |
$50 |
Index ToC |
No Index ToC Summary |
Course Structure
- Concepts (50% of Grade)
- Homework (10%; ~12 Assignments, Drop any 4)
- March 13: Midterm (20% or 15%)
- May 17?: Comprehensive Final (20% or 25%)
- Practicum (50% of Grade)
- Build a Relational Query Engine
- 3-Person Group Project
- 4 Checkpoints (+ 5 free points for Checkpoint 0)
Embedded Databases
- SQLite (In your browser, computer, phone, fridge...)
- Simple, Easy-To-Use Declarative Data Management
- Critical for future tech: Part of Mobile, IoT, Web
Your Startup: Build the next great Embedded Database
We give you...
Data (CSV Files)
Schema Information (CREATE TABLE)
Questions (SQL Queries)
You give us...
Answers
(really really fast)
Real World Challenge
You get graded on your code's...
- Correctness
- ~1/3 credit for getting the right answer.
- Performance
- ~2/3 credit for getting it reasonably fast.
Checkpoint 0: "Hello World"
5/50 pts
- Form groups
- Submit a simple Java program
- Make sure that the submission workflow works for you.
Checkpoint 1: "Get it Working"
10/50 pts
- Translate SQL to Relational Algebra
- Load CSV Files
- Run Basic Select, Project, Join Queries
Checkpoint 2: "Big Data"
10/50 pts
- Order By
- Limit
- Nested Queries
- Aggregation
- Too much data for memory
Checkpoint 3: "Precomputation"
15/50 pts
- You get a few minutes to pre-compute
- Load data
- Cache views
- Build indexes
Checkpoint 4: "The Real World"
10/50 pts
- We give you a Buffer Manager
- Dynamic, skewed workload
Ways to Fail
- Start your project at the last minute
- Don’t go to office hours
- Don’t ask questions on Piazza
- Wait until the deadline to submit for the first time
- Cheat
Academic Integrity
Cheating is submitting any work that you did not perform by yourself as if you did.
- References (when cited)
- Wikipedia, Wikibooks (or similar): OK
- Public Code
- Stack Exchange (or similar): Not OK
- Discussing concepts/ideas with classmates
- “A hash index has O(1) lookups”: OK (except during exams 😇 )
- Sharing code or answers with anyone
- “Just have a look at how I implemented it”: NOT OK
- For-hire code: NOT OK
MOSS
MOSS
- Zero Tolerance
- If I catch you submitting someone else’s code (including pay-for-code services), you will fail the class.
- Group Responsibility
- If your teammate cheats on a group project, the entire group will be penalized.
- Share Code, Share Blame
- If someone else submits your code as their own, you will be penalized as well.
What does a Data Management System Do?
- Analysis: Answering user-provided questions about data
- What kind of tools can we give end-users?
- Declarative Languages
- Organizational Datastructures (e.g., Indexes)
- Manipulation: Safely persisting and sharing data updates
- What kind of tools can we give end-users?
- Consistency Primitives
- Data Validation Primitives
So let's talk structure...
- Primitive
- Basic building blocks like Int, Float, Char, String
- Tuple
- Several ‘fields’ of different types. (N-Tuple = N fields)
- A Tuple has a ‘schema’ defining each field
- Set
- A collection of unique records, all of the same type
- Bag
- An unordered collection of records, all of the same type
- List
- An ordered collection of records, all of the same type
Your data is currently an Unordered Set
of Tuples with 100 fields each.
Tomorrow, you’ll be repeatedly asked for 1 specific attribute
of 5 specific rows identified by the first attribute
Can you do better?
Better Idea: Rewrite data into a 99-Tuple of Maps keyed on the 1st attribute
This representation is equivalent and better for your needs.
Declarative specifications make it easier to find equivalences.
Intro
CSE-4/562 Spring 2019
January 28, 2019
Textbook: Ch. 1, 2.1-2.2