Don't Wrangle, Guess

One of the biggest costs in analytics is data wrangling: Getting your messy, mis-labeled, disorganized data together so you can actually ask your questions. All data wrangling tools force you to do all this work upfront, before you actually know what you even want to do with the data. Mimir lets you at your data sooner by tracking your cleaning todos. Ask first, clean later, with Mimir.

Get Mimir

Mimir is about getting you to your analysis as fast as possible. It lets you harness the raw power of SQL, StackOverflow's second-most popular language for 4 years running. Mimir then adds a ton of powerful SQL extensions designed to dealing with messy data easier:

LOAD

Stop messing with data import and relational schema design. The versatile LOAD command allows you to quickly transform documents into relational tables without the muss and fuss of upfront schema design or defining complex transformation operators.

PLOT

Stop writing messy scripts to visualize your data. The PLOT command lets you take SQL queries and see them directly – notebook style, PDF/PNG, or Javascript, take your pick. Mimir even keeps track of unknowns in your data.

ANALYZE

Mimir keeps track of your wrangling to-dos, marking query results that might have errors. When you need to be more precise, the ANALYZE command zeroes in on the specific wrangling you need right now.

Unlike most other SQL-based systems, Mimir lets you make decisions during and after data exploration. All of Mimir's functionality is based on three ideas: (1) Mimir provides sensible best guess defaults, and (2) Mimir warns you when one of its guesses is going to affect what it's telling you, and (3) Mimir lets you easily inspect what it's doing to your data with ANALYZE.

Better still, you don't need any new infrastructure. Mimir attaches to ordinary relational databases through JDBC (We currently support SQLite, with SparkSQL and Oracle support in progress). If you don't care, Mimir just puts everything in a super portable SQLite database by default.

Documentation

If you want to use Mimir...

Get Mimir

5 minute overview

Mimir SQL

Mimir's Lenses

If you're having problems...

Issue Tracker

If you want to hack on Mimir...

Setting Up a Dev Environment

Conceptual Introduction to Mimir

Conceptual Introduction to the UI

ScalaDocs

Easy Projects to Start With

Who Are We?

The Team: Mike Brachmann, Oliver Kennedy, Aaron Huber
Research Advisors: Oliver Kennedy, Boris Glavic
Industry Advisors: Ronny Fehling (Airbus), Dieter Gawlick (Oracle), Zhen Hua Liu (Oracle), Beda Hammerschmidt (Oracle)
Alumni: Poonam Kumari, William Spoth, Ting Xie, Gourab Mitra, Vinayak Karuppasamy, Arindam Nandi, Niccolò Meneghetti, Ying Yang, Olivia Alphonce, Sneha Krishnamurthy, Anand Sankar Bhagavandas, Shivang Aggarwal

Mimir is supported by gifts from Oracle, as well as grants from the NSF and Naval Postgraduate School

Presentations

Project Overview (2017)

Video Demo (2015)

Project Overview (2015)

Rant: What if Databases Could Answer Incorrectly (2015)

Publications

FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds

Aaron Huber, Oliver Kennedy, Atri Rudra, Zhuoyue Zhao, Su Feng, Boris Glavic

SIGMOD 2025 [ bibtex ]

@inproceedings{huber:2025:sigmod:fastpdb,
   author = {Huber, Aaron and Kennedy, Oliver and Rudra, Atri and Zhao, Zhuoyue and Feng, Su and Glavic, Boris},
   title = {FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds},
   booktitle = {SIGMOD},
   year = {2025}
}

Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data

Su Feng, Boris Glavic, Oliver Kennedy

pVLDB 2023 [ paper | bibtex ]

@article{feng:2023:pvldb:efficient,
   author = {Feng, Su and Glavic, Boris and Kennedy, Oliver},
   title = {Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data},
   journal = {pVLDB},
   year = {2023}
}

The Right Tool for the Job: Data-Centric Workflows in Vizier

Oliver Kennedy, Boris Glavic, Juliana Freire, Michael Brachmann

IEEE-DEB 2022 [ paper | bibtex ]

@article{kennedy:2022:ieee-deb:right,
   author = {Kennedy, Oliver and Glavic, Boris and Freire, Juliana and Brachmann, Michael},
   title = {The Right Tool for the Job: Data-Centric Workflows in Vizier},
   journal = {IEEE-DEB},
   year = {2022}
}

Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds

Su Feng, Boris Glavic, Aaron Huber, Oliver Kennedy

SIGMOD 2021 [ paper | bibtex ]

@inproceedings{feng:2021:sigmod:efficient,
   author = {Feng, Su and Glavic, Boris and Huber, Aaron and Kennedy, Oliver},
   title = {Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds},
   booktitle = {SIGMOD},
   year = {2021}
}

Make Informed Decisions: Understanding Query Results from Incomplete Databases

Poonam Kumari

pVLDB 2019 (Workshop) [ paper | bibtex ]

@inproceedings{kumari:2019:pvldb:make,
   author = {Kumari, Poonam},
   title = {Make Informed Decisions: Understanding Query Results from Incomplete Databases},
   booktitle = {VLDB-PhD},
   year = {2019}
}

Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers

Su Feng, Aaron Huber, Boris Glavic, Oliver Kennedy

SIGMOD 2019 [ paper | extended | bibtex ]

@inproceedings{feng:2019:sigmod:uncertainty,
   author = {Feng, Su and Huber, Aaron and Glavic, Boris and Kennedy, Oliver},
   title = {Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers},
   booktitle = {SIGMOD},
   year = {2019}
}

Learning From Query-Answers: A Scalable Approach to Belief Updating and Parameter Learning

Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer

Invited article extending a 'Best-of-SIGMOD' paper from SIGMOD 2017

TODS 2018 [ paper | bibtex ]

@article{meneghetti:2018:tods:learning,
   author = {Meneghetti, Niccolò and Kennedy, Oliver and Gatterbauer, Wolfgang},
   title = {Learning From Query-Answers: A Scalable Approach to Belief Updating and Parameter Learning},
   journal = {TODS},
   year = {2018}
}

SchemaDrill: Interactive Semi-Structured Schema Design

William Spoth, Ting Xie, Oliver Kennedy, Ying Yang, Beda Hammerschmidt, Zhen Hua Liu, Dieter Gawlick

HILDA 2018 (Workshop) [ paper | bibtex ]

@inproceedings{spoth:2018:hilda:schemadrill,
   author = {Spoth, William and Xie, Ting and Kennedy, Oliver and Yang, Ying and Hammerschmidt, Beda and Liu, Zhen Hua and Gawlick, Dieter},
   title = {SchemaDrill: Interactive Semi-Structured Schema Design},
   booktitle = {HILDA},
   year = {2018}
}

The Good and Bad Data

Poonam Kumari, Oliver Kennedy

NEDB 2018 (Workshop) [ abstract | bibtex ]

@inproceedings{kumari:2018:nedb:good,
   author = {Kumari, Poonam and Kennedy, Oliver},
   title = {The Good and Bad Data},
   booktitle = {NEDB},
   year = {2018}
}

Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning

Niccolò Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer

Invited to submit an extended version as a 'Best-of-SIGMOD' paper to ACM-TODS

SIGMOD 2017 [ paper | video | slides | poster | bibtex ]

@inproceedings{meneghetti:2017:sigmod:beta,
   author = {Meneghetti, Niccolò and Kennedy, Oliver and Gatterbauer, Wolfgang},
   title = {Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning},
   booktitle = {SIGMOD},
   year = {2017}
}