Modern compilation pipelines (compilers, optimizers, program analysis tools) consume, transform, and produce abstract syntax trees. If you squint a little, these processes start to look a lot like database queries, updates, and views. The ODIn Lab is looking at ways to leverage expertise from the database community to improve the performance of compiler tool chains, while at the same time making it easier to write compilers and program analyses by decoupling the core specification logic from logic aimed at making the compiler faster.
Data Science is frequently an iterative process, with scientists frequently revisiting and past efforts and adapting them for new research goals. A key challenge in this process is ensuring that efforts are re-used safely. Just like compilers for strongly typed languages help users understand the potential implications of changes in their code, the ODIn lab is looking to develop similar techniques to assist data scientists in developing safe, re-usable, and reproducible data science pipelines.
Ambiguous, incomplete, conflicting, and otherwise uncertain data is difficult to use safely, often requiring days or even weeks of careful study to understand the uncertainty's implication on downstream analyses. Exploration, and re-use of uncertain data pose a particularly significant challenge, as it becomes easy to lose track of assumptions made during one phase of a data science project. The ODIn lab is exploring how uncertainty in data can be precisely modeled without impeding a researcher's ability to explore or analyze their data.
This page last updated 2025-01-28 13:31:58 -0500