The MIT Inference Stack is an open-source AI programming stack. It includes new AI programming languages that make it possible to combine neural, symbolic, and Bayesian approaches to modeling and inference. It also includes spreadsheet interfaces for data analysts, SQL interfaces for data engineers, and Python/Julia libraries for data scientists and AI engineers.
The Inference Stack has been developed by MIT researchers and open-source contributors over the past 10 years. The software has been developed under DARPA XDATA, PPAML, SD2, and MCS research contracts, and through philanthropic gifts to create a dedicated engineering team. Additional components have been developed by VC backed startup Prior Knowledge Inc (acquired by Salesforce in 2012), Empirical Systems Inc (acquired by Tableau in 2018), and by engineers at Salesforce.
“Democratizing Data Science” (MIT News 2019).
“New AI programming language goes beyond deep learning” (MIT News 2019).
“Tableau Acquires MIT Spinoff Empirical Systems” (Forbes 2019).
Please note: This is research software. It is certain to contain flaws. No warranty of any kind is implied by this release. Use it at your own risk.
The MIT Inference Stack is available as a 511 Mb standalone Docker image. It includes:
BayesDB, a SQL interface to probabilistic programming and automatic data modeling, usable by data scientists and data engineers.
- CGPM, a Python library that enables data scientists to customize BayesDB models via machine learning packages in Python (e.g. scikit.learn) and via probabilistic programming.
CrossCat, an automatic data modeling technique that builds models for high-dimensional, heterogeneously-typed data tables.
- Loom is a C++ implementation developed by Prior Knowledge and Salesforce, suitable for 1B+ cell data tables, and usable by ML engineers in production.
Venture, a higher-order probabilistic programming language with programmable inference, suitable for AI researchers.
Gen, a general-purpose probabilistic programming platform with programmable inference. Gen has been used to implement state-of-the-art techniques in robotics, computer vision and statistics in under 100 lines of code, and is being adopted by Bayesian statisticians as a more powerful alternative to languages such as Stan. Gen is also the basis of multiple AI moonshot projects at MIT, including an effort to build models of human common sense at the 18-month-old level.