BayesDB

Is it possible to make statistical inference broadly accessible to non-statisticians without sacrificing mathematical rigor or inference quality?

INFER orbit_type FROM satellites
        WITH CONFIDENCE 0.7

BayesDB is a probabilistic programming platform that enables users to query the probable implications of their data as directly as SQL databases enable them to query the data itself.

The default modeling assumptions that BayesDB makes are suitable for a broad class of problems, but statisticians can customize these assumptions when necessary. BayesDB also enables domain experts that lack statistical expertise to perform qualitative model checking and encode simple forms of qualitative prior knowledge.

Bayesian Query Language (BQL)


INFER EXPLICIT
  anticipated_lifetime, perigee_km,
  period_minutes, class_of_orbit,
  PREDICT type_of_orbit AS inferred_orbit_type
  CONFIDENCE inferred_orbit_type_conf
  FROM satellites_cc
  WHERE type_of_orbit IS NULL;
        

The Bayesian Query Lanuage (BQL) allows analysts and domain experts to interact perform Bayesian data analysis without requiring a detailed understanding of model implementation. That means queries can be articulated before models have been build, and models can be improved and optimized without invalidating existing queries.

What-If Scenarios in BQL


SIMULATE country_of_operator, purpose
  FROM satellites
  GIVEN Class_of_orbit = GEO, Dry_mass_kg = 500
  LIMIT 1000;
        

BQL enables users to generate answers to a broad class of "what-if?" scenarios, contingencies and hypotehticals.

These samples can be used as proxy data in sensitive settings, as the basis for model checking by domain experts, and as the basis for making complex risk-reward tradeoffs requiring full probability distributions on outcomes.

Meta-modeling Language (MML)

Private Alpha

CREATE POPULATION pop WTIH DATA 'pop.csv';
GUESS POPULATION SCHEMA pop;
ALTER POPULATION pop
   ALTER COLUMN x SET DATA TYPE CYCLIC;
CREATE DEFAULT METAMODEL ON pop (
   INDEPENDENT(x, y),
   INDEPENDENT(x, v),
   DEPENDENT(z, v, w)
);
            

The Meta-modeling Language (MML) enables machine assisted modeling for populations based on samples and domain insight.

By specifying population schemas and also by using the MML, domain experts can encode qualitative prior knowledge and control the behavior of BayesDB's built-in model building engine.

Extensible Models

Private Alpha

CREATE METAMODEL sat_keplers ON satellites
  USING composer(
  random_forest (
    Type_of_Orbit (CATEGORICAL)
      GIVEN Apogee_km, Perigee_km,
            Eccentricity, Period_minutes,
            Launch_Mass_kg, Power_watts,
            Anticipated_Lifetime, Class_of_orbit
  ),
  foreign_model (
    source = 'keplers_laws.py',
    Period_Minutes (NUMERICAL)
      GIVEN Perigee_km, Apogee_km
  ),
  default (
    Country_of_Operator CATEGORICAL,
    Operator_Owner CATEGORICAL,
    Users CATEGORICAL, Purpose CATEGORICAL,
    Type_of_Orbit CATEGORICAL,
    Perigee_km NUMERICAL,
    Apogee_km NUMERICAL,
    Eccentricity NUMERICAL,
    Launch_Mass_kg NUMERICAL,
    Dry_Mass_kg NUMERICAL,
    Power_watts NUMERICAL,
    Date_of_Launch NUMERICAL,
    Anticipated_Lifetime NUMERICAL,
    Contractor CATEGORICAL,
    Country_of_Contractor CATEGORICAL,
    Launch_Site CATEGORICAL,
    Launch_Vehicle CATEGORICAL,
    Source_Used_for_Orbital_Data CATEGORICAL,
    longitude_radians_of_geo NUMERICAL,
    Inclination_radians NUMERICAL
  )
);
            

MML includes constructs for integrating arbitrary algorithmic models contained in external software, and for invoking a standard library of custom statistical modeling techniques.

Some BayesDB functionality, including machine assisted modeling, is currently only available through private alpha. Private alpha users are expected to share more about their data and experience with the probcomp research team.


BayesDB research and development is supported by grants from DARPA (under the PPAML program), IARPA, the Office of Naval Research, and the Army Research Laboratory, with additional support from Google and from the Bill and Melinda Gates Foundation. BayesDB is a part of the Venture platform, also supported by PPAML, and includes components from CrossCat, supported by DARPA (under the XDATA program). Vikash Mansinghka is the Principal Investigator, and co-founder of the open source project with Richard Tibbetts.

The bayeslite prototype implementation of BayesDB was originally developed by Taylor Campbell. Additional contributions were made by Feras Saad, Alexey Radul, Baxter Eaves, Jay Baxter, and Pat Shafto.

If you have any comments or questions, please feel free to email us at bayesdb@mit.edu.