Bayeslite API reference

bayeslite: Bayeslite API

Main bayeslite API.

The focus of the bayeslite API is the BayesDB, a handle for a database. To obtain a BayesDB handle, use bayesdb_open():

import bayeslite

bdb = bayeslite.bayesdb_open(pathname='foo.bdb')

When done, close it with the close() method:

bdb.close()

BayesDB handles also serve as context managers, so you can do:

with bayeslite.bayesdb_open(pathname='foo.bdb') as bdb:
    bdb.execute('SELECT 42')
    ...

You can query the probable (according to the models stored in the database) implications of the data by passing BQL queries to the execute() method:

bql = 'ESTIMATE DEPENDENCE PROBABILITY FROM PAIRWISE COLUMNS OF foo'
for x in bdb.execute(bql):
   print x

You can also execute normal SQL on a BayesDB handle bdb with the sql_execute() method:

bdb.sql_execute('CREATE TABLE t(x INT, y TEXT, z REAL)')
bdb.sql_execute("INSERT INTO t VALUES(1, 'xyz', 42.5)")
bdb.sql_execute("INSERT INTO t VALUES(1, 'pqr', 83.7)")
bdb.sql_execute("INSERT INTO t VALUES(2, 'xyz', 1000)")

(BQL does not yet support CREATE TABLE and INSERT directly, so you must use sql_execute() for those.)

exception bayeslite.BQLError(bayesdb, *args, **kwargs)

Errors in interpreting or executing BQL on a particular database.

exception bayeslite.BQLParseError(errors)

Errors in parsing BQL.

As many parse errors as can be reasonably detected are listed together.

Variables:errors (list) – list of strings describing parse errors
class bayeslite.BayesDB(cookie, pathname=None, seed=None, version=None, compatible=None)

A handle for a Bayesian database in memory or on disk.

Do not create BayesDB instances directly; use bayesdb_open() instead.

An instance of BayesDB is a context manager that returns itself on entry and closes itself on exit, so you can write:

with bayesdb_open(pathname='foo.bdb') as bdb:
    ...
changes()

Return the number of changes of the last INSERT, DELETE, or UPDATE.

This may return unexpected results after a statement that is not an INSERT, DELETE, or UPDATE.

close()

Close the database. Further use is not allowed.

execute(string, bindings=None)

Execute a BQL query and return a cursor for its results.

The argument string is a string parsed into a single BQL query. It must contain exactly one BQL phrase, optionally terminated by a semicolon.

The argument bindings is a sequence or dictionary of bindings for parameters in the query, or None to supply no bindings.

last_insert_rowid()

Return the rowid of the row most recently inserted.

np_prng

A Numpy RandomState object local to this BayesDB instance.

This pseudorandom number generator is deterministically initialized from the seed supplied to bayesdb_open(). Use it to conserve reproducibility of results.

py_prng

A random.Random object local to this BayesDB instance.

This pseudorandom number generator is deterministically initialized from the seed supplied to bayesdb_open(). Use it to conserve reproducibility of results.

reconnect()

Reconnecting may sometimes be necessary, e.g. before a DROP TABLE

savepoint(**kwds)

Savepoint context. On return, commit; on exception, roll back.

The effects of a savepoint happen durably all at once if committed, or not at all if rolled back.

Savepoints may be nested. Parsed metadata and models are cached in Python during a savepoint.

Example:

with bdb.savepoint():
    bdb.execute('DROP GENERATOR foo')
    try:
        with bdb.savepoint():
            bdb.execute('ALTER TABLE foo RENAME TO bar')
            raise NeverMind
    except NeverMind:
        # No changes will be recorded.
        pass
    bdb.execute('CREATE GENERATOR foo ...')
# foo will have been dropped and re-created.
savepoint_rollback(**kwds)

Auto-rollback savepoint context. Roll back on return or exception.

This may be used to compute hypotheticals – the bdb is guaranteed to remain unmodified afterward.

sql_execute(string, bindings=None)

Execute a SQL query on the underlying SQLite database.

The argument string is a string parsed into a single SQL query. It must contain exactly one SQL phrase, optionally terminated by a semicolon.

The argument bindings is a sequence or dictionary of bindings for parameters in the query, or None to supply no bindings.

sql_trace(tracer)

Trace execution of SQL queries.

For simple tracing, pass a function or arbitrary Python callable as the tracer. It will be called at the start of execution of each SQL query, with two arguments: the query to be executed, as a string; and the sequence or dictionary of bindings.

For articulated tracing, pass an instance of IBayesDBTracer, whose methods will be called in the pattern described in its documentation.

Only one tracer can be established at a time. To remove it, use sql_untrace().

sql_untrace(tracer)

Stop tracing execution of SQL queries.

tracer must have been previously established with sql_trace().

Any queries currently in progress will continue to be traced until completion.

trace(tracer)

Trace execution of BQL queries.

For simple tracing, pass a function or arbitrary Python callable as the tracer. It will be called at the start of execution of each BQL query, with two arguments: the query to be executed, as a string; and the sequence or dictionary of bindings.

For articulated tracing, pass an instance of IBayesDBTracer, whose methods will be called in the pattern described in its documentation.

Only one tracer can be established at a time. To remove it, use untrace().

transaction(**kwds)

Transaction context. On return, commit; on exception, roll back.

Transactions may not be nested: use a savepoint if you need nesting. Parsed metadata and models are cached in Python during a savepoint.

untrace(tracer)

Stop tracing execution of BQL queries.

tracer must have been previously established with trace().

Any queries currently in progress will continue to be traced until completion.

exception bayeslite.BayesDBException(bayesdb, *args, **kwargs)

Exceptions associated with a BayesDB instance.

Variables:bayesdb (bayeslite.BayesDB) – associated BayesDB instance
exception bayeslite.BayesDBTxnError(bayesdb, *args, **kwargs)

Transaction errors in a BayesDB.

bayeslite.bayesdb_deregister_backend(bdb, backend)

Deregister backend, which must have been registered in bdb.

bayeslite.bayesdb_open(pathname=None, builtin_backends=None, seed=None, version=None, compatible=None)

Open the BayesDB in the file at pathname.

If there is no file at pathname, it is automatically created. If pathname is unspecified or None, a temporary in-memory BayesDB instance is created.

seed is a 32-byte string specifying a pseudorandom number generation seed. If not specified, it defaults to all zeros.

If compatible is None or False and the database already exists, bayesdb_open may have the effect of incompatibly changing the format of the database so that older versions of bayeslite cannot read it. If compatible is True, bayesdb_open will not incompatibly change the format of the database (but some newer bayesdb features may not work).

bayeslite.bayesdb_read_csv(bdb, table, f, header=False, create=False, ifnotexists=False)

Read CSV data from a line iterator into a table.

Parameters:
  • bdb (bayeslite.BayesDB) – BayesDB instance
  • table (str) – name of table
  • f (iterable) – iterator returning lines as str
  • header (bool) – if true, first line specifies column names
  • create (bool) – if true and table does not exist, create it
  • ifnotexists (bool) – if true and table exists, do it anyway
bayeslite.bayesdb_read_csv_file(bdb, table, pathname, header=False, create=False, ifnotexists=False)

Read CSV data from a file into a table.

Parameters:
  • bdb (bayeslite.BayesDB) – BayesDB instance
  • table (str) – name of table
  • pathname (str) – pathname of CSV file
  • header (bool) – if true, first line specifies column names
  • create (bool) – if true and table does not exist, create it
  • ifnotexists (bool) – if true and table exists, do it anyway
bayeslite.bayesdb_register_backend(bdb, backend)

Register backend in bdb, creating any necessary tables.

backend must not already be registered in any BayesDB, nor any backend by the same name.

bayeslite.bayesdb_upgrade_schema(bdb, version=None)

Upgrade the BayesDB internal database schema.

If version is None, upgrade to the latest database format version supported by bayeslite. Otherwise, it may be a schema version number.

bayeslite.bql_quote_name(name)

Quote name as a BQL identifier, e.g. a table or column name.

Do NOT use this for strings, e.g. inserting data into a table. Use query parameters instead.

class bayeslite.BayesDB_Backend

BayesDB backend interface.

Subclasses of BayesDB_Backend implement the functionality needed by probabilistic BQL queries to sample from and inquire about the posterior distribution of a generative model conditioned on data in a table. Instances of subclasses of BayesDB_Backend contain any in-memory state associated with the backend in the database.

add_column(bdb, generator_id, colno)

Add colno from the population as a variable in the backend.

Used by the MML:

ALTER POPULATION <population> ADD VARIABLE <variable> <stattype>
alter(bdb, generator_id, modelnos, commands)

Modify the generator according to the metamdoel-specific commands.

Used by the MML:

ALTER GENERATOR <generator> [MODELS [(<modelnos>)]]
    commands...
analyze_models(bdb, generator_id, modelnos=None, iterations=1, max_seconds=None, ckpt_iterations=None, ckpt_seconds=None, program=None)

Analyze the specified model numbers of a generator.

If none are specified, analyze all of them.

Parameters:
  • iterations (int) – maximum number of iterations of analysis for each model
  • max_seconds (int) – requested maximum number of seconds to analyze
  • ckpt_iterations (int) – number of iterations before committing results of analysis to the database
  • ckpt_seconds (int) – number of seconds before committing results of analysis to the database
  • program (object) – None, or list of tokens of analysis program
column_dependence_probability(bdb, generator_id, modelnos, colno0, colno1)

Compute DEPENDENCE PROBABILITY OF <col0> WITH <col1>.

column_mutual_information(bdb, generator_id, modelnos, colnos0, colnos1, constraints=None, numsamples=100)

Compute MUTUAL INFORMATION OF (<cols0>) WITH (<cols1>).

create_generator(bdb, table, schema, **kwargs)

Create a generator for a table with the given schema.

Called when executing CREATE GENERATOR.

Must parse schema to build the generator.

The generator id and column numbers may be used to create backend-specific records in the database for the generator with foreign keys referring to the bayesdb_generator and bayesdb_variable tables.

schema is a list of schema items corresponding to the comma-separated ‘columns’ from a BQL CREATE GENERATOR command. Each schema item is a list of strings or lists of schema items, corresponding to whitespace-separated tokens and parenthesized lists. Note that within parenthesized lists, commas are not excluded.

drop_generator(bdb, generator_id)

Drop any backend-specific records for a generator.

Called when executing DROP GENERATOR.

drop_models(bdb, generator_id, modelnos=None)

Drop the specified model numbers of a generator.

If none are specified, drop all models.

initialize_models(bdb, generator_id, modelnos)

Initialize the specified model numbers for a generator.

logpdf_joint(bdb, generator_id, modelnos, rowid, targets, constraints)

Evalute the joint probability of targets subject to constraints.

Returns the probability density of the targets (in log domain).

rowid is an integer.

targets is a list of (colno, value) pairs.

constraints is a list of (colno, value) pairs.

modelno is a model number or None, meaning all models.

name()

Return the name of the backend as a str.

predict(bdb, generator_id, modelnos, rowid, colno, threshold, numsamples=None)

Predict a value for a column, if confidence is high enough.

predict_confidence(bdb, generator_id, modelnos, rowid, colno, numsamples=None)

Predict a value for a column and return confidence.

predictive_relevance(bdb, generator_id, modelnos, rowid_target, rowid_query, hypotheticals, colno)

Compute predictive relevance, also known as relevance probability.

rowid_target is an integer.

rowid_query is a list of integers.

hypotheticals is a list of hypothetical observations, where each item
is itself a list of (colno, value) pairs.
register(bdb)

Install any state needed for the backend in bdb.

Called by bayeslite.bayesdb_register_backend().

Normally this will create SQL tables if necessary.

rename_column(bdb, generator_id, oldname, newname)

Note that a table column has been renamed.

Not currently used. To be used in the future when executing:

ALTER TABLE <table> RENAME COLUMN <oldname> TO <newname>
row_similarity(bdb, generator_id, modelnos, rowid, target_rowid, colnos)

Compute SIMILARITY TO <target_row> for given rowid.

set_multiprocess(switch)

Switch between multiprocessing and single processing.

The boolean variable switch toggles between single (False) and multi (True) processing, if the choice is available, and otherwise ignores the request.

simulate_joint(bdb, generator_id, modelnos, rowid, targets, constraints, num_samples=1, accuracy=None)

Simulate targets from a generator, subject to constraints.

Returns a list of lists of values for the specified targets.

rowid is an integer.

modelno may be None, meaning “all models”

targets is a list of (colno).

constraints is a list of (colno, value) pairs.

num_samples is the number of results to return.

accuracy is a generic parameter (usually int) which specifies the desired accuracy, compute time, etc if the simulations are approximately distributed from the true target.

The results are samples from the distribution on targets, independent conditioned on (the latent state of the backend and) the constraints.

class bayeslite.IBayesDBTracer

BayesDB articulated tracing interface.

If you just want to trace start of queries, pass a function to trace() or sql_trace(). If you want finer-grained event tracing, pass an instance of this interface.

A successful execution of a BayesDB query goes through the following stages:

  1. Not started
  2. Preparing cursor
  3. Result available
  4. All results consumed

Preparing the cursor and consuming results may both be fast or slow, and may succeed or fail, depending on the query. Also, the client may abandon some queries without consuming all the results.

Thus, a query may experience the following transitions:

  • start: 0 –> 1
  • ready: 1 –> 2
  • error: 1 –> 0 or 2 –> 0
  • finished: 2 –> 3
  • abandoned: 2 –> 0 or 3 –> 0

To receive notifications of any of those events for BQL or SQL queries, override the corresponding method(s) of this interface, and install the tracer object using trace() or sql_trace() respectively.

Note 1: The client may run multiple cursors at the same time, so queries in the “Result available” state may overlap.

Note 2: Abandonment of a query is detected when the cursor object is garbage collected, so the timing cannot be relied upon.

abandoned(qid)

Called when a query is abandoned.

This is detected when the cursor object is garbage collected, so its timing cannot be relied upon.

error(qid, e)

Called when query preparation or result consumption fails.

The arguments are the query id and the exception object.

finished(qid)

Called when all query results are consumed.

ready(qid, cursor)

Called when a query is ready for consumption of results.

The arguments are the query id and the BayesDB cursor.

Note for garbage collector wonks: the passed cursor is the one wrapped in the TracingCursor, not the TracingCursor instance itself, so a tracer retaining a reference to cursor will not create a reference cycle or prevent the abandoned() method from being called.

start(qid, query, bindings)

Called when a query is started.

The arguments are a unique query id, the query string, and the tuple or dictionary of bindings.