Main bayeslite API.
The focus of the bayeslite API is the BayesDB, a handle for a database. To obtain a BayesDB handle, use bayesdb_open():
import bayeslite
bdb = bayeslite.bayesdb_open(pathname='foo.bdb')
When done, close it with the close() method:
bdb.close()
BayesDB handles also serve as context managers, so you can do:
with bayeslite.bayesdb_open(pathname='foo.bdb') as bdb:
bdb.execute('SELECT 42')
...
You can query the probable (according to the analyses stored in the database) implications of the data by passing BQL queries to the execute() method:
bql = 'ESTIMATE DEPENDENCE PROBABILITY FROM PAIRWISE COLUMNS OF foo'
for x in bdb.execute(bql):
print x
You can also execute normal SQL on a BayesDB handle bdb with the sql_execute() method:
bdb.sql_execute('CREATE TABLE t(x INT, y TEXT, z REAL)')
bdb.sql_execute("INSERT INTO t VALUES(1, 'xyz', 42.5)")
bdb.sql_execute("INSERT INTO t VALUES(1, 'pqr', 83.7)")
bdb.sql_execute("INSERT INTO t VALUES(2, 'xyz', 1000)")
(BQL does not yet support CREATE TABLE and INSERT directly, so you must use sql_execute() for those.)
Errors in interpreting or executing BQL on a particular database.
Errors in parsing BQL.
As many parse errors as can be reasonably detected are listed together.
Variables: | errors (list) – list of strings describing parse errors |
---|
A handle for a Bayesian database in memory or on disk.
Do not create BayesDB instances directly; use bayesdb_open() instead.
An instance of BayesDB is a context manager that returns itself on entry and closes itself on exit, so you can write:
with bayesdb_open(pathname='foo.bdb') as bdb:
...
Return the number of changes of the last INSERT, DELETE, or UPDATE.
This may return unexpected results after a statement that is not an INSERT, DELETE, or UPDATE.
Close the database. Further use is not allowed.
Execute a BQL query and return a cursor for its results.
The argument string is a string parsed into a single BQL query. It must contain exactly one BQL phrase, optionally terminated by a semicolon.
The argument bindings is a sequence or dictionary of bindings for parameters in the query, or None to supply no bindings.
Return the rowid of the row most recently inserted.
A Numpy RandomState object local to this BayesDB instance.
This pseudorandom number generator is deterministically initialized from the seed supplied to bayesdb_open(). Use it to conserve reproducibility of results.
A random.Random object local to this BayesDB instance.
This pseudorandom number generator is deterministically initialized from the seed supplied to bayesdb_open(). Use it to conserve reproducibility of results.
Reconnecting may sometimes be necessary, e.g. before a DROP TABLE
Savepoint context. On return, commit; on exception, roll back.
The effects of a savepoint happen durably all at once if committed, or not at all if rolled back.
Savepoints may be nested. Parsed metadata and models are cached in Python during a savepoint.
Example:
with bdb.savepoint():
bdb.execute('DROP GENERATOR foo')
try:
with bdb.savepoint():
bdb.execute('ALTER TABLE foo RENAME TO bar')
raise NeverMind
except NeverMind:
# No changes will be recorded.
pass
bdb.execute('CREATE GENERATOR foo ...')
# foo will have been dropped and re-created.
Auto-rollback savepoint context. Roll back on return or exception.
This may be used to compute hypotheticals – the bdb is guaranteed to remain unmodified afterward.
Execute a SQL query on the underlying SQLite database.
The argument string is a string parsed into a single SQL query. It must contain exactly one SQL phrase, optionally terminated by a semicolon.
The argument bindings is a sequence or dictionary of bindings for parameters in the query, or None to supply no bindings.
Trace execution of SQL queries.
For simple tracing, pass a function or arbitrary Python callable as the tracer. It will be called at the start of execution of each SQL query, with two arguments: the query to be executed, as a string; and the sequence or dictionary of bindings.
For articulated tracing, pass an instance of IBayesDBTracer, whose methods will be called in the pattern described in its documentation.
Only one tracer can be established at a time. To remove it, use sql_untrace().
Stop tracing execution of SQL queries.
tracer must have been previously established with sql_trace().
Any queries currently in progress will continue to be traced until completion.
Trace execution of BQL queries.
For simple tracing, pass a function or arbitrary Python callable as the tracer. It will be called at the start of execution of each BQL query, with two arguments: the query to be executed, as a string; and the sequence or dictionary of bindings.
For articulated tracing, pass an instance of IBayesDBTracer, whose methods will be called in the pattern described in its documentation.
Only one tracer can be established at a time. To remove it, use untrace().
Transaction context. On return, commit; on exception, roll back.
Transactions may not be nested: use a savepoint if you need nesting. Parsed metadata and models are cached in Python during a savepoint.
Exceptions associated with a BayesDB instance.
Variables: | bayesdb (bayeslite.BayesDB) – associated BayesDB instance |
---|
Transaction errors in a BayesDB.
Deregister metamodel, which must have been registered in bdb.
Load a codebook for table from the CSV file at pathname.
Open the BayesDB in the file at pathname.
If there is no file at pathname, it is automatically created. If pathname is unspecified or None, a temporary in-memory BayesDB instance is created.
seed is a 32-byte string specifying a pseudorandom number generation seed. If not specified, it defaults to all zeros.
If compatible is None or False and the database already exists, bayesdb_open may have the effect of incompatibly changing the format of the database so that older versions of bayeslite cannot read it. If compatible is True, bayesdb_open will not incompatibly change the format of the database (but some newer bayesdb features may not work).
Read CSV data from a line iterator into a table.
Parameters: |
|
---|
Read CSV data from a file into a table.
Parameters: |
|
---|
Register metamodel in bdb, creating any necessary tables.
metamodel must not already be registered in any BayesDB, nor any metamodel by the same name.
Upgrade the BayesDB internal database schema.
If version is None, upgrade to the latest database format version supported by bayeslite. Otherwise, it may be a schema version number.
Quote name as a BQL identifier, e.g. a table or column name.
Do NOT use this for strings, e.g. inserting data into a table. Use query parameters instead.
BayesDB metamodel interface.
Subclasses of IBayesDBMetamodel implement the functionality needed by probabilistic BQL queries to sample from and inquire about the posterior distribution of a generative model conditioned on data in a table. Instances of subclasses of IBayesDBMetamodel contain any in-memory state associated with the metamodel in the database.
Add colno from the population as a variable in the metamodel.
Used by the MML:
ALTER POPULATION <population> ADD VARIABLE <variable> <stattype>
Analyze the specified model numbers of a generator.
If none are specified, analyze all of them.
Parameters: |
|
---|
Compute DEPENDENCE PROBABILITY OF <col0> WITH <col1>.
Compute MUTUAL INFORMATION OF (<cols0>) WITH (<cols1>).
Create a generator for a table with the given schema.
Called when executing CREATE GENERATOR.
Must parse schema to build the generator.
The generator id and column numbers may be used to create metamodel-specific records in the database for the generator with foreign keys referring to the bayesdb_generator and bayesdb_generator_column tables.
schema is a list of schema items corresponding to the comma-separated ‘columns’ from a BQL CREATE GENERATOR command. Each schema item is a list of strings or lists of schema items, corresponding to whitespace-separated tokens and parenthesized lists. Note that within parenthesized lists, commas are not excluded.
Drop any metamodel-specific records for a generator.
Called when executing DROP GENERATOR.
Drop the specified model numbers of a generator.
If none are specified, drop all models.
Initialize the specified model numbers for a generator.
Evalute the joint probability of targets subject to constraints.
Returns the probability density of the targets (in log domain).
rowid is an integer.
targets is a list of (colno, value) pairs.
constraints is a list of (colno, value) pairs.
modelno is a model number or None, meaning all models.
Return the name of the metamodel as a str.
Predict a value for a column, if confidence is high enough.
Predict a value for a column and return confidence.
Compute predictive relevance, also known as relevance probability.
rowid_target is an integer.
rowid_query is a list of integers.
Install any state needed for the metamodel in bdb.
Called by bayeslite.bayesdb_register_metamodel().
Normally this will create SQL tables if necessary.
Note that a table column has been renamed.
Not currently used. To be used in the future when executing:
ALTER TABLE <table> RENAME COLUMN <oldname> TO <newname>
Compute SIMILARITY TO <target_row> for given rowid.
Switch between multiprocessing and single processing.
The boolean variable switch toggles between single (False) and multi (True) processing, if the choice is available, and otherwise ignores the request.
Simulate targets from a generator, subject to constraints.
Returns a list of lists of values for the specified targets.
rowid is an integer.
modelno may be None, meaning “all models”
targets is a list of (colno).
constraints is a list of (colno, value) pairs.
num_samples is the number of results to return.
accuracy is a generic parameter (usually int) which specifies the desired accuracy, compute time, etc if the simulations are approximately distributed from the true target.
The results are samples from the distribution on targets, independent conditioned on (the latent state of the metamodel and) the constraints.
BayesDB articulated tracing interface.
If you just want to trace start of queries, pass a function to trace() or sql_trace(). If you want finer-grained event tracing, pass an instance of this interface.
A successful execution of a BayesDB query goes through the following stages:
Preparing the cursor and consuming results may both be fast or slow, and may succeed or fail, depending on the query. Also, the client may abandon some queries without consuming all the results.
Thus, a query may experience the following transitions:
To receive notifications of any of those events for BQL or SQL queries, override the corresponding method(s) of this interface, and install the tracer object using trace() or sql_trace() respectively.
Note 1: The client may run multiple cursors at the same time, so queries in the “Result available” state may overlap.
Note 2: Abandonment of a query is detected when the cursor object is garbage collected, so the timing cannot be relied upon.
Called when a query is abandoned.
This is detected when the cursor object is garbage collected, so its timing cannot be relied upon.
Called when query preparation or result consumption fails.
The arguments are the query id and the exception object.
Called when all query results are consumed.
Called when a query is ready for consumption of results.
The arguments are the query id and the BayesDB cursor.
Note for garbage collector wonks: the passed cursor is the one wrapped in the TracingCursor, not the TracingCursor instance itself, so a tracer retaining a reference to cursor will not create a reference cycle or prevent the abandoned() method from being called.
Called when a query is started.
The arguments are a unique query id, the query string, and the tuple or dictionary of bindings.