Bayeslite API internals

bayeslite.compiler: BQL-to-SQL query compiler

BQL->SQL compiler.

To compile a parsed BQL query:

  1. Determine the number, names, and values of the parameters.
  2. Create an output accumulator, Output.
  3. Pass the query and accumulator to compile_query.
  4. Use Output.getvalue() to get the compiled SQL text.
  5. Use Output.getbindings() to get bindings for parameters that were actually used in the query.
  6. Use bayesdb_wind() or similar to bracket the execution of the SQL query with wind/unwind commands.
class bayeslite.compiler.Output(n_numpar, nampar_map, bindings)

Compiled SQL output accumulator.

Like a write-only StringIO.StringIO(), but also does bookkeeping for parameters and subqueries.

getbindings()

Return a selection of bindings fit for the accumulated output.

If there were subqueries, or if this is accumulating output for a subquery, this may not use all bindings.

getvalue()

Return the accumulated output.

subquery()

Return an output accumulator for a subquery.

write(text)

Accumulate text in the output of getvalue().

write_nampar(name, n)

Accumulate a reference to the parameter name numbered n.

write_numpar(n)

Accumulate a reference to the parameter numbered n.

bayeslite.compiler.bayesdb_wind(*args, **kwds)

Perform queries winders before and unwinders after.

Each of winders and unwinders is a list of (<sql>, <bindings>) tuples.

bayeslite.compiler.compile_query(bdb, query, out)

Compile query, writing output to output.

Parameters:
  • bdb – database in which to interpret query
  • query – abstract syntax tree of a query
  • out (Output) – output accumulator

bayeslite.bql: BQL query and command execution

BQL execution.

This module implements the main dispatcher for executing different kinds of BQL phrases. Queries, as in SELECT, ESTIMATE, and so on, are compiled into SQL; commands, as in CREATE TABLE, INSERT, and the rest of the DDL/DML (Data Definition/Modelling language) are executed directly.

class bayeslite.bql.BayesDBCursor(bdb, cursor)

Cursor for a BQL or SQL query from a BayesDB.

class bayeslite.bql.WoundCursor(bdb, cursor, unwinders)
bayeslite.bql.execute_phrase(bdb, phrase, bindings=())

Execute the BQL AST phrase phrase and return a cursor of results.

bayeslite.core: BayesDB object model

Miscellaneous utilities for managing BayesDB entities.

Tables, generators, and columns are named with strs. Only US-ASCII is allowed, no Unicode.

Each table has a nonempty sequence of named columns. As in sqlite3, tables may be renamed and do not necessarily have numeric ids, so there is no way to have a handle on a table that is persistent outside a savepoint.

Each table may optionally be modeled by any number of generators, representing a parametrized generative model for the table’s data, according to a named generator.

Each generator models a subset of the columns in its table, which are called the modeled columns of that generator. Each column in a generator has an associated statistical type. Like tables, generators may be renamed. Unlike tables, each generator has a numeric id, which is never reused and therefore persistent across savepoints.

Each generator may have any number of different models, each representing a particular choice of parameters for the parametrized generative model. Models are numbered consecutively for the generator, and may be identified uniquely by (generator_id, modelno) or (generator_name, modelno).

bayeslite.core.bayesdb_add_latent(bdb, population_id, generator_id, var, stattype)

Add a generator’s latent variable to a population.

NOTE: To be used ONLY by a backend’s create_generator method when establishing any latent variables of that generator.

bayeslite.core.bayesdb_add_variable(bdb, population_id, name, stattype)

Adds a variable to the population, with colno from the base table.

bayeslite.core.bayesdb_generator_backend(bdb, generator_id)

Return the backend of the generator with given generator_id.

bayeslite.core.bayesdb_generator_has_model(bdb, generator_id, modelno)

True if generator_id has a model numbered modelno.

bayeslite.core.bayesdb_generator_is_implicit(bdb, generator_id)

True if the generator with given generator_id is implicit.

bayeslite.core.bayesdb_generator_modelnos(bdb, generator_id)

Return list of model numbers associated with given generator_id.

bayeslite.core.bayesdb_generator_name(bdb, generator_id)

Return the name of the generator with given generator_id.

bayeslite.core.bayesdb_generator_population(bdb, generator_id)

Return id of population of the generator with given generator_id.

bayeslite.core.bayesdb_generator_table(bdb, generator_id)

Return name of table of the generator with given generator_id.

bayeslite.core.bayesdb_get_generator(bdb, population_id, name)

Return the id of the generator named name in bdb.

The generator id is persistent across savepoints: ids are 64-bit integers that increase monotonically and are never reused.

bdb must have a generator named name. If you’re not sure, call bayesdb_has_generator() first.

bayeslite.core.bayesdb_get_population(bdb, name)

Return the id of the population named name in bdb.

The id is persistent across savepoints: ids are 64-bit integers that increase monotonically and are never reused.

bdb must have a population named name. If you’re not sure, call bayesdb_has_population() first.

bayeslite.core.bayesdb_has_generator(bdb, population_id, name)

True if there is a generator named name in bdb.

If population_id is specified, then the generator with name needs to be defined for that population. Otherwise, when population_id is None, the name may be of any generator.

bayeslite.core.bayesdb_has_latent(bdb, population_id, var)

True if the population has a latent variable by the given name.

bayeslite.core.bayesdb_has_population(bdb, name)

True if there is a population named name in bdb.

bayeslite.core.bayesdb_has_stattype(bdb, stattype)

True if stattype is registered in bdb instance.

bayeslite.core.bayesdb_has_table(bdb, name)

True if there is a table named name in bdb.

The table need not be modeled.

bayeslite.core.bayesdb_has_variable(bdb, population_id, generator_id, name)

True if the population has a given variable.

generator_id is None for manifest variables and the id of a generator for variables that may be latent.

bayeslite.core.bayesdb_population_cell_value(bdb, population_id, rowid, colno)

Return value stored in rowid and colno of given population_id.

bayeslite.core.bayesdb_population_fresh_row_id(bdb, population_id)

Return one plus maximum rowid in base table of given population_id.

bayeslite.core.bayesdb_population_generators(bdb, population_id)

Return list of generators for population_id.

bayeslite.core.bayesdb_population_has_implicit_generator(bdb, population_id)

True if population_id has an implicit generator.

bayeslite.core.bayesdb_population_is_implicit(bdb, population_id)

True if the population with id id is implicit.

bayeslite.core.bayesdb_population_name(bdb, population_id)

Return the name of the population with given population_id.

bayeslite.core.bayesdb_population_row_values(bdb, population_id, rowid)

Return values stored in rowid of given population_id.

bayeslite.core.bayesdb_population_table(bdb, population_id)

Return the name of table of the population with id id.

bayeslite.core.bayesdb_rowid_tokens(bdb)

Return list of built-in tokens that identify rowids (e.g. oid).

bayeslite.core.bayesdb_table_column_name(bdb, table, colno)

Return the name of the column numbered colno in table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_column_names(bdb, table)

Return a list of names of columns in the table named table.

The results strs and are ordered by column number.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_column_number(bdb, table, name)

Return the number of column named name in table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_guarantee_columns(bdb, table)

Make sure bayesdb_column is populated with columns of table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

bayeslite.core.bayesdb_table_has_column(bdb, table, name)

True if the table named table has a column named name.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_has_implicit_population(bdb, table)

True if the table named table has an implicit population.

bdb must have a table named table. If you are not sure, call bayesdb_has_table() first.

bayeslite.core.bayesdb_table_has_rowid(bdb, table, rowid)

True if the table named table has record with given rowid.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

bayeslite.core.bayesdb_table_populations(bdb, table)

Return list of populations for table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

bayeslite.core.bayesdb_variable_name(bdb, population_id, generator_id, colno)

Return the name a population variable.

bayeslite.core.bayesdb_variable_names(bdb, population_id, generator_id)

Return a list of the names of columns modeled in population_id.

bayeslite.core.bayesdb_variable_number(bdb, population_id, generator_id, name)

Return the column number of a population variable.

bayeslite.core.bayesdb_variable_numbers(bdb, population_id, generator_id)

Return a list of the numbers of columns modeled in population_id.

bayeslite.core.bayesdb_variable_stattype(bdb, population_id, generator_id, colno)

Return the statistical type of a population variable.

bayeslite.parse: BQL parser

BQL parser front end.

bayeslite.parse.bql_string_complete_p(string)

True if string has at least one complete BQL phrase or error.

False if empty or if the last BQL phrase is incomplete.

bayeslite.parse.parse_bql_string(string)

Yield each parsed BQL phrase AST in string.

bayeslite.parse.parse_bql_string_pos(string)

Yield (phrase, pos) for each BQL phrase in string.

phrase is the parsed AST. pos is zero-based index of the code point at which phrase starts.

bayeslite.parse.parse_bql_string_pos_1(string)

Return (phrase, pos) for the first BQL phrase in string.

May not report parse errors afterward.

bayeslite.sqlite3_util: SQLite 3 utilities

SQLite3 utilities.

bayeslite.sqlite3_util.sqlite3_column_affinity(column_type)

Return the sqlite3 column affinity corresponding to a type string.

bayeslite.sqlite3_util.sqlite3_connection(*args, **kwds)

SQLite3 connection context manager. On exit, runs close.

bayeslite.sqlite3_util.sqlite3_exec_1(db, query, *args)

Execute a query returning a 1x1 table, and return its one value.

Do not call this if you cannot guarantee the result is a 1x1 table. Beware passing user-controlled input in here.

bayeslite.sqlite3_util.sqlite3_quote_name(name)

Quote name as a SQL identifier, e.g. a table or column name.

Do NOT use this for strings, e.g. inserting data into a table. Use query parameters instead.

bayeslite.sqlite3_util.sqlite3_savepoint(*args, **kwds)

Savepoint context manager. On return, commit; on exception, rollback.

Savepoints are like transactions, but they may be nested in transactions or in other savepoints.

bayeslite.sqlite3_util.sqlite3_savepoint_rollback(*args, **kwds)

Savepoint context manager that always rolls back.

bayeslite.sqlite3_util.sqlite3_transaction(*args, **kwds)

Transaction context manager. On return, commit; on exception, rollback.

Transactions may not be nested. Use savepoints if you want a nestable analogue to transactions.

bayeslite.stats: Statistics utilities

Miscellaneous statistics utilities.

bayeslite.stats.arithmetic_mean(array)

Arithmetic mean of elements of array.

bayeslite.stats.chi2_contingency(contingency)

Pearson chi^2 test of independence statistic on contingency table.

https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Test_of_independence

bayeslite.stats.chi2_sf(x, df)

Survival function for chi^2 distribution.

bayeslite.stats.f_oneway(groups)

F-test statistic for one-way analysis of variance (ANOVA).

https://en.wikipedia.org/wiki/F-test#Multiple-comparison_ANOVA_problems

groups[i][j] is jth observation in ith group.

bayeslite.stats.f_sf(x, df_num, df_den)

Approximate survival function for the F distribution.

f_sf(x, df_num, df_den) = P(F_{df_num, df_den} > x)

bayeslite.stats.gauss_suff_stats(data)

Summarize an array of data as (count, mean, standard deviation).

The algorithm is the “Online algorithm” presented in Knuth Volume 2, 3rd ed, p. 232, originally credited to “Note on a Method for Calculating Corrected Sums of Squares and Products” B. P. Welford Technometrics Vol. 4, No. 3 (Aug., 1962), pp. 419-420. This has the advantage over naively accumulating the sum and sum of squares that it is less subject to precision loss through massive cancellation.

This version collected 8/31/15 from https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

bayeslite.stats.pearsonr(a0, a1)

Pearson r, product-moment correlation coefficient, of two samples.

Covariance divided by product of standard deviations.

https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#For_a_sample

bayeslite.stats.signum(x)

Sign of x: -1 if x<0, 0 if x=0, +1 if x>0.

bayeslite.stats.t_cdf(x, df)

Approximate CDF for Student’s t distribution.

t_cdf(x, df) = P(T_df < x)

bayeslite.math_util: Math utilities

Miscellaneous special math functions and analysis utilities.

This is not a general-purpose nor highly optimized math library: it is limited to the purposes of bayeslite for modest data sets, written for clarity and maintainability over speed.

bayeslite.math_util.abs_summation(sequence)

Approximate summation of a nonnegative convergent sequence.

The sequence is assumed to converge to zero quickly without oscillating – it is truncated at the first term whose magnitude relative to the partial sum is bounded by the machine epsilon.

bayeslite.math_util.abserr(expected, actual)

Relative error between expected and actual: abs((a - e)).

bayeslite.math_util.continuants(contfrac)

Continuants of a continued fraction.

contfrac must yield an infinite sequence (n0, d0), (n1, d1), (n2, d2), …, representing the continued fraction:

        n0
------------------
          n1
d0 + -------------
             n2
     d1 + --------
          d2 + ...

The kth continuant is the numerator and denominator of the continued fraction truncated at the kth term, i.e. with zero instead of the rest in the ellipsis.

If the numerator or denominator grows large in magnitude, both are multiplied by the machine epsilon, leaving their quotient unchanged.

bayeslite.math_util.convergents(contfrac)

Convergents of a continued fraction.

contfrac must yield an infinite sequence (n0, d0), (n1, d1), (n2, d2), …, representing the continued fraction:

        n0
------------------
          n1
d0 + -------------
             n2
     d1 + --------
          d2 + ...

The kth convergent is the continued fraction truncated at the kth term, i.e. with zero instead of the rest in the ellipsis.

bayeslite.math_util.gamma_above(a, x)

Normalized upper incomplete gamma integral.

Equal to:

(1/\Gamma(a)) \int_x^\infty e^{-t} t^{a - 1} dt.

gamma_above is the complement of gamma_below:

gamma_above(a, x) = 1 - gamma_below(a, x).

As x goes to zero, gamma_above(a, x) converges to 1.

For x > max(1, a), this is computed by the continued fraction[1]:

           1
-----------------------,
          1 - s
x + -------------------
               1
    1 + ---------------
               2 - s
        x + -----------
                   2
            1 + -------
                x + ...

which is then multiplied by x^a e^{-x} / \Gamma(a).

For x <= max(1, a), this is computed by 1 - gamma_below(a, x).

[1] Abramowitz & Stegun, p. 263, 6.5.31

bayeslite.math_util.gamma_below(a, x)

Normalized lower incomplete gamma integral.

Equal to:

(1/\Gamma(a)) \int_0^x e^{-t} t^{a - 1} dt.

gamma_below is the complement of gamma_above:

gamma_below(a, x) = 1 - gamma_above(a, x).

As x grows, gamma_below(a, x) converges to 1.

For x <= max(1, a), this is computed by the power series[1]:

 x^a e^-x   /           x^2            x^3           \ 
----------- | 1 + x + ------- + -------------- + ... |.
a \Gamma(a) \         (a + 1)   (a + 1)(a + 2)       /

For x > max(1, a), this is computed by 1 - gamma_above(a, x).

[1] NIST Digital Library of Mathematical Functions, Release 1.0.9 of 2014-08-29, Eq. 8.7.1 <http://dlmf.nist.gov/8.7.E1>.

In the NIST DLMF notation, gamma_below(a, x) is P(a, x), related to \gamma^* by \gamma^*(a, x) = x^{-a} P(a, x).

To derive the power series, multiply (8.7.1) by x^a, expand \Gamma(a + k + 1) into \Gamma(a) a(a + 1)...(a + k), and factor a \Gamma(a) out of the sum.

bayeslite.math_util.limit(sequence)

Approximate limit of a convergent sequence.

The sequence is assumed to converge quickly without oscillating – it is truncated at the first term whose relative error from the previous term is bounded by the machine epsilon.

bayeslite.math_util.partial_sums(sequence)

Sequence of partial sums of a sequence.

bayeslite.math_util.relerr(expected, actual)

Relative error between expected and actual: abs((a - e)/e).

bayeslite.util: Miscellaneous utilities

Miscellaneous utilities.

bayeslite.util.float_sum(iterable)

Return the sum of elements of iterable in floating-point.

This implementation uses Kahan-Babuška summation.

bayeslite.util.json_dumps(obj)

Return a JSON string of obj, compactly and deterministically.

bayeslite.util.unique(array)

Return a sorted array of the unique elements in array.

No element may be a floating-point NaN. If your data set includes NaNs, omit them before passing them here.

bayeslite.util.unique_indices(array)

Return an array of the indices of the unique elements in array.

No element may be a floating-point NaN. If your data set includes NaNs, omit them before passing them here.