Table Of Contents

Previous topic

BQL: Bayesian Query Language

Next topic

[Experimental] Bayeslite Analysis Reference

This Page

Bayeslite API internals

bayeslite.compiler: BQL-to-SQL query compiler

BQL->SQL compiler.

To compile a parsed BQL query:

  1. Determine the number, names, and values of the parameters.
  2. Create an output accumulator, Output.
  3. Pass the query and accumulator to compile_query.
  4. Use Output.getvalue() to get the compiled SQL text.
  5. Use Output.getbindings() to get bindings for parameters that were actually used in the query.
  6. Use bayesdb_wind() or similar to bracket the execution of the SQL query with wind/unwind commands.
class bayeslite.compiler.Output(n_numpar, nampar_map, bindings)

Compiled SQL output accumulator.

Like a write-only StringIO.StringIO(), but also does bookkeeping for parameters and subqueries.

getbindings()

Return a selection of bindings fit for the accumulated output.

If there were subqueries, or if this is accumulating output for a subquery, this may not use all bindings.

getvalue()

Return the accumulated output.

subquery()

Return an output accumulator for a subquery.

write(text)

Accumulate text in the output of getvalue().

write_nampar(name, n)

Accumulate a reference to the parameter name numbered n.

write_numpar(n)

Accumulate a reference to the parameter numbered n.

bayeslite.compiler.bayesdb_wind(*args, **kwds)

Perform queries winders before and unwinders after.

Each of winders and unwinders is a list of (<sql>, <bindings>) tuples.

bayeslite.compiler.compile_query(bdb, query, out)

Compile query, writing output to output.

Parameters:
  • bdb – database in which to interpret query
  • query – abstract syntax tree of a query
  • out (Output) – output accumulator

bayeslite.bql: BQL query and command execution

BQL execution.

This module implements the main dispatcher for executing different kinds of BQL phrases. Queries, as in SELECT, ESTIMATE, and so on, are compiled into SQL; commands, as in CREATE TABLE, INSERT, and the rest of the DDL/DML (Data Definition/Modelling language) are executed directly.

class bayeslite.bql.BayesDBCursor(bdb, cursor)

Cursor for a BQL or SQL query from a BayesDB.

bayeslite.bql.execute_phrase(bdb, phrase, bindings=())

Execute the BQL AST phrase phrase and return a cursor of results.

bayeslite.core: BayesDB object model

Miscellaneous utilities for managing BayesDB entities.

Tables, generators, and columns are named with strs. Only US-ASCII is allowed, no Unicode.

Each table has a nonempty sequence of named columns. As in sqlite3, tables may be renamed and do not necessarily have numeric ids, so there is no way to have a handle on a table that is persistent outside a savepoint.

Each table may optionally be modelled by any number of generators, representing a parametrized generative model for the table’s data, according to a named metamodel.

Each generator models a subset of the columns in its table, which are called the modelled columns of that generator. Each column in a generator has an associated statistical type. Like tables, generators may be renamed. Unlike tables, each generator has a numeric id, which is never reused and therefore persistent across savepoints.

Each generator may have any number of different models, each representing a particular choice of parameters for the parametrized generative model. Models are numbered consecutively for the generator, and may be identified uniquely by (generator_id, modelno) or (generator_name, modelno).

bayeslite.core.bayesdb_add_latent(bdb, population_id, generator_id, var, stattype)

Add a generator’s latent variable to a population.

NOTE: To be used ONLY by a metamodel’s create_generator method when establishing any latent variables of that generator.

bayeslite.core.bayesdb_add_variable(bdb, population_id, name, stattype)

Adds a variable to the population, with colno from the base table.

bayeslite.core.bayesdb_generator_column_name(bdb, generator_id, colno)

Return the name of the column numbered colno in generator_id.

bayeslite.core.bayesdb_generator_column_names(bdb, generator_id)

Return a list of names of columns modelled by generator_id.

bayeslite.core.bayesdb_generator_column_number(bdb, generator_id, column_name)

Return the number of the column column_name in generator_id.

bayeslite.core.bayesdb_generator_column_numbers(bdb, generator_id)

Return a list of the numbers of columns modelled in generator_id.

bayeslite.core.bayesdb_generator_column_stattype(bdb, generator_id, colno)

Return the statistical type of the column colno in generator_id.

bayeslite.core.bayesdb_generator_has_column(bdb, generator_id, column_name)

True if generator_id models a column named name.

bayeslite.core.bayesdb_generator_has_model(bdb, generator_id, modelno)

True if generator_id has a model numbered modelno.

bayeslite.core.bayesdb_generator_metamodel(bdb, id)

Return the metamodel of the generator with id id.

bayeslite.core.bayesdb_generator_name(bdb, id)

Return the name of the generator with id id.

bayeslite.core.bayesdb_generator_population(bdb, id)

Return the id of the population of the generator with id id.

bayeslite.core.bayesdb_generator_table(bdb, id)

Return the name of the table of the generator with id id.

bayeslite.core.bayesdb_get_generator(bdb, population_id, name)

Return the id of the generator named name in bdb.

The id is persistent across savepoints: ids are 64-bit integers that increase monotonically and are never reused.

bdb must have a generator named name. If you’re not sure, call bayesdb_has_generator() first.

bayeslite.core.bayesdb_get_population(bdb, name)

Return the id of the population named name in bdb.

The id is persistent across savepoints: ids are 64-bit integers that increase monotonically and are never reused.

bdb must have a population named name. If you’re not sure, call bayesdb_has_population() first.

bayeslite.core.bayesdb_has_generator(bdb, population_id, name)

True if there is a generator named name in bdb.

bayeslite.core.bayesdb_has_latent(bdb, population_id, var)

True if the population has a latent variable by the given name.

bayeslite.core.bayesdb_has_population(bdb, name)

True if there is a population named name in bdb.

bayeslite.core.bayesdb_has_table(bdb, name)

True if there is a table named name in bdb.

The table need not be modelled.

bayeslite.core.bayesdb_has_variable(bdb, population_id, generator_id, name)

True if the population has a given variable.

generator_id is None for manifest variables and the id of a generator for variables that may be latent.

bayeslite.core.bayesdb_population_name(bdb, id)

Return the name of the population with id id.

bayeslite.core.bayesdb_population_table(bdb, id)

Return the name of table of the population with id id.

bayeslite.core.bayesdb_table_column_name(bdb, table, colno)

Return the name of the column numbered colno in table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_column_names(bdb, table)

Return a list of names of columns in the table named table.

The results strs and are ordered by column number.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_column_number(bdb, table, name)

Return the number of column named name in table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_table_guarantee_columns(bdb, table)

Make sure bayesdb_column is populated with columns of table.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

bayeslite.core.bayesdb_table_has_column(bdb, table, name)

True if the table named table has a column named name.

bdb must have a table named table. If you’re not sure, call bayesdb_has_table() first.

WARNING: This may modify the database by populating the bayesdb_column table if it has not yet been populated.

bayeslite.core.bayesdb_variable_name(bdb, population_id, colno)

Return the name a population variable.

bayeslite.core.bayesdb_variable_names(bdb, population_id, generator_id)

Return a list of the names of columns modelled in population_id.

bayeslite.core.bayesdb_variable_number(bdb, population_id, generator_id, name)

Return the column number of a population variable.

bayeslite.core.bayesdb_variable_numbers(bdb, population_id, generator_id)

Return a list of the numbers of columns modelled in population_id.

bayeslite.core.bayesdb_variable_stattype(bdb, population_id, colno)

Return the statistical type of a population variable.

bayeslite.parse: BQL parser

BQL parser front end.

bayeslite.parse.bql_string_complete_p(string)

True if string has at least one complete BQL phrase or error.

False if empty or if the last BQL phrase is incomplete.

bayeslite.parse.parse_bql_string(string)

Yield each parsed BQL phrase AST in string.

bayeslite.parse.parse_bql_string_pos(string)

Yield (phrase, pos) for each BQL phrase in string.

phrase is the parsed AST. pos is zero-based index of the code point at which phrase starts.

bayeslite.parse.parse_bql_string_pos_1(string)

Return (phrase, pos) for the first BQL phrase in string.

May not report parse errors afterward.

bayeslite.sqlite3_util: SQLite 3 utilities

SQLite3 utilities.

bayeslite.sqlite3_util.sqlite3_column_affinity(column_type)

Return the sqlite3 column affinity corresponding to a type string.

bayeslite.sqlite3_util.sqlite3_connection(*args, **kwds)

SQLite3 connection context manager. On exit, runs close.

bayeslite.sqlite3_util.sqlite3_exec_1(db, query, *args)

Execute a query returning a 1x1 table, and return its one value.

Do not call this if you cannot guarantee the result is a 1x1 table. Beware passing user-controlled input in here.

bayeslite.sqlite3_util.sqlite3_quote_name(name)

Quote name as a SQL identifier, e.g. a table or column name.

Do NOT use this for strings, e.g. inserting data into a table. Use query parameters instead.

bayeslite.sqlite3_util.sqlite3_savepoint(*args, **kwds)

Savepoint context manager. On return, commit; on exception, rollback.

Savepoints are like transactions, but they may be nested in transactions or in other savepoints.

bayeslite.sqlite3_util.sqlite3_savepoint_rollback(*args, **kwds)

Savepoint context manager that always rolls back.

bayeslite.sqlite3_util.sqlite3_transaction(*args, **kwds)

Transaction context manager. On return, commit; on exception, rollback.

Transactions may not be nested. Use savepoints if you want a nestable analogue to transactions.

bayeslite.stats: Statistics utilities

Miscellaneous statistics utilities.

bayeslite.stats.arithmetic_mean(array)

Arithmetic mean of elements of array.

bayeslite.stats.chi2_contingency(contingency)

Pearson chi^2 test of independence statistic on contingency table.

https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Test_of_independence

bayeslite.stats.chi2_sf(x, df)

Survival function for chi^2 distribution.

bayeslite.stats.f_oneway(groups)

F-test statistic for one-way analysis of variance (ANOVA).

https://en.wikipedia.org/wiki/F-test#Multiple-comparison_ANOVA_problems

groups[i][j] is jth observation in ith group.

bayeslite.stats.f_sf(x, df_num, df_den)

Approximate survival function for the F distribution.

f_sf(x, df_num, df_den) = P(F_{df_num, df_den} > x)

bayeslite.stats.gauss_suff_stats(data)

Summarize an array of data as (count, mean, standard deviation).

The algorithm is the “Online algorithm” presented in Knuth Volume 2, 3rd ed, p. 232, originally credited to “Note on a Method for Calculating Corrected Sums of Squares and Products” B. P. Welford Technometrics Vol. 4, No. 3 (Aug., 1962), pp. 419-420. This has the advantage over naively accumulating the sum and sum of squares that it is less subject to precision loss through massive cancellation.

This version collected 8/31/15 from https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

bayeslite.stats.pearsonr(a0, a1)

Pearson r, product-moment correlation coefficient, of two samples.

Covariance divided by product of standard deviations.

https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#For_a_sample

bayeslite.stats.signum(x)

Sign of x: -1 if x<0, 0 if x=0, +1 if x>0.

bayeslite.stats.t_cdf(x, df)

Approximate CDF for Student’s t distribution.

t_cdf(x, df) = P(T_df < x)

bayeslite.math_util: Math utilities

Miscellaneous special math functions and analysis utilities.

This is not a general-purpose nor highly optimized math library: it is limited to the purposes of bayeslite for modest data sets, written for clarity and maintainability over speed.

bayeslite.math_util.abs_summation(sequence)

Approximate summation of a nonnegative convergent sequence.

The sequence is assumed to converge to zero quickly without oscillating – it is truncated at the first term whose magnitude relative to the partial sum is bounded by the machine epsilon.

bayeslite.math_util.continuants(contfrac)

Continuants of a continued fraction.

contfrac must yield an infinite sequence (n0, d0), (n1, d1), (n2, d2), ..., representing the continued fraction:

        n0
------------------
          n1
d0 + -------------
             n2
     d1 + --------
          d2 + ...

The kth continuant is the numerator and denominator of the continued fraction truncated at the kth term, i.e. with zero instead of the rest in the ellipsis.

If the numerator or denominator grows large in magnitude, both are multiplied by the machine epsilon, leaving their quotient unchanged.

bayeslite.math_util.convergents(contfrac)

Convergents of a continued fraction.

contfrac must yield an infinite sequence (n0, d0), (n1, d1), (n2, d2), ..., representing the continued fraction:

        n0
------------------
          n1
d0 + -------------
             n2
     d1 + --------
          d2 + ...

The kth convergent is the continued fraction truncated at the kth term, i.e. with zero instead of the rest in the ellipsis.

bayeslite.math_util.gamma_above(a, x)

Normalized upper incomplete gamma integral.

Equal to:

(1/\Gamma(a)) \int_x^\infty e^{-t} t^{a - 1} dt.

gamma_above is the complement of gamma_below:

gamma_above(a, x) = 1 - gamma_below(a, x).

As x goes to zero, gamma_above(a, x) converges to 1.

For x > max(1, a), this is computed by the continued fraction[1]:

           1
-----------------------,
          1 - s
x + -------------------
               1
    1 + ---------------
               2 - s
        x + -----------
                   2
            1 + -------
                x + ...

which is then multiplied by x^a e^{-x} / \Gamma(a).

For x <= max(1, a), this is computed by 1 - gamma_below(a, x).

[1] Abramowitz & Stegun, p. 263, 6.5.31

bayeslite.math_util.gamma_below(a, x)

Normalized lower incomplete gamma integral.

Equal to:

(1/\Gamma(a)) \int_0^x e^{-t} t^{a - 1} dt.

gamma_below is the complement of gamma_above:

gamma_below(a, x) = 1 - gamma_above(a, x).

As x grows, gamma_below(a, x) converges to 1.

For x <= max(1, a), this is computed by the power series[1]:

 x^a e^-x   /           x^2            x^3           \ 
----------- | 1 + x + ------- + -------------- + ... |.
a \Gamma(a) \         (a + 1)   (a + 1)(a + 2)       /

For x > max(1, a), this is computed by 1 - gamma_above(a, x).

[1] NIST Digital Library of Mathematical Functions, Release 1.0.9 of 2014-08-29, Eq. 8.7.1 <http://dlmf.nist.gov/8.7.E1>.

In the NIST DLMF notation, gamma_below(a, x) is P(a, x), related to \gamma^* by \gamma^*(a, x) = x^{-a} P(a, x).

To derive the power series, multiply (8.7.1) by x^a, expand \Gamma(a + k + 1) into \Gamma(a) a(a + 1)...(a + k), and factor a \Gamma(a) out of the sum.

bayeslite.math_util.limit(sequence)

Approximate limit of a convergent sequence.

The sequence is assumed to converge quickly without oscillating – it is truncated at the first term whose relative error from the previous term is bounded by the machine epsilon.

bayeslite.math_util.partial_sums(sequence)

Sequence of partial sums of a sequence.

bayeslite.math_util.relerr(expected, actual)

Relative error between expected and actual: abs((a - e)/e).

bayeslite.util: Miscellaneous utilities

Miscellaneous utilities.

bayeslite.util.float_sum(iterable)

Return the sum of elements of iterable in floating-point.

This implementation uses Kahan-Babuška summation.

bayeslite.util.json_dumps(obj)

Return a JSON string of obj, compactly and deterministically.

bayeslite.util.unique(array)

Return a sorted array of the unique elements in array.

No element may be a floating-point NaN. If your data set includes NaNs, omit them before passing them here.

bayeslite.util.unique_indices(array)

Return an array of the indices of the unique elements in array.

No element may be a floating-point NaN. If your data set includes NaNs, omit them before passing them here.