Inference Syntax Reference (VenChurch)

Introduction

The Venture inference language is the language in which the toplevel Venture program, and particularly arguments to the infer instruction, are written. It is actually the same language as the Venture modeling language, except with a few additional predefined procedures and special forms.

Venture inference programs are effectively evaluated in a context where the underlying model trace is available as a reified object, on which built-in inference procedures can operate. [1]

Scopes, Blocks, and the Local Posterior

Venture defines the notion of inference scope to allow the programmer to control the parts of their model on which to apply various inference procedures. The idea is that a scope is some collection of related random choices (for example, the states of a hidden Markov model could be one scope, and the hyperparameters could be another); and each scope is further subdivided into block s, which are choices that ought to be reproposed together (the name is meant to evoke the idea of block proposals).

Any given random choice in an execution history can exist in an arbitrary number of scopes; but for each scope it is in it must be in a unique block. As such, a scope-block pair denotes a set of random choices.

Any set of random choices defines a local posterior, which is the posterior on those choices, conditioned on keeping the rest of the execution history fixed. Every inference method accepts a scope id and a block id as its first two arguments, and operates only on those random choices, with respect to that local posterior.

Built-in Procedures for Inference

All the procedures available in the modeling language can be used in the inference language, too. In addition, the following inference procedures are available.

mh(scope : object, block : object, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a Metropolis-Hastings kernel, proposing by resimulating the prior.

The transitions argument specifies how many transitions of the chain to run.

func_mh(scope : object, block : object, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like mh, but functional.

To wit, represent the proposal with a new trace (sharing common structure) instead of modifying the existing particle in place.

Up to log factors, there is no asymptotic difference between this and mh, but the distinction is exposed for those who know what they are doing.

gibbs(scope : object, block : object, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a Gibbs sampler that computes the local posterior by enumeration.

All the random choices identified by the scope-block pair must be discrete.

The transitions argument specifies how many transitions of the chain to run.

The in-parallel argument, if supplied, toggles parallel evaluation of the local posterior. Parallel evaluation is only available in the Puma backend, and is on by default.

emap(scope : object, block : object, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Deterministically move to the local posterior maximum (computed by enumeration).

All the random choices identified by the scope-block pair must be discrete.

The transitions argument specifies how many times to do this. Specifying more than one transition is redundant unless the block is one.

The in-parallel argument, if supplied, toggles parallel evaluation of the local posterior. Parallel evaluation is only available in the Puma backend, and is on by default.

func_pgibbs(scope : object, block : object, particles : int, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Move to a sample of the local posterior computed by particle Gibbs.

The block must indicate a sequential grouping of the random choices in the scope. This can be done by supplying the keyword ordered as the block, or the value of calling ordered_range.

The particles argument specifies how many particles to use in the particle Gibbs filter.

The transitions argument specifies how many times to do this.

The in-parallel argument, if supplied, toggles per-particle parallelism. Parallel evaluation is only available in the Puma backend, and is on by default.

pgibbs(scope : object, block : object, particles : int, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like func_pgibbs but reuse a single trace instead of having several.

The performance is asymptotically worse in the sequence length, but does not rely on stochastic procedures being able to functionally clone their auxiliary state.

The only reason to use this is if you know you want to.

func_pmap(scope : object, block : object, particles : int, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like func_pgibbs, but deterministically select the maximum-likelihood particle at the end instead of sampling.

Iterated applications of func_pmap are guaranteed to grow in likelihood (and therefore do not converge to the posterior).

meanfield(scope : object, block : object, training_steps : int, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Sample from a mean-field variational approximation of the local posterior.

The mean-field approximation is optimized with gradient ascent. The training_steps argument specifies how many steps to take.

The transitions argument specifies how many times to do this.

Note: There is currently no way to save the result of training the variational approximation to be able to sample from it many times.

Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Print some statistics about the requested scaffold.

This may be useful as a diagnostic.

The transitions argument specifies how many times to do this; this is not redundant if the block argument is one.

nesterov(scope : object, block : object, step_size : number, steps : int, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Move deterministically toward the maximum of the local posterior by Nesterov-accelerated gradient ascent.

Not available in the Puma backend. Not all the builtin procedures support all the gradient information necessary for this.

The gradient is of the log posterior.

The presence of discrete random choices in the scope-block pair will not prevent this inference strategy, but none of the discrete choices will be moved by the gradient steps.

The step_size argument gives how far to move along the gradient at each point.

The steps argument gives how many steps to take.

The transitions argument specifies how many times to do this.

Note: the Nesterov acceleration is applied across steps within one transition, not across transitions.

map(scope : object, block : object, step_size : number, steps : int, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Move deterministically toward the maximum of the local posterior by gradient ascent.

Not available in the Puma backend. Not all the builtin procedures support all the gradient information necessary for this.

This is just like nesterov, except without the Nesterov correction.

hmc(scope : object, block : object, step_size : number, steps : int, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a Hamiltonian Monte Carlo transition kernel.

Not available in the Puma backend. Not all the builtin procedures support all the gradient information necessary for this.

The presence of discrete random choices in the scope-block pair will not prevent this inference strategy, but none of the discrete choices will be moved.

The step_size argument gives the step size of the integrator used by HMC.

The steps argument gives how many steps to take in each HMC trajectory.

The transitions argument specifies how many times to do this.

rejection(scope : object, block : object[, attempt_bound : number[, transitions : int]])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Sample from the local posterior by rejection sampling.

Not available in the Puma backend. Not all the builtin procedures support all the density bound information necessary for this.

The attempt_bound bound argument, if supplied, indicates how many attempts to make. If no sample is accepted after that many trials, stop, and leave the local state as it was. Warning: bounded rejection is not a Bayes-sound inference algorithm. If attempt_bound is not given, keep trying until acceptance (possibly leaving the session unresponsive). Note: if three arguments are supplied, the last one is taken to be the number of transitions, not the attempt bound.

The transitions argument specifies how many times to do this. Specifying more than 1 transition is redundant if the block is anything other than one.

bogo_possibilize(scope : object, block : object[, transitions : int])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Initialize the local inference problem to a possible state.

If the current local likelihood is 0, resimulate the local prior until a non-zero likelihood state is found.

Notes:

  • If the current state is possible, do nothing.

  • This is different from rejection sampling because the distribution on results is not the posterior, but the prior conditioned on the likelihood being non-zero. As such, it is likely to complete faster.

  • This is different from likelihood weighting because a) it keeps trying automatically until it finds a possible state, and b) it does not modify the weight of the particle it is applied to (because if the scope and block are other than default all it is not clear what the weight should become).

  • Does not change the particle weight, because the right one is not obvious for general scaffolds, or for the case where the state was possible to begin with. If you’re using (bogo_possibilize default all) for pure initialization from the prior, consider following it with:

    (do (l <- global_likelihood)
        (set_particle_log_weights l))
    

The transitions argument specifies how many times to do this. Specifying more than 1 transition is redundant if the block is anything other than one.

slice(scope : object, block : object, w : number, m : int, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Slice sample from the local posterior of the selected random choice.

The scope-block pair must identify a single random choice, which must be continuous and one-dimensional.

This kernel uses the stepping-out procedure to find the slice. The w and m arguments parameterize the slice sampler in the standard way.

The transitions argument specifies how many transitions of the chain to run.

slice_doubling(scope : object, block : object, w : number, p : int, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Slice sample from the local posterior of the selected random choice.

The scope-block pair must identify a single random choice, which must be continuous and one-dimensional.

This kernel uses the interval-doubling procedure to find the slice. The w and p arguments parameterize the slice sampler in the standard way.

The transitions argument specifies how many transitions of the chain to run.

resample(particles : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Perform an SMC-style resampling step.

The particles argument gives the number of particles to make. Subsequent modeling and inference commands will be applied to each result particle independently.

Future observations will have the effect of weighting the particles relative to each other by the relative likelihoods of observing those values in those particles. The resampling step respects those weights.

The new particles will be handled in series. See the next procedures for alternatives.

resample_multiprocess(particles : int[, max_processes : int])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like resample, but fork multiple OS processes to simulate the resulting particles in parallel.

The max_processes argument, if supplied, puts a cap on the number of processes to make. The particles are distributed evenly among the processes. If no cap is given, fork one process per particle.

Subtlety: Collecting results (and especially performing further resampling steps) requires inter-process communication, and therefore requires serializing and deserializing any state that needs transmitting. resample_multiprocess is therefore not a drop-in replacement for resample, as the former will handle internal states that cannot be serialized, whereas the latter will not.

resample_serializing(particles : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like resample, but performs serialization the same way resample_multiprocess does.

Use this to debug serialization problems without messing with actually spawning multiple processes.

resample_threaded(particles : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like resample_multiprocess but uses threads rather than actual processes, and does not serialize, transmitting objects in shared memory instead.

Python’s global interpreter lock is likely to prevent any speed gains this might have produced.

Might be useful for debugging concurrency problems without messing with serialization and multiprocessing, but we expect such problems to be rare.

resample_thread_ser(particles : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like resample_threaded, but serializes the same way resample_multiprocess does.

Python’s global interpreter lock is likely to prevent any speed gains this might have produced.

Might be useful for debugging concurrency+serialization problems without messing with actual multiprocessing, but then one is messing with multithreading.

likelihood_weight()
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Likelihood-weight the full particle set.

Resample all particles in the current set from the prior and reset their weights to the likelihood.

enumerative_diversify(scope : object, block : object)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Diversify the current particle set to represent the local posterior exactly.

Specifically:

  1. Compute the local posterior by enumeration of all possible values in the given scope and block
  2. Fork every extant particle as many times are there are values
  3. Give each new particle a relative weight proportional to the relative weight of its ancestor particle times the posterior probability of the chosen value.

Unlike most inference SPs, this transformation is deterministic.

This is useful together with collapse_equal and collapse_equal_map for implementing certain kinds of dynamic programs in Venture.

collapse_equal(scope : object, block : object)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Collapse the current particle set to represent the local posterior less redundantly.

Specifically:

  1. Bin all extant particles by the (joint) values they exhibit on all random variables in the given scope and block (specify a block of “none” to have one bin)
  2. Resample by relative weight within each bin, retaining one particle
  3. Set the relative weight of the retained particle to the sum of the weights of the particles that formed the bin

Viewed as an operation on only the random variables in the given scope and block, this is deterministic (the randomness only affects other values).

This is useful together with enumerative_diversify for implementing certain kinds of dynamic programs in Venture.

collapse_equal_map(scope : object, block : object)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Like collapse_equal but deterministically retain the max-weight particle.

And leave its weight unaltered, instead of adding in the weights of all the other particles in the bin.

draw_scaffold(scope : object, block : object, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Draw a visual representation of the scaffold indicated by the given scope and block.

This is useful for debugging. You probably do not want to specify more than 1 transition.

subsampled_mh(scope : object, block : object, Nbatch : int, k0 : int, epsilon : number, useDeltaKernels : bool, deltaKernelArgs : number, updateValues : bool, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a subsampled Metropolis-Hastings kernel

per the Austerity MCMC paper.

Note: not all dependency structures that might occur in a scaffold are supported. See subsampled_mh_check_applicability.

Note: the resulting execution history may not actually be possible, so may confuse other transition kernels. See subsampled_mh_make_consistent and *_update.

subsampled_mh_check_applicability(scope : object, block : object, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Raise a warning if the given scope and block obviously do not admit subsampled MH

From the source:

# Raise three types of warnings:
# - SubsampledScaffoldNotEffectiveWarning: calling subsampled_mh will be the
#   same as calling mh.
# - SubsampledScaffoldNotApplicableWarning: calling subsampled_mh will cause
#   incorrect behavior.
# - SubsampledScaffoldStaleNodesWarning: stale node will affect the
#   inference of other random variables. This is not a critical
#   problem but requires one to call makeConsistent before other
#   random nodes are selected as principal nodes.
#
# This method cannot check all potential problems caused by stale nodes.
subsampled_mh_make_consistent(scope : object, block : object, useDeltaKernels : bool, deltaKernelArgs : number, updateValues : bool, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Fix inconsistencies introduced by subsampled MH.

mh_kernel_update(scope : object, block : object, useDeltaKernels : bool, deltaKernelArgs : number, updateValues : bool, transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a normal mh kernel, tolerating inconsistencies introduced by previous subsampled MH.

gibbs_update(scope : object, block : object, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a normal gibbs kernel, tolerating inconsistencies introduced by previous subsampled MH.

pgibbs_update(scope : object, block : object, particles : int, transitions : int[, in_parallel : bool])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Run a normal pgibbs kernel, tolerating inconsistencies introduced by previous subsampled MH.

incorporate()
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Make the history consistent with observations.

Specifically, modify the execution history so that the values of variables that have been observed since the last incorporate match the given observations. If there are multiple particles, also adjust their relative weights by the relative likelihoods of the observations being incorporated.

This is done automatically at the beginning of every infer command, but is also provided explicitly because it may be appropriate to invoke in the middle of complex inference programs that introduce new observations.

likelihood_at(scope : object, block : object)
Return type:proc(<foreignblob>) -> <pair <array <number>> <foreignblob>>

Compute and return the value of the local log likelihood at the given scope and block.

If there are stochastic nodes in the conditional regeneration graph, reuses their current values. This could be viewed as a one-sample estimate of the local likelihood.

(likelihood_at default all) is not the same as getGlobalLogScore because it does not count the scores of any nodes that cannot report likelihoods, or whose existence is conditional. likelihood_at also treats exchangeably coupled nodes correctly.

Compare posterior_at.

posterior_at(scope : object, block : object)
Return type:proc(<foreignblob>) -> <pair <array <number>> <foreignblob>>

Compute and return the value of the local log posterior at the given scope and block.

The principal nodes must be able to assess. Otherwise behaves like likelihood_at, except that it includes the log densities of non-observed stochastic nodes.

particle_log_weights(<foreignblob>)
Return type:<pair <array <number>> <foreignblob>>

Return the weights of all extant particles as an array of numbers (in log space).

set_particle_log_weights(<array <number>>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Set the weights of the particles to the given array. It is an error if the length of the array differs from the number of particles.

load_plugin(filename, ...)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

Load the plugin located at <filename>.

Any additional arguments to load_plugin are passed to the plugin’s __venture_start__ function, whose result is returned.

XXX: Currently, extra arguments must be VentureSymbols, which are unwrapped to Python strings for the plugin.

_call_back(<object>, ...)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_collect(<object>, ...)
Return type:proc(<foreignblob>) -> <pair <dataset> <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

printf(<dataset>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Print model values collected in a dataset.

This is a basic debugging facility.

plot(<spec>, <dataset>)
Return type:()

Plot a data set according to a plot specification.

Example:

[define d (empty)]
[infer (do (assume x (normal 0 1))
           (repeat 1000
                   (do (mh default one 1)
                       (bind (collect x) (curry into d)))))]
(plot d)

will do 1000 iterations of MH collecting some standard data and the value of x, and then show a plot of the x variable (which should be a scalar) against the iteration number (from 1 to 1000), colored according to the global log score. See collect for details on collecting and labeling data to be plotted.

The format specifications are inspired loosely by the classic printf. To wit, each individual plot that appears on a page is specified by some line noise consisting of format characters matching the following regex:

[<geom>]*(<stream>?<scale>?){1,3}

specifying

  • the geometric objects to draw the plot with, and

  • for each dimension (x, y, and color, respectively)
    • the data stream to use
    • the scale

The possible geometric objects are:

  • _p_oint,
  • _l_ine,
  • _b_ar, and
  • _h_istogram

The possible data streams are:

  • _<an integer>_ that column in the data set, 0-indexed,
  • _%_ the next column after the last used one
  • iteration _c_ounter,
  • _t_ime (wall clock, since the beginning of the Venture program),
  • log _s_core, and
  • pa_r_ticle

The possible scales are:

  • _d_irect, and
  • _l_ogarithmic

If one stream is indicated for a 2-D plot (points or lines), the x axis is filled in with the iteration counter. If three streams are indicated, the third is mapped to color.

If the given specification is a list, make all those plots at once.

plotf(<spec>, <dataset>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Plot a data set according to a plot specification.

Example:

[INFER (let ((d (empty)))
         (do (assume x (normal 0 1))
             (repeat 1000
                     (do (mh default one 1)
                         (bind (collect x) (curry into d))))
             (plotf (quote c0s) d))) ]

will do 1000 iterations of MH collecting some standard data and the value of x, and then show a plot of the x variable (which should be a scalar) against the iteration number (from 1 to 1000), colored according to the global log score. See collect for details on collecting and labeling data to be plotted.

The format specifications are inspired loosely by the classic printf. To wit, each individual plot that appears on a page is specified by some line noise consisting of format characters matching the following regex:

[<geom>]*(<stream>?<scale>?){1,3}

specifying

  • the geometric objects to draw the plot with, and

  • for each dimension (x, y, and color, respectively)
    • the data stream to use
    • the scale

The possible geometric objects are:

  • _p_oint,
  • _l_ine,
  • _b_ar, and
  • _h_istogram

The possible data streams are:

  • _<an integer>_ that column in the data set, 0-indexed,
  • _%_ the next column after the last used one
  • iteration _c_ounter,
  • _t_ime (wall clock, since the beginning of the Venture program),
  • log _s_core, and
  • pa_r_ticle

The possible scales are:

  • _d_irect, and
  • _l_ogarithmic

If one stream is indicated for a 2-D plot (points or lines), the x axis is filled in with the iteration counter. If three streams are indicated, the third is mapped to color.

If the given specification is a list, make all those plots at once.

plot_to_file(<basename>, <spec>, <dataset>)
Return type:()

Save plot(s) to file(s).

Like plot, but save the resulting plot(s) instead of displaying on screen. Just as <spec> may be either a single expression or a list, <basenames> may either be a single symbol or a list of symbols. The number of basenames must be the same as the number of specifications.

Examples:
(plot_to_file (quote basename) (quote spec) <expression> ...) saves the plot specified by
the spec in the file “basename.png”
(plot_to_file (quote (basename1 basename2)) (quote (spec1 spec2)) <expression> ...) saves
the spec1 plot in the file basename1.png, and the spec2 plot in basename2.png.
plotf_to_file(<basename>, <spec>, <dataset>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Save plot(s) to file(s).

Like plotf, but save the resulting plot(s) instead of displaying on screen. Just as <spec> may be either a single expression or a list, <basenames> may either be a single symbol or a list of symbols. The number of basenames must be the same as the number of specifications.

Examples:
(plotf_to_file (quote basename) (quote spec) <expression> ...) saves the plot specified by
the spec in the file “basename.png”
(plotf_to_file (quote (basename1 basename2)) (quote (spec1 spec2)) <expression> ...) saves
the spec1 plot in the file basename1.png, and the spec2 plot in basename2.png.
sweep(<dataset>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Print the iteration count.

Extracts the last row of the supplied inference Dataset and prints its iteration count.

Examples:
(sweep d)
_assume(<symbol>, <expression>[, <label>])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_observe(<expression>, <object>[, <label>])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_force(<expression>, <object>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_predict(<expression>[, <label>])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_sample(<expression>)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_sample_all(<expression>)
Return type:proc(<foreignblob>) -> <pair <list> <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

_extract_stats(<expression>)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

A helper function for implementing the eponymous inference macro.

Calling it directly is likely to be difficult and unproductive.

forget(<label>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Forget an observation, prediction, or unused assumption.

Removes the directive indicated by the label argument from the model. If an assumption is forgotten, the symbol it binds disappears from scope; the behavior if that symbol was still referenced is unspecified.
freeze(<label>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Freeze an assumption to its current sample.

Replaces the assumption indicated by the label argument with a constant whose value is that assumption’s current value (which may differ across particles). This has the effect of preventing future inference on that assumption, and decoupling it from its (former) dependecies, as well as reclaiming any memory of random choices that can no longer influence any toplevel value.

Together with forget, freeze makes it possible for particle filters in Venture to use model memory independent of the sequence length.

empty()
Return type:<dataset>

Create an empty dataset into which further collect ed stuff may be merged.

into(<foreignblob>, <foreignblob>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Destructively merge the contents of the second argument into the first.

Right now only implemented on datasets created by empty and collect, but in principle generalizable to any monoid.

ordered_range(<object>, ...)
Return type:<list>

deterministic ordered_range

assert(<bool>[, message])
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Check the given boolean condition and raise an error if it fails.

print(<object>, ...)
Return type:()

Print the given values to the terminal.

new_model([<symbol>])
Return type:proc(<foreignblob>) -> <pair <model> <foreignblob>>

Create an new empty model.

The symbol, if supplied, gives the name of the backend to use, either puma or lite. If omitted, defaults to the same backend as the current implicit model.

This is an inference action rather than a pure operation due to implementation accidents. [It reads the Engine to determine the default backend to use (TODO could take that as an argument) and for the registry of bound foreign sps (TODO: Implicitly bind extant foreign sps into new models?)]

in_model(<model>, <action>)
Return type:proc(<foreignblob>) -> <pair <pair <object> <model>> <foreignblob>>

Run the given inference action against the given model.

Returns a pair consisting of the result of the action and the model, which is also mutated.

This is itself an inference action rather than a pure operation due to implementation accidents. [It invokes a method on the Engine to actually run the given action].

model_import_foreign(<name>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Import the named registered foregin SP into the current model.

This is typically only necessary in conjunction with new_model, because foreign SPs are automatically imported into the model that is ambient at the time the foreign SP is bound by the ripl (which is usually the toplevel model).

The name must refer to an SP that was previously registered with Venture via ripl.register_foreign_sp or ripl.bind_foreign_sp. Binds that symbol to that procedure in the current model.

select(scope : object, block : object)
Return type:proc(<foreignblob>) -> <pair subproblem <foreignblob>>

Select the subproblem indicated by the given scope and block from the current model.

Does not interoperate with multiple particles, or with stochastic subproblem selection.

detach(<subproblem>)
Return type:proc(<foreignblob>) -> <pair <pair weight <rhoDB>> <foreignblob>>

Detach the current model along the given subproblem.

Return the current likelihood at the fringe, and a database of the old values that is suitable for restoring the current state (e.g., for rejecting a proposal).

Does not interoperate with multiple particles, or with custom proposals.

regen(<subproblem>)
Return type:proc(<foreignblob>) -> <pair weight <foreignblob>>

Regenerate the current model along the given subproblem.

Return the new likelihood at the fringe.

Does not interoperate with multiple particles, or with custom proposals.

restore(<subproblem>, <rhoDB>)
Return type:proc(<foreignblob>) -> <pair weight <foreignblob>>

Restore a former state of the current model along the given subproblem.

Does not interoperate with multiple particles.

detach_for_proposal(<subproblem>)
Return type:proc(<foreignblob>) -> <pair <pair weight <rhoDB>> <foreignblob>>

Detach the current model along the given subproblem, returning the local posterior.

Differs from detach in that it includes the log densities of the principal nodes in the returned weight, so as to match regen_with_proposal. The principal nodes must be able to assess.

Return the current posterior at the fringe, and a database of the old values for restoring the current state.

Does not interoperate with multiple particles.

regen_with_proposal(<subproblem>, <list>)
Return type:proc(<foreignblob>) -> <pair weight <foreignblob>>

Regenerate the current model along the given subproblem from the given values.

Differs from regen in that it deterministically moves the principal nodes to the given values rather than resimulating them from the prior, and includes the log densities of those nodes in the returned weight. The principal nodes must be able to assess.

Return the new posterior at the fringe.

Does not interoperate with multiple particles.

get_current_values(<subproblem>)
Return type:proc(<foreignblob>) -> <pair <list> <foreignblob>>

Get the current values of the principal nodes of the given subproblem.

Does not interoperate with multiple particles.

draw_subproblem(<subproblem>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Draw a subproblem by printing out the source code of affected random choices.

pyexec(<code>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Execute the given string as Python code, via exec.

The code is executed in an environment where the RIPL is accessible via the name ripl. Values from the ambient inference program are not directly accessible. The environment against which pyexec is executed persists across invocations of pyexec and pyeval.

pyeval(<code>)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

Evaluate the given string as a Python expression, via eval.

The code is executed in an environment where the RIPL is accessible via the name ripl. Values from the ambient inference program are not directly accessible. The environment against which pyeval is evaluated persists across invocations of pyexec and pyeval.

iterate(f : <inference action>, iterations : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Repeatedly apply the given action, suppressing the returned values.

repeat(iterations : int, f : <inference action>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Repeatedly apply the given action, suppressing the returned values. This is the same as iterate, except for taking its arguments in the opposite order, as a convenience.

sequence(ks : list<inference action returning a>)
Return type:proc(<foreignblob>) -> <pair list<a> <foreignblob>>

Apply the given list of actions in sequence, returning the values. This is Haskell’s sequence.

sequence(ks : list<inference action>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Apply the given list of actions in sequence, discarding the values. This is Haskell’s sequence_.

mapM(act : proc(a) -> <inference action returning b>, objs : list<a>)
Return type:proc(<foreignblob>) -> <pair list<b> <foreignblob>>

Apply the given action function to each given object and perform those actions in order. Return a list of the resulting values. The nomenclature is borrowed from Haskell.

imapM(act : proc(int, a) -> <inference action returning b>, objs : list<a>)
Return type:proc(<foreignblob>) -> <pair list<b> <foreignblob>>

Apply the given action function to each given object and its index in the list and perform those actions in order. Return a list of the resulting values.

for_each(objs : list<a>, act : proc(a) -> <inference action>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Apply the given action function to each given object and perform those actions in order. Discard the results.

for_each_indexed(objs : list<a>, act : proc(int, a) -> <inference action>)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Apply the given action function to each given object and its index in the list and perform those actions in order. Discard the results.

pass(<foreignblob>)
Return type:<pair () <foreignblob>>

An inference action that does nothing and returns nil. Useful in the same sorts of situations as Python’s pass statement.

bind(<inference action returning a>, proc(a) -> <inference action returning b>)
Return type:proc(<foreignblob>) -> <pair b <foreignblob>>

Chain two inference actions sequentially, passing the value of the first into the procedure computing the second. This is Haskell’s bind, specialized to inference actions.

bind_(<inference action>, proc() -> <inference action returning a>)
Return type:proc(<foreignblob>) -> <pair a <foreignblob>>

Chain two inference actions sequentially, ignoring the value of the first. This is Haskell’s >> operator, specialized to inference actions.

Note that the second argument is a thunk that computes an inference action. This is important, because it defers computing the action to take until it is actually time to take it, preventing infinite loops in, e.g., unrolling the future action spaces of recursive procedures.

action(<object>)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

Wrap an object, usually a non-inference function like plotf, as an inference action, so it can be used inside a do(...) block.

return(<object>)
Return type:proc(<foreignblob>) -> <pair <object> <foreignblob>>

An inference action that does nothing and just returns the argument passed to return.

curry(proc(<a>, <b>) -> <c>, <a>)
Return type:proc(<b>) -> <c>

Curry a two-argument function into two one-argument stages. Supports the idiom (bind (collect ...) (curry plotf (quote spec))).

curry3(proc(<a>, <b>, <c>) -> <d>, <a>, <b>)
Return type:proc(<c>) -> <d>

Curry a three-argument function into a two-argument stage and a one-argument stage. Supports the idiom (bind (collect ...) (curry plotf_to_file (quote name) (quote spec))).

global_likelihood(<foreignblob>)
Return type:<pair <number> <foreignblob>>

An inference action that computes and returns the global likelihood (in log space). Cost: O(size of trace).

global_posterior(<foreignblob>)
Return type:<pair <number> <foreignblob>>

An inference action that computes and returns the global posterior (in log space). Cost: O(size of trace).

global_posterior(<foreignblob>)
Return type:<pair () <foreignblob>>

An inference action that sets each particle to an independent sample from the full posterior (with respect to currently incorporated observations).

This is implemented by global rejection sampling (generalized to continuous equality constraints), so may take a while for problems where the posterior is far from the prior in KL divergence.

join_datasets(datasets : list<dataset>)
Return type:proc(<foreignblob>) -> <pair <dataset> <foreignblob>>

Merge all the given datasets into one.

accumulate_dataset(iterations : int, a : <inference action returning a dataset>)
Return type:proc(<foreignblob>) -> <pair <dataset> <foreignblob>>

Run the given inference action the given number of times, accumulating all the returned datasets into one.

For example,

(accumulate_dataset 1000
(do (mh default one 10)
(collect x)))

will return a dataset consisting of the values of x that occur at 10-step intervals in the history of a 10000-step default Markov chain on the current model.

reset_to_prior(<foreignblob>)
Return type:<pair () <foreignblob>>

Reset all particles to the prior. Also reset their weights to the likelihood.

This is equivalent to ``(likelihood_weight)’‘.

run(<inference action returning a>)
Return type:a

Run the given inference action and return its value.

default_markov_chain(transitions : int)
Return type:proc(<foreignblob>) -> <pair () <foreignblob>>

Take the requested number of steps of the default Markov chain.

The default Markov chain is single-site resimulation M-H.

(default_markov_chain k)

is equivalent to

(mh default one k)

Built-in Helpers

  • default: The default scope.

    The default scope contains all the random choices, each in its own block.

  • one: Mix over individual blocks in the scope.

    If given as a block keyword, one causes the inference procedure to uniformly choose one of the blocks in the scope on which it is invoked and apply to that.

  • all: Affect all choices in the scope.

    If given as a block keyword, all causes the inference procedure to apply to all random choices in the scope.

  • none: Affect no choices in the scope.

    If given as a block keyword, none causes the inference procedure to apply to no random choices. This is useful only for collapse_equal and collapse_equal_map.

  • ordered: Make particle Gibbs operate on all blocks in order of block ID.

  • (ordered_range <block> <block>): Make particle Gibbs operate on a range of blocks in order of block ID.

    Specifically, all the blocks whose IDs lie between the given lower and upper bounds.

Special Forms

All the macros available in the modeling language can be used in the inference language, too. In addition, the following inference macros are available.

  • (loop <kernel>): Run the given kernel continuously in a background thread.

    Available in Lite and Puma.

    Can only be used as the top level of the infer instruction: [infer (loop something)].

    Execute the [stop_continuous_inference] instruction to stop.

  • (do <stmt> <stmt> ...): Sequence actions that may return results.

    Each <stmt> except the last may either be

    • a kernel, in which case it is performed and any value it returns is dropped, or
    • a binder of the form (<variable> <- <kernel>) in which case the kernel is performed and its value is made available to the remainder of the do form by being bound to the variable.

    The last <stmt> may not be a binder and must be a kernel. The whole do expression is then a single compound heterogeneous kernel, whose value is the value returned by the last <stmt>.

    If you need a kernel that produces a value without doing anything, use (return <value>). If you need a kernel that does nothing and produces no useful value, you can use pass.

    For example, to make a kernel that does inference until some variable in the model becomes “true” (why would anyone want to do that?), you can write:

    1 [define my_strange_kernel (lambda ()
    2   (do
    3     (finish <- (sample something_from_the_model))
    4     (if finish
    5         pass
    6         (do
    7           (mh default one 1)
    8           (my_strange_kernel)))))]
    

    Line 3 is a binder for the do started on line 2, which makes finish a variable usable by the remainder of the procedure. The if starting on line 4 is a kernel, and is the last statement of the outer do. Line 7 is a non-binder statement for the inner do.

    The nomenclature is borrowed from the (in)famous do notation of Haskell. If this helps you think about it, Venture’s do is exactly Haskell do, except there is only one monad, which is essentially State ModelHistory. Randomness and actual i/o are not treated monadically, but just executed, which we can get away with because Venture is strict and doesn’t aspire to complete functional purity.

  • (begin <kernel> ...): Perform the given kernels in sequence.

  • (call_back <name> <model-expression> ...): Invoke a user-defined callback.

    Locate the callback registered under the name name and invoke it with

    • First, the Infer instance in which the present inference program is being run
    • Then, for each expression in the call_back form, a list of values for that expression, represented as stack dicts, sampled across all extant particles. The lists are parallel to each other.

    Return the value returned by the callback, or Nil if the callback returned None.

    To bind a callback, call the bind_callback method on the Ripl object:

    ripl.bind_callback(<name>, <callable>):
    
    Bind the given Python callable as a callback function that can be
    referred to by `call_back` by the given name (which is a string).
    

    There is an example in test/inference_language/test_callback.py.

  • (collect <model-expression> ...): Extract data from the underlying model during inference.

    When a collect inference command is executed, the given expressions are sampled and their values are returned in a Dataset object. This is the way to get data into datasets; see into for accumulating datasets, and printf, plotf, and plotf_to_file for using them.

    Each <model-expression> may optionally be given in the form (labelled <model-expression> <name>), in which case the given name serves as the key in the returned table of data. Otherwise, the key defaults to a string representation of the given expression.

    Note: The <model-expression>s are sampled in the _model_, not the inference program. For example, they may refer to variables assume d in the model, but may not refer to variables define d in the inference program. The <model-expression>s may be constructed programmatically: see unquote.

    collect also automatically collects some standard items: the iteration count (maintained by merging datasets), the particle id, the wall clock time that passed since the Venture program began, the global log score, the particle weights in log space, and the normalized weights of the particles in direct space.

    If you want to do something custom with the data, you will want to use the asPandas() method of the Dataset object from your callback or foreign inference sp.

  • (assume <symbol> <model-expression> [<label>]): Programmatically add an assumption.

    Extend the underlying model by adding a new generative random variable, like the assume directive. The given model expression may be constructed programmatically – see unquote.

    The <label>, if supplied, may be used to freeze or forget this directive.

  • (observe <model-expression> <value> [<label>]): Programmatically add an observation.

    Condition the underlying model by adding a new observation, like the observe directive. The given model expression may be constructed programmatically – see unquote. The given value is computed in the inference program, and may be stochastic. This corresponds to conditioning a model on randomly chosen data.

    The <label>, if supplied, may be used to forget this observation.

    Note: Observations are buffered by Venture, and do not take effect immediately. Call incorporate when you want them to. incorporate is called automatically before every toplevel infer instruction, but if you are using observe inside a compound inference program, you may not execute another toplevel infer instruction for a while.

  • (predict <model-expression> [<label>]): Programmatically add a prediction.

    Extend the underlying model by adding a new generative random variable, like the predict directive. The given model expression may be constructed programmatically – see unquote.

    The <label>, if supplied, may be used to freeze or forget this directive.

  • (force <model-expression> <value>): Programatically force the state of the model.

    Force the model to set the requested variable to the given value, without constraining it to stay that way. Implemented as an observe followed by a forget.

  • (sample <model-expression>): Programmatically sample from the model.

    Sample an expression from the underlying model by simulating a new generative random variable without adding it to the model, like the sample directive. If there are multiple particles, refers to the distinguished one.

    The given model expression may be constructed programmatically – see unquote.

  • (sample_all <model-expression>): Programmatically sample from the model in all particles.

    Sample an expression from the underlying model by simulating a new generative random variable without adding it to the model, like the sample directive.

    Unlike the sample directive, interacts with all the particles, and returns values from all of them as a list.

    The given model expression may be constructed programmatically – see unquote.

  • (extract_stats <model-expression>): Extract maintained statistics.

    Specifically, sample the given model expression, like sample, but expect it to return a stochastic procedure and reify and return the statistics about its applications that it has collected.

    The exact Venture-level representation of the returned statistics depends on the procedure in question. If the procedure does not collect statistics, return nil.

    For example:

    (assume coin (make_beta_bernoulli 1 1))
    (observe (coin) true)
    (incorporate)
    (extract_stats coin) --> (list 1 0)
    
  • (unquote <object>): Programmatically construct part of a model expression.

    All the <model-expression> s in the above special forms may be constructed programmatically. An undecorated expression is taken literally, but if (unquote ...) appears in such an expression, the code inside the unquote is executed in the inference program, and its result is spliced in to the model program.

    For example, suppose one wanted to observe every value in a data set, but allow the model to know the index of the observation (e.g., to select observation models). For this, every observed model expression needs to be different, but in a programmable manner. Here is a way to do that:

    [define data ...]
    [define (observe_after i)
      (lambda (i)
        (if (< i (length data))
            (begin
              (observe (obs_fun (unquote i)) (lookup data i)) ; (*)
              (observe_after (+ i 1)))
            pass))]
    [infer (observe_after 0)]
    

    Note the use of unquote on the like marked (*) to construct different observe expressions for each data element. To the underlying model, this will look like:

    [observe (obs_fun 0) <val0>]
    [observe (obs_fun 1) <val1>]
    [observe (obs_fun 2) <val2>]
    ...
    

Footnotes

[1]

For the interested, the way this is actually done is that each of the primitives documented here actually returns a procedure that accepts a reified object representing the sampled execution history of the model program, affects it, and returns a pair consisting of whatever value it wishes to return in the first slot, and the reified execution history in the second. This is relevant if you wish to define additional inference abstractions in Venture, or more complex combinations of them than the provided ones.

For those readers for whom this analogy is helpful, the above is exactly the type signature one would get by implementing a State monad on execution histories. This is why the do form is called do. The begin form is a simplification of do that drops all intermediate return values. As of this writing, there are no analogues to get, put, or runState.