Placeholder image for example.

Generative Probabilistic Graphics Programming

The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood model. Representations and algorithms from computer graphics, originally designed to produce high-quality images, are instead used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on general-purpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and supports accurate, approximately Bayesian inferences about ambiguous real-world images.

Get Paper Get Supplemental

Decoding simple CAPTCHAs

A generative probabilistic graphics program for reading degraded text. The scene genera- tor chooses letter identity (A-Z and digits 0-9), position, size and rotation at random. These random variables are fed into the renderer, along with the bandwidths of a series of spatial blur kernels (one per letter, another for the overall redered image from generative model and another for the original input image). These blur kernels control the fidelity of the rendered image. The image returned by the renderer is compared to the data via a pixel-wise Gaussian likelihood model, whose variance is also an unknown variable.

Placeholder image for example.
Placeholder image for example.

Simple CAPTCHA examples

Four input images from our CAPTCHA corpus, along with the final results and conver- gence trajectory of typical inference runs. The first row is a highly cluttered synthetic CAPTCHA exhibiting extreme letter overlap. The second row is a CAPTCHA from TurboTax, the third row is a CAPTCHA from AOL, and the fourth row shows an example where our system makes errors on some runs. Our probabilistic graphics program did not originally support rotation, which was needed for the AOL CAPTCHAs; adding it required only 1 additional line of probabilistic code. See the main text for quantitative details, and supplemental material for the full corpus.

Parsing Road Scenes

Probablistic program that infers simple 3D road models.

Placeholder image for example.
Placeholder image for example.

Road Scene Parsing Examples

An illustration of generative probabilistic graphics for 3D road finding. (a) Renderings of random samples from our scene prior, showing the surface-based image segmentation induced by each sample. (b) Representative test frames from the KITTI dataset [3]. (c) Maximum likeli- hood lane/non-lane classification of the images from (b) based solely on the best-performing single- training-frame appearance model (ignoring latent geometry). Geometric constraints are clearly needed for reliable road finding. (d) Results from [1]. (e) Typical inference results from the pro- posed generative probabilistic graphics approach on the images from (b). (f) Appearance model his- tograms (over quantized RGB values) from the best-performing single-training-frame appearance model for all four region types: lane, left offroad, right offroad and road.

CAPTCHA Movie

GPGP for Road Scenes

Caltech Baseline for Road Scenes