Move files out of subdir
Update readme
tests: Clarify reason for memoized execution trace being shorter
This is a lightweight probabilistic programming system embedded in Guile Scheme, based on Oleg Kiselyov's HANSEI.^1
The library is divided into two modules:
(hansei model)
defines the interface for building probabilistic models.
(hansei infer)
implements a number of inference procedures
for exploring and drawing conclusions from a model.
There are also a few example modules defined in the examples directory, one of which is described below by way of a tutorial.
This is a modified version of the "grass model" from Pearl (1998), based on an example by Kiselyov.^2 There is a matching module file and test script which output traces of the computation as it unfolds. Also included is a memoized version, not (yet) documented here.
The challenge is to calculate the most likely explanation for an observation, given an incomplete picture of the world which includes both local and environmental factors.
Our knowledge is given as follows:
Every other day is cloudy.
Rain occurs on one in five clear days, or four in five cloudy days.
Every week the garden is watered with a sprinkler. On the remaining six days the garden is watered only when there are no clouds, and then only half the time.
After a rainy day, the roof of the house is wet seven times out of ten.
The soil in the garden stays wet nine out of ten times, either because it rained or because the sprinkler was run.
Which corresponds to the directed graph (Kiselyov, 2009)^3:
cloudy?

+> sprinkler? +
 
+> rain? +> wet garden?

+> wet roof?
Now, we observe that the garden is wet. What is the likelihood that it rained, given the above?
The problem can be broken down into two steps:
Bind each node in the knowledge graph to a variable, using special distribution functions to represent uncertainty or noise. Relations between variables are modeled using ordinary Scheme functions.
Assert that all variables must match the observed evidence, tossing out any inconsistent explanations.
Since all of our variables contain boolean values,
we only need two distribution functions,
imported from the (hansei model)
module:
flip
returns a distribution with two values,
representing a coin flip that comes up as #t
or #f
.
The argument to flip
defines the likelihood of the coin coming up #t
.
fail
returns a distribution with no values,
representing an inconsistent world.
Any part of the program which invokes fail
is effectively cut off
from the rest of the model.
(define (gardenmodel)
(let* ((cloudy? (flip 1/2))
(rain? (flip (if cloudy? 4/5 1/5)))
(sprinkler? (or (flip 1/7)
(and (not cloudy?)
(flip 1/2))))
(wetroof? (and rain? (flip 7/10)))
(wetgarden? (and (flip 9/10)
(or rain? sprinkler?))))
(if wetgarden? rain?
(fail))))
We use let*
to bind our variables in order,
writing flip
wherever there is uncertainty.
Each variable expresses its relation to those above it
using ordinary boolean logic.
Then, we make an observation — the garden is wet —
and return the state of the variable which concerns us — the rain.
We use fail
to rule out explanations which are inconsistent
with our observation.
Loading this module and running the gardenmodel
procedure produces the following list:
((1/2 . #<<subforest> thunk: ... ()>>)
(1/2 . #<<subforest> thunk: ... ()>>))
There are two values, as we might expect, but they aren't immediately useful. The list represents a search tree, whose inner nodes are weighted according to the probability of a part of the model. They are evaluated lazily, hence they are encoded as thunks. The possible values for the variables are stored in the tree's leaves.
We need a way to traverse the trees and collect these values,
calculating probabilities based on the weights of the internal nodes.
We do so using explore
from the (hansei infer)
module,
which tabulates the posterior odds of each variable's set of unique values
(since there may be more than one way to arrive at the same value),
and returns the result as a newly flattened tree.
First, however, we have to finalize our model using reify
.
Reifying a model prevents it from leaking into the rest of the program 
in this case, the call to explore
.
(define reifiedgardenmodel (reify gardenmodel))
(define (exploregardenmodel) (explore reifiedgardenmodel))
We can now import this module and run the inference procedure
exploregardenmodel
, yielding:
((153/700 . #(#f)) (9/20 . #(#t)))
Now we have the answers we wanted, but in rational form.
We could convert them to floating point using exact>inexact
and then compare them, or do some mental math and estimate
that rain is more likely by a factor of about two.
On the other hand, exact values represent a kind of program trace 
we can see the effect the weekly schedule had on the odds of no rain,
for example, because its denominator is a multiple of seven.
The final operation we need in order to compare the results is also exported by the inference module:
(normalize (exploregardenmodel))
Running normalize
on the explored model yields two values 
a value representing the fraction of the search space explored
(that is, the portion consistent with our observation),
and the normalized answers, exact and in equal denominations:
117/175
((17/52 . #(#f)) (35/52 . #(#t)))
Indeed, the likelihood of rain versus no rain, given the wet garden, is about 2:1.