* \section about About libDAI * libDAI is a free/open source C++ library (licensed under GPL 2+) that provides * implementations of various (approximate) inference methods for discrete * graphical models. libDAI supports arbitrary factor graphs with discrete * variables; this includes discrete Markov Random Fields and Bayesian * Networks. * * The library is targeted at researchers. To be able to use the library, a * good understanding of graphical models is needed. * * The best way to use libDAI is by writing C++ code that invokes the library; * in addition, part of the functionality is accessibly by using the * - command line interface * - (limited) MatLab interface * - (experimental) python interface * - (experimental) octave interface. * * libDAI can be used to implement novel (approximate) inference algorithms * and to easily compare the accuracy and performance with existing algorithms * that have been implemented already. * * A solver using libDAI was amongst the three winners of the UAI 2010 Approximate * Inference Challenge (see http://www.cs.huji.ac.il/project/UAI10/ for more * information). The full source code is provided as part of the library. * * \section features Features * Currently, libDAI supports the following (approximate) inference methods: * - Exact inference by brute force enumeration; * - Exact inference by junction-tree methods; * - Mean Field; * - Loopy Belief Propagation [\ref KFL01]; * - Fractional Belief Propagation [\ref WiH03]; * - Tree-Reweighted Belief Propagation [\ref WJW03]; * - Tree Expectation Propagation [\ref MiQ04]; * - Generalized Belief Propagation [\ref YFW05]; * - Double-loop GBP [\ref HAK03]; * - Various variants of Loop Corrected Belief Propagation * [\ref MoK07, \ref MoR05]; * - Gibbs sampler; * - Conditioned Belief Propagation [\ref EaG09]; * - Decimation algorithm. * * These inference methods can be used to calculate partition sums, marginals * over subsets of variables, and MAP states (the joint state of variables that * has maximum probability). * * In addition, libDAI supports parameter learning of conditional probability * tables by Expectation Maximization. * * \section limitations Limitations * libDAI is not intended to be a complete package for approximate inference. * Instead, it should be considered as an "inference engine", providing * various inference methods. In particular, it contains no GUI, currently * only supports its own file format for input and output (although support * for standard file formats may be added later), and provides very limited * visualization functionalities. The only learning method supported currently * is Expectation Maximization (or Maximum Likelihood if no data is missing) * for learning factor parameters. * * \section rationale Rationale * * In my opinion, the lack of open source "reference" implementations hampers * progress in research on approximate inference. Methods differ widely in terms * of quality and performance characteristics, which also depend in different * ways on various properties of the graphical models. Finding the best * approximate inference method for a particular application therefore often * requires empirical comparisons. However, implementing and debugging these * methods takes a lot of time which could otherwise be spent on research. I hope * that this code will aid researchers to be able to easily compare various * (existing as well as new) approximate inference methods, in this way * accelerating research and stimulating real-world applications of approximate * inference. * * \section language Language * Because libDAI is implemented in C++, it is very fast compared with * implementations in MatLab (a factor 1000 faster is not uncommon). * libDAI does provide a (limited) MatLab interface for easy integration with MatLab. * It also provides a command line interface and experimental python and octave * interfaces (thanks to Patrick Pletscher). * * \section compatibility Compatibility * * The code has been developed under Debian GNU/Linux with the GCC compiler suite. * libDAI compiles successfully with g++ versions 3.4 up to 4.4. * * libDAI has also been successfully compiled with MS Visual Studio 2008 under Windows * (but not all build targets are supported yet) and with Cygwin under Windows. * * Finally, libDAI has been compiled successfully on MacOS X. * * \section download Downloading libDAI * The libDAI sources and documentation can be downloaded from the libDAI website: * http://www.libdai.org. * * \section support Mailing list * The Google group "libDAI" (http://groups.google.com/group/libdai) * can be used for getting support and discussing development issues. */ /** \page license License *

* \section license-license License * * libDAI is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * libDAI is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * *

* \section license-gpl GNU General Public License version 2 * * \verbinclude COPYING */ /** \page citations Citing libDAI *

* \section citations-citations Citing libDAI * * If you write a scientific paper describing research that made substantive use * of this program, please cite the software appropriately, by mentioning the * fashion in which this software was used, including the version number. * * An appropriate citation would be:\n * * Joris M. Mooij et al. (2010) "libDAI 0.2.6: A free/open source C++ library for Discrete * Approximate Inference", http://www.libdai.org * * or in BiBTeX format: * *

* \@misc{mooij2010libdai, * author = "Joris M. Mooij et al.", * title = "lib{DAI} 0.2.6: A free/open source {C}++ library for {D}iscrete {A}pproximate {I}nference", * howpublished = "http://www.libdai.org/", * year = 2010 * } ** * Moreover, as a personal note, I would appreciate it to be informed about any * publications using libDAI at joris dot mooij at libdai dot org. */ /** \page authors Authors * \section authors-authors People who contributed to libDAI * * \verbinclude AUTHORS */ /** \page build Building libDAI *

* \section build-unix Building libDAI under UNIX variants (Linux / Cygwin / Mac OS X) * * \subsection build-unix-preparations Preparations * * You need: * - a recent version of gcc (at least version 3.4) * - GNU make * - recent boost C++ libraries (at least version 1.37; however, * version 1.37 shipped with Ubuntu 9.04 is known not to work) * - doxygen (only for building the documentation) * - graphviz (only for using some of the libDAI command line utilities) * - CImg library (only for building the image segmentation example) * * On Debian/Ubuntu, you can easily install the required packages with a single command: *

apt-get install g++ make doxygen graphviz libboost-dev libboost-graph-dev libboost-program-options-dev libboost-test-dev cimg-dev* (root permissions needed). * * On Mac OS X (10.4 is known to work), these packages can be installed easily via MacPorts. * If MacPorts is not already installed, install it according to the instructions at http://www.macports.org/. * Then, a simple *

sudo port install gmake boost doxygen graphviz* should be enough to install everything that is needed. * * On Cygwin, the prebuilt Cygwin package boost-1.33.1-x is known not to work. * You can however obtain the latest boost version (you need at least 1.37.0) * from http://www.boost.org/ and build it as described in the next subsection. * * \subsubsection build-unix-boost Building boost under Cygwin * * - Download the latest boost libraries from http://www.boost.org * - Build the required boost libraries using: *

* ./bootstrap.sh --with-libraries=program_options,math,graph,test --prefix=/boost_root/ * ./bjam* - In order to use dynamic linking, the boost .dll's should be somewhere in the path. * This can be achieved by a command like: *

* export PATH=$PATH:/boost_root/stage/lib* * * \subsection build-unix-libdai Building libDAI * * To build the libDAI source, first copy a template Makefile.* to Makefile.conf * (for example, copy Makefile.LINUX to Makefile.conf if you use GNU/Linux). * Then, edit the Makefile.conf template to adapt it to your local setup. * Especially directories may differ from system to system. Platform independent * build options can be set in Makefile.ALL. Finally, run *

make* The build includes a regression test, which may take a while to complete. * * If the build is successful, you can test the example program: *

examples/example tests/alarm.fg* or the more extensive test program: *

tests/testdai --aliases tests/aliases.conf --filename tests/alarm.fg --methods JTREE_HUGIN BP_SEQMAX* * *

* \section build-windows Building libDAI under Windows * * \subsection build-windows-preparations Preparations * * You need: * - A recent version of MicroSoft Visual Studio (2008 is known to work) * - recent boost C++ libraries (version 1.37 or higher) * - GNU make (can be obtained from http://gnuwin32.sourceforge.net) * - CImg library (only for building the image segmentation example) * * For the regression test, you need: * - GNU diff, GNU sed (can be obtained from http://gnuwin32.sourceforge.net) * * \subsubsection build-windows-boost Building boost under Windows * * Because building boost under Windows is tricky, I provide some guidance here. * * - Download the boost zip file from http://www.boost.org/users/download * and unpack it somewhere. * - Download the bjam executable from http://www.boost.org/users/download * and unpack it somewhere else. * - Download Boost.Build (v2) from http://www.boost.org/docs/tools/build/index.html * and unpack it yet somewhere else. * - Edit the file \c boost-build.jam in the main boost directory to change the * \c BOOST_BUILD directory to the place where you put Boost.Build (use UNIX * / instead of Windows \ in pathnames). * - Copy the \c bjam.exe executable into the main boost directory. * Now if you issue

* bjam --with-graph --with-math --with-program_options --with-test link=static runtime-link=shared* * \subsection build-windows-libdai Building libDAI * * To build the source, copy Makefile.WINDOWS to Makefile.conf. Then, edit * Makefile.conf to adapt it to your local setup. Platform independent * build options can be set in Makefile.ALL. Finally, run (from the command line) *

make* The build includes a regression test, which may take a while to complete. * * If the build is successful, you can test the example program: *

examples\\example tests\\alarm.fg* or the more extensive test program: *

tests\\testdai --aliases tests\\aliases.conf --filename tests\\alarm.fg --methods JTREE_HUGIN BP_SEQMAX* * *

* \section build-matlab Building the libDAI MatLab interface * * You need: * - MatLab * - The platform-dependent requirements described above * * First, you need to build the libDAI source as described above for your * platform. By default, the MatLab interface is disabled, so before compiling the * source, you have to enable it in Makefile.ALL by setting *

WITH_MATLAB=true* Also, you have to configure the MatLab-specific parts of * Makefile.conf to match your system (in particular, the Makefile variables ME, * MATLABDIR and MEX). The MEX file extension depends on your platform; for a * 64-bit linux x86_64 system this would be "ME=.mexa64", for a 32-bit linux x86 * system "ME=.mexglx". If you are unsure about your MEX file * extension: it needs to be the same as what the MatLab command "mexext" returns. * The required MEX files are built by issuing *

make* from the command line. The MatLab interface is much less powerful than using * libDAI from C++. There are two reasons for this: (i) it is boring to write MEX * files; (ii) the large performance penalty paid when large data structures (like * factor graphs) have to be converted between their native C++ data structure to * something that MatLab understands. * * A simple example of how to use the MatLab interface is the following (entered * at the MatLab prompt), which performs exact inference by the junction tree * algorithm and approximate inference by belief propagation on the ALARM network: *

cd path_to_libdai/matlab * [psi] = dai_readfg ('../tests/alarm.fg'); * [logZ,q,md,qv,qf] = dai (psi, 'JTREE', '[updates=HUGIN,verbose=0]') * [logZ,q,md,qv,qf] = dai (psi, 'BP', '[updates=SEQMAX,tol=1e-9,maxiter=10000,logdomain=0]')* where "path_to_libdai" has to be replaced with the directory in which libDAI * was installed. For other algorithms and some default parameters, see the file * tests/aliases.conf. * *

* \section build-doxygen Building the documentation * * Install doxygen, graphviz and a TeX distribution and use *

make doc* to build the documentation. If the documentation is not clear enough, feel free * to send me an email (or even better, to improve the documentation and send a patch!). * The documentation can also be browsed online at http://www.libdai.org. */ /** \page changelog Change Log * \verbinclude ChangeLog */ /** \page terminology Terminology and conventions * * \section terminology-graphicalmodels Graphical models * * Commonly used graphical models are Bayesian networks and Markov random fields. * In libDAI, both types of graphical models are represented by a slightly more * general type of graphical model: a factor graph [\ref KFL01]. * * An example of a Bayesian network is: * \dot * digraph bayesnet { * size="1,1"; * x0 [label="0"]; * x1 [label="1"]; * x2 [label="2"]; * x3 [label="3"]; * x4 [label="4"]; * x0 -> x1; * x0 -> x2; * x1 -> x3; * x1 -> x4; * x2 -> x4; * } * \enddot * The probability distribution of a Bayesian network factorizes as: * \f[ P(\mathbf{x}) = \prod_{i\in\mathcal{V}} P(x_i \,|\, x_{\mathrm{pa}(i)}) \f] * where \f$\mathrm{pa}(i)\f$ are the parents of node \a i in a DAG. * * The same probability distribution can be represented as a Markov random field: * \dot * graph mrf { * size="1.5,1.5"; * x0 [label="0"]; * x1 [label="1"]; * x2 [label="2"]; * x3 [label="3"]; * x4 [label="4"]; * x0 -- x1; * x0 -- x2; * x1 -- x2; * x1 -- x3; * x1 -- x4; * x2 -- x4; * } * \enddot * * The probability distribution of a Markov random field factorizes as: * \f[ P(\mathbf{x}) = \frac{1}{Z} \prod_{C\in\mathcal{C}} \psi_C(x_C) \f] * where \f$ \mathcal{C} \f$ are the cliques of an undirected graph, * \f$ \psi_C(x_C) \f$ are "potentials" or "compatibility functions", and * \f$ Z \f$ is the partition sum which properly normalizes the probability * distribution. * * Finally, the same probability distribution can be represented as a factor graph: * \dot * graph factorgraph { * size="1.8,1"; * x0 [label="0"]; * x1 [label="1"]; * x2 [label="2"]; * x3 [label="3"]; * x4 [label="4"]; * f01 [shape="box",label=""]; * f02 [shape="box",label=""]; * f13 [shape="box",label=""]; * f124 [shape="box",label=""]; * x0 -- f01; * x1 -- f01; * x0 -- f02; * x2 -- f02; * x1 -- f13; * x3 -- f13; * x1 -- f124; * x2 -- f124; * x4 -- f124; * } * \enddot * * The probability distribution of a factor graph factorizes as: * \f[ P(\mathbf{x}) = \frac{1}{Z} \prod_{I\in \mathcal{F}} f_I(x_I) \f] * where \f$ \mathcal{F} \f$ are the factor nodes of a factor graph (a * bipartite graph consisting of variable nodes and factor nodes), * \f$ f_I(x_I) \f$ are the factors, and \f$ Z \f$ is the partition sum * which properly normalizes the probability distribution. * * Looking at the expressions for the joint probability distributions, * it is obvious that Bayesian networks and Markov random fields can * both be easily represented as factor graphs. Factor graphs most * naturally express the factorization structure of a probability * distribution, and hence are a convenient representation for approximate * inference algorithms, which all try to exploit this factorization. * This is why libDAI uses a factor graph as representation of a * graphical model, implemented in the dai::FactorGraph class. * * \section terminology-inference Inference tasks * * Given a factor graph, specified by the variable nodes \f$\{x_i\}_{i\in\mathcal{V}}\f$ * the factor nodes \f$ \mathcal{F} \f$, the graph structure, and the factors * \f$\{f_I(x_I)\}_{I\in\mathcal{F}}\f$, the following tasks are important: * * - Calculating the partition sum: * \f[ Z = \sum_{\mathbf{x}_{\mathcal{V}}} \prod_{I \in \mathcal{F}} f_I(x_I) \f] * - Calculating the marginal distribution of a subset of variables * \f$\{x_i\}_{i\in A}\f$: * \f[ P(\mathbf{x}_{A}) = \frac{1}{Z} \sum_{\mathbf{x}_{\mathcal{V}\setminus A}} \prod_{I \in \mathcal{F}} f_I(x_I) \f] * - Calculating the MAP state which has the maximum probability mass: * \f[ \mathrm{argmax}_{\mathbf{x}}\,\prod_{I\in\mathcal{F}} f_I(x_I) \f] * * libDAI offers several inference algorithms, which solve (a subset of) these tasks either * approximately or exactly, for factor graphs with discrete variables. The following * algorithms are implemented: * * Exact inference: * - Brute force enumeration: dai::ExactInf * - Junction-tree method: dai::JTree * * Approximate inference: * - Mean Field: dai::MF * - (Loopy) Belief Propagation: dai::BP [\ref KFL01] * - Fractional Belief Propagation: dai::FBP [\ref WiH03] * - Tree-Reweighted Belief Propagation: dai::TRWBP [\ref WJW03] * - Tree Expectation Propagation: dai::TreeEP [\ref MiQ04] * - Generalized Belief Propagation: dai::HAK [\ref YFW05] * - Double-loop GBP: dai::HAK [\ref HAK03] * - Loop Corrected Belief Propagation: dai::MR [\ref MoR05] and dai::LC [\ref MoK07] * - Gibbs sampling: dai::Gibbs * - Conditioned Belief Propagation: dai::CBP [\ref EaG09] * - Decimation algorithm: dai::DecMAP * * Not all inference tasks are implemented by each method: calculating MAP states * is only possible with dai::JTree, dai::BP and dai::DECMAP; calculating partition sums is * not possible with dai::MR, dai::LC and dai::Gibbs. * * \section terminology-learning Parameter learning * * In addition, libDAI supports parameter learning of conditional probability * tables by Expectation Maximization (or Maximum Likelihood, if there is no * missing data). This is implemented in dai::EMAlg. * * \section terminology-variables-states Variables and states * * Linear states are a concept that is used often in libDAI, for example for storing * and accessing factors, which are functions mapping from states of a set of variables * to the real numbers. Internally, a factor is stored as an array, and the array index * of an entry corresponds with the linear state of the set of variables. Below we will * define variables, states and linear states of (sets of) variables. * * \subsection terminology-variables Variables * * Each (random) \a variable has a unique identifier, its \a label (which has * a non-negative integer value). If two variables have the same * label, they are considered as identical. A variable can take on a finite * number of different values or \a states. * * We use the following notational conventions. The discrete * random variable with label \f$l\f$ is denoted as \f$x_l\f$, and the number * of possible values of this variable as \f$S_{x_l}\f$ or simply \f$S_l\f$. * The set of possible values of variable \f$x_l\f$ is denoted * \f$X_l := \{0,1,\dots,S_l-1\}\f$ and called its \a state \a space. * * \subsection terminology-variable-sets Sets of variables and the canonical ordering * * Let \f$A := \{x_{l_1},x_{l_2},\dots,x_{l_n}\}\f$ be a set of variables. * * The \a canonical \a ordering of the variables in \a A is induced by their labels. * That is: if \f$l_1 < l_2\f$, then \f$x_{l_1}\f$ occurs before \f$x_{l_2}\f$ in the * canonical ordering. Below, we will assume that \f$(l_i)_{i=1}^n\f$ is * ordered according to the canonical ordering, i.e., \f$l_1 < l_2 < \dots < l_n\f$. * * \subsection terminology-variable-states States and linear states of sets of variables * * A \a state of the variables in \a A refers to a joint assignment of the * variables, or in other words, to an element of the Cartesian product * \f$ \prod_{i=1}^n X_{l_i}\f$ of the state spaces of the variables in \a A. * Note that a state can also be interpreted as a mapping from variables (or * variable labels) to the natural numbers, which assigns to a variable (or its * label) the corresponding state of the variable. * * A state of \a n variables can be represented as an n-tuple of * non-negative integers: \f$(s_1,s_2,\dots,s_n)\f$ corresponds to the * joint assignment \f$x_{l_1} = s_1, \dots, x_{l_n} = s_n\f$. * Alternatively, a state can be represented compactly as one non-negative integer; * this representation is called a \a linear \a state. The linear state * \a s corresponding to the state \f$(s_1,s_2,\dots,s_n)\f$ would be: * \f[ * s := \sum_{i=1}^n s_i \prod_{j=1}^{i-1} S_{l_j} * = s_1 + s_2 S_{l_1} + s_3 S_{l_1} S_{l_2} + \dots + s_n S_{l_1} \cdots S_{l_{n-1}}. * \f] * * Vice versa, given a linear state \a s for the variables \a A, the * corresponding state \f$s_i\f$ of the \a i 'th variable \f$x_{l_i}\f$ (according to * the canonical ordering of the variables in \a A) is given by * \f[ * s_i = \left\lfloor\frac{s \mbox { mod } \prod_{j=1}^i S_{l_j}}{\prod_{j=1}^{i-1} S_{l_j}}\right\rfloor. * \f] * * Finally, the \a number \a of \a states of the set of variables \a A is simply the * number of different joint assignments of the variables, that is, \f$\prod_{i=1}^n S_{l_i}\f$. */ /** \page fileformats libDAI file formats * * \section fileformats-factorgraph Factor graph (.fg) file format * * This section describes the .fg file format used in libDAI to store factor graphs. * Markov Random Fields are special cases of factor graphs, as are Bayesian * networks. A factor graph can be specified as follows: for each factor, one has * to specify which variables occur in the factor, what their respective * cardinalities (i.e., number of possible values) are, and a table listing all * the values of that factor for all possible configurations of these variables. * * A .fg file is not much more than that. It starts with a line containing the * number of factors in that graph, followed by an empty line. Then all factors * are specified, using one block for each factor, where the blocks are seperated * by empty lines. Each variable occurring in the factor graph has a unique * identifier, its label (which should be a nonnegative integer). Comment lines * which start with # are ignored. * * \subsection fileformats-factorgraph-factor Factor block format * * Each block describing a factor starts with a line containing the number of * variables in that factor. The second line contains the labels of these * variables, seperated by spaces (labels are nonnegative integers and to avoid * confusion, it is suggested to start counting at 0). The third line contains * the number of possible values of each of these variables, also seperated by * spaces. Note that there is some redundancy here, since if a variable appears * in more than one factor, the cardinality of that variable appears several * times in the .fg file; obviously, these cardinalities should be consistent. * The fourth line contains the number of nonzero entries * in the factor table. The rest of the lines contain these nonzero entries; * each line consists of a table index, followed by white-space, followed by the * value corresponding to that table index. The most difficult part is getting * the indexing right. The convention that is used is that the left-most * variables cycle through their values the fastest (similar to MatLab indexing * of multidimensional arrays). * * \subsubsection fileformats-factorgraph-factor-example Example * * An example block describing one factor is: * *

* 3 * 4 8 7 * 3 2 2 * 11 * 0 0.1 * 1 3.5 * 2 2.8 * 3 6.3 * 4 8.4 * 6 7.4 * 7 2.4 * 8 8.9 * 9 1.3 * 10 1.6 * 11 2.6 ** * which corresponds to the following factor: * * \f[ * \begin{array}{ccc|c} * x_4 & x_8 & x_7 & \mbox{value}\\ * \hline * 0 & 0 & 0 & 0.1\\ * 1 & 0 & 0 & 3.5\\ * 2 & 0 & 0 & 2.8\\ * 0 & 1 & 0 & 6.3\\ * 1 & 1 & 0 & 8.4\\ * 2 & 1 & 0 & 0.0\\ * 0 & 0 & 1 & 7.4\\ * 1 & 0 & 1 & 2.4\\ * 2 & 0 & 1 & 8.9\\ * 0 & 1 & 1 & 1.3\\ * 1 & 1 & 1 & 1.6\\ * 2 & 1 & 1 & 2.6 * \end{array} * \f] * * Note that the value of \f$x_4\f$ changes fastest, followed by that of \f$x_8\f$, and \f$x_7\f$ * varies the slowest, corresponding to the second line of the block ("4 8 7"). * Further, \f$x_4\f$ can take on three values, and \f$x_8\f$ and \f$x_7\f$ each have two possible * values, as described in the third line of the block ("3 2 2"). The table * contains 11 non-zero entries (all except for the fifth entry). Note that the * eleventh and twelveth entries are interchanged. * * A final note: the internal representation in libDAI of the factor above is * different, because the variables are ordered according to their indices * (i.e., the ordering would be \f$x_4 x_7 x_8\f$) and the values of the table are * stored accordingly, with the variable having the smallest index changing * fastest: * * \f[ * \begin{array}{ccc|c} * x_4 & x_7 & x_8 & \mbox{value}\\ * \hline * 0 & 0 & 0 & 0.1\\ * 1 & 0 & 0 & 3.5\\ * 2 & 0 & 0 & 2.8\\ * 0 & 1 & 0 & 7.4\\ * 1 & 1 & 0 & 2.4\\ * 2 & 1 & 0 & 8.9\\ * 0 & 0 & 1 & 6.3\\ * 1 & 0 & 1 & 8.4\\ * 2 & 0 & 1 & 0.0\\ * 0 & 1 & 1 & 1.3\\ * 1 & 1 & 1 & 1.6\\ * 2 & 1 & 1 & 2.6 * \end{array} * \f] * * * \section fileformats-evidence Evidence (.tab) file format * * This section describes the .tab fileformat used in libDAI to store "evidence", * i.e., a data set consisting of multiple samples, where each sample is the * observed joint state of some variables. * * A .tab file is a tabular data file, consisting of a header line, followed by * an empty line, followed by the data points, with one line for each data point. * Each line (apart from the empty one) should have the same number of columns, * where columns are separated by one tab character. Each column corresponds to * a variable. The header line consists of the variable labels (corresponding to * dai::Var::label()). The other lines are observed joint states of the variables, i.e., * each line corresponds to a joint observation of the variables, and each column * of a line contains the state of the variable associated with that column. * Missing data is handled simply by having two consecutive tab characters, * without any characters in between. * * \subsection fileformats-evidence-example Example * *

* 1 3 2 * * 0 0 1 * 1 0 1 * 1 1 ** * This would correspond to a data set consisting of three observations concerning * the variables with labels 1, 3 and 2; the first observation being * \f$x_1 = 0, x_3 = 0, x_2 = 1\f$, the second observation being * \f$x_1 = 1, x_3 = 0, x_2 = 1\f$, and the third observation being * \f$x_1 = 1, x_2 = 1\f$ (where the state of \f$x_3\f$ is missing). * * \section fileformats-emalg Expectation Maximization (.em) file format * * This section describes the file format of .em files, which are used * to specify a particular EM algorithm. The .em files are complementary * to .fg files; in other words, an .em file without a corresponding .fg * file is useless. Furthermore, one also needs a corresponding .tab file * containing the data used for parameter learning. * * An .em file starts with a line specifying the number of maximization steps, * followed by an empty line. Then, each maximization step is described in a * block, which should satisfy the format described in the next subsection. * * \subsection fileformats-emalg-maximizationstep Maximization Step block format * * A maximization step block of an .em file starts with a single line * describing the number of shared parameters blocks that will follow. * Then, each shared parameters block follows, in the format described in * the next subsection. * * \subsection fileformats-emalg-sharedparameters Shared parameters block format * * A shared parameters block of an .em file starts with a single line * consisting of the name of a ParameterEstimation subclass * and its parameters in the format of a PropertySet. For example: *

CondProbEstimation [target_dim=2,total_dim=4,pseudo_count=1]* The next line contains the number of factors that share their parameters. * Then, each of these factors is specified on separate lines (possibly * seperated by empty lines), where each line consists of several fields * seperated by a space or a tab character. The first field contains * the index of the factor in the factor graph. The following fields should * contain the variable labels of the variables on which that factor depends, * in a specific ordering. This ordering can be different from the canonical * ordering of the variables used internally in libDAI (which would be sorted * ascendingly according to the variable labels). The ordering of the variables * specifies the implicit ordering of the shared parameters: when iterating * over all shared parameters, the corresponding index of the first variable * changes fastest (in the inner loop), and the corresponding index of the * last variable changes slowest (in the outer loop). By choosing the right * ordering, it is possible to let different factors (depending on different * variables) share parameters in parameter learning using EM. This convention * is similar to the convention used in factor blocks in a factor graph .fg * file (see \ref fileformats-factorgraph-factor). * * \section fileformats-aliases Aliases file format * * An aliases file is basically a list of "macros" and the strings that they * should be substituted with. * * Each line of the aliases file can be either empty, contain a comment * (if the first character is a '#') or contain an alias. In the latter case, * the line should contain a colon; the part before the colon contains the * name of the alias, the part after the colon the string that it should be * substituted with. Any whitespace before and after the colon is ignored. * * For example, the following line would define the alias \c BP_SEQFIX * as a shorthand for "BP[updates=SEQFIX,tol=1e-9,maxiter=10000,logdomain=0]": *

* BP_SEQFIX: BP[updates=SEQFIX,tol=1e-9,maxiter=10000,logdomain=0] ** * Aliases files can be used to store default options for algorithms. */ /** \page bibliography Bibliography * \anchor EaG09 \ref EaG09 * F. Eaton and Z. Ghahramani (2009): * "Choosing a Variable to Clamp", *