-IMPORTANT
+TODO:
-- Find out which methods need connected factor graphs (at least TreeEP and JunctionTree).
+- Improve documentation.
-- Forwardport the Bart patch
+- Adapt http://www.boost.org/development/requirements.html#Design_and_Programming
-- Get rid of multiple inheritance
+- Use "gcc -MM" to generate dependencies for targets.
+ Maybe even better: switch to cmake (see http://lwn.net/Articles/188693/)
+ cross-platform build system.
-- Introduce a namespace libDAI
-- Investigate the performance penalty from the use of Properties map
+OPTIMIZATION:
-- Another design question that needs to be answered: should a BP object own a
- FactorGraph, or only store a reference to a FactorGraph (which can optimize
- memory), or should it be a FactorGraph with additional variables and functions?
- Probably, the first or second option would be the best. Since FactorGraphs are
- assumed to be rather static, it would make sense to store *references* to
- FactorGraphs inside AIAlgs. Or maybe some smart_ptr? It would make sense to
- prevent unnecessary copying of FactorGraphs. Also, general marginalization
- functions now copy a complete object. Instead, it would make more sense that
- they construct a new object without copying the FactorGraph. Or they can be made
- simply methods of the general InfAlg class.
+- BipartiteGraph::isConnected should be optimized using boost::graph
+- Replace VarSets by smallSet<size_t> where appropriate, in order to minimize the use of findVar().
-DIFFICULT
+- A DAIAlg<T> should not inherit from a FactorGraph/RegionGraph, but should
+ store a reference to it. This prevents needless copying of (possibly large)
+ data structures. Disadvantage: the caller must not change the data structure
+ between calls. (Maybe a smart_ptr would help here?)
+ Also, general marginalization functions now copy a complete object. Instead, it
+ would make more sense that they construct a new object without copying the
+ FactorGraph. Or they can simply be made methods of the general InfAlg class.
-- Bug: TreeEP sometimes returns NANs.
-- Bug: TreeEP::logZ() seems to be wrong (?).
+IDEAS WORTH EXPLORING:
-- Bug: strange things happen for at least JTree and TreeEP if the factor graph is not connected.
- Should be fixed by finding out which methods demand connected factor graphs and these
- methods should check whether the supplied factor graph is connected.
+- Use a PropertySet as output of a DAIAlg, instead of functions like maxDiff and Iterations().
-- Kees' trick for preventing NANs in GBP updates: if quantity smaller than epsilon, replace by epsilon.
+- Cache second-order neighborhoods (delta's) in BipGraph?
-- Bug: MF_SEQRANDOM does not work on alarm2.fg (normalization error: Z = 0)
+- If you ever do a rewrite, make sure that the graphical properties are not
+ entangled with the probabilistic properties. E.g., a factor graph really
+ should be represented as a bipartite graph, with a separate array of variable
+ labels and dimensions, and a seperate array of (pointers to) factor tables. In
+ this way, each factor could be implemented differently, e.g., we could have
+ some sparse factors, some noisy-OR factors, some dense factors, some arbitrary
+ precision factors, etc. Or we could make more use of templates to have a more
+ generic factor graph.
+- Use Boost::uBLAS framework to deal with matrices, especially, with
+ 2D sparse matrices.
+ See http://www.boost.org/libs/numeric/ublas/doc/matrix_sparse.htm
+ and tests/errorbounds/errorbounds3
-OPTIONAL
+- Introduce naming scheme:
+ all Vars and VarSets should be named v_..., e.g. v_i instead of i
+ all Factors should be named f_..., e.g. f_I instead of I
+ all indices should be named _..., e.g. _k instead of k
+ all iterators should be named i_, e.g. i_i is an iterator to i
+ all const_iterators should be named ci_, e.g. ci_i is an iterator to i
-- Restore brute force EXACT method.
+- Define a better fileformat for .fg files (maybe using XML)?
- Alternative implementation of undo factor changes: the only things that have to be
-undone are setting a factor to all ones and setting a factor to a Kronecker delta. This
-could also be implemented in the factor itself, which could maintain its state
-(one/delta/full) and act accordingly.
+ undone are setting a factor to all ones and setting a factor to a Kronecker delta. This
+ could also be implemented in the factor itself, which could maintain its state
+ (one/delta/full) and act accordingly.
-- Regression test could be enhanced by
- * reporting number of iterations
- * reporting all single node marginal errors
-
-- Think about the _fac2ind variable: a better implementation of this feature is
-probably possible.
-
-- Changed() methods?
-
-- It might be an idea to forbid modification of the structure of a factor graph
- (except in constructors). Then we could do away with Regenerate and similar methods.
-
-- Think about whether using the L-1 norm as a distance measure between new and
- old messages/beliefs would be better, since || P(x_A, x_B) - Q(x_A, xB) ||_1 < epsilon
- implies that || P(x_A) - Q(x_A) ||_1 < epsilon. However, for a product of messages
- we have the weaker || P1(x_A) P2(x_A) - Q1(x_A) Q2(x_A) ||_1 < 2 epsilon if
- || P1(x_A) - Q1(x_A) ||_1 < epsilon AND || P2(x_A) - Q1(x_A) ||_1 < epsilon...
-
-- Implement damping.
+- Think about the RegionGraph::fac2OR variable: a better implementation of this feature is
+ probably possible.
- Optimize GBP with precalculated indices.
+- Optimize all indices as follows: keep a cache of all indices that have been
+ computed (use a hash). Then, instead of computing on the fly, use the precomputed ones.
+
- Variable elimination should be implemented generically using a function
object that tells you which variable to delete.
- Improve general support for graphs and trees.
-- What to do in case of divide-by-zero? For HUGIN, the correct way to handle
- this is to define x/0 = 0. For other methods, this may not be the case...
- Maybe a good solution would be to make different flavours of Prob which differ
- in the way that they handle zeroes (two issues here: normalization of an all-zero
- factor, which appears to be a sign that something is wrong, and division-by-zero
- in operator/, operator/= and inverse(), which is probably harmless and can be fixed
- by defining x/0 = 0. I think it is harmless because usually it occurs by calculating
- some marginal _in the absence of some factor_ as an intermediate step. Afterwards,
- this factor is always multiplied in again, which sets any value on these positions
- to zero. Another difference would be how to deal with underflows: GBP often has
- numbers that converge to zero, but instead this gives an underflow. A solution is
- to set everything smaller than some epsilon to zero. Finally, one could decide to
- add support for sparse potentials. Oh, and whether it can handle negative entries.
+- Add support for sparse potentials.
- Optimize BP_SEQMAX (it should use a better data structure than a vector for the residuals).
- Also it does strange things in case of negative potentials.
-- Investigate the possibility to calculate logZ *during* the inference algorithm
- by keeping track of all normalizations.
-- Write documentation.
+BAD IDEAS:
+
+- A FactorGraph and a RegionGraph are often equipped with
+ additional properties for nodes and edges. The code to initialize those
+ is often quite similar; maybe this can be abstracted to classes
+ like ExtFactorGraph and ExtRegionGraph (Ext stands for Extended), e.g.
+ template <typename NodeProperties, typename EdgeProperties>
+ class ExtFactorGraph : public FactorGraph {
+ public:
+ std::vector<NodeProperties> nodeProps;
+ std::vector<std::vector<EdgeProperties> > edgeProps;
+ // blabla
+ }
+ A disadvantage of this approach may be worse cachability.
+ A problem is if there are nog properties for nodes (type 1), nodes (type 2)
+ or edges. Maybe this can be solved using specializations, or using variadac
+ template arguments or something similar?
+ Idea: you could define a "class Empty {}", and add some code that checks for
+ the typeid, comparing it with Empty, and doing something special in that case
+ (e.g., not allocating memory).
+ The main disadvantage of this approach seems to be that it leads to even more
+ entanglement.
+
+- Instead of polymorphism by inheritance, use polymorphism by template
+ parameterization. For example, the real reason you introduced the complicated
+ inheritance scheme was for functions like
+ Factor calcMarginal( const InferenceAlgorithm & obj, const VarSet & ns, bool reInit );
+ Now instead, you could use a template function:
+ template<typename InferenceAlgorithm>
+ Factor calcMarginal( const InferenceAlgorithm &obj, const VarSet &ns, bool reInit );
+ This would assume that InferenceAlgorithm supports certain methods, like
+ clone(), grm(), ...... Ideally, you would use concepts to define different
+ classes of inference algorithms with different capabilities, for example the
+ ability to calculate logZ, the ability to calculate marginals, the ability to
+ calculate bounds, the ability to calculate MAP states, etcetera. Then, use
+ traits classes in order to be able to query the capabilities of the model. For
+ example, you would be able to query whether the inference algorithm supports
+ calculation of logZ. Unfortunately, this is compile-time polymorphism, whereas
+ tests/test needs runtime polymorphism.