XGitUrl: http://git.tuebingen.mpg.de/?p=libdai.git;a=blobdiff_plain;f=TODO;h=ba34d112decd85e2329e9cb7448b5e9ee8602e10;hp=a66062102f69f516bea7343eb760451c6b67c40a;hb=7f2ae0d584185669fd276f75bf41e46690e0ab51;hpb=ee69bb6ffc4549d9a756fcfea7e4f131269a0c3f
diff git a/TODO b/TODO
index a660621..ba34d11 100644
 a/TODO
+++ b/TODO
@@ 1,15 +1,90 @@
IMPORTANT
 Find out which methods need connected factor graphs (at least TreeEP and JunctionTree).
+ Idea: a FactorGraph and a RegionGraph are often equipped with
+extra properties for nodes and edges. The code to initialize those
+is often quite similar; maybe this can be abstracted to classes
+like ExtFactorGraph and ExtRegionGraph (Ext stands for Extended), e.g.
+template
+class ExtFactorGraph : public FactorGraph {
+ public:
+ std::vector nodeProps;
+ std::vector > edgeProps;
+ // blabla
+}
+A disadvantage of this approach may be worse cachability.
+A problem is if there are nog properties for nodes (type 1), nodes (type 2)
+or edges. Maybe this can be solved using specializations, or using variadac
+template arguments or something similar?
+Idea: you could define a "class Empty {}", and add some code that checks for
+the typeid, comparing it with Empty, and doing something special in that case
+(e.g., not allocating memory).
+The main disadvantage of this approach seems to be that it leads to even more
+entanglement.
+
+ THIS SMELLS LIKE A GOOD IDEA: instead of polymorphism by inheritance,
+use polymorphism by template parameterization. For example, the real reason
+you introduced the complicated inheritance scheme was for functions like
+ Factor calcMarginal( const InferenceAlgorithm & obj, const VarSet & ns, bool reInit );
+Now instead, you could use a template function:
+template
+ Factor calcMarginal( const InferenceAlgorithm &obj, const VarSet &ns, bool reInit );
+This would assume that InferenceAlgorithm supports certain methods, like
+clone(), grm(), ......
+Ideally, you would use concepts to define different classes of inference
+algorithms with different capabilities, for example the ability to calculate logZ,
+the ability to calculate marginals, the ability to calculate bounds, the ability
+to calculate MAP states, etcetera. Then, use traits classes in order to be able to
+query the capabilities of the model. For example, you would be able to query whether
+the inference algorithm supports calculation of logZ.
+
+ If you ever do a rewrite, make sure that the graphical properties are
+not entangled with the probabilistic properties. E.g., a factor graph
+really should be represented as a bipartite graph, with a separate array
+of variable labels and dimensions, and a seperate array of (pointers to)
+factor tables. In this way, each factor could be implemented differently,
+e.g., we could have some sparse factors, some noisyOR factors, some dense
+factors, some arbitrary precision factors, etc. Or we could make more use
+of templates to have a more generic factor graph. Maybe in the end,
+HasA relations are better than IsA relations...
+Also, the current setup is stupid: I wrote a new function that works
+on FactorGraphs, and I had to write boiler plate code for it in graphicalmodel.h
+and in regiongraph.h (which is stupid).
+
+ Clean up
+
+ Use Boost::uBLAS framework to deal with matrices, especially, with
+2D sparse matrices. See http://www.boost.org/libs/numeric/ublas/doc/matrix_sparse.htm
+and tests/errorbounds/errorbounds3
+
+ Introduce naming scheme:
+all Vars and VarSets should be named v_..., e.g. v_i instead of i
+all Factors should be named f_..., e.g. f_I instead of I
+all indices should be named _..., e.g. _k instead of k
+all iterators should be named i_, e.g. i_i is an iterator to i
+all const_iterators should be named ci_, e.g. ci_i is an iterator to i
+
+ Improve weightedgraph or use Boost::Graph
+
+ Iterations and maxDiff are only interesting for iterative inference
+algorithms. Yet, tests/test wants to know these values in a generic
+way. Maybe we have to think of some way (e.g. using a Properties object)
+to return these values from run(). Then we can simply look whether a
+InferenceAlgorithm supports these fields. What other results could
+we return? Time. MaxDiff during initialization for LC methods.
+Or maybe we could use some traits mechanism which we can ask whether the
+object has _iterations and _maxdiff variables.
+
+ Think about whether the cavity initialization belongs to init() or to run().
+
+ Fix LCLin.
+
+ setFactor and setFactors should only change Probs, not Factors.
+
+ Simplify Properties framework: each Property should be a std::string.
+Each DAIAlg should have its own _properties struct and handle conversion.
 Forwardport the Bart patch
 Get rid of multiple inheritance

 Introduce a namespace libDAI

 Investigate the performance penalty from the use of Properties map

 Another design question that needs to be answered: should a BP object own a
FactorGraph, or only store a reference to a FactorGraph (which can optimize
memory), or should it be a FactorGraph with additional variables and functions?
@@ 21,6 +96,12 @@ IMPORTANT
they construct a new object without copying the FactorGraph. Or they can be made
simply methods of the general InfAlg class.
+ Add comments in example.cpp, add documentation.
+
+ Write documentation.
+
+ Improve error handling.
+
DIFFICULT
@@ 28,69 +109,49 @@ DIFFICULT
 Bug: TreeEP::logZ() seems to be wrong (?).
 Bug: strange things happen for at least JTree and TreeEP if the factor graph is not connected.
 Should be fixed by finding out which methods demand connected factor graphs and these
 methods should check whether the supplied factor graph is connected.

 Kees' trick for preventing NANs in GBP updates: if quantity smaller than epsilon, replace by epsilon.
 Bug: MF_SEQRANDOM does not work on alarm2.fg (normalization error: Z = 0)

OPTIONAL
 Restore brute force EXACT method.
+ Use Expat XML parser for IO and define a XML based fileformat for .fg files?
+
+ Port to GNU autotool chain?
+
+ Another design question that needs to be answered: should a BP object own a
+ FactorGraph, or only store a reference to a FactorGraph (which can optimize
+ memory), or should it be a FactorGraph with additional variables and functions?
+ Probably, the first or second option would be the best. Since FactorGraphs are
+ assumed to be rather static, it would make sense to store *references* to
+ FactorGraphs inside AIAlgs. Or maybe some smart_ptr? It would make sense to
+ prevent unnecessary copying of FactorGraphs. Also, general marginalization
+ functions now copy a complete object. Instead, it would make more sense that
+ they construct a new object without copying the FactorGraph. Or they can be made
+ simply methods of the general InfAlg class.
+
+ Forwardport the Bart/Max patch.
 Alternative implementation of undo factor changes: the only things that have to be
undone are setting a factor to all ones and setting a factor to a Kronecker delta. This
could also be implemented in the factor itself, which could maintain its state
(one/delta/full) and act accordingly.
 Regression test could be enhanced by
 * reporting number of iterations
 * reporting all single node marginal errors

 Think about the _fac2ind variable: a better implementation of this feature is
probably possible.
 Changed() methods?
 It might be an idea to forbid modification of the structure of a factor graph
 (except in constructors). Then we could do away with Regenerate and similar methods.

 Think about whether using the L1 norm as a distance measure between new and
 old messages/beliefs would be better, since  P(x_A, x_B)  Q(x_A, xB) _1 < epsilon
 implies that  P(x_A)  Q(x_A) _1 < epsilon. However, for a product of messages
 we have the weaker  P1(x_A) P2(x_A)  Q1(x_A) Q2(x_A) _1 < 2 epsilon if
  P1(x_A)  Q1(x_A) _1 < epsilon AND  P2(x_A)  Q1(x_A) _1 < epsilon...

 Implement damping.

 Optimize GBP with precalculated indices.
+ Optimize all indices as follows: keep a cache of all indices that have been
+computed (use a hash). Then, instead of computing on the fly, use the precomputed
+ones.
+
 Variable elimination should be implemented generically using a function
object that tells you which variable to delete.
 Improve general support for graphs and trees.
 What to do in case of dividebyzero? For HUGIN, the correct way to handle
 this is to define x/0 = 0. For other methods, this may not be the case...
 Maybe a good solution would be to make different flavours of Prob which differ
 in the way that they handle zeroes (two issues here: normalization of an allzero
 factor, which appears to be a sign that something is wrong, and divisionbyzero
 in operator/, operator/= and inverse(), which is probably harmless and can be fixed
 by defining x/0 = 0. I think it is harmless because usually it occurs by calculating
 some marginal _in the absence of some factor_ as an intermediate step. Afterwards,
 this factor is always multiplied in again, which sets any value on these positions
 to zero. Another difference would be how to deal with underflows: GBP often has
 numbers that converge to zero, but instead this gives an underflow. A solution is
 to set everything smaller than some epsilon to zero. Finally, one could decide to
 add support for sparse potentials. Oh, and whether it can handle negative entries.
+ Add support for sparse potentials.
 Optimize BP_SEQMAX (it should use a better data structure than a vector for the residuals).
 Also it does strange things in case of negative potentials.

 Investigate the possibility to calculate logZ *during* the inference algorithm
 by keeping track of all normalizations.

 Write documentation.