Regularization
Ill-defined messages
Propagating a message $\tilde{\mu}_{i\rightarrow j}$ from a cluster $\mathcal{C}_i$ to its neighbor $\mathcal{C}_j$ involves 3 steps:
1. $\mathcal{C}_i$'s belief $\beta_i$ is marginalized over the sepset nodes $\mathcal{S}_{i,j}$:
\[\tilde{\mu}_{i\rightarrow j} = \int\beta_i d(\mathcal{C}_i\setminus \mathcal{S}_{i,j})\]
2. The message $\tilde{\mu}_{i\rightarrow j}$ is divided by the current edge belief $\mu_{i,j}$, and the result is multiplied into $\mathcal{C}_j$'s belief $\beta_j$:
\[\beta_j \leftarrow \beta_j\tilde{\mu}_{i\rightarrow j}/\mu_{i,j}\]
3. The edge belief $\mu_{i,j}$ is updated to the message just passed:
\[\mu_{i,j} \leftarrow \tilde{\mu}_{i\rightarrow j}\]
In the linear Gaussian setting, where each belief has the form $\exp(-x^{\top}\bm{J}x/2 + h^{\top}x + g)$, where $x$ denotes its scope, these steps can be concisely expressed in terms of the $(\bm{J},h,g)$ parameters of the beliefs involved.
Crucially, the precision matrix $\bm{J}$ for $\beta_i$ has to be full-rank / invertible with respect to the nodes to be integrated out.
For example, if $\mathcal{C}_i=\{X_1,X_2,X_3\}$, $\mathcal{S}_{i,j}=\{X_1\}$ and:
\[\beta_i = \exp\left(-\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}^{\top}\bm{J} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}/2 + h^{\top}\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} + g\right), \text{ where } \bm{J} = \begin{matrix}x_1 \\ x_2 \\ x_3\end{matrix}\!\! \begin{bmatrix}1 & -1/2 & -1/2 \\ -1/2 & 1/4 & 1/4 \\ -1/2 & 1/4 & 1/4\end{bmatrix}\]
then the $2\times 2$ submatrix of $\bm{J}$ for $\mathcal{C}_i\setminus\mathcal{S}_{i,j}=\{X_2,X_3\}$ (annotated above) consists only of $1/4$s, is not full-rank, and thus $\tilde{\mu}_{i\rightarrow j}$ is ill-defined / cannot be computed.
On a clique tree, a schedule of messages that follows a postorder traversal of the tree (from the tip clusters to a root cluster) can always be computed.
On a loopy cluster graph however, it may be unclear how to find a schedule such that each message is well-defined, or if such a schedule even exists. This is a problem since a loopy cluster graph typically requires multiple traversals to reach convergence.
A heuristic
One approach to deal with ill-defined messages is to skip their computation and proceed on with the schedule.
A more robust, yet simple, alternative is to regularize cluster beliefs by increasing some diagonal elements of their precision matrix so that the relevant submatrices are full-rank:
\[ \begin{matrix}x_1 \\ x_2 \\ x_3\end{matrix}\!\! \begin{bmatrix}1 & -1/2 & -1/2 \\ -1/2 & 1/4 & 1/4 \\ -1/2 & 1/4 & 1/4\end{bmatrix} \longrightarrow \begin{bmatrix}1 & -1/2 & -1/2 \\ -1/2 & 1/4\textcolor{red}{+\epsilon} & 1/4 \\ -1/2 & 1/4 & 1/4\textcolor{red}{+\epsilon}\end{bmatrix}\]
To maintain the probability model, the product of all cluster beliefs divided by the product of all edge beliefs must remain equal to the joint distribution $p_\theta$ (this is satisfied after factor assignment, and everytime a message is passed).
Thus, each time a cluster belief is regularized, we "balance" this change by a similar modification to one or more associated edge beliefs. For example, if $\mathcal{C}_i$ above was connected to another sepset $\mathcal{S}_{i,k}=\{X_2,X_3\}$ with $\bm{J}=\bm{0}$ then we might do:
\[ \begin{matrix}x_2 \\ x_3\end{matrix}\!\! \begin{bmatrix}0 & 0 \\ 0 & 0\end{bmatrix} \longrightarrow \begin{bmatrix}0\textcolor{red}{+\epsilon} & 0 \\ 0 & \textcolor{red}{+\epsilon}\end{bmatrix}\]
We provide several options for regularization below. A typical usage of these methods is after the initial assignment of factors.
By cluster
regularizebeliefs_bycluster!
performs regularization separately from message passing.
The algorithm loops over each cluster. For each incident sepset, $\epsilon$ is added to all the diagonal entries of its precision, and to the corresponding diagonal entries of the cluster's precision.
Currently, calibrate_optimize_clustergraph!
calls regularizebeliefs_bycluster!
for each set of candidate parameter values, before calibration is run. Other options are likely to be available in future versions.
Along node subtrees
regularizebeliefs_bynodesubtree!
performs regularization separately from message passing.
A node subtree of a cluster graph is the subtree induced by all clusters that contain that node.
The algorithm loops over each node subtree. For all edges and all but one cluster in a given node subtree, it adds $\epsilon$ to the diagonal entries of the precision matrix that correspond to that node.
On a schedule
regularizebeliefs_onschedule!
interleaves regularization with message passing.
The algorithm loops over each cluster and tracks which messages have been sent:
- Each cluster $\mathcal{C}_i$ is regularized only if it has not received a message from $\ge 1$ of its neighbors.
- Regularization proceeds by adding $\epsilon$ to the diagonal entries of $\mathcal{C}_i$'s precision that correspond to the nodes in $\mathcal{S}_{i,j}$, and to all diagonal entries of $\mathcal{S}_{i,j}$'s precision, if neighbor $\mathcal{C}_j$ has not sent a message to $\mathcal{C}_i$.
- After being regularized, $\mathcal{C}_i$ sends a message to each neighbor for which it has not already done so.
The example below shows how regularization methods can help to minimize ill-defined messages. We use here a network from Lipson et al. (2020, Extended Data Fig. 4) [5], with degree-2 nodes suppressed and any resulting length 0 edges assigned length 1 (see lipson2020b_notes.jl), and follow the steps in Exact likelihood for fixed parameters:
julia> using DataFrames, Tables, PhyloNetworks, PhyloGaussianBeliefProp
julia> const PGBP = PhyloGaussianBeliefProp;
julia> net = readTopology(pkgdir(PGBP, "test/example_networks", "lipson_2020b.phy")); # 54 edges; 44 nodes: 12 tips, 11 hybrid nodes, 21 internal tree nodes.
julia> preorder!(net)
julia> df = DataFrame(taxon=tipLabels(net), # simulated using `simulate(net, ParamsBM(0, 1))` from PhyloNetworks
x=[0.431, 1.606, 0.72, 0.944, 0.647, 1.263, 0.46, 1.079, 0.877, 0.748, 1.529, -0.469]);
julia> m = PGBP.UnivariateBrownianMotion(1, 0); # choose model: σ2 = 1.0, μ = 0.0
julia> fg = PGBP.clustergraph!(net, PGBP.Bethe()); # build factor graph
julia> tbl_x = columntable(select(df, :x)); # trait data as column table
julia> b = PGBP.init_beliefs_allocate(tbl_x, df.taxon, net, fg, m); # allocate memory for beliefs
julia> PGBP.init_beliefs_assignfactors!(b, m, tbl_x, df.taxon, net.nodes_changed); # assign factors based on model
julia> fgb = PGBP.ClusterGraphBelief(b); # wrap beliefs for message passing
julia> sched = PGBP.spanningtrees_clusterlist(fg, net.nodes_changed); # generate schedule
Without regularization, errors indicating ill-defined messages (which are skipped) are returned when we run a single iteration of calibration:
julia> PGBP.calibrate!(fgb, sched); # there are ill-defined messages (which are skipped)
┌ Error: belief H5I5I16, integrating [2, 3]
└ @ PhyloGaussianBeliefProp ...
┌ Error: belief H6I10I15, integrating [2, 3]
└ @ PhyloGaussianBeliefProp ...
However, with regularization, there are no ill-defined messages for a single iteration of calibration:
julia> PGBP.init_beliefs_assignfactors!(b, m, tbl_x, df.taxon, net.nodes_changed); # reset to initial beliefs
julia> PGBP.regularizebeliefs_bynodesubtree!(fgb, fg); # regularize by node subtree
julia> PGBP.calibrate!(fgb, sched); # no ill-defined messages
julia> PGBP.init_beliefs_assignfactors!(b, m, tbl_x, df.taxon, net.nodes_changed); # reset to initial beliefs
julia> PGBP.regularizebeliefs_onschedule!(fgb, fg); # regularize by on schedule
julia> PGBP.calibrate!(fgb, sched); # no ill-defined messages
Note that this does not necessarily guarantee that subsequent iterations avoid ill-defined messages.