PhyloGaussianBeliefProp

PhyloGaussianBeliefProp is a Julia package for the analysis of Gaussian models on phylogenetic networks using belief propagation (aka message passing).

Library

PhyloGaussianBeliefProp.AbstractClusterGraphMethod
PhyloGaussianBeliefProp.BPPosDefException
PhyloGaussianBeliefProp.Bethe
PhyloGaussianBeliefProp.CanonicalBelief
PhyloGaussianBeliefProp.CanonicalBelief
PhyloGaussianBeliefProp.Cliquetree
PhyloGaussianBeliefProp.ClusterFactor
PhyloGaussianBeliefProp.ClusterGraphBelief
PhyloGaussianBeliefProp.ClusterGraphBelief
PhyloGaussianBeliefProp.EvolutionaryModel
PhyloGaussianBeliefProp.GeneralizedBelief
PhyloGaussianBeliefProp.GeneralizedBelief
PhyloGaussianBeliefProp.GeneralizedBelief
PhyloGaussianBeliefProp.HeterogeneousBrownianMotion
PhyloGaussianBeliefProp.HeterogeneousShiftedBrownianMotion
PhyloGaussianBeliefProp.JoinGraphStructuring
PhyloGaussianBeliefProp.LTRIP
PhyloGaussianBeliefProp.MessageResidual
PhyloGaussianBeliefProp.MessageResidual
PhyloGaussianBeliefProp.MvDiagBrownianMotion
PhyloGaussianBeliefProp.MvFullBrownianMotion
PhyloGaussianBeliefProp.PaintedParameter
PhyloGaussianBeliefProp.UnivariateBrownianMotion
PhyloGaussianBeliefProp.UnivariateOrnsteinUhlenbeck
PhyloGaussianBeliefProp.absorbevidence!
PhyloGaussianBeliefProp.absorbleaf!
PhyloGaussianBeliefProp.addtreenode_belowdegeneratehybrid!
PhyloGaussianBeliefProp.allocatebeliefs
PhyloGaussianBeliefProp.assign!
PhyloGaussianBeliefProp.assignfactors!
PhyloGaussianBeliefProp.average_energy
PhyloGaussianBeliefProp.average_energy!
PhyloGaussianBeliefProp.betheclustergraph
PhyloGaussianBeliefProp.branch_actualization
PhyloGaussianBeliefProp.branch_displacement
PhyloGaussianBeliefProp.branch_precision
PhyloGaussianBeliefProp.branch_variance
PhyloGaussianBeliefProp.calibrate!
PhyloGaussianBeliefProp.calibrate!
PhyloGaussianBeliefProp.calibrate_exact_cliquetree!
PhyloGaussianBeliefProp.calibrate_optimize_cliquetree!
PhyloGaussianBeliefProp.calibrate_optimize_clustergraph!
PhyloGaussianBeliefProp.check_runningintersection
PhyloGaussianBeliefProp.cliquetree
PhyloGaussianBeliefProp.clustergraph
PhyloGaussianBeliefProp.clustergraph!
PhyloGaussianBeliefProp.default_rootcluster
PhyloGaussianBeliefProp.default_rootcluster
PhyloGaussianBeliefProp.dimension
PhyloGaussianBeliefProp.divide!
PhyloGaussianBeliefProp.entropy
PhyloGaussianBeliefProp.extend!
PhyloGaussianBeliefProp.factor_hybridnode
PhyloGaussianBeliefProp.factor_root
PhyloGaussianBeliefProp.factor_treeedge
PhyloGaussianBeliefProp.factored_energy
PhyloGaussianBeliefProp.free_energy
PhyloGaussianBeliefProp.getcholesky
PhyloGaussianBeliefProp.getcholesky_μ
PhyloGaussianBeliefProp.getcholesky_μ!
PhyloGaussianBeliefProp.hasdegenerate
PhyloGaussianBeliefProp.hybridnode_displacement
PhyloGaussianBeliefProp.hybridnode_precision
PhyloGaussianBeliefProp.hybridnode_variance
PhyloGaussianBeliefProp.init_beliefs_allocate_atroot!
PhyloGaussianBeliefProp.init_beliefs_reset!
PhyloGaussianBeliefProp.init_beliefs_reset_fromfactors!
PhyloGaussianBeliefProp.init_clustergraph
PhyloGaussianBeliefProp.init_factors_allocate
PhyloGaussianBeliefProp.init_factors_frombeliefs!
PhyloGaussianBeliefProp.init_messagecalibrationflags_reset!
PhyloGaussianBeliefProp.init_messagecalibrationflags_reset!
PhyloGaussianBeliefProp.init_messageresidual_allocate
PhyloGaussianBeliefProp.inscope_onenode
PhyloGaussianBeliefProp.integratebelief!
PhyloGaussianBeliefProp.integratebelief!
PhyloGaussianBeliefProp.iscalibrated_kl!
PhyloGaussianBeliefProp.iscalibrated_residnorm
PhyloGaussianBeliefProp.iscalibrated_residnorm!
PhyloGaussianBeliefProp.isdegenerate
PhyloGaussianBeliefProp.isdegenerate_extendedfamily_covered
PhyloGaussianBeliefProp.isdegenerate_extendedfamily_covered
PhyloGaussianBeliefProp.isfamilypreserving
PhyloGaussianBeliefProp.ishybridsinglepositivechild
PhyloGaussianBeliefProp.joingraph
PhyloGaussianBeliefProp.ltripclustergraph
PhyloGaussianBeliefProp.marginalize
PhyloGaussianBeliefProp.marginalize!
PhyloGaussianBeliefProp.moralize
PhyloGaussianBeliefProp.moralize!
PhyloGaussianBeliefProp.mult!
PhyloGaussianBeliefProp.nodefamilies
PhyloGaussianBeliefProp.nodesubtree
PhyloGaussianBeliefProp.nodesubtree_clusterlist
PhyloGaussianBeliefProp.params
PhyloGaussianBeliefProp.params_optimize
PhyloGaussianBeliefProp.params_original
PhyloGaussianBeliefProp.parentinformation
PhyloGaussianBeliefProp.preprocessnet!
PhyloGaussianBeliefProp.propagate_1traversal_postorder!
PhyloGaussianBeliefProp.propagate_belief!
PhyloGaussianBeliefProp.regularizebeliefs_1clustersepset!
PhyloGaussianBeliefProp.regularizebeliefs_bycluster!
PhyloGaussianBeliefProp.regularizebeliefs_bynodesubtree!
PhyloGaussianBeliefProp.regularizebeliefs_onschedule!
PhyloGaussianBeliefProp.residual_kldiv!
PhyloGaussianBeliefProp.scopeindex
PhyloGaussianBeliefProp.scopeindex
PhyloGaussianBeliefProp.scopeindex
PhyloGaussianBeliefProp.scopeindex
PhyloGaussianBeliefProp.shrinkdegenerate_treeedges
PhyloGaussianBeliefProp.spanningtree_clusterlist
PhyloGaussianBeliefProp.spanningtrees_clusterlist
PhyloGaussianBeliefProp.triangulate_minfill!
PhyloGaussianBeliefProp.unscope

PhyloGaussianBeliefProp.AbstractClusterGraphMethod — Type

AbstractClusterGraphMethod

Abstract type for cluster graph construction algorithms.

source

PhyloGaussianBeliefProp.BPPosDefException — Type

BPPosDefException

Exception thrown when a belief message cannot be computed, that is, when the submatrix of the precision J, subsetted to the variables to be integrated out, is not positive definite. It has a message msg field (string), and an info field (integer) inherited from LinearAlgebra.PosDefException, to indicate the location of (one of) the eigenvalue(s) which is (are) less than/equal to 0.

source

PhyloGaussianBeliefProp.Bethe — Type

Bethe

Subtype of AbstractClusterGraphMethod.

Algorithm

A Bethe cluster graph (also known as factor graph) has:

a factor-cluster {v, parents(v}} for each node-family in the network, that is, for each non-root node v (a family is a child node and all of its parents)
- with one exception: if v's family is included in another family, then no factor-cluster is created for v.
a variable-cluster {v} for each non-leaf node v in the network, or more specifically, for each node v that belongs in more than 1 factor.

Each variable-cluster {v} is joined to the factor-clusters that contain v, by an edge labelled with sepset {v}.

References

D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009. ISBN 9780262013192.

source

PhyloGaussianBeliefProp.CanonicalBelief — Type

CanonicalBelief{
    T<:Real,
    Vlabel<:AbstractVector,
    P<:AbstractMatrix{T},
    V<:AbstractVector{T},
    M,
} <: AbstractBelief{T}

A canonical belief is an exponential quadratic form with canonical parameters J, h, g (see MvNormalCanon{T,P,V} in Distributions.jl):

C(x | J,h,g) = exp(-(1/2)xᵀJx + hᵀx + g)

If J is positive-definite (i.e. J ≻ 0), then C(x | J,h,g) can be interpreted as a non-degenerate, though possibly unnormalized, distribution density for x.

Fields

μ::V: mean (of x); well-defined only when J ≻ 0; not always updated
h::V: potential; interpretable as Jμ when J ≻ 0
J::P: precision; interpretable as inv(Σ), where Σ is the variance of x, when J ≻ 0
g::MVector{1,T}: g[1] is interpretable as the normalization constant for the belief

when J ≻ 0

The belief is normalized if:

        g = - (1/2) (log(2πΣ) + μ'Jμ)
        = - entropy of normalized distribution + (1/2) dim(μ) - (1/2) μ'Jμ.

Other fields used to track which cluster or edge the belief corresponds to, and which traits of which nodes are in scope:

nodelabel::Vlabel: vector of labels (e.g. preorder indices) for nodes from the phylogeny

that this belief corresponds to (i.e. whose traits distribution it describes)

ntraits::Integer: maximum number of traits per node
inscope::BitArray: matrix of booleans (row i: trait i, column j: node j) indicating

which traits are present for which nodes

type::BeliefType: cluster or sepset
metadata::M: index (in the cluster graph) of the cluster or sepset that this belief

corresponds to (e.g. ::Symbol for a cluster, ::Tuple{Symbol,Symbol} for a sepset)

Methods

nodelabels(b): b.nodelabel
ntraits(b): b.ntraits
inscope(b): b.inscope, a ntraits(b)×nodelabels(b) matrix
nodedimensions(b): vector of integers, with jth value giving the dimension (number of traits in scope) of node j.
dimension(b): total dimension of the belief, that is, total number of traits in scope. Without any missing data, that would be ntraits(b)*nodelabels(b).

source

PhyloGaussianBeliefProp.CanonicalBelief — Method

CanonicalBelief(nodelabels, numtraits, inscope, belieftype, metadata, T=Float64)

Constructor to allocate memory for one cluster, and initialize objects with 0s to initialize the belief with the constant function exp(0)=1.

source

PhyloGaussianBeliefProp.Cliquetree — Type

Cliquetree

Subtype of AbstractClusterGraphMethod.

Algorithm

moralize the network (connect partners that share a child).
triangulate the resulting undirected graph using greedy min-fill, see triangulate_minfill!(graph).
extract the maximal cliques of the resulting chordal graph.
calculate the edge weight between each pair of maximal cliques as the size of their intersection
find a maximum-weight spanning tree
label the retained edges (in the spanning tree) by the intersection of the two cliques they connect.

References

D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009. ISBN 9780262013192.

source

PhyloGaussianBeliefProp.ClusterFactor — Method

ClusterFactor(belief::AbstractBelief{T}) where T

Constructor to allocate memory for one cluster factor, with canonical parameters and metadata initialized to be a copy of those in belief. ClusterFactors metadata are supposed to be symbols, so this constructor should fail if its input is a sepset belief, whose metadata is a Tuple of Symbols.

source

PhyloGaussianBeliefProp.ClusterGraphBelief — Type

ClusterGraphBelief{B<:AbstractBelief, F<:ClusterFactor, M<:MessageResidual}

Structure to hold a vector of beliefs, with cluster beliefs coming first and sepset beliefs coming last. Fields:

belief: vector of beliefs
factor: vector of initial cluster beliefs after factor assignment
nclusters: number of clusters
cdict: dictionary to get the index of a cluster belief from its node labels
sdict: dictionary to get the index of a sepset belief from the labels of its two incident clusters
messageresidual: dictionary to log information about sepset messages, which can be used to track calibration or help adaptive scheduling with residual BP. See MessageResidual. The keys of messageresidual are tuples of cluster labels, similar to a sepset's metadata. Each edge in the cluster graph has 2 messages corresponding to the 2 directions in which a message can be passed, with keys: (label1, label2) and (label2, label1). The cluster receiving the message is the first label in the tuple, and the sending cluster is the second.

Assumptions:

For a cluster belief, the cluster's nodes are stored in the belief's metadata.
For a sepset belief, its incident clusters' nodes are in the belief's metadata.

source

PhyloGaussianBeliefProp.ClusterGraphBelief — Method

ClusterGraphBelief(belief_vector::Vector{B})

Constructor of a ClusterGraphBelief with belief belief_vector and all other fields constructed accordingly. New memory is allocated for these other fields, e.g. for factors (with data copied from cluster beliefs) and message residuals (with data initialized to 0 but of size matching that from sepset beliefs)

To construct the input vector of beliefs, see allocatebeliefs and assignfactors!

source

PhyloGaussianBeliefProp.EvolutionaryModel — Type

EvolutionaryModel{T}

Evolutionary model type, with T the element type in all parameter vector and matrices. Implemented models include the UnivariateBrownianMotion.

An object of this type must contain at least the following elements:

μ: the mean of the trait at the root.
v: the variance of the trait at the root. Can be zero (fixed root) or infinite.

New evolutionary models must implement the following interfaces:

params(obj::EvolutionaryModel)
params_optimize(obj::EvolutionaryModel)
params_original(obj::EvolutionaryModel, transformedparams::AbstractArray)

source

PhyloGaussianBeliefProp.GeneralizedBelief — Type

GeneralizedBelief{
    T<:Real,
    Vlabel<:AbstractVector,
    P<:AbstractMatrix{T},
    V<:AbstractVector{T},
    M,
} <: AbstractBelief{T}

A generalized belief is the product of an exponential quadratic form and a Dirac measure (generalizing the CanonicalBelief) with generalized parameters Q, R, Λ, h, c, g:

𝒟(x | Q,R,Λ,h,c,g) = exp(-(1/2)xᵀ(QΛQᵀ)x + (Qh)ᵀx + g) • δ(Rᵀx - c)

For m-dimensional x (i.e. x ∈ ℝᵐ), [Q R] is assumed to be m × m orthonormal (i.e. [Q R]ᵀ[Q R] = Iₘ) such that Q and R are orthogonal (i.e. QᵀR = 0).

Generally, Q is m × (m-k) and R is m × k, where 0 ≤ k ≤ m. We refer to R as the constraint matrix (R represents linear dependencies among the variables of x) and to k as the constraint rank. Λ is (m-k) × (m-k) diagonal, h and c are respectively m × 1 and k × 1, and g is scalar.

When k=0, a generalized belief is equivalent to a canonical belief with canonical parameters J₁=QΛQᵀ, h₁=Qh, g₁=g.

Fields

μ::V: m × 1, where μ[1:(m-k)] stores (QΛQᵀ)⁻¹Qh = QΛ⁻¹h, the mean of x; well-defined

only when QΛQᵀ is positive-definite (i.e. Q is square and Λ has no zero-entries on its diagonal); not always updated

h::V: m × 1 vector, where h[1:(m-k)] stores generalized parameter h. Note that is

not the potential for a canonical belief.

Q::P: m × m matrix, where Q[:,1:(m-k)] stores generalized parameter Q
Λ::V: m × 1 vector, where Diagonal(Λ[1:(m-k)]) gives generalized parameter Λ
g::MVector{1,T}: g[1] stores the generalized parameter g
k::MVector{1,Int}: k[1] stores the constraint rank k
R::P: m × m matrix, where R[:,1:k] stores the generalized parameter R (i.e. the

constraint matrix)

c::V: m × 1 matrix, where c[1:k] stores the the generalized parameter c

hmsg, Qmsg, Λmsg, gmsg, kmsg, Rmsg, cmsg are matrices/vectors of the same dimensions as h, Q, Λ, g, k, R, c and are meant to store (1) the outgoing message for a sending cluster, (2) the incoming message for a receiving cluster, or (3) the message quotient (from dividing an outgoing message by a sepset belief) for a mediating sepset. We refer to these as the message fields of the generalized belief.

Other fields (defined identically as for CanonicalBelief) used to track which cluster or edge the belief corresponds to, and which traits of which nodes are in scope:

nodelabel::Vlabel
ntraits::Integer
inscope::BitArray
type::BeliefType
metadata::M

source

PhyloGaussianBeliefProp.GeneralizedBelief — Method

GeneralizedBelief(b::CanonicalBelief)

Constructor from a canonical belief b.

Precision b.J is eigendecomposed into Q Λ transpose(Q), where Q and Λ are square with the same dimensions as b.J, and Λ is positive semidefinite.

source

PhyloGaussianBeliefProp.GeneralizedBelief — Method

GeneralizedBelief(nodelabels, numtraits, inscope, belieftype, metadata, T=Float64)

Constructor to allocate memory for one cluster, and initialize objects with zeros to initialize the belief with the constant function exp(0)=1.

source

PhyloGaussianBeliefProp.HeterogeneousBrownianMotion — Type

HeterogeneousBrownianMotion{T,U,V,W} <: HeterogeneousBM{T}

Type for a heterogeneous Brownian motion model, univariate or multivariate. Along each edge, evolution follows a Brownian motion. Each edge can have its own variance rate. This model has no shifts, and no extra hybrid variance. By default, the root is fixed with prior variance 0.

T is the scalar type, U is the type for the root mean (vector of length d, where d is the trait dimension, even if univariate), V is a matrix type for the root variance, and W the matrix type of each variance rate, one per color. For a univariate BM, we may have W=T and V=Vector{T}. For a multivariate BM, we may have W=V<:Matrix{T}. This is such that each field is mutable, so we can update model parameters in place within the model object, itself immutable.

source

PhyloGaussianBeliefProp.HeterogeneousShiftedBrownianMotion — Type

HeterogeneousShiftedBrownianMotion{T,U,V,W} <: HeterogeneousBM{T}

Type for a heterogeneous Brownian motion model like HeterogeneousBrownianMotion but with a possible shift in the mean along each edge.

source

PhyloGaussianBeliefProp.JoinGraphStructuring — Type

JoinGraphStructuring

Subtype of AbstractClusterGraphMethod.

Fieldnames

maxclustersize: upper limit for cluster size. This value must be at least the size of the largest node family in the input phylogenetic network, normally 3 if the network is bicombining (each hybrid node has 2 parents, never more). See nodefamilies(net).

Constructors

JoinGraphStructuring(maxclustersize, net): checks that the input maxclustersize is valid for net

Algorithm, by Mateescu et al. (2010)

Requires:

a user-specified maximum cluster size
an elimination order for the nodes in the HybridNetwork, hopefully yielding a small induced-width, e.g. from a heuristic such as greedy min-fill (see triangulate_minfill!(graph)).

Each node in net labels a "bucket", and these buckets are ordered according to the elimination order, e.g. the highest-priority bucket is labelled by the first node in the elimination order.
Node families are assigned to buckets based on the highest-priority node they contain. For example, if {1,4,9} forms a node family and if node 4 is higher in the elimination order than nodes 1 or 9, then {1,4,9} gets assigned to bucket 4.
Node families are clusters (of nodes), and we refer to the clusters within a bucket as "minibuckets". Minibuckets within a bucket can be merged as long as the size of their union does not exceed the maximum cluster size allowed. Sometimes, this can be done in multiple ways.
Starting with the highest-priority bucket, we create new minibuckets by "marginalizing out" the bucket label from each existing minibucket (these are left unchanged in the process).
Each new minibucket is joined to its "originator" minibucket by an edge that is labeled by their intersection (i.e. the variables in the new minibucket). Each new minibucket is then reassigned to a new (and necessarily lower-priority) bucket based on its highest priority node. Merging can take place during reassignment as long as the the maximum cluster size is respected. The union of 2 minibuckets retains the edges of each of minibucket.
Steps 4 & 5 are carried out for each bucket in order of priority.
The resulting minibuckets in each bucket are then joined in a chain (there is some degree of freedom for how this can be done), where each edge is labelled by the bucket label.

References

R. Mateescu, K. Kask, V.Gogate, and R. Dechter. Join-graph propagation algorithms. Journal of Artificial Intelligence Research, 37:279-328, 2010 doi: 10.1613/jair.2842.

source

PhyloGaussianBeliefProp.LTRIP — Type

LTRIP{T<:Integer}

Subtype of AbstractClusterGraphMethod. A HybridNetwork and a valid LTRIP are passed to clustergraph! to construct a cluster graph from the user-provided clusters based on the Layered Trees Running Intersection Property algorithm of Streicher & du Preez (2017).

Fieldnames

clusters: vector of clusters, required to be family-preserving with respect to some HybridNetwork – see isfamilypreserving. Within each cluster, nodes (identified by their preorder index in the network) are required to be sorted in decreasing order (for postorder)

Constructors

LTRIP(net): uses nodefamilies(net) as input clusters, which are guaranteed to be family-preserving
LTRIP(clusters, net): checks if that clusters provided are family-preserving, then sorts each cluster in decreasing order (modifying them in place!) before creating the LTRIP object.

They assume, with no check, that net already has a preordering.

Algorithm

An initial graph G is considered, in which each input cluster is a node. An edge (C1,C2) is added if clusters C1 and C2 share at least 1 node (in net). The weight of edge (C1,C2) is defined as the size of the intersection C1 ∩ C2.
For each node n in net, the subgraph of G induced by the clusters containing n, G_n, has its weights adjusted as follows:
- the edges of maximum weight (within G_n) are identified, then
- the weight of each edge is increased by the number of max-weight edges that either of its endpoints adjacent to.
Then, LTRIP finds a maximum-weight spanning tree of G_n. The edges of this tree are all labelled with {n} (or its label or preorder index).
The spanning trees for each node are layered on one another to form a cluster graph. In other words, an edge (C1,C2) is added if it is present is any spanning tree. If so, its sepset is the union of its labels across the different spanning trees.

References

S. Streicher and J. du Preez. Graph Coloring: Comparing Cluster Graphs to Factor Graphs. In Proceedings of the ACM Multimedia 2017 Workshop on South African Academic Participation, pages 35-42, 2017. doi: 10.1145/3132711.3132717.

source

PhyloGaussianBeliefProp.MessageResidual — Type

MessageResidual{T<:Real, P<:AbstractMatrix{T}, V<:AbstractVector{T}} <: AbstractResidual{T}

Structure to store the most recent computation history of a message, in the form of the ratio: sent_message / current_sepset_belief, when a message is sent from one cluster to another along a given sepset. At calibration, this ratio is 1. For Gaussian beliefs, this ratio is an exponential quadratic form, stored using its canonical parametrization, excluding the constant.

Fields:

Δh: canonical parameter vector of the message residual
ΔJ: canonical parameter matrix of the message residual
kldiv: kl divergence between the message that was last sent and the sepset belief before the last update
iscalibrated_resid: true if the last message and prior sepset belief were approximately equal, false otherwise. see iscalibrated_residnorm!
iscalibrated_kl: same, but in terms of the KL divergence, see iscalibrated_kl!.

source

PhyloGaussianBeliefProp.MessageResidual — Method

MessageResidual(J::AbstractMatrix{T}, h::AbstractVector{T})

Constructor to allocate memory for a MessageResidual with canonical parameters (ΔJ, Δh) of the same dimension and type as J and h, initialized to zeros. kldiv is initalized to [-1.0] and the flags iscalibrated_{resid,kl} are initialized to false if the message is of positive dimension. If the message is empty (ΔJ and Δh of dimension 0) then the message is initialized as being calibrated: kldiv is set to 0 and iscalibrated flags set to true.

(ΔJ, Δh) of zero suggest calibration, but the flags iscalibrated_{resid,kl} being false indicate otherwise.

source

PhyloGaussianBeliefProp.MvDiagBrownianMotion — Type

MvDiagBrownianMotion{T,V} <: HomogeneousBrownianMotion{T}

The multivariate Brownian motion with diagonal variance rate matrix, that is, traits with independent evolution. It is homogeneous across the phylogeny. R is the variance rate (stored as a vector of type V), μ is the prior mean at the root and v the prior variance at the root, 0 by default (and both also of type V)

source

PhyloGaussianBeliefProp.MvFullBrownianMotion — Type

MvFullBrownianMotion{T,P1,V,P2} <: HomogeneousBrownianMotion{T}

The full multivariate Brownian motion. It is homogeneous across the phylogeny. R is the variance rate (of matrix type P1), μ is the prior mean at the root (of vector type V) and v the prior variance at the root, 0 by default (of matrix type P2).

source

PhyloGaussianBeliefProp.PaintedParameter — Type

PaintedParameter{T}

Type with 2 fields:

parameter: vector whose elements are of type T
color: DefaultDict dictionary mapping integers to integers, with a default value of 1.

ncolors(pp) returns the number of parameters, that is, the length of pp.parameter This type is meant to store several values for a given evolutionary parameter (say, Brownian motion variance rate), each one being used on some edges or nodes of a phylogenetic network. The default parameter value is the first one.

For an edge number i, color j=pp.color[i] indexes its parameter value, that is, evolution along edge number i should use pp.parameter[j]. This parameter value is obtained with getparameter(pp, j) for the parameter value of color j, or getparameter(pp, edge) for the parameter value of edge.

source

PhyloGaussianBeliefProp.UnivariateBrownianMotion — Type

UnivariateBrownianMotion{T} <: HomogeneousBrownianMotion{T}

The univariate Brownian motion, homogeneous across the phylogeny, that is, with the same variance rate σ2 across all edges. μ is the prior mean at the root. v the prior variance at the root, 0 by default.

source

PhyloGaussianBeliefProp.UnivariateOrnsteinUhlenbeck — Type

UnivariateOrnsteinUhlenbeck{T} <: HomogeneousOrnsteinUhlenbeck{T}

The univariate Ornstein-Uhlenbeck model. It is homogeneous, that is, has the same parameters across all edges in the phylogeny. σ2 is the variance rate, α the selection strength, θ the optimal value, μ the prior mean at the root, and v the prior variance at the root, 0 by default.

source

PhyloGaussianBeliefProp.absorbevidence! — Method

absorbevidence!(h,J,g, dataindex, datavalues)

Absorb evidence, at indices dataindex and using datavalues. Warnings:

a subset of h is modified in place
traits are assumed to come in the same order in dataindex as in datavalues.

source

PhyloGaussianBeliefProp.absorbleaf! — Method

absorbleaf!(h,J,g, rowindex, columntable)

Absorb evidence from a leaf, given in col[rowindex] of each column in the table, then marginalizes out any variable for a missing trait at that leaf. See absorbevidence! and marginalize. Warning: The leaf traits are assumed to correspond to the first variables in h (and J), as is output by factor_treeedge.

source

PhyloGaussianBeliefProp.addtreenode_belowdegeneratehybrid! — Method

addtreenode_belowdegeneratehybrid!(net::HybridNetwork)

If a degenerate hybrid node h1 has 1 child edge of length t>0 to a hybrid child h2: break the edge by adding a tree node at distance t from h1 and 0 from h2. That way, h1 may be removed from scope. This is done iteratively, as h2 may become degenerate after this operation. See shrinkdegenerate_treeedges to remove degenerate internal tree nodes, and hasdegenerate to check if net still has degenerate nodes.

source

PhyloGaussianBeliefProp.allocatebeliefs — Method

allocatebeliefs(tbl, taxa, nodevector_preordered, clustergraph,
                 evolutionarymodel)

Tuple (beliefs, (node2cluster, node2family, node2degen, cluster2nodes)) with:

beliefs: vector of beliefs, canonical or generalized as appropriate, initialized to the constant function exp(0)=1, one for each cluster then one for each sepset in clustergraph.
node2cluster: vector mapping each node to the cluster it is assigned to: node2cluster[i] is the index of the cluster to which was assigned the family for node of preorder index i.
node2family: vector mapping each node to its node family. Each node is represented by its preorder index. A family is represented as a vector of preorder indices: [child, parent1, ...].
node2fixed: vector of booleans, indicating for each node if its value is fixed, either because it's the root and is assumed fixed by the model, or because it's a tip with data (and any missing values removed from scope).
node2degen: vector mapping each node to a boolean indicating if its distribution is deterministic given its parents (e.g. if all of its parent edges 0 length under a BM with no extra variance).
cluster2nodes: vector mapping each cluster to the nodes (represented by their preorder index) whose families are assigned to the cluster.

tbl is used to know which leaf in net has data for which trait, so as to remove from the scope each variable without data below it. taxa should be a vector with taxon names in the same order as they come in the table of data tbl. The root is removed from scope if the evolutionary model has a fixed root: so as to use the model's fixed root value as data if the root as zero prior variance.

Warnings:

This function might need to be re-run to re-do allocation if:
- the data changed: different number of traits, or different pattern of
missing data at the tips
- the model changed: with the root changed from fixed to random, see
init_beliefs_allocate_atroot! in that case

source

PhyloGaussianBeliefProp.assign! — Method

assign!(bucket, new_minibucket, max_minibucket_size)

Merge new_minibucket with one of the minibuckets contained in bucket (in order of decreasing size) subject to the constraint that the resulting minibucket does not exceed max_minibucket_size. If this is not possible, then new_minibucket is added to bucket as a new minibucket. The bucket should be represented as a dictionary: minibucket_size => vector of minibuckets of that size, where each minibucket is itself represented as a vector of indices.

Output:

(resulting_minibucket, minibucket_merged_into) if a successful merge is found
(new_minibucket, []) otherwise

source

PhyloGaussianBeliefProp.assignfactors! — Method

assignfactors!(beliefs,
               evolutionarymodel, columntable, taxa,
               nodevector_preordered, node2cluster, node2family)

Initialize cluster beliefs prior to belief propagation, by assigning each factor to one cluster. Sepset beliefs are reset to 1. There is one factor for each node v: the density of X_v conditional on its parent X_pa(v) if v is not the root, or the prior density for X_root.

For each leaf, the factor is reduced by absorbing the evidence for that leaf,

that is, the data found in columntable, whose rows should be ordered consistent with taxa

For each leaf, missing trait values are removed from scope.
For each internal node, any trait not in scope (e.g. if all descendant leaves

are missing a value for this trait) is marginalized out of the factor.

Assumptions:

In beliefs, cluster beliefs come first and sepset beliefs come last, as when

created by allocatebeliefs

beliefs is modified in place.

source

PhyloGaussianBeliefProp.average_energy — Function

average_energy!(ref::AbstractBelief, target::AbstractFactor)
average_energy!(ref::AbstractBelief, Jₜ, hₜ, gₜ)
average_energy(Jᵣ::Union{LA.Cholesky,PDMat}, μᵣ, Jₜ, hₜ, gₜ)

Average energy (i.e. negative expected log) of a target canonical form with parameters (Jₜ, hₜ, gₜ) with respect to a normalized non-degenerate reference canonical form ref with parameters (Jᵣ, hᵣ). The reference distribution is normalized, so specifying gᵣ is unnecessary. When the target canonical form is also normalized and non-degenerate, this is equal to their cross-entropy:

H(fᵣ, fₜ) = - Eᵣ(log fₜ) = - ∫ fᵣ log fₜ .

ref is assumed to be non-degenerate, that is, Jᵣ should be positive definite.

average_energy! modifies the reference belief by updating ref.μ to J⁻¹h. It calls average_energy after a cholesky decomposition of ref.J, stored in Jᵣ: see getcholesky_μ!.

Calculation:

ref: f(x) = C(x | Jᵣ, hᵣ, _) is the density of 𝒩(μ=Jᵣ⁻¹hᵣ, Σ=Jᵣ⁻¹) target: C(x | Jₜ, hₜ, gₜ) = exp( - (1/2)x'Jₜx + hₜ'x + gₜ )

E[-log C(X | Jₜ, hₜ, gₜ)] where X ∼ C(Jᵣ, hᵣ, _)
= 0.5 (μᵣ'Jₜ μᵣ + tr(Jᵣ⁻¹Jₜ)) - hₜ'μᵣ - gₜ

With empty vectors and matrices (J's of dimension 0×0 and h's of length 0), the result is simply: - gₜ.

source

PhyloGaussianBeliefProp.average_energy! — Method

average_energy!(ref::AbstractBelief, target::AbstractFactor)
average_energy!(ref::AbstractBelief, Jₜ, hₜ, gₜ)
average_energy(Jᵣ::Union{LA.Cholesky,PDMat}, μᵣ, Jₜ, hₜ, gₜ)

H(fᵣ, fₜ) = - Eᵣ(log fₜ) = - ∫ fᵣ log fₜ .

ref is assumed to be non-degenerate, that is, Jᵣ should be positive definite.

average_energy! modifies the reference belief by updating ref.μ to J⁻¹h. It calls average_energy after a cholesky decomposition of ref.J, stored in Jᵣ: see getcholesky_μ!.

Calculation:

ref: f(x) = C(x | Jᵣ, hᵣ, _) is the density of 𝒩(μ=Jᵣ⁻¹hᵣ, Σ=Jᵣ⁻¹) target: C(x | Jₜ, hₜ, gₜ) = exp( - (1/2)x'Jₜx + hₜ'x + gₜ )

E[-log C(X | Jₜ, hₜ, gₜ)] where X ∼ C(Jᵣ, hᵣ, _)
= 0.5 (μᵣ'Jₜ μᵣ + tr(Jᵣ⁻¹Jₜ)) - hₜ'μᵣ - gₜ

With empty vectors and matrices (J's of dimension 0×0 and h's of length 0), the result is simply: - gₜ.

source

PhyloGaussianBeliefProp.betheclustergraph — Method

betheclustergraph(net)

See Bethe

source

PhyloGaussianBeliefProp.branch_actualization — Method

branch_actualization(obj::EvolutionaryModel, edge::PN.Edge) 
branch_displacement(obj::EvolutionaryModel,  edge::PN.Edge)
branch_precision(obj::EvolutionaryModel,     edge::PN.Edge)
branch_variance(obj::EvolutionaryModel,      edge::PN.Edge)
branch_logdet(obj::EvolutionaryModel,        edge::PN.Edge, precision::AbstractMatrix)
branch_transition_qωjg(obj::EvolutionaryModel,    edge)
branch_transition_qωv!(q, obj::EvolutionaryModel, edge)

Under the most general linear Gaussian model, X₀ given X₁ is Gaussian with conditional mean q X₁ + ω and conditional variance Σ independent of X₁. branch_actualization, branch_displacement and branch_variance return, respectively, q, ω and Σ. branch_precision and branch_variance should return a matrix of symmetric type. branch_variance defaults to the inverse of branch_precision. branch_logdet defaults to g = -0.5*log(|2πΣ|), the log normalizing constant of the Gaussian density in the traditional form. branch_transition_* return or modify in place the corresponding transition matrices.

Under a Brownian motion, we have q=I, ω=0, and conditional variance t*R where R is the model's variance rate and t the branch length.

source

PhyloGaussianBeliefProp.branch_displacement — Function

branch_actualization(obj::EvolutionaryModel, edge::PN.Edge) 
branch_displacement(obj::EvolutionaryModel,  edge::PN.Edge)
branch_precision(obj::EvolutionaryModel,     edge::PN.Edge)
branch_variance(obj::EvolutionaryModel,      edge::PN.Edge)
branch_logdet(obj::EvolutionaryModel,        edge::PN.Edge, precision::AbstractMatrix)
branch_transition_qωjg(obj::EvolutionaryModel,    edge)
branch_transition_qωv!(q, obj::EvolutionaryModel, edge)

Under a Brownian motion, we have q=I, ω=0, and conditional variance t*R where R is the model's variance rate and t the branch length.

source

PhyloGaussianBeliefProp.branch_precision — Function

branch_actualization(obj::EvolutionaryModel, edge::PN.Edge) 
branch_displacement(obj::EvolutionaryModel,  edge::PN.Edge)
branch_precision(obj::EvolutionaryModel,     edge::PN.Edge)
branch_variance(obj::EvolutionaryModel,      edge::PN.Edge)
branch_logdet(obj::EvolutionaryModel,        edge::PN.Edge, precision::AbstractMatrix)
branch_transition_qωjg(obj::EvolutionaryModel,    edge)
branch_transition_qωv!(q, obj::EvolutionaryModel, edge)

Under a Brownian motion, we have q=I, ω=0, and conditional variance t*R where R is the model's variance rate and t the branch length.

source

PhyloGaussianBeliefProp.branch_variance — Function

branch_actualization(obj::EvolutionaryModel, edge::PN.Edge) 
branch_displacement(obj::EvolutionaryModel,  edge::PN.Edge)
branch_precision(obj::EvolutionaryModel,     edge::PN.Edge)
branch_variance(obj::EvolutionaryModel,      edge::PN.Edge)
branch_logdet(obj::EvolutionaryModel,        edge::PN.Edge, precision::AbstractMatrix)
branch_transition_qωjg(obj::EvolutionaryModel,    edge)
branch_transition_qωv!(q, obj::EvolutionaryModel, edge)

Under a Brownian motion, we have q=I, ω=0, and conditional variance t*R where R is the model's variance rate and t the branch length.

source

PhyloGaussianBeliefProp.calibrate! — Function

calibrate!(beliefs::ClusterGraphBelief, schedule, niterations=1;
    auto::Bool=false, info::Bool=false,
    verbose::Bool=true,
    update_residualnorm::Bool=true,
    update_residualkldiv::Bool=false)

Propagate messages in postorder then preorder for each tree in the schedule list, for niterations. Each schedule "tree" should be a tuple of 4 vectors as output by spanningtree_clusterlist, where each vector provides the parent/child label/index of an edge along which to pass a message, and where these edges are listed in preorder. For example, the parent of the first edge is taken to be the root of the schedule tree. Calibration is evaluated after each schedule tree is run, and said to be reached if all message residuals have a small norm, based on iscalibrated_residnorm.

Output: true if calibration is reached, false otherwise.

Optional keyword arguments:

auto: If true, then belief updates are stopped after calibration is found to be reached. Otherwise belief updates continue for the full number of iterations.
info: Is true, information is logged with the iteration number and schedule tree index when calibration is reached.
verbose: log error messages about degenerate messages
update_residualnorm
update_residualkldiv

See also: iscalibrated_residnorm and iscalibrated_residnorm! for the tolerance and norm used by default, to declare calibration for a given sepset message (in 1 direction).

source

PhyloGaussianBeliefProp.calibrate! — Function

calibrate!(beliefs::ClusterGraphBelief, scheduletree::Tuple,
    verbose::Bool=true, up_resnorm::Bool=true, up_reskldiv::Bool=false)

Propagate messages along the scheduletree, in postorder then preorder: see propagate_1traversal_postorder!.

Output: true if all message residuals have a small norm, based on iscalibrated_residnorm, false otherwise.

source

PhyloGaussianBeliefProp.calibrate_exact_cliquetree! — Method

calibrate_exact_cliquetree!(beliefs::ClusterGraphBelief,
    schedule,
    nodevector_preordered,
    tbl::Tables.ColumnTable, taxa::AbstractVector,
    evolutionarymodel_name)

For a Brownian Motion with a fixed root, compute the maximum likelihood estimate of the prior mean at the root and the restricted maximum likelihood (REML) estimate of the variance/covariance rate matrix using analytical formulas relying on belief propagation, using the data in tbl at leaves in the network. These estimates are for the model with a prior variance of 0 at the root, that is, a root state equal to the prior mean.

output: (bestmodel, loglikelihood_score) where bestmodel is an evolutionary model created by evolutionarymodel_name, containing the estimated model parameters.

assumptions:

taxa should list the taxon names in the same order in which they come in the rows of tbl.
schedule should provide a schedule to transmit messages between beliefs in beliefs (containing clusters first then sepsets). This schedule is assumed to traverse a clique tree for the input phylogeny, with the root cluster containing the root of the phylogeny in its scope.
nodevector_preordered should list the nodes in this phylogeny, in preorder.
beliefs should be of size and scope consistent with evolutionarymodel_name and data in tbl.
a leaf should either have complete data, or be missing data for all traits.

Steps:

Calibrate beliefs in place according to the schedule, under a model with an infinite prior variance at the root.
Estimate parameters analytically.
Re-calibrate beliefs, to calculate the maximum log-likelihood of the fixed-root model at the estimated optimal parameters, again modifying beliefs in place. (Except that beliefs with the root node in scope are re-assigned to change their scoping dimension.)

Warning: there is no check that the beliefs and schedule are consistent with each other.

source

PhyloGaussianBeliefProp.calibrate_optimize_cliquetree! — Function

calibrate_optimize_cliquetree!(beliefs::ClusterGraphBelief, clustergraph,
    nodevector_preordered, tbl::Tables.ColumnTable, taxa::AbstractVector,
    evolutionarymodel_name, evolutionarymodel_startingparameters)

Optimize model parameters using belief propagation along clustergraph, assumed to be a clique tree for the input network, whose nodes in preorder are nodevector_preordered. Optimization aims to maximize the likelihood of the data in tbl at leaves in the network. The taxon names in taxa should appear in the same order as they come in tbl. The parameters being optimized are the variance rate(s) and prior mean(s) at the root. The prior variance at the root is fixed.

The calibration does a postorder of the clique tree only, to get the likelihood at the root without the conditional distribution at all nodes, modifying beliefs in place. Therefore, if the distribution of ancestral states is sought, an extra preorder calibration would be required. Warning: there is no check that the cluster graph is in fact a clique tree.

source

PhyloGaussianBeliefProp.calibrate_optimize_clustergraph! — Function

calibrate_optimize_clustergraph!(beliefs::ClusterGraphBelief, clustergraph,
    nodevector_preordered, tbl::Tables.ColumnTable, taxa::AbstractVector,
    evolutionarymodel_name, evolutionarymodel_startingparameters,
    max_iterations=100, regularizationfunction=regularizebeliefs_bycluster!)

Same as calibrate_optimize_cliquetree! above, except that the user can supply an arbitrary clustergraph (including a clique tree) for the input network. Optimization aims to maximize the factored_energy approximation to the ELBO for the log-likelihood of the data (which is also the negative Bethe free_energy). When clustergraph is a clique tree, the factored energy approximation is exactly equal to the ELBO and the log-likelihood.

Cluster beliefs are regularized using regularizebeliefs_bycluster! by default (other options include regularizebeliefs_bynodesubtree!, regularizebeliefs_onschedule!) before calibration. The calibration repeatedly loops through a minimal set of spanning trees (see spanningtrees_clusterlist) that covers all edges in the cluster graph, and does a postorder-preorder traversal for each tree. The loop runs till calibration is detected or till max_iterations have passed, whichever occurs first.

source

PhyloGaussianBeliefProp.check_runningintersection — Method

check_runningintersection(clustergraph, net)

Vector of tuples. Each tuple is of the form (nodelabel, istree), where nodelabel::Symbol is the label of a node in net and istree is true (false) if the node's cluster subgraph is (is not) a tree. This "cluster subgraph" for a given node is the subgraph of clustergraph induced by the clusters containing the node and by the edges whose sepset contain the node: see nodesubtree.

clustergraph satisfies the generalized running-intersection property if istree is true for all nodes in net.

Warning:

assumes that net has been preordered, and
does not check if clustergraph has been correctly constructed.

source

PhyloGaussianBeliefProp.cliquetree — Method

cliquetree(chordal_graph)

Clique tree U for an input graph g assumed to be chordal (triangulated), e.g. using triangulate_minfill!. Warning: does not check that the graph is already triangulated.

Each node in U is a maximal clique of g whose data is the tuple of vectors (nodelabels, nodedata) using the labels and data from g, with nodes sorted by decreasing data. If g was originally built from a phylogenetic network using moralize, then the nodes' data are their preorder index, making them sorted in postorder within in each clique. The clique label is the concatenation of the node labels.
For each edge (clique1, clique2) in U, the edge data hold the sepset (separating set) information as a vector of node data, for nodes shared by both clique1 and clique2. In this sepset, nodes are sorted by decreasing data.

Uses maximal_cliques and kruskal_mst (for min/maximum spanning trees) from Graphs.jl.

source

PhyloGaussianBeliefProp.clustergraph — Function

clustergraph!(net, method)
clustergraph(net, method)

Cluster graph U for an input network net and a method of cluster graph construction. The following methods are supported:

Bethe
LTRIP
JoinGraphStructuring
Cliquetree

The first method pre-processes net, which may modify it in place, see preprocessnet!. The second method assumes that net is already pre-processed.

source

PhyloGaussianBeliefProp.clustergraph! — Method

clustergraph!(net, method)
clustergraph(net, method)

Cluster graph U for an input network net and a method of cluster graph construction. The following methods are supported:

Bethe
LTRIP
JoinGraphStructuring
Cliquetree

The first method pre-processes net, which may modify it in place, see preprocessnet!. The second method assumes that net is already pre-processed.

source

PhyloGaussianBeliefProp.default_rootcluster — Method

default_rootcluster(clustergraph, nodevector_preordered)

Index of a cluster that contains the network's root, whose label is assumed to be 1 (preorder index). If multiple clusters contain the network's root, then one is chosen with the smallest number of taxa (leaves in the network).

For cluster with label :lab, its property clustergraph[:lab][2] should list the nodes in the cluster, by the index of each node in nodevector_preordered such that 1 corresponds to the network's root. Typically, this vector is net.vec_node after the network is preordered.

source

PhyloGaussianBeliefProp.default_rootcluster — Method

default_rootcluster(clustergraph)

Index of a cluster that contains the node with the smallest preorder index. If multiple clusters contain that node, then one is chosen that only has that node. If all clusters containing that node have more than 1 node, then a cluster is chosen containing a node with the second-smallest-preorder index.

For cluster with label :lab, its property clustergraph[:lab][2] should list the nodes in the cluster by their preorder index, sorted in decreasingly order (so the smallest is at the end).

source

PhyloGaussianBeliefProp.dimension — Method

dimension(m::EvolutionaryModel)

Number of traits, e.g. 1 for univariate models.

source

PhyloGaussianBeliefProp.divide! — Method

divide!(sepset::GeneralizedBelief, cluster_from::GeneralizedBelief)
divide!(sepset::CanonicalBelief, cluster_from::GeneralizedBelief)
divide!(sepset::CanonicalBelief, h, J, g)

Divide an outgoing message by a sepset belief, return the canonical parameters of the quotient, and update the sepset belief to the message.

For the first two forms, the dividend is the message of cluster_from. For the third form, the canonical parameters of the message are specified directly (it is assumed in this case that the message is unconstrained by any linear dependencies among its variables). For the first form, the message of sepset is updated to the quotient. If the quotient is constrained (i.e. must be expressed as a generalized belief), then the canonical parameters of the unconstrained part (i.e. exponential quadratic part) are returned (i.e. Qh, QΛQᵀ, g).

Assumptions (not checked):

Dividend and divisor are assumed to have matching scopes
The constraint rank for the divisor cannot exceed that of the dividend

(i.e. cluster_from.kmsg[1] ≥ sepset.k[1]) for division to be well-defined

source

PhyloGaussianBeliefProp.entropy — Method

entropy(J::Cholesky)
entropy(J::AbstractMatrix)
entropy(belief::AbstractFactor)

Entropy of a multivariate Gaussian distribution with precision matrix J, assumed to be square and symmetric (not checked). It is 0 if J is empty (of size 0×0). It may be Inf if J is semi-definite. The second version applies the first to the belief precision belief.J.

entropy is defined for discrete distributions in StatsBase.jl and extended to Gaussian distributions in Distributions.jl around here

source

PhyloGaussianBeliefProp.extend! — Method

extend!(cluster_to::GeneralizedBelief, sepset)
extend!(cluster_to::GeneralizedBelief, upind, Δh, ΔJ, Δg)
extend!(cluster_to::GeneralizedBelief, upind, R, c, g)

Extend and match the scope of an incoming message to the scope of generalized belief cluster_to so that multiplication (see mult!) can be applied. The incoming message is extended within the message of cluster_to.

The first form applies when the incoming message is the message of sepset. The second form applies when the incoming message is non-deterministic with canonical parameters Δh, ΔJ, Δg. The third form applies when the incoming message is deterministic with constraint matrix R, offset c, and multiplicative constant exp(g) (i.e. exp(g)⋅δ(Rᵀx - c))

source

PhyloGaussianBeliefProp.factor_hybridnode — Method

factor_hybridnode(evolutionarymodel, ts::AbstractVector, γs)
factor_tree_degeneratehybrid(model,  t0::Real,           γs)

Canonical parameters h,J,g of factor ϕ(X₀, X₁, X₂, ...) from the evolutionary model for a hybrid node: where X₀ is the state at the hybrid node and X₁, X₂, ... the states of the parent nodes. Warning: γs is modified in placed, changed to [1 -γs].

It is assumed that the conditional mean is a simple weighted average:

$E[X_0 | X_1, X_2, ...] = \sum_k \gamma_k X_k = q \mathrm{vec}(X_1,X_2,...) + \omega$

where q has one block for each parent, and each block is diagonal scalar: $\gamma_k I_p$. More complex models could consider adding a shift ω to the conditional mean.

If all the parent hybrid edges have length 0, then it is assumed that the model gives a degenerate distribution, with 0 conditional variance. More complex models could consider adding a hybrid conditional variance Σ.

The first form assumes that at least 1 parent edge length is positive, with conditional variance $\sum_k \gamma_k^2 V_k$ where $V_k$ is the conditional variance from the kth parent edge.
The second form can be used in case all parent edges have 0 length, to integrate out the hybrid node state and the factor ϕ(X₀, X₁, X₂, ...) when X₀ is its child state, along an edge of length t0 between the hybrid node and its child. This second form is appropriate when this hybrid's child is a tree node, and t0>0.`

In h and J, the first p coordinates are for the hybrid (or its child) and the last coordinates for the parents, in the same order in which the edge lengths and γs are given.

source

PhyloGaussianBeliefProp.factor_root — Method

factor_root(m::EvolutionaryModel)

Canonical parameters h,J,g of the prior density at the root, from model m. Assumes that isrootfixed(m) returns false (in which case the root value should be absorbed as evidence and the root removed from scope). More strongly, the root variance is assumed to be invertible, in particular, traits are all non-fixed at the root.

The prior is improper if the prior variance is infinite. In this case this prior is not a distribution (total probability ≠ 1) but is taken as the constant function 1, which corresponds to h,J,g all 0 (and an irrelevant mean).

If the root variance is not invertible (e.g., fixed root), this function fails and should never be called (see isrootfixed)

source

PhyloGaussianBeliefProp.factor_treeedge — Method

factor_treeedge(evolutionarymodel, edge)

Canonical parameters h,J,g of factor ϕ(X0,X1) from the given evolutionary model along one edge, where X₀ is the state of the child node and X₁ the state of the parent node. In h and J, the first p coordinates are for the child and the last p for the parent, where p is the number of traits (determined by the model).

Under the most general linear Gaussian model, X₀ given X₁ is Gaussian with conditional mean q X₁ + ω and conditional variance Σ independent of X₁. The generic fallback method uses functions branch_actualization for q, branch_displacement for ω, branch_precision for Σ⁻¹.

Under a Brownian motion, we have q=I, ω=0, and Σ=tR where R is the model's variance rate and t is the length of the branch. In that case, a specific (more efficient) method is implemented, and the default fallback is not used.

source

PhyloGaussianBeliefProp.factored_energy — Method

factored_energy(beliefs::ClusterGraphBelief)

Factored energy functional for general cluster graphs (Koller & Friedman 2009), which approximates the evidence lower bound (ELBO), a lower bound for the log-likelihood. It is also called the (negative) Bethe free energy in the context of factor graphs It is the sum of the cluster average energies and entropies, minus the sepset entropies. It is assumed but not checked that beliefs are calibrated (neighbor clusters and sepset beliefs are consistent, used as local marginals).

For a calibrated clique tree, the factored energy is equal to the log-likelihood. For a calibrated cluster graph, it can serve as as approximation.

output: tuple of 3 values, the 3rd being the factored energy: (average energy, approximate entropy, factored energy = -energy + entropy).

References

D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009. ISBN 9780262013192.

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. Variational inference: A Review for Statisticians, Journal of the American statistical Association, 112:518, 859-877, 2017, doi: 10.1080/01621459.2017.1285773.

source

PhyloGaussianBeliefProp.free_energy — Method

free_energy(beliefs::ClusterGraphBelief)

negative factored_energy to approximate the negative log-likelihood. The approximation is exact on a clique tree after calibration.

source

PhyloGaussianBeliefProp.getcholesky — Method

getcholesky(J::AbstractMatrix)

Cholesky decomposition of J, assumed to be symmetric positive definite, stored as a PDMat object. Warning: PDMat is not a subtype of Cholesky. PDMats.jl is efficient for structured matrices (e.g diagonal or sparse) and has efficient methods for linear algebra, e.g. \, invquad, X_invA_Xt etc.

source

PhyloGaussianBeliefProp.getcholesky_μ! — Function

getcholesky_μ(J::AbstractMatrix, h)
getcholesky_μ!(belief::AbstractFactor)

Tuple (Jchol, μ) where Jchol is a cholesky representation of J or belief.J and μ is J⁻¹h, used to update belief.μ (by the second method).

source

PhyloGaussianBeliefProp.getcholesky_μ — Method

getcholesky_μ(J::AbstractMatrix, h)
getcholesky_μ!(belief::AbstractFactor)

Tuple (Jchol, μ) where Jchol is a cholesky representation of J or belief.J and μ is J⁻¹h, used to update belief.μ (by the second method).

source

PhyloGaussianBeliefProp.hasdegenerate — Method

hasdegenerate(net)

true if degenerate nodes remain in scope, that is, if there exists a tree edge of length 0, or if there exists a hybrid node with all parent edges of length 0 and with 2 or more children edges, or with 1 child edge of length 0.

source

PhyloGaussianBeliefProp.hybridnode_displacement — Method

hybridnode_displacement(obj::EvolutionaryModel, parentedges::AbstractVector{PN.Edge})
hybridnode_precision(obj::EvolutionaryModel,    parentedges::AbstractVector{PN.Edge})
hybridnode_variance(obj::EvolutionaryModel,     parentedges::AbstractVector{PN.Edge})

Under the most general weighted average Gaussian model, X₀ given its parents X₁, X₂, ... is Gaussian with conditional mean the weighted average of the parents plus a displacement vector ω and conditional variance Σ independent of X₁, X₂, ... . The weights are given by the inheritance probabilities contained in the PN.Edge objects. hybridnode_displacement and hybridnode_variance return, respectively, ω and Σ.

hybridnode_variance and hybridnode_precision should return a matrix of symmetric type. hybridnode_precision defaults to the inverse of hybridnode_variance. hybridnode_displacement and hybridnode_variance default to a vector or matrix of zeros.

source

PhyloGaussianBeliefProp.hybridnode_precision — Function

hybridnode_displacement(obj::EvolutionaryModel, parentedges::AbstractVector{PN.Edge})
hybridnode_precision(obj::EvolutionaryModel,    parentedges::AbstractVector{PN.Edge})
hybridnode_variance(obj::EvolutionaryModel,     parentedges::AbstractVector{PN.Edge})

source

PhyloGaussianBeliefProp.hybridnode_variance — Function

hybridnode_displacement(obj::EvolutionaryModel, parentedges::AbstractVector{PN.Edge})
hybridnode_precision(obj::EvolutionaryModel,    parentedges::AbstractVector{PN.Edge})
hybridnode_variance(obj::EvolutionaryModel,     parentedges::AbstractVector{PN.Edge})

source

PhyloGaussianBeliefProp.init_beliefs_allocate_atroot! — Method

init_beliefs_allocate_atroot!(beliefs, factors, messageresiduals, model,
    node2family, cluster2nodes)

Update the scope and re-allocate memory for cluster & sepset beliefs, factors and messageresiduals to include or exclude the root, depending on whether the root variable is random or fixed in model. To change the dimension of canonical parameters μ,h,J, new memory is allocated and initialized to 0. This function can be used to update beliefs when the root model changes from fixed to non-fixed or vice-versa. It re-allocates less memory than allocatebeliefs (which would need to be followed by init_factors_allocate) because clusters and sepsets that do not have the root are not modified.

Assumptions:

all traits at the root have at least one descendant with non-missing data,
beliefs were previously initialized with a model that had the same number of traits as the current model.

source

PhyloGaussianBeliefProp.init_beliefs_reset! — Method

init_beliefs_reset!(beliefs::Vector{<:AbstractBelief})

Reset all beliefs (which can be cluster and/or sepset beliefs) so that they can later be re-initialized for different model parameters and re-calibrated, without re-allocating memory. For CanonicalBeliefs, all entries of h, J, g are zeroed, with μ left unchanged. For GeneralizedBeliefs, all entries of h, hmsg, Λ, Λmsg, g, gmsg, k, kmsg are zeroed. Q and Qmsg are set to the identity. μ, R, Rmsg, c, cmsg are left unchanged.

source

PhyloGaussianBeliefProp.init_beliefs_reset_fromfactors! — Method

init_beliefs_reset_fromfactors!(beliefs::ClusterGraphBelief)

Reset cluster beliefs to existing factors, and sepset beliefs to h=0, J=0, g=0

This is not used so far, as changing model parameters requires a reset of both factors and beliefs, done by assignfactors!.

source

PhyloGaussianBeliefProp.init_clustergraph — Method

init_clustergraph(T::Type{<:Integer}, clustergraph_method::Symbol)

MetaGraph with an empty base graph (0 vertices, 0 edges), meta-graph data clustergraph_method, edge-weight function counting the length of the edge-data vector, and the following types:

vertex indices in the base graph: T
vertex labels: Symbol
vertex data: Tuple{Vector{Symbol}, Vector{T}} to hold information about the variables (nodes in phylogenetic network) in the cluster (vertex in cluster graph): node names as symbols, and node preorder indices
edge data: Vector{T} to hold information about the sepset: preorder index of nodes in the sepset.

See packages MetaGraphsNext and Graphs.

The empty graph above is of type Graphs.SimpleGraphs.SimpleGraph{T}: undirected, with vertex indices of type T. After addition of n vertices, the vertex indices range from 1 to n, technically in Base.OneTo{T}(n).

source

PhyloGaussianBeliefProp.init_factors_allocate — Method

init_factors_allocate(beliefs::AbstractVector{<:AbstractBelief}, nclusters::Integer)

Vector of nclusters factors of type ClusterFactor, whose canonical parameters and metadata are initialized to be a copy of those in beliefs. Assumption: beliefs[1:nclusters] are cluster beliefs, and beliefs[nclusters+1:end] (if any) are sepset beliefs. This is not checked.

source

PhyloGaussianBeliefProp.init_factors_frombeliefs! — Method

init_factors_frombeliefs!(
    factors,
    beliefs::AbstractVector{<:CanonicalBelief},
    checkmetadata::Bool=false
)

Reset all factors by copying h,J,g from beliefs. Assumption: the cluster beliefs match the factors exactly: for a valid factor index i, beliefs[i] is of cluster type and has the same dimension as factors[i].

Set checkmetadata to true to check that beliefs[i] and factors[i] have the same metadata.

source

PhyloGaussianBeliefProp.init_messagecalibrationflags_reset! — Function

init_messagecalibrationflags_reset!(beliefs::ClusterGraphBelief, reset_kl=true)

Reset all non-empty message residuals in beliefs.

source

PhyloGaussianBeliefProp.init_messagecalibrationflags_reset! — Method

init_messagecalibrationflags_reset!(mr::AbstractResidual, reset_kl::Bool)

For a non-empty message residual mr, reset its iscalibrated_* flags to false, and if reset_kl is true, reset its kldiv to -1. Its ΔJ and Δh fields are not reset here, because they are overwritten during a belief propagation step. Nothing is done for empty messages.

source

PhyloGaussianBeliefProp.init_messageresidual_allocate — Method

init_messageresidual_allocate(beliefs::Vector{B}, nclusters)

Dictionary of 2k residuals of type MessageResidual, whose canonical parameters (Δh,ΔJ) are initialized using MessageResidual, to be of the same size as sepsets in beliefs, where k is length(beliefs) - nclusters. Assumption: the first nclusters beliefs are cluster beliefs, and the next k beliefs are sepset beliefs. This is not checked.

The sepset for edge (label1,label2) is associated with 2 messages, for the 2 directions in which beliefs can be propagated along the edge. The keys for these messages are (label1,label2) and (label2,label1).

source

PhyloGaussianBeliefProp.inscope_onenode — Method

inscope_onenode(node_label, b:AbstractBelief)

AbstractVector: view of the row vector in b's inscope matrix corresponding to node node_label, indicating whether a trait at that node is in scope or not.

source

PhyloGaussianBeliefProp.integratebelief! — Method

integratebelief!(belief::CanonicalBelief)
integratebelief!(belief::GeneralizedBelief)
integratebelief(h,J,g)

(μ, norm) from fully integrating the belief. The first two forms update belief.μ.

For a canonical belief, μ = J⁻¹h and norm = g + (log|2πJ⁻¹| + hᵀJ⁻¹h)/2. For a generalized belief, μ = QΛ⁻¹h + Rc and norm = g + (log|2πΛ⁻¹| + hᵀΛ⁻¹h)/2. μ is the mean of x, where x is the inscope of belief. norm is the normalization constant from integrating out x.

source

PhyloGaussianBeliefProp.integratebelief! — Method

integratebelief!(obj::ClusterGraphBelief, beliefindex)
integratebelief!(obj::ClusterGraphBelief)
integratebelief!(obj::ClusterGraphBelief, clustergraph, nodevector_preordered)

(μ,g) from fully integrating the object belief indexed beliefindex. This belief is modified, with its belief.μ's values updated to those in μ.

The second method uses the first sepset containing a single node. This is valid if the beliefs are fully calibrated (including a pre-order traversal), but invalid otherwise. The third method uses the default cluster containing the root, see default_rootcluster. This is valid if the same cluster was used as the root of the cluster graph, if this graph is a clique tree, and after a post-order traversal to start the calibration.

source

PhyloGaussianBeliefProp.iscalibrated_kl! — Method

iscalibrated_kl!(res::AbstractResidual, atol=1e-5)

True if the KL divergence stored in res.kldiv is within atol of 0; false otherwise. res.iscalibrated_kl is modified accordingly. This KL divergence should have been previously calculated: between a sepset belief, equal to the message that was passed most recently, and its belief just prior to passing that message.

source

PhyloGaussianBeliefProp.iscalibrated_residnorm! — Method

iscalibrated_residnorm!(res::AbstractResidual, atol=1e-5, p::Real=Inf)

True if the canonical parameters res.Δh and res.ΔJ of the message residual have p-norm within atol of 0; false otherwise. res.iscalibrated_resid is updated accordingly.

With p infinite, the max norm is used by default, meaning that res.Δh and res.ΔJ should be close to 0 element-wise.

source

PhyloGaussianBeliefProp.iscalibrated_residnorm — Method

iscalibrated_residnorm(beliefs::ClusterGraphBelief)
iscalibrated_kl(beliefs::ClusterGraphBelief)

True if all edges in the cluster graph have calibrated messages in both directions, in that their latest message residuals have norm close to 0 (residnorm) or KL divergence close to 0 between the message received and prior sepset belief. False if not all edges have calibrated messages.

This condition is sufficient but not necessary for calibration.

Calibration was determined for each individual message residual by iscalibrated_residnorm! and iscalibrated_kl! using some tolerance value.

source

PhyloGaussianBeliefProp.isdegenerate — Method

isdegenerate(node)

true if all parent edges of node have length 0, false otherwise. Intended for hybrid nodes, as tree edges of length 0 should be suppressed before trait evolution analysis.

source

PhyloGaussianBeliefProp.isdegenerate_extendedfamily_covered — Method

isdegenerate_extendedfamily_covered(nodeindex, clustermembers,
    node2family, node2degen, node2fixed)

Check for the absence of an intermediate node in a degenerate extended family, for an input node v represented by its preorder index, looking at its ancestors in the input cluster C represented by its members, a vector of node preorder indices.

Output: tuple of 2 booleans (b1,b2) where

b1 is true if v is degenerate conditional on its ancestors that are present in C, false otherwise (e.g. v is not degenerate conditional on its parents, or if none of its parents are in C)
b2 is true if C is a "good cover" for v in the following sense: either v is not generate given its ancestors in C, or C contains all of v's parents.

By v is "degenerate" we mean that its distribution is deterministic, taking as value a linear (or affine) combination of its ancestor(s)' value(s).

source

PhyloGaussianBeliefProp.isdegenerate_extendedfamily_covered — Method

isdegenerate_extendedfamily_covered(clustermembers, args...)
isdegenerate_extendedfamily_covered(cluster2nodes, args...)

Boolean: true if for each node v in the input cluster C, C contains all intermediate nodes in the degenerate extended family of v. false if this property fails for one or more node v in C.

The first method considers a single cluster, containing clustermembers. The second method consider all clusters in a cluster graph, and returns true if all clusters meet the condition.

We say that C contains all intermediate ancestors for v in v's degenerate extended family if:

for any set of ancestors A ⊆ C such that v is degenerate conditional on A,
for any p intermediate between A and v (that is p is a descendant of A and ancestor of v),
then we have that p ∈ C.

We check that this condition holds for all nodes v in C by checking, more simply, that for any v degenerate given its ancestors in C, all of v's parents are in C.

positional arguments args...: node2family, node2degen, node2fixed.

source

PhyloGaussianBeliefProp.isfamilypreserving — Method

isfamilypreserving(clusters, net)

Tuple (ispreserving, isfamily_incluster):

ispreserving: true (false) if clusters is (is not) family-preserving with respect to net, that is: if each node family (a node and all of its parents) in net is contained in at least 1 cluster in clusters. clusters should be a vector, where each element describes one cluster, given as a vector of preorder indices. Index i corresponds to node number i in net according the node pre-ordering: net.vec_node[i].
isfamily_incluster: vector of BitVectors. isfamily_incluster[i][j] is true (false) if the family of node i is (is not) fully contained in cluster [j]. i is taken as the preorder index of a node in net. ispreserving is true if every element (bit vector) in isfamily_incluster contains at least 1 true value.

Warning: assumes that net is preordered, see PhyloNetworks.preorder!).

See also nodefamilies to get node families.

source

PhyloGaussianBeliefProp.ishybridsinglepositivechild — Method

ishybridsinglepositivechild(node)

true if node is a hybrid node with a single child edge of positive length. If it isdegenerate) (all its parent edges have length 0) and if its child is a tree node, then it could be removed from scope: see unscope.

source

PhyloGaussianBeliefProp.joingraph — Method

joingraph(net, method::JoinGraphStructuring)

See JoinGraphStructuring

source

PhyloGaussianBeliefProp.ltripclustergraph — Method

ltripclustergraph(net, method::LTRIP)

See LTRIP

source

PhyloGaussianBeliefProp.marginalize! — Method

marginalize!(cluster_from::GeneralizedBelief, sepset::AbstractBelief)
marginalize!(cluster_from::GeneralizedBelief, keep_index)

Marginalize a generalized belief by integrating out one or more inscope variables of cluster_from. The first form integrates out variables that are not shared by an adjacent (this is not checked!) sepset. The second form integrates out variables at indices keep_index. The resulting marginal is stored in the message of cluster_from.

source

PhyloGaussianBeliefProp.marginalize — Method

marginalize(belief::AbstractFactor, keep_index)
marginalize(h,J,g, keep_index, beliefmetadata)
marginalize(h,J,g, keep_index, integrate_index, beliefmetadata)

Canonical form (h,J,g) of the input belief, after all variables except those at indices keep_index have been integrated out. If we use I and S subscripts to denote subvectors and submatrices at indices to integrate out (I: integrate_index) and indices to keep (S: save for sepset, keep_index) then the returned belief parameters are:

$h_S - J_{S,I} J_I^{-1} h_I$

$J_S - J_{S,I} J_I^{-1} J_{I,S}$

and

$g + (\log|2\pi J_I^{-1}| + h_I^{T} J_I^{-1} h_I)/2 .$

These operations fail if the Cholesky decomposition of $J_I$ fails. In that case, an error of type BPPosDefException is thrown with a message about the beliefmetadata, which can be handled by downstream functions.

source

PhyloGaussianBeliefProp.moralize — Function

moralize!(net::HybridNetwork, prefix="I")
moralize(net)

Undirected graph g of type MetaGraph with the same nodes as in net, labelled by their names in net, with extra edges to moralize the graph, that is, to connect any nodes with a common child. Node data, accessed as g[:nodelabel], is their index in the network's preordering. Edge data, accessed as g[:label1, :label2] is a type to indicate if the edge was an original tree edge or hybrid edge, or added to moralize the graph. Another type, not used here, if for fill edges that may need to be added to triangulate g (make it chordal).

The first version modifies net to name its internal nodes (used as labels in g) and to create or update its node preordering, then calls the second version.

source

PhyloGaussianBeliefProp.moralize! — Function

moralize!(net::HybridNetwork, prefix="I")
moralize(net)

The first version modifies net to name its internal nodes (used as labels in g) and to create or update its node preordering, then calls the second version.

source

PhyloGaussianBeliefProp.mult! — Method

mult!(cluster_to::GeneralizedBelief, sepset::GeneralizedBelief)
mult!(cluster_to::GeneralizedBelief)
mult!(cluster_to::GeneralizedBelief, upind, Δh, ΔJ, Δg)
mult!(cluster_to::GeneralizedBelief, upind, R, c, g)
mult!(cluster_to::CanonicalBelief, upind, Δh, ΔJ, Δg)

Multiply an incoming message into a cluster belief. The incoming message is extended (see extend!) to the scope of the receiving cluster. The cluster belief is updated to the product.

The first form multiplies the message of sepset into cluster_to. The second form multiplies the message of cluster_to into its belief. The third and fourth form respectively multiply a non-deterministic and deterministic factor into cluster_to, by directly specifying canonical parameters or constraint parameters. The fifth form is analogous to the third one.

source

PhyloGaussianBeliefProp.nodefamilies — Method

nodefamilies(net)

Vector v with elements of type Vector{T}. v[i] first lists i, the preorder index of node net.vec_node[i], followed by the preorder index of all of this node's parents in net, sorted in decreasing order. Due to pre-ordering, all of the parents' indices are listed after the node (their child) index. A given node and its parents is called a "node family".

Warning: net is assumed preordered, see preprocessnet! and PhyloNetworks.preorder!).

source

PhyloGaussianBeliefProp.nodesubtree — Method

nodesubtree(clustergraph::MetaGraph, node_symbol)
nodesubtree(clustergraph::MetaGraph, node_symbol, node_preorderindex)

MetaGraph subgraph of clustergraph induced by the clusters and sepsets containing the node labelled node_symbol (of specified preorder index if known, not checked). If clustergraph satisfies the generalized running-intersection property, then this subgraph should be a tree.

source

PhyloGaussianBeliefProp.nodesubtree_clusterlist — Method

nodesubtree_clusterlist(clustergraph::MetaGraph, nodesymbol)

Spanning tree of the subgraph of clustergraph induced by the clusters and sepsets that contain the node labelled nodesymbol, see nodesubtree. If clustergraph satisfies the generalized running-intersection property, then this subgraph should be a tree anyway, but this is not assumed (via extracting a spanning tree).

Output: tuple of 4 vectors describing a depth-first search traversal of the tree, starting from a cluster containing the node of smallest preorder index, as determined by default_rootcluster. Each element in this tuple is a vector: see spanningtree_clusterlist.

source

PhyloGaussianBeliefProp.params — Method

params(m::EvolutionaryModel)

Tuple of parameters, the same that can be used to construct the evolutionary model.

source

PhyloGaussianBeliefProp.params_optimize — Method

params_optimize(m::EvolutionaryModel)

Tuple of transformed parameters for model m, in an unconstrained space that can be used for numerical optimization.

source

PhyloGaussianBeliefProp.params_original — Method

params_original(m::EvolutionaryModel, transformedparams::AbstractArray)

Tuple of parameters for model m in the original space, corresponding to back-transformed parameters of transformedparams, and that can be used as input to the model constructor.

source

PhyloGaussianBeliefProp.parentinformation — Method

parentinformation(node, net)

Tuple of (edge length, edge γ, index of parent node in net.vec_node) for all parent edges of node. Assumes that net has been preordered before.

source

PhyloGaussianBeliefProp.preprocessnet! — Function

preprocessnet!(net::HybridNetwork, prefix="I")

Create or update the pre-ordering of nodes in net using PhyloNetworks.preorder!, then name unnamed internal nodes, with names starting with prefix. Nodes in phylogenetic networks need to have names to build cluster graphs, in which a cluster contains network nodes. Pre-ordering is also used to traverse the network for building cluster graphs.

See clustergraph!.

source

PhyloGaussianBeliefProp.propagate_1traversal_postorder! — Function

propagate_1traversal_postorder!(beliefs::ClusterGraphBelief, spanningtree...)
propagate_1traversal_preorder!(beliefs::ClusterGraphBelief,  spanningtree...)

Messages are propagated along the spanning tree, from the tips to the root by propagate_1traversal_postorder! and from the root to the tips by propagate_1traversal_preorder!.

The "spanning tree" should be a tuple of 4 vectors as output by spanningtree_clusterlist, meant to list edges in preorder. Its nodes (resp. edges) should correspond to clusters (resp. sepsets) in beliefs: labels and indices in the spanning tree information should correspond to indices in beliefs. This condition holds if beliefs are produced on a given cluster graph and if the tree is produced by spanningtree_clusterlist on the same graph.

Optional positional arguments after spanning tree, in this order (default value):

verbose (true): log error messages about degenerate messages that failed to be passed.
update_residualnorm (true): to update each message residual's iscalibrated_resid
update_residualkldiv (false): to update each message residual's field kldiv: KL divergence between the new and old sepset beliefs, normalized to be considered as (conditional) distributions.

source

PhyloGaussianBeliefProp.propagate_belief! — Method

propagate_belief!(cluster_to, sepset, cluster_from, residual)
propagate_belief!(cluster_to, sepset, cluster_from)

Propagate a message from cluster_from to neighbor cluster_to through the connecting sepset, and update the beliefs of sepset and cluster_to. cluster_from's belief is marginalized to the inscope variables of sepset to produce the "outgoing message". The outgoing message is divided by sepset's existing belief, and the quotient is received by cluster_to as the "incoming message". Finally, sepset's belief is updated to the outgoing message, and cluster_to's belief is updated by multiplying in the incoming message.

The incoming message reflects the change in sepset's belief. We quantify this change by its canonical parameters. This is h, J, g if sepset is a canonical belief, or Qh, QΛQᵀ, g if it is a generalized belief (we ignore the constraint part).

The first form stores the potential and precision canonical parameters of the incoming message in residual, but returns nothing. The second form returns the canonical parameters of the incoming message, and dispatches on the types of cluster_from, sepset and cluster_to. There are 4 cases: (1) sepset, cluster_to and cluster_from are all canonical beliefs (2) sepset, cluster_to and cluster_from are all generalized beliefs (3) sepset and cluster_from are canonical beliefs, and cluster_to is a generalized belief (4) sepset and cluster_to are canonical beliefs, and cluster_to is a generalized belief

Degeneracy

Propagating a belief requires the precision of the cluster_from's belief (i.e. J if it is canonical, or QΛQᵀ if it is generalized) to have a non-degenerate submatrix JI for the variables to be integrated out. Problems arise if JI has one or more 0 eigenvalues, or infinite values (see marginalize). If so, a BPPosDefException is returned but not thrown. Downstream functions should try & catch these failures, and decide how to proceed. See regularizebeliefs_bycluster! to reduce the prevalence of degeneracies.

Warnings

the μ parameter is not updated
Does not check that cluster_from and cluster_to are of cluster type, or that sepset is of sepset type, but does check that the labels and scope of sepset are included in each cluster.

source

PhyloGaussianBeliefProp.regularizebeliefs_1clustersepset! — Method

regularizebeliefs_1clustersepset!(cluster::AbstractBelief, sepset::AbstractBelief, ϵ)

Modify beliefs of 1 cluster and 1 of sepset, assumed to be neighbors in a cluster graph (such that the sepset's scope is a subset of the cluster's scope) by adding ϵ to all diagonal elements of the sepset precision matrice J and to the corresponding diagonal elements of the cluster precision, so as to preserve the full graphical model.

Used by regularizebeliefs_bycluster! and regularizebeliefs_onschedule!.

source

PhyloGaussianBeliefProp.regularizebeliefs_bycluster! — Method

regularizebeliefs_bycluster!(beliefs::ClusterGraphBelief, clustergraph)
regularizebeliefs_bycluster!(beliefs::ClusterGraphBelief, clustergraph, cluster_label)

Modify beliefs of cluster graph by adding positive values to some diagonal elements of precision matrices J, while preserving the full graphical model (product of cluster beliefs over product of sepset beliefs, invariant during belief propagation) so that all beliefs are non-degenerate.

This regularization could be done after initialization with assignfactors! for example.

The goal is that at each later step of belief propagation, the sending cluster has a non-degenerate (positive definite) precision matrix for the variables to be integrated, so that the message to be sent is well-defined (i.e. can be computed) and positive semidefinite.

Algorithm

For each cluster Ci (or for only for 1 cluster, labeled cluster_label):

Find a regularization parameter adaptively for that cluster: ϵ = maximum absolute value of all entries in Ci's precision matrix J, and of the machine epsilon. Then loop through its incident edges:
For each neighbor cluster Cj and associated sepset Sij, add ϵ > 0 to the diagonal entries of Ci's precision matrix J corresponding to the traits in Sij.
To preserve the graphical model's joint distribution for the full set of variables (invariant during BP), the same ϵ is added to each diagonal entry of Sij's precision matrix.

source

PhyloGaussianBeliefProp.regularizebeliefs_bynodesubtree! — Method

regularizebeliefs_bynodesubtree!(beliefs::ClusterGraphBelief, clustergraph)

The goal is the same as regularizebeliefs_bycluster! and regularizebeliefs_onschedule!, but the algorithm is different.

Algorithm

For each node (or variable) v:

Consider the subgraph T of clusters & edges that have v. If clustergraph has the generalized running-intersection property, this subgraph is a tree.
Root T at a cluster containing a node with the largest postorder index.
Find a regularization parameter adaptively for that node: ϵ = maximum absolute value of all entries in Ci's precision matrix J, and of the machine epsilon, over clusters Ci in T.
For each trait j, find the subtree Tj of clusters and sepsets that have trait j of node v in their scope.
For each cluster and sepset in Tj, except at its cluster root: add ϵ on the diagonal of their belief precision matrix J corresponding to trait j of node v.
Check that graphical model invariant is preserved, that is: for each trait j, ϵ was added to the same number of clusters as number of sepsets.

source

PhyloGaussianBeliefProp.regularizebeliefs_onschedule! — Method

regularizebeliefs_onschedule!(beliefs::ClusterGraphBelief, clustergraph)

Modify beliefs of the cluster graph so that all beliefs are non-degenerate, by (1) adding positive values to some diagonal elements of precision matrices J while preserving the full graphical model, and (2) propagating messages by marginalizing cluster beliefs.

The goal is the same as regularizebeliefs_bycluster! and regularizebeliefs_bynodesubtree!, but the algorithm is different.

Algorithm

The outcomes depends on a "schedule", that is, on an order in which clusters are considered. Here, the method simply takes the order given by labels(clustergraph). Mark all messages Ci → Cj as un-sent. For each cluster Ci in order:

For each neighbor cluster Cj of Ci, if the message Cj → Ci has not been sent, then:
- add ϵ_i to the diagonal entries of sepset Sij's precision matrix Jij
- add ϵ_i to the diagonal entries of Ci's precision matrix Ji that correspond to the inscope variables of Sij
- mark Cj → Ci as sent.
Notes:
- ϵ_i is taken as the maximum of all absolute values of entries in Ji, and of the square root of machine epsilon.
- This is equivalent to "sending" a default message, that is not computed from Cj's belief, with diagonal precision matrix ϵ⋅I and other canonical parameters h=0, g=0.
For each neighbor cluster Cj of Ci, if Ci → Cj has not been sent then
- send a message from Ci to Cj by marginalizing Ci's belief (that is, using belief propagation).
- mark the message Ci → Cj as sent.

source

PhyloGaussianBeliefProp.residual_kldiv! — Method

residual_kldiv!(residual::AbstractResidual, sepset::AbstractFactor,
    canonicalparams::Tuple)

Update residual.kldiv with the Kullback-Leibler divergence between a message sent through a sepset (normalized to a probability distribution), and the sepset belief before the belief update (also normalized). sepset should contain the updated belief, and residual the difference in the J and h parameters due to the belief update (after - before), such that the previous belief is: sepset belief - residual. As a side product, sepset.μ is updated.

Output: true if the KL divergence is close to 0, false otherwise. See iscalibrated_kl! for the tolerance.

If the current or previous sepset belief is degenerate, in the sense that its precision matrix is not positive definite and the belief cannot be normalized to a proper distribution, then residual and sepset are not updated, and false is returned. No warning and no error is sent, because sepset beliefs are initialized at 0 and this case is expected to be frequent before enough messages are sent.

Calculation:

sepset after belief-update (i.e. message sent): C(x | Jₘ, hₘ, gₘ) ∝ density for X ~ 𝒩(μ=Jₘ⁻¹hₘ, Σ=Jₘ⁻¹) sepset before belief-update: C(x | Jₛ, hₛ, gₛ) residual: ΔJ = Jₘ - Jₛ, Δh = hₘ - hₛ p: dimension of X (number of variables: number of nodes * number of traits). Below, we use the notation Δg for the change in constants to normalize each message, which is not gₘ-gₛ because the stored beliefs are not normalized.

KL(C(Jₘ, hₘ, _) || C(Jₛ, hₛ, _))
= Eₘ[log C(x | Jₘ,hₘ,_)/C(x | Jₛ,hₛ,_)] where x ∼ C(Jₘ,hₘ,_)
= Eₘ[-(1/2) x'ΔJx + Δh'x + Δg)]
= (  tr(JₛJₘ⁻¹) - p + (μₛ-μₘ)'Jₛ(μₛ-μₘ) + log(det(Jₘ)/det(Jₛ))  ) /2

See also: average_energy!, which only requires the sepset belief to be positive definite.

source

PhyloGaussianBeliefProp.scopeindex — Method

scopeindex(node_label, sepset::AbstractBelief, cluster::AbstractBelief)

Tuple of 2 index vectors (ind_in_sepset, ind_in_cluster) in the sepset and in the cluster in-scope variables (that is, in the cluster's μ,h,J vectors and matrices) of the shared in-scope traits for node node_label, such that sepset.μ[ind_in_sepset] correspond to all the node's traits in the sepset scope and cluster.μ[ind_in_cluster] correspond to the same traits in the cluster scope, which may be a subset of all the node's traits in scope for that cluster. If not, an error is thrown.

source

PhyloGaussianBeliefProp.scopeindex — Method

scopeindex(j::Integer, belief::AbstractBelief)

Indices in the belief's μ,h,J vectors and matrices of the traits in scope for node indexed j in nodelabels(belief).

source

PhyloGaussianBeliefProp.scopeindex — Method

scopeindex(sepset::AbstractBelief, cluster::AbstractBelief)

Indices ind, in the cluster in-scope variables (that is, in the cluster's μ,h,J vectors and matrices) of the sepset in-scope variables, such that cluster.μ[ind] correspond to the same variables as sepset.μ, for example. These sepset in-scope variables can be a subset of traits for each node in the sepset, as indicated by inscope(sepset).

Warning: the labels in the sepset are assumed to be ordered as in the cluster.

An error is returned if sepset contains labels not in the cluster, or if a variable in the sepset's scope is not in scope in the cluster.

source

PhyloGaussianBeliefProp.scopeindex — Method

scopeindex(node_labels::Union{Tuple,AbstractVector}, belief::AbstractBelief)

Indices in the belief's μ,h,J vectors and matrices of the variables for nodes labeled node_labels. The belief's inscope matrix of booleans says which node (column) and trait (row) is in the belief's scope. These variables are vectorized by stacking up columns, that is, listing all in-scope traits of the first node, then all in-scope traits of the second node etc.

source

PhyloGaussianBeliefProp.shrinkdegenerate_treeedges — Method

shrinkdegenerate_treeedges(net::HybridNetwork)

Network obtained from net with any non-external tree edge of length 0 suppressed. Returns an error if any edge length is missing or negative, or if any γ is missing or non positive. It is assumed that γs sum to 1 across partner hybrid edges.

source

PhyloGaussianBeliefProp.spanningtree_clusterlist — Method

spanningtree_clusterlist(clustergraph, root_index)
spanningtree_clusterlist(clustergraph, nodevector_preordered)

Build the depth-first search spanning tree of the cluster graph, starting from the cluster indexed root_index in the underlying simple graph; find the associated topological ordering of the clusters (preorder); then return a tuple of these four vectors:

parent_labels: labels of the parents' child clusters. The first one is the root.
child_labels: labels of clusters in pre-order, except for the cluster choosen to be the root.
parent_indices: indices of the parent clusters
child_indices: indices of the child clusters, listed in preorder as before.

In the second version in which root_index is not provided, the root of the spanning tree is chosen to be a cluster that contains the network's root. If multiple clusters contain the network's root, then one is chosen containing the smallest number of taxa: see default_rootcluster.

source

PhyloGaussianBeliefProp.spanningtrees_clusterlist — Method

spanningtrees_clusterlist(clustergraph, nodevector_preordered)

Vector of spanning trees for clustergraph, that together cover all edges. Each spanning tree is specified as a tuple of 4 vectors describing a depth-first search traversal of the tree, starting from a cluster that contains the network's root, as in spanningtree_clusterlist.

The spanning trees are iteratively obtained using Kruskal's minimum-weight spanning tree algorithm, with edge weights defined as the number of previous trees covering the edge.

source

PhyloGaussianBeliefProp.triangulate_minfill! — Method

triangulate_minfill!(graph)

Ordering for node elimination, chosen to greedily minimize the number of fill edges necessary to eliminate the node (to connect all its neighbors with each other). Ties are broken by favoring the post-ordering of nodes. graph is modified with these extra fill edges, making it chordal.

source

PhyloGaussianBeliefProp.unscope — Method

unscope(node)

true if node is a hybrid node with a single child edge of positive length, and if its child node is a tree node. If it isdegenerate) (all its parent edges have length 0) then it could be removed from scope: see addtreenode_belowdegeneratehybrid!.

source

PhyloGaussianBeliefProp

Manual

Library