Internal Documentation
Contents
Index
PhyloNetworks.fAbs
PhyloNetworks.ANode
PhyloNetworks.BM
PhyloNetworks.CacheGammaLiNC
PhyloNetworks.CacheLengthLiNC
PhyloNetworks.CacheLengthLiNC
PhyloNetworks.ContinuousTraitEM
PhyloNetworks.EdgeT
PhyloNetworks.MatrixTopologicalOrder
PhyloNetworks.Node
PhyloNetworks.NucleicAcidSubstitutionModel
PhyloNetworks.OptSummary
PhyloNetworks.PagelLambda
PhyloNetworks.QuartetNetwork
PhyloNetworks.QuartetT
PhyloNetworks.ScalingHybrid
PhyloNetworks.StatisticalSubstitutionModel
PhyloNetworks.StatisticalSubstitutionModel
PhyloNetworks.SubstitutionModel
PhyloNetworks.TopologyConstraint
PhyloNetworks.TopologyConstraint
PhyloNetworks.WithinSpeciesCTM
Base.getindex
Base.getindex
PhyloNetworks.P
PhyloNetworks.P!
PhyloNetworks.addAlternativeHybridizations!
PhyloNetworks.addHybridBetweenClades!
PhyloNetworks.addhybridedge!
PhyloNetworks.addhybridedge!
PhyloNetworks.addhybridedgeLiNC!
PhyloNetworks.addindividuals!
PhyloNetworks.addleaf!
PhyloNetworks.adjacentedges
PhyloNetworks.afterOptBL!
PhyloNetworks.afterOptBLAll!
PhyloNetworks.afterOptBLRepeat!
PhyloNetworks.allowrootbelow!
PhyloNetworks.allowrootbelow!
PhyloNetworks.anova
PhyloNetworks.assignhybridnames!
PhyloNetworks.biconnectedcomponent_entrynodes
PhyloNetworks.biconnectedcomponent_exitnodes
PhyloNetworks.blobInfo
PhyloNetworks.breakedge!
PhyloNetworks.calculateObsCFAll!
PhyloNetworks.checkMapDF
PhyloNetworks.checkNumHybEdges!
PhyloNetworks.check_matchtaxonnames!
PhyloNetworks.check_nonmissing_nonnegative_edgelengths
PhyloNetworks.checknetwork_LiNC!
PhyloNetworks.checkspeciesnetwork!
PhyloNetworks.cleantaxonname
PhyloNetworks.constraintviolated
PhyloNetworks.defaultsubstitutionmodel
PhyloNetworks.deleteEdge!
PhyloNetworks.deleteLeaf!
PhyloNetworks.deleteNode!
PhyloNetworks.deletehybridedge!
PhyloNetworks.deletehybridedgeLiNC!
PhyloNetworks.descendants
PhyloNetworks.directionalconflict
PhyloNetworks.discrete_backwardlikelihood_trait!
PhyloNetworks.discrete_corelikelihood!
PhyloNetworks.discrete_corelikelihood_trait!
PhyloNetworks.displayedNetworks!
PhyloNetworks.edgerelation
PhyloNetworks.fliphybrid!
PhyloNetworks.fliphybrid!
PhyloNetworks.fliphybridedgeLiNC!
PhyloNetworks.fuseedgesat!
PhyloNetworks.gammaZero!
PhyloNetworks.getDataValue!
PhyloNetworks.getGammas
PhyloNetworks.getHeights
PhyloNetworks.getNeighborsTarget
PhyloNetworks.getTipSubmatrix
PhyloNetworks.getlengths
PhyloNetworks.getparameters
PhyloNetworks.getparamindex
PhyloNetworks.hardwiredClusterDistance_unrooted
PhyloNetworks.hashybridladder
PhyloNetworks.hybrid3cycle
PhyloNetworks.hybridEdges
PhyloNetworks.hybridEdges
PhyloNetworks.hybridatnode
PhyloNetworks.inheritanceWeight
PhyloNetworks.initializeWeightsFromLeaves!
PhyloNetworks.initializeWeightsFromLeavesSoftwired!
PhyloNetworks.isconnected
PhyloNetworks.isdescendant
PhyloNetworks.isdescendant_undirected
PhyloNetworks.ladderpartition
PhyloNetworks.lambda
PhyloNetworks.lambda!
PhyloNetworks.learnlabels
PhyloNetworks.leaststableancestor
PhyloNetworks.majoredgelength
PhyloNetworks.majoredgematrix
PhyloNetworks.makemissing!
PhyloNetworks.mapAllelesCFtable!
PhyloNetworks.mapindividuals
PhyloNetworks.maxParsimonyNetRun1
PhyloNetworks.maxParsimonyNetRun1!
PhyloNetworks.minorreticulationgamma
PhyloNetworks.minorreticulationlength
PhyloNetworks.minorreticulationmatrix
PhyloNetworks.moveHybrid!
PhyloNetworks.moveTargetUpdate!
PhyloNetworks.moveroot!
PhyloNetworks.nameinternalnodes!
PhyloNetworks.nchoose1234
PhyloNetworks.nj!
PhyloNetworks.nni_LiNC!
PhyloNetworks.nnimax
PhyloNetworks.norootbelow!
PhyloNetworks.optBL!
PhyloNetworks.optTopLevel!
PhyloNetworks.optTopRun1!
PhyloNetworks.optTopRuns!
PhyloNetworks.optimizeallgammas_LiNC!
PhyloNetworks.optimizealllengths_LiNC!
PhyloNetworks.optimizegamma_LiNC!
PhyloNetworks.optimizelength_LiNC!
PhyloNetworks.optimizelocalBL_LiNC!
PhyloNetworks.optimizelocalgammas_LiNC!
PhyloNetworks.optimizestructure!
PhyloNetworks.pairwiseTaxonDistanceGrad
PhyloNetworks.parseEdgeData!
PhyloNetworks.parseHybridNode!
PhyloNetworks.parseRemainingSubtree!
PhyloNetworks.parseTreeNode!
PhyloNetworks.parsimonyBottomUpFitch!
PhyloNetworks.parsimonyBottomUpGF!
PhyloNetworks.parsimonyBottomUpSoftwired!
PhyloNetworks.parsimonyDiscreteFitch
PhyloNetworks.parsimonySummaryFitch
PhyloNetworks.parsimonyTopDownFitch!
PhyloNetworks.phyLiNC
PhyloNetworks.phyLiNC!
PhyloNetworks.phyLiNCone!
PhyloNetworks.posterior_loghybridweight
PhyloNetworks.posterior_logtreeweight
PhyloNetworks.problem4cycle
PhyloNetworks.proposedTop!
PhyloNetworks.quartetdata_columnnames
PhyloNetworks.quartetrank
PhyloNetworks.readCSVtoArray
PhyloNetworks.readFastaToArray
PhyloNetworks.readInputData
PhyloNetworks.readSubtree!
PhyloNetworks.readnexus_assigngammas!
PhyloNetworks.readnexus_extractgamma
PhyloNetworks.readnexus_translatetable
PhyloNetworks.readnexuscomment
PhyloNetworks.readnodename
PhyloNetworks.recursionPostOrder
PhyloNetworks.recursionPreOrder
PhyloNetworks.recursionPreOrder!
PhyloNetworks.removeHybrid!
PhyloNetworks.resetEdgeNumbers!
PhyloNetworks.resetNodeNumbers!
PhyloNetworks.rezip_canonical!
PhyloNetworks.sameTaxa
PhyloNetworks.sampleBootstrapTrees
PhyloNetworks.sampleCFfromCI
PhyloNetworks.setBLGammaParsimony!
PhyloNetworks.setBranchLength!
PhyloNetworks.setGammaBLfromGammaz!
PhyloNetworks.setGammas!
PhyloNetworks.setNonIdBL!
PhyloNetworks.setalpha!
PhyloNetworks.seteigeninfo!
PhyloNetworks.seteigeninfo!
PhyloNetworks.seteigeninfo!
PhyloNetworks.seteigeninfo!
PhyloNetworks.seteigeninfo!
PhyloNetworks.setlengths!
PhyloNetworks.setparameters!
PhyloNetworks.setpinv!
PhyloNetworks.setpinvalpha!
PhyloNetworks.setrates!
PhyloNetworks.showQ
PhyloNetworks.showdata
PhyloNetworks.shrink2cycleat!
PhyloNetworks.shrink3cycleat!
PhyloNetworks.shrinkedge!
PhyloNetworks.sort_stringasinteger!
PhyloNetworks.startingBL!
PhyloNetworks.startingrate
PhyloNetworks.startree_newick
PhyloNetworks.symmetricnet_newick
PhyloNetworks.symmetricnet_newick
PhyloNetworks.symmetrictree_newick
PhyloNetworks.synchronizePartnersData!
PhyloNetworks.taxadiff
PhyloNetworks.traitlabels2indices
PhyloNetworks.traverseContainRoot!
PhyloNetworks.undoGammaz!
PhyloNetworks.unzip_canonical!
PhyloNetworks.unzipat_canonical!
PhyloNetworks.updateBL!
PhyloNetworks.updateContainRoot!
PhyloNetworks.updatePostOrder!
PhyloNetworks.updatePreOrder!
PhyloNetworks.updateSSM!
PhyloNetworks.updateSSM_root!
PhyloNetworks.update_logtrans
PhyloNetworks.update_logtrans
PhyloNetworks.updatecache_edge!
PhyloNetworks.updateconstraintfields!
PhyloNetworks.updateconstraints!
PhyloNetworks.writeTopologyLevel1
StatsAPI.coeftable
StatsAPI.confint
StatsAPI.deviance
StatsAPI.deviance
StatsAPI.fit
StatsAPI.loglikelihood
StatsAPI.nobs
StatsAPI.nulldeviance
StatsAPI.stderror
StatsAPI.vcov
StatsModels.isnested
types
PhyloNetworks.ANode
— TypeANode
Abstract node. An object of type EdgeT
has a node
attribute, which is an vector of 2 objects of some subtype of ANode
. The concrete type Node
is a subtype of ANode
, and has an edge
attribute, which is vector of Edge
objects (where Edge is an alias for EdgeT{Node}).
PhyloNetworks.BM
— TypeBM(λ)
Brownian Motion, subtype of ContinuousTraitEM
, to model the population mean of a trait (or of the residuals from a linear model). Under the BM model, the population (or species) means have a multivariate normal distribution with covariance matrix = σ²λV, where σ² is the between-species variance-rate (to be estimated), and the matrix V is obtained from sharedPathMatrix
(net)[:Tips].
λ is set to 1 by default, and is immutable. In future versions, λ may be used to control the scale for σ².
On a tree, V is the length of shared ancestry. On a network, the BM model assumes that the trait at a hybrid node is the weighted average of its immediate parents (plus possibly a fixed shift). The weights are the proportion of genes inherited from each parent: the γ parameters of hybrid edges.
PhyloNetworks.CacheGammaLiNC
— TypeCacheGammaLiNC
Type to store intermediate values during γ optimization, to limit time spent on garbage collection.
PhyloNetworks.CacheLengthLiNC
— TypeCacheLengthLiNC
Type to store intermediate values during optimization of one branch length, to limit time spent on garbage collection. Re-used across branches. See constructor below, from a StatisticalSubstitutionModel
.
PhyloNetworks.CacheLengthLiNC
— MethodCacheLengthLiNC(obj::SSM,
ftolRel::Float64, ftolAbs::Float64, xtolRel::Float64, xtolAbs::Float64,
maxeval::Int)
Constructor from a StatisticalSubstitutionModel
object, with tolerance values and parameters controlling the search.
PhyloNetworks.ContinuousTraitEM
— TypeContinuousTraitEM
Abstract type for evolutionary models for continuous traits, using a continuous-time stochastic process on a phylogeny.
For subtypes, see BM
, PagelLambda
, ScalingHybrid
.
Each of these subtypes/models has the field lambda
, whose default value is 1.0. However, the interpretation of this field differs across models.
PhyloNetworks.EdgeT
— TypeEdgeT{node type}
Edge = EdgeT{Node}
Edge(number, length=1.0)
Data structure for an edge and its various attributes. Most notably:
number
(integer): serves as unique identifier; remains unchanged when the network is modified, with a nearest neighbor interchange for examplenode
: vector ofNode
s, normally just 2 of themisChild1
(boolean):true
ifnode[1]
is the child node of the edge, false ifnode[1]
is the parent node of the edgelength
: branch lengthhybrid
(boolean): whether the edge is a tree edge or a hybrid edge (in which caseisChild1
is important, even if the network is semi-directed)gamma
: proportion of genetic material inherited by the child node via the edge; 1.0 for a tree edgeisMajor
(boolean): whether the edge is the major path to the child node;true
for tree edges, since a tree edge is the only path to its child node; normally true ifgamma>0.5
.
and other fields, used very internally
PhyloNetworks.MatrixTopologicalOrder
— TypeMatrixTopologicalOrder
Matrix associated to an HybridNetwork
in which rows/columns correspond to nodes in the network, sorted in topological order.
The following functions and extractors can be applied to it: tipLabels
, obj[:Tips]
, obj[:InternalNodes]
, obj[:TipsNodes]
(see documentation for function getindex(::MatrixTopologicalOrder, ::Symbol)
).
Functions sharedPathMatrix
and simulate
return objects of this type.
The MatrixTopologicalOrder
object has fields: V
, nodeNumbersTopOrder
, internalNodeNumbers
, tipNumbers
, tipNames
, indexation
. Type in "?MatrixTopologicalOrder.field" to get documentation on a specific field.
PhyloNetworks.Node
— TypeNode(number, leaf)
Node(number, leaf, hybrid)
Data structure for a node and its various attributes. Most notably:
number
(integer): serves as unique identifier; remains unchanged when the network is modified, with a nearest neighbor interchange for exampleleaf
(boolean): whether the node is a leaf (with data typically) or an internal node (no data typically)name
(string): taxon name for leaves; internal node may or may not have a nameedge
: vector ofEdge
s that the node is attached to; 1 if the node is a leaf, 2 if the node is the root, 3 otherwise, and potentially more if the node has a polytomyhybrid
(boolean): whether the node is a hybrid node (with 2 or more parents) or a tree node (with a single parent)
Other more internal attributes include:
isBadDiamondI
andisBadDiamondII
(booleans): whether the node is a hybrid node where the reticulation forms a cycle of 4 nodes (diamond), and where both parents of the hybrid nodes are connected to a leaf. In a bad diamond of type I, the hybrid node itself is also connected to a leaf but the common neighbor of the 2 hybrid's parents is not connected to a leaf. In a bad diamond of type II, the hybrid node has an internal node as child, and the common neighbor of the 2 hybrid's parents is connected to a leaf.isBadTriangle
,isVeryBadTriangle
andisExtBadTriangle
(booleans): true if the reticulation forms a cycle of 3 nodes (triangle) and depending on the number of leaves attached these 3 nodes. The triangle means that the 2 parents of the hybrid node are directly related: one is the child of the other.isBadTriangle
is true if the triangle is "good", as per Solís-Lemus & Ané (2016), that is, if all 3 nodes in the cycle are not connected to any leaves (the reticulation is detectable from quartet concordance factors, even though all branch lengths are not identifiable).isVeryBadTriangle
is true if 2 (or all) of the 3 nodes are connected to a leaf, in which case the reticulation is undetectable from unrooted gene tree topologies (thus it's best to exclude these reticulations from a search).isBadTriangle
is true if exactly 1 of the 3 nodes is connected to a leaf.
For details see Solís-Lemus & Ané (2016, doi:10.1371/journal.pgen.1005896)
PhyloNetworks.NucleicAcidSubstitutionModel
— TypeNucleicAcidSubstitutionModel
Adapted from SubstitutionModels.jl in BioJulia. The same Q
and P
function names are used for the transition rates and probabilities.
PhyloNetworks.OptSummary
— TypeOptSummary{T<:AbstractFloat}
Summary of an NLopt
optimization. Idea and code taken from MixedModels
. T
is the type of the function argument(s) and of the function value.
PhyloNetworks.PagelLambda
— TypePagelLambda(λ)
Pagel's λ model, subtype of ContinuousTraitEM
, with covariance matrix σ²V(λ). σ² is the between-species variance-rate (to be estimated), and V(λ) = λV + (1-λ)T, where V is the covariance under a Brownian motion BM
and T is a diagonal matrix containing the total branch length elapsed from the root to each leaf (if the phylogeny is a tree, or more generally if the network is time consistent: the time from the root to a given node does not depend on the path).
λ ∈ [0,1] is mutable and may be optimized. It is a measure of phylogenetic signal, that is, how important the given network is for explaining variation in the response. When λ=1, the PagelLambda
model reduces to the BM
model.
PhyloNetworks.QuartetNetwork
— TypeQuartetNetwork(net::HybridNetwork)
Subtype of Network
abstract type. A QuartetNetwork
object is an internal type used to calculate the expected CFs of quartets on a given network. Attributes of the QuartetNetwork
objects need not be updated at a given time (see below).
The procedure to calculate expected CFs for a given network is as follows:
- A
QuartetNetwork
object is created for eachQuartet
usingextractQuartet!(net,d)
fornet::HybridNetwork
andd::DataCF
- The vector
d.quartet
has all theQuartet
objects, each with aQuartetNetwork
object (q.qnet
). Attibutes inQuartetNetwork
are not updated at this point - Attributes in
QuartetNetwork
are partially updated when calculating the expected CF (calculateExpCFAll!
). To calculate the expected CF for this quartet, we need to update the attributes:which
,typeHyb
,t1
,split
,formula
,expCF
. To do this, we need to modify theQuartetNetwork
object (i.e. merge edges,...). But we do not want to modify it directly because it is connected to the originalnet
via a map of the edges and nodes, so we use a deep copy:qnet=deepcopy(q.qnet)
and thencalculateExpCFAll!(qnet)
. Attributes that are updated on the originalQuartetNetwork
objectq.qnet
are:q.qnet.hasEdge
: array of booleans of length equal tonet.edge
that shows which identifiable edges and gammas ofnet
(net.ht
) are inqnet
(and still identifiable). Note that the first elements of the vector correspond to the gammas.q.qnet.index
: length should match the number of trues inqnet.hasEdge
. It has the indexes inqnet.edge
from the edges inqnet.hasEdge
. Note that the first elements of the vector correspond to the gammas.q.qnet.edge
: list of edges inQuartetNetwork
. Note that external edges innet
are collapsed when they appear inQuartetNetwork
, so only internal edges map directly to edges innet
q.qnet.expCF
: expected CF for thisQuartet
Why not modify the original QuartetNetwork
? We wanted to keep the original QuartetNetwork
stored in DataCF
with all the identifiable edges, to be able to determine if this object had been changed or not after a certain optimization.
The process is:
- Deep copy of full network to create
q.qnet
forQuartet q
. ThisQuartetNetwork
object has only 4 leaves now, but does not have merged edges (the identifiable ones) so that we can correspond to the edges in net. ThisQuartetNetwork
does not have other attributes updated. - For the current set of branch lengths and gammas, we can update the attributes in
q.qnet
to compute the expected CF. The functions that do this will "destroy" theQuartetNetwork
object by merging edges, removing nodes, etc... So, we do this process inqnet=deepcopy(q.qnet)
, and at the end, only updateq.qnet.expCF
. - After we optimize branch lengths in the full network, we want to update the branch lengths in
q.qnet
. The edges need to be there (which is why we do not want to modify thisQuartetNetwork
object by merging edges), and we do not do a deep-copy of the full network again. We only change the values of branch lengths and gammas inq.qnet
, and we can re-calculate the expCF by creating a deep copyqnet=deepcopy(q.qnet)
and run the other functions (which merge edges, etc) to get theexpCF
.
Future work: there are definitely more efficient ways to do this (without the deep copies). In addition, currently edges that are no longer identifiable in QuartetNetwork
do not appear in hasEdge
nor index
. Need to study this.
julia> net0 = readTopology("(s17:13.76,(((s3:10.98,(s4:8.99,s5:8.99)I1:1.99)I2:0.47,(((s6:2.31,s7:2.31)I3:4.02,(s8:4.97,#H24:0.0::0.279)I4:1.36)I5:3.64,((s9:8.29,((s10:2.37,s11:2.37)I6:3.02,(s12:2.67,s13:2.67)I7:2.72)I8:2.89)I9:0.21,((s14:2.83,(s15:1.06,s16:1.06)I10:1.78)I11:2.14)#H24:3.52::0.72)I12:1.47)I13:1.48)I14:1.26,(((s18:5.46,s19:5.46)I15:0.59,(s20:4.72,(s21:2.40,s22:2.40)I16:2.32)I17:1.32)I18:2.68,(s23:8.56,(s1:4.64,s2:4.64)I19:3.92)I20:0.16)I21:3.98)I22:1.05);");
julia> net = readTopologyLevel1(writeTopology(net0)) ## need level1 attributes for functions below
HybridNetwork, Un-rooted Network
46 edges
46 nodes: 23 tips, 1 hybrid nodes, 22 internal tree nodes.
tip labels: s17, s3, s4, s5, ...
(s4:8.99,s5:8.99,(s3:10.0,((((s6:2.31,s7:2.31)I3:4.02,(s8:4.97,#H24:0.0::0.279)I4:1.36)I5:3.64,((s9:8.29,((s10:2.37,s11:2.37)I6:3.02,(s12:2.67,s13:2.67)I7:2.72)I8:2.89)I9:0.21,((s14:2.83,(s15:1.06,s16:1.06)I10:1.78)I11:2.14)#H24:3.52::0.721)I12:1.47)I13:1.48,((((s18:5.46,s19:5.46)I15:0.59,(s20:4.72,(s21:2.4,s22:2.4)I16:2.32)I17:1.32)I18:2.68,(s23:8.56,(s1:4.64,s2:4.64)I19:3.92)I20:0.16)I21:3.98,s17:10.0)I22:1.26)I14:0.47)I2:1.99)I1;
julia> q1 = Quartet(1,["s1", "s16", "s18", "s23"],[0.296,0.306,0.398])
number: 1
taxon names: ["s1", "s16", "s18", "s23"]
observed CF: [0.296, 0.306, 0.398]
pseudo-deviance under last used network: 0.0 (meaningless before estimation)
expected CF under last used network: Float64[] (meaningless before estimation)
julia> qnet = PhyloNetworks.extractQuartet!(net,q1)
taxa: ["s1", "s16", "s18", "s23"]
number of hybrid nodes: 1
julia> sum([e.istIdentifiable for e in net.edge]) ## 23 identifiable edges in net
23
julia> idedges = [ee.number for ee in net.edge[[e.istIdentifiable for e in net.edge]]];
julia> print(idedges)
[5, 6, 9, 11, 12, 13, 17, 20, 21, 22, 26, 27, 28, 29, 30, 31, 34, 38, 39, 40, 44, 45, 46]
julia> length(qnet.hasEdge) ## 24 = 1 gamma + 23 identifiable edges
24
julia> sum(qnet.hasEdge) ## 8 = 1 gamma + 7 identifiable edges in qnet
8
julia> print(idedges[qnet.hasEdge[2:end]]) ## 7 id. edges: [12, 13, 29, 30, 31, 45, 46]
[12, 13, 29, 30, 31, 45, 46]
julia> qnet.edge[qnet.index[1]].number ## 11 = minor hybrid edge
11
PhyloNetworks.QuartetT
— TypeQuartetT{T}
Generic type for 4-taxon sets. Fields:
number
: rank of the 4-taxon settaxonnumber
: static vector of 4 integers, assumed to be distinct and sorteddata
: object of typeT
For easier look-up, a unique mapping is used between the rank (number
) of a 4-taxon set and its 4 taxa (see quartetrank
and nchoose1234
):
rank-1 = (t1-1) choose 1 + (t2-1) choose 2 + (t3-1) choose 3 + (t4-1) choose 4
examples
julia> nCk = PhyloNetworks.nchoose1234(5)
6×4 Matrix{Int64}:
0 0 0 0
1 0 0 0
2 1 0 0
3 3 1 0
4 6 4 1
5 10 10 5
julia> PhyloNetworks.QuartetT(1,3,4,6, [.92,.04,.04, 100], nCk)
4-taxon set number 8; taxon numbers: 1,3,4,6
data: [0.92, 0.04, 0.04, 100.0]
PhyloNetworks.ScalingHybrid
— TypeScalingHybrid(λ)
Scaling Hybrid model, subtype of ContinuousTraitEM
, with covariance matrix σ²V(N(λ)). σ² is the between-species variance-rate (to be estimated), V(N) is the Brownian motion BM
covariance obtained from network N, and N(λ) is a obtained from the input network by rescaling the inheritance parameter γ of all minor edges by the same λ: a minor edge has its original γ changed to λγ, using the same λ at all reticulations. Note that for a major edge with original inheritance γ, the partner minor edge has inheritance γminor = 1-γ, so the major edge's inheritance is changed to 1-λγminor = λγ+1-λ.
For more information: see Bastide (2017) dissertation, section 4.3.2 p.175, available at https://tel.archives-ouvertes.fr/tel-01629648
λ ∈ [0,1] is mutable and may be optimized. It is a measure of how important the reticulations are for explaining variation in the response. When λ=1, the ScalingHybrid
model reduces to the BM
model.
PhyloNetworks.StatisticalSubstitutionModel
— TypeStatisticalSubstitutionModel(model::SubstitutionModel,
ratemodel::RateVariationAcrossSites,
net::HybridNetwork, trait::AbstractVector,
siteweight=nothing::Union{Nothing, Vector{Float64}},
maxhybrid=length(net.hybrid)::Int)
Inner constructor. Makes a deep copy of the input model, rate model. Warning: does not make a deep copy of the network: modification of the object.net
would modify the input net
. Assumes that the network has valid gamma values (to extract displayed trees).
StatisticalSubstitutionModel(net::HybridNetwork, fastafile::String,
modsymbol::Symbol, rvsymbol=:noRV::Symbol,
ratecategories=4::Int;
maxhybrid=length(net.hybrid)::Int)
Constructor from a network and a fasta file. The model symbol should be one of :JC69
, :HKY85
, :ERSM
or :BTSM
. The rvsymbol
should be as required by RateVariationAcrossSites
.
The network's gamma values are modified if they are missing. After that, a deep copy of the network is passed to the inner constructor.
PhyloNetworks.StatisticalSubstitutionModel
— TypeStatisticalSubstitutionModel
Subtype of StatsBase.StatisticalModel
, to fit discrete data to a model of trait substitution along a network. See fitdiscrete
to fit a trait substitution model to discrete data. It returns an object of type StatisticalSubstitutionModel
, to which standard functions can be applied, like loglikelihood(object)
, aic(object)
etc.
PhyloNetworks.SubstitutionModel
— TypeSubstitutionModel
Abstract type for substitution models, using a continous time Markov model on a phylogeny. Adapted from SubstitutionModels.jl in BioJulia.
For variable rates, see RateVariationAcrossSites
For sub types, see NucleicAcidSubstitutionModel
, TraitSubstitutionModel
All these models are supposed to have fields rate
and eigeninfo
.
PhyloNetworks.TopologyConstraint
— TypeType for various topological constraints, such as:
- a set of taxa forming a clade in the major tree
- a set of individuals belonging to the same species
- a set of taxa forming a clade in any one of the displayed trees
PhyloNetworks.TopologyConstraint
— MethodTopologyConstraint(type::UInt8, taxonnames::Vector{String}, net::HybridNetwork)
Create a topology constraint from user-given type, taxon names, and network. There are 3 types of constraints:
- type 1: A set of tips that forms a species.
- type 2: A set of tips that forms a clade in the major tree. Note that the root matters. Constraining a set of species to be an outgroup is equivalent to constraining the ingroup to form a clade.
- type 3: A set of tips that forms a clade in any one of the displayed trees.
Note: currently, with type-1 constraints, hybridizations are prevented from coming into or going out of a species group.
examples
julia> net = readTopology("(((2a,2b),(((((1a,1b,1c),4),(5)#H1),(#H1,(6,7))))#H2),(#H2,10));");
julia> c_species1 = PhyloNetworks.TopologyConstraint(0x01, ["1a","1b","1c"], net)
Species constraint, on tips: 1a, 1b, 1c
stem edge number 7
crown node number -9
julia> c_species2 = PhyloNetworks.TopologyConstraint(0x01, ["2a","2b"], net)
Species constraint, on tips: 2a, 2b
stem edge number 3
crown node number -4
julia> nni!(net , net.edge[3], true, true, [c_species2]) === nothing # we get nothing: would break species 2
true
julia> # the following gives an error, because 4,5 do not form a clade in "net"
# PhyloNetworks.TopologyConstraint(0x02, ["4","5"], net)
julia> c_clade145 = PhyloNetworks.TopologyConstraint(0x02, ["1a","1b","1c","4","5"], net)
Clade constraint, on tips: 1a, 1b, 1c, 4, 5
stem edge number 12
crown node number -7
julia> nni!(net , net.edge[12], true, true, [c_species1, c_clade145]) # we get nothing (failed NNI): would break the clade
PhyloNetworks.WithinSpeciesCTM
— TypeWithinSpeciesCTM
CTM stands for "continuous trait model". Contains the estimated variance components (between-species phylogenetic variance rate and within-species variance) and output from the NLopt
optimization used in the estimation.
Fields
wsp_var
: intra/within-species variance.bsp_var
: inter/between-species variance-rate.wsp_ninv
: vector of the inverse sample-sizes (e.g. [1/n₁, ..., 1/nₖ], where data from k species was used to fit the model and nᵢ is the no. of observations for the ith species).rss
: within-species sum of squaresoptsum
: anOptSummary
object.
functions
Base.getindex
— Functiongetindex(obj, d,[ indTips, nonmissing])
Getting submatrices of an object of type MatrixTopologicalOrder
.
Arguments
obj::MatrixTopologicalOrder
: the matrix from which to extract.d::Symbol
: a symbol precising which sub-matrix to extract. Can be::Tips
columns and/or rows corresponding to the tips:InternalNodes
columns and/or rows corresponding to the internal nodes Includes tips not listed inindTips
or missing data according tononmissing
.:TipsNodes
columns corresponding to internal nodes, and row to tips (works only is indexation="b")
indTips::Vector{Int}
: optional argument precising a specific order for the tips (internal use).nonmissing::BitArray{1}
: optional argument saying which tips have data (internal use). Tips with missing data are treated as internal nodes.
Base.getindex
— Functiongetindex(obj, d)
Getting submatrices of an object of type TraitSimulation
.
Arguments
obj::TraitSimulation
: the matrix from which to extract.d::Symbol
: a symbol precising which sub-matrix to extract. Can be::Tips
columns and/or rows corresponding to the tips:InternalNodes
columns and/or rows corresponding to the internal nodes
PhyloNetworks.P!
— MethodP!(Pmat::AbstractMatrix, obj::SM, t::Float64)
Fill in the input matrix Pmat
with the transition rates to go from each state to another in time t
, according to rates in Q
. see also: P
.
julia> m1 = BinaryTraitSubstitutionModel([1.0,2.0], ["low","high"])
Binary Trait Substitution Model:
rate low→high α=1.0
rate high→low β=2.0
julia> PhyloNetworks.P!(Matrix{Float64}(undef,2,2), m1, 0.3) # fills an uninitialized 2x2 matrix of floats
2×2 Matrix{Float64}:
0.80219 0.19781
0.39562 0.60438
julia> m2 = JC69([1.]);
julia> PhyloNetworks.P!(Matrix{Float64}(undef,4,4), m2, 0.2)
4×4 Matrix{Float64}:
0.824446 0.0585179 0.0585179 0.0585179
0.0585179 0.824446 0.0585179 0.0585179
0.0585179 0.0585179 0.824446 0.0585179
0.0585179 0.0585179 0.0585179 0.824446
PhyloNetworks.P
— MethodP(obj, t)
Probability transition matrix for a TraitSubstitutionModel
, of the form
P[1,1] ... P[1,k]
. .
. .
P[k,1] ... P[k,k]
where P[i,j] is the probability of ending in state j after time t, given that the process started in state i. see also: P!
.
HKY example:
julia> m1 = HKY85([0.5], [0.20, 0.30, 0.30, 0.20])
HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
relative rate version with transition/tranversion ratio kappa = 0.5,
scaled so that there is one substitution per unit time
rate matrix Q:
A C G T
A * 0.4839 0.2419 0.3226
C 0.3226 * 0.4839 0.1613
G 0.1613 0.4839 * 0.3226
T 0.3226 0.2419 0.4839 *
julia> PhyloNetworks.P(m1, 0.2)
4×4 StaticArraysCore.MMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
0.81592 0.0827167 0.0462192 0.0551445
0.0551445 0.831326 0.0827167 0.0308128
0.0308128 0.0827167 0.831326 0.0551445
0.0551445 0.0462192 0.0827167 0.81592
Juke-Cantor example:
julia> m1 = JC69([1.]);
julia> PhyloNetworks.P(m1, 0.2)
4×4 StaticArraysCore.MMatrix{4, 4, Float64, 16} with indices SOneTo(4)×SOneTo(4):
0.824446 0.0585179 0.0585179 0.0585179
0.0585179 0.824446 0.0585179 0.0585179
0.0585179 0.0585179 0.824446 0.0585179
0.0585179 0.0585179 0.0585179 0.824446
PhyloNetworks.addAlternativeHybridizations!
— MethodaddAlternativeHybridizations!(net::HybridNetwork, BSe::DataFrame;
cutoff=10::Number, top=3::Int)
Modify the network net
(the best estimated network) by adding some of the hybridizations present in the bootstrap networks. By default, it will only add hybrid edges with more than 10% bootstrap support (cutoff
) and it will only include the top 3 hybridizations (top
) sorted by bootstrap support.
The dataframe BSe
is also modified. In the original BSe
, supposedly obtained with hybridBootstrapSupport
, hybrid edges that do not appear in the best network have a missing number. After hybrid edges from bootstrap networks are added, BSe
is modified to include the edge numbers of the newly added hybrid edges. To distinguish hybrid edges present in the original network versus new edges, an extra column of true/false values is also added to BSe
, named "alternative", with true for newly added edges absent from the original network.
The hybrid edges added to net
are added as minor edges, to keep the underlying major tree topology.
example
julia> bootnet = readMultiTopology(joinpath(dirname(pathof(PhyloNetworks)), "..","examples", "bootsnaq.out")); # vector of 10 networks
julia> bestnet = readTopology("((O,(E,#H7:::0.196):0.314):0.332,(((A)#H7:::0.804,B):10.0,(C,D):10.0):0.332);");
julia> BSn, BSe, BSc, BSgam, BSedgenum = hybridBootstrapSupport(bootnet, bestnet);
julia> BSe[1:6,[:edge,:hybrid_clade,:sister_clade,:BS_hybrid_edge]]
6×4 DataFrame
Row │ edge hybrid_clade sister_clade BS_hybrid_edge
│ Int64? String String Float64
─────┼─────────────────────────────────────────────────────
1 │ 7 H7 B 33.0
2 │ 3 H7 E 32.0
3 │ missing c_minus3 c_minus8 44.0
4 │ missing c_minus3 H7 44.0
5 │ missing E O 12.0
6 │ missing c_minus6 c_minus8 9.0
julia> PhyloNetworks.addAlternativeHybridizations!(bestnet, BSe)
julia> BSe[1:6,[:edge,:hybrid_clade,:sister_clade,:BS_hybrid_edge,:alternative]]
6×5 DataFrame
Row │ edge hybrid_clade sister_clade BS_hybrid_edge alternative
│ Int64? String String Float64 Bool
─────┼──────────────────────────────────────────────────────────────────
1 │ 7 H7 B 33.0 false
2 │ 3 H7 E 32.0 false
3 │ 16 c_minus3 c_minus8 44.0 true
4 │ 19 c_minus3 H7 44.0 true
5 │ 22 E O 12.0 true
6 │ missing c_minus6 c_minus8 9.0 false
julia> # using PhyloPlots; plot(bestnet, edgelabel=BSe[:,[:edge,:BS_hybrid_edge]]);
PhyloNetworks.addHybridBetweenClades!
— MethodaddHybridBetweenClades!(net::HybridNetwork, hybnum::Number, sisnum::Number)
Modify net
by adding a minor hybrid edge from "donor" to "recipient", where "donor" is the major parent edge e1
of node number hybnum
and "recipient" is the major parent edge e2
of node number sisnum
. The new nodes are currently inserted at the middle of these parent edges.
If a hybrid edge from e1
to e2
would create a directed cycle in the network, then this hybrid cannot be added. In that case, the donor edge e1
is moved up if its parent is a hybrid node, to ensure that the sister clade to the new hybrid would be a desired (the descendant taxa from e1
) and a new attempt is made to create a hybrid edge.
Output: number of the new hybrid edge, or nothing
if the desired hybridization is not possible.
See also: addhybridedge!
(used by this method) and directionalconflict
to check that net
would still be a DAG.
PhyloNetworks.addhybridedge!
— Functionaddhybridedge!(net::HybridNetwork, edge1::Edge, edge2::Edge, hybridpartnernew::Bool,
edgelength=-1.0::Float64, gamma=-1.0::Float64)
Add hybridization to net
coming from edge1
going into edge2
. 2 new nodes and 3 new edges are created: edge1
are edge2
are both cut into 2 edges, and a new edge is created linking the 2 new "middle" nodes, pointing from edge1
to edge2
. The new node in the middle of edge1
is a tree node. The new node in the middle of edge2
is a hybrid node. Its parent edges are the newly created hybrid edge (with γ = gamma, missing by default), and either the newly edge "above" edge2
if hybridpartnernew=true
, or the old edge2
otherwise (which would reverse the direction of edge2
and others).
Should be called from the other method, which performs a bunch of checks. Updates containRoot
attributes for edges below the new hybrid node.
Output: new hybrid node (middle of the old edge2
) and new hybrid edge.
examples
julia> net = readTopology("((S8,(((S1,(S5)#H1),(#H1,S6)))#H2),(#H2,S10));");
julia> hybnode, hybedge = PhyloNetworks.addhybridedge!(net, net.edge[13], net.edge[8], true, 0.0, 0.2)
(PhyloNetworks.Node:
number:9
name:H3
hybrid node
attached to 3 edges, numbered: 8 16 17
, PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:17
length:0.0
minor hybrid edge with gamma=0.2
attached to 2 node(s) (parent first): 8 9
)
julia> writeTopology(net)
"((S8,(((S1,(S5)#H1),((#H1,S6))#H3:::0.8))#H2),(#H2,(S10,#H3:0.0::0.2)));"
PhyloNetworks.addhybridedge!
— Functionaddhybridedge!(net::HybridNetwork, nohybridladder::Bool, no3cycle::Bool,
constraints=TopologyConstraint[]::Vector{TopologyConstraint};
maxattempts=10::Int, fixroot=false::Bool)
Randomly choose two edges in net
then: add hybrid edge from edge 1 to edge 2 of length 0.01. The two halves of edge 1 (and of edge 2) have equal lengths. The hybrid partner edge (top half of edge 2, if fixroot is true) will point towards the newly-created node on the middle of the original edge 2.
If the resulting network is a DAG, satisfies the constraint(s), does not contain any 3-cycle (if no3cycle=true
), and does not have a hybrid ladder (if nohybridladder=true
) then the proposal is successful: net
is modified, and the function returns the newly created hybrid node and newly created hybrid edge.
If the resulting network is not acceptable, then a new set of edges is proposed (using a blacklist) until one is found acceptable, or until a maximum number of attempts have been made (maxattempts
). If none of the attempted proposals are successful, nothing
is returned (without causing an error).
After a pair of edges is picked, the "top" half of edge2 is proposed as the partner hybrid edge with probability 0.8 if fixroot
is false, (to avoid changing the direction of edge2 with more than 50% chance) and with probability 1.0 if fixroot
is true. If this choice does not work and if fixroot
is false, the other half of edge2 is proposed as the partner hybrid edge. Note that choosing the "bottom" half of edge2 as the partner edge requires to flip the direction of edge 2, and to move the root accordingly (to the original child of edge2).
examples
julia> net = readTopology("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");
julia> using Random
julia> Random.seed!(170);
julia> PhyloNetworks.addhybridedge!(net, true, true)
(PhyloNetworks.Node:
number:9
name:H3
hybrid node
attached to 3 edges, numbered: 5 16 17
, PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:17
length:0.01
minor hybrid edge with gamma=0.32771460911632916
attached to 2 node(s) (parent first): 8 9
)
julia> writeTopology(net, round=true, digits=2)
"((S1,(((#H1,S4),((S2,(S3)#H1))#H3:::0.67))#H2),((#H2,S5),#H3:0.01::0.33));"
PhyloNetworks.addhybridedgeLiNC!
— MethodaddhybridedgeLiNC!(obj::SSM, currLik::Float64,
no3cycle::Bool, nohybridladder::Bool,
constraints::Vector{TopologyConstraint},
ftolAbs::Float64,
γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)
Completes checks, adds hybrid in a random location, updates SSM object, and optimizes branch lengths and gammas locally as part of PhyLiNC optimization.
Return true if accepted add hybrid move. If move not accepted, return false. If cannot add a hybrid, return nothing.
For arguments, see phyLiNC
. Called by optimizestructure!
.
PhyloNetworks.addindividuals!
— Methodaddindividuals!(net::HybridNetwork, species::AbstractString, individuals::Vector)
Add individuals to their species leaf as a star, and return the corresponding species constraint. nothing
is returned in 2 cases:
- if
individuals
contains only 1 individual, in which case the name of the leaf for that species is replaced by the name of its one individual representative - if
species
is not found in the network
Spaces in individuals' names are eliminated, see cleantaxonname
.
Called by mapindividuals
.
examples
julia> net = readTopology("(S8,(((S1,S4),(S5)#H1),(#H1,S6)));");
julia> PhyloNetworks.addindividuals!(net, "S1", ["S1A", "S1B", "S1C"])
Species constraint, on tips: S1A, S1B, S1C
stem edge number 2
crown node number 2
julia> writeTopology(net) # 3 new nodes, S1 now internal: not a tip
"(S8,((((S1A,S1B,S1C)S1,S4),(S5)#H1),(#H1,S6)));"
PhyloNetworks.addleaf!
— Functionaddleaf!(net::HybridNetwork, node::Node, leafname::String, edgelength::Float64=-1.0)
addleaf!(net::HybridNetwork, edge::Edge, leafname::String, edgelength::Float64=-1.0)
Add a new external edge between node
or between the "middle" of edge
and a newly-created leaf, of name leafname
. By default, the new edge length is missing (-1).
output: newly created leaf node.
examples
julia> net = readTopology("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");
julia> net.node[6].name # leaf S4
"S4"
julia> PhyloNetworks.addleaf!(net, net.node[6], "4a"); # adding leaf to a node
julia> writeTopology(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(4a)S4)))#H2),(#H2,S5));"
julia> PhyloNetworks.addleaf!(net, net.node[6], "4b");
julia> writeTopology(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(4a,4b)S4)))#H2),(#H2,S5));"
julia> net = readTopology("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");
julia> [n.name for n in net.edge[7].node] # external edge to S4
2-element Vector{String}:
"S4"
""
julia> PhyloNetworks.addleaf!(net, net.edge[7], "4a"); # adding leaf to an edge
julia> writeTopology(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(S4,4a))))#H2),(#H2,S5));"
PhyloNetworks.adjacentedges
— Methodadjacentedges(centeredge::Edge)
Vector of all edges that share a node with centeredge
.
PhyloNetworks.afterOptBL!
— MethodafterOptBL
road map
Function that will check if there are h==0,1;t==0,hz==0,1
cases in a network after calling optBL!
.
Arguments:
closeN=true
will move origin/target, if false, add/delete N times before giving up (we have only testedcloseN=true
)origin=true
will move origin, false will move target. We added this to avoid going back and forth between the same networksmovesgamma
vector of counts of number of times each move is proposed to fix a gamma zero problem:(add,mvorigin,mvtarget,chdir,delete,nni)
Procedure:
First we split the
ht
vector innh,nt,nhz
(gammas, lengths, gammaz)If we find a
h==0,1
, we loop throughnh
to find a hybrid edge with h==0 or 1 and want to try to fix this by doing:gammaZero!(currT,d,edge,closeN,origin,N,movesgamma)
which returns true if there was a successful change, and we stop the loop
If we find a
t==0
, we loop through allnt
to find such edge, and do NNI move on this edge; return true if change successful and we stop the loopIf we find a
hz==0,1
, we loop throughnhz
to find such hybrid edge and callgammaZero
againIf we did a successful change, we run
optBL
again, and recheck if there are no more problems.Returns successchange, flagh, flagt,flaghz (flag=true means no problems)
If it is the multiple alleles case, it will not try to fix
h==0,1;hz==0,1
because it can reach a case that violates the multiple alleles condition. If we add a check here, things become horribly slow and inefficient, so we just delete a hybridization that hash==0,1;hz==0,1
** Important: ** afterOptBL
is doing only one change, but we need to repeat multiple times to be sure that we fix all the gamma zero problems, which is why we call afterOptBLRepeat
PhyloNetworks.afterOptBLAll!
— MethodafterOptBLAll
road map
After optBL
, we want to call afterOptBLAll
(or afterOptBLAllMultipleAlleles
) to check if there are h==0,1
; t==0
; hz==0,1
. This function will try to fix the gamma zero problem, but if it cannot, it will call moveDownLevel
, to delete the hybridization from the network.
Procedure:
While startover=true
and tries<N
- While
badliks < N2
(number of bad pseudolikelihoods are less thanN2
)- Run
success = afterOptBLRepeat
- If
success = true
(it changed something):- If worse pseudolik, then go back to original topology
currT
, setstartover=true
andbadliks++
- If better pseudolik, then check flags. If all good, then
startover=false
; otherwisestartover = true
- If worse pseudolik, then go back to original topology
- If
success = false
(nothing changed), then setbadliks=N2+1
(to end the while oncurrT
)- If all flags are ok, then
startover = false
- If bad h or hz, then call
moveDownLevel
(delete one hybridization), and setstartover = true
(maybe deleting that hybridization did not fix other gamma zero problems) - If bad t, then set
startover = false
- If all flags are ok, then
- Run
- If left second while by back to original
currT
, and still bad h/hz, then move down one level, andstartover=true
; otherwisestartover=false
If first while ends by tries>N
, then it checks one last time the flags, if bad h/hz will move down one level, and exit
PhyloNetworks.afterOptBLRepeat!
— MethodafterOptBLRepeat
road map
afterOptBL
is doing only one change, but we need to repeat multiple times to be sure that we fix all the gamma zero problems, which is why we call afterOptBLRepeat
. This function will repeat afterOptBL
every time a successful change happened; this is done only if closeN=false
, because we would delete/add hybridizations and need to stop after tried N times. If closeN=true
(default), then afterOptBLRepeat
only does one afterOptBL
, because in this case, only the neighbor edges need to be tested, and this would have been done already in gammaZero
.
PhyloNetworks.allowrootbelow!
— Methodallowrootbelow!(net::HybridNetwork)
Set containRoot
to true
for each edge below the root node, then traverses net
in preorder to update containRoot
of all edges (stopping at hybrid nodes): see the other methods. Assumes correct isChild1
edge field.
PhyloNetworks.allowrootbelow!
— Methodallowrootbelow!(e::Edge)
allowrootbelow!(n::Node, parent_edge_of_n::Edge)
Set containRoot
to true
for edge e
and all edges below, recursively. The traversal stops whenever a hybrid node is encountered: if the child of e
is a hybrid node (that is, if e
is a hybrid edge) or if n
is a hybrid node, then the edges below e
or n
are not traversed.
PhyloNetworks.anova
— Methodanova(objs::PhyloNetworkLinearModel...)
Takes several nested fits of the same data, and computes the F statistic for each pair of models.
The fits must be results of function phylolm
called on the same data, for models that have more and more effects.
Returns a DataFrame object with the anova table.
PhyloNetworks.assignhybridnames!
— Methodassignhybridnames!(net)
Assign names to hybrid nodes in the network net
. Hybrid nodes with an empty name
field ("") are modified with a name that does not conflict with other hybrid names in the network. The preferred name is "H3" if the node number is 3 or -3, but an index other than 3 would be used if "H3" were the name of another node already.
If two hybrid nodes have non-empty and equal names, the name of one of them is changed and re-assigned as described above (with a warning).
PhyloNetworks.biconnectedcomponent_entrynodes
— Functionbiconnectedcomponent_entrynodes(net, bcc, preorder=true)
Array containing the entry node of the each biconnected component in bcc
. bcc
is supposed to contain the biconnected components as output by biconnectedComponents
, that is, an array of array of edges.
These entry nodes depend on the rooting (whereas the BCC only depend on the unrooted graph). They are either the root of the network or cut node (articulation points).
PhyloNetworks.biconnectedcomponent_exitnodes
— Functionbiconnectedcomponent_exitnodes(net, bcc, preorder=true)
Array containing an array of the exit node(s) of the each biconnected component in bcc
. bcc
is supposed to contain the biconnected components as output by biconnectedComponents
, that is, an array of array of edges.
These exit nodes depend on the rooting (whereas the BCC only depend on the unrooted graph). The degree of a blob is the number of exit nodes + 1 if the blob doesn't contain the root (its entry node is a cut node), or + 0 if the blob contains the root (which enters into the blob but isn't a cut node).
Warning (or positive side effect?): the edge .inCycle
attribute is modified. It stores the index (in bcc
) of the biconnected component that an edge belongs to. If an edge doesn't belong in any (e.g. if trivial blobs are ignored), then its .inCycle
is set to -1.
PhyloNetworks.blobInfo
— FunctionblobInfo(network, ignoreTrivial=true)
Calculate the biconnected components (blobs) using function biconnectedComponents
then:
- set node field
isExtBadTriangle
to true at the root of each non-trivial blob (and at the network root), false otherwise. (a better name for the field would be something like "isBlobRoot".) - output:
- array of nodes that are the roots of each non-trivial blob, and the network root. If the root of the full network is not part of a non-trivial blob, a corresponding blob is added to the list.
- array of arrays: for each non-trivial blob, array of major hybrid edges in that blob.
- array of arrays: same as #2 but for minor hybrid edges, with hybrids listed in the same order, for each blob.
Blobs are ordered in reverse topological ordering (aka post order). If ignoreTrivial
is true, trivial components are ignored.
keyword argument: checkPreorder
, true by default. If false, the isChild1
edge field and the net.nodes_changed
network field are supposed to be correct.
warning: see biconnectedComponents
for node attributes modified during the algorithm.
PhyloNetworks.breakedge!
— Methodbreakedge!(edge::Edge, net::HybridNetwork)
Break an edge into 2 edges, each of length half that of original edge, creating a new node of degree 2. Useful to root network along an edge. Return the new node and the new edge, which is the "top" half of the original starting edge. These new node & edge are pushed last in net.node
and net.edge
.
If the starting edge was:
n1 --edge--> n2
then we get this:
n1 --newedge--> newnode --edge--> n2
isChild1
and containRoot
are updated, but not fields for level-1 networks like inCycle
, partition
, gammaz
, etc.
examples
julia> net = readTopology("(((S8,S9),((((S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));");
julia> length(net.node)
19
julia> net.edge[4] # edge 4 goes from node -8 to 3
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:4
length:-1.0
attached to 2 node(s) (parent first): -8 3
julia> newnode, newedge = PhyloNetworks.breakedge!(net.edge[4], net);
julia> length(net.node) # one more than before
20
julia> newedge # new edge 21 goes from node -8 and 11 (new)
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:21
length:-1.0
attached to 2 node(s) (parent first): -8 11
julia> net.edge[4] # original edge 4 now goes from node 11 (new) to 3
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:4
length:-1.0
attached to 2 node(s) (parent first): 11 3
julia> writeTopology(net) # note extra pair of parentheses around S1
"(((S8,S9),((((S4,(S1)),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"
See also: fuseedgesat!
PhyloNetworks.calculateObsCFAll!
— MethodcalculateObsCFAll!(DataCF, taxa::Union{Vector{<:AbstractString}, Vector{Int}})
Calculate observed concordance factors: update the .quartet[i].obsCF
values of the DataCF
object based on its .tree vector.
calculateObsCFAll!(vector of quartets, vector of trees, taxa)
Calculate observed concordance factors: update the .obsCF
values of the quartets, based on the trees, and returns a new DataCF
object with these updated quartets and trees.
calculateObsCFAll_noDataCF!(vector of quartets, vector of trees, taxa)
update the .obsCF
values of the quartets based on the trees, but returns nothing.
Warning: all these functions need input trees (without any reticulations: h=0).
See also: countquartetsintrees
, which uses a faster algorithm, processing each input tree only once. calculateObsCFAll_noDataCF!
processes each input tree # quartet
times.
PhyloNetworks.checkMapDF
— MethodcheckMapDF(mapping_allele2species::DataFrame)
Check that the data frame has one column named "allele" or "individual", and one column named "species". Output: indices of these column.
PhyloNetworks.checkNumHybEdges!
— MethodcheckNumHybEdges!(net)
Check for consistency between hybrid-related attributes in the network:
- for each hybrid node: 2 or more hybrid edges
- exception: allows for a leaf to be attached to a single hybrid edge
- exactly 2 incoming parent hybrid edges
Run after storeHybrids!
. See also check2HybEdges
.
PhyloNetworks.check_matchtaxonnames!
— Methodcheck_matchtaxonnames!(species, data, net)
Modify species
and dat
by removing the species (rows) absent from the network. Return a new network (net
is not modified) with tips matching those in species: if some species in net
have no data, these species are pruned from the network. The network also has its node names reset, such that leaves have nodes have consecutive numbers starting at 1, with leaves first. Used by fitdiscrete
to build a new StatisticalSubstitutionModel
.
PhyloNetworks.check_nonmissing_nonnegative_edgelengths
— Functioncheck_nonmissing_nonnegative_edgelengths(net, str="")
Throw an Exception if net
has undefined edge lengths (coded as -1.0) or negative edge lengths. The error message indicates the number of the offending edge(s), followed by str
.
PhyloNetworks.checknetwork_LiNC!
— Functionchecknetwork_LiNC!(net::HybridNetwork, maxhybrid::Int, no3cycle::Bool,
nohybridladder::Bool,
constraints=TopologyConstraint[]::Vector{TopologyConstraint},
verbose::Bool=false)
Check that net
is an adequate starting network before phyLiNC!
: remove nodes of degree 2 (possibly including the root); check that net
meets the topological constraints
, has no polytomies (except at species constraints), and maxhybrid
of fewer reticulations. According to user-given options, also check for the absence of 3-cycles and/or hybrid ladders.
julia> maxhybrid = 3;
julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> preorder!(net) # for correct unzipping in checknetwork_LiNC!
julia> PhyloNetworks.checknetwork_LiNC!(net, maxhybrid, true, true)
HybridNetwork, Rooted Network
8 edges
8 nodes: 4 tips, 1 hybrid nodes, 3 internal tree nodes.
tip labels: A, B, C, D
((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0,D:2.5);
PhyloNetworks.checkspeciesnetwork!
— Methodcheckspeciesnetwork!(network::HybridNetwork,
constraints::Vector{TopologyConstraint})
Check that the network satisfies a number of requirements:
- no polytomies, other than at species constraints: throws an error otherwise
- no unnecessary nodes: fuse edges at nodes with degree two, including at the root (hence the bang:
net
may be modified) - topology constraints are met.
Output: true if all is good, false if one or more clades are violated.
PhyloNetworks.cleantaxonname
— Methodcleantaxonname(taxonname::AbstractString)
Return a String with leading and trailing spaces removed, and interior spaces replaced by underscores: good for using as tip names in a network without causing future error when reading the newick description of the network.
PhyloNetworks.constraintviolated
— Methodconstraintviolated(network::HybridNetwork,
constraints::Vector{TopologyConstraint})
True if network
violates one (or more) of the constraints of type 1 (individuals in a species group) or type 2 (must be clades in the major tree). Warning: constraints of type 3 are not implemented.
PhyloNetworks.defaultsubstitutionmodel
— Functiondefaultsubstitutionmodel(network, modsymbol::Symbol, data::DataFrame,
siteweights::Vector)
Return a statistical substitution model (SSM) with appropriate state labels and a rate appropriate for the branch lengths in net
(see startingrate
). The data
frame must have the actual trait/site data in columns 2 and up, as when the species names are in column 1. For DNA data, the relative rate model is returned, with a stationary distribution equal to the empirical frequencies.
PhyloNetworks.deleteEdge!
— MethoddeleteEdge!(net::HybridNetwork, e::Edge; part=true)
deleteEdge!(net::QuartetNetwork, e::Edge)
Delete edge e
from net.edge
and update net.numEdges
. If part
is true, update the network's partition field.
PhyloNetworks.deleteLeaf!
— MethoddeleteLeaf!(net::HybridNetwork, leaf::AbstractString)
deleteLeaf!(net::Network, leaf::Node)
Deletes the leaf taxon from the network. The leaf argument is the name of the taxon to delete.
Warnings:
- requires a level-1 network with up-to-date attributes for snaq! (e.g. non-missing branch lengths, gammaz, etc.)
- does not care where the root is and does not update it to a sensible location if the root is affected by the leaf removal.
- does not merge edges, i.e. does not remove all nodes of degree 2. Within snaq!, this is used to extract quartets and to keep track of which edge lengths in the original network map to the quartet network.
PhyloNetworks.deleteNode!
— MethoddeleteNode!(net::HybridNetwork, n::Node)
deleteNode!(net::QuartetNetwork, n::Node)
Delete node n
from a network, i.e. removes it from net.node, and from net.hybrid or net.leaf as appropriate. Update attributes numNodes
, numTaxa
, numHybrids
. Warning: net.names
is not updated, and this is a feature (not a bug) for networks of type QuartetNetwork.
Warning: if the root is deleted, the new root is arbitrarily set to the first node in the list. This is intentional to save time because this function is used frequently in snaq!, which handles semi-directed (unrooted) networks.
PhyloNetworks.deletehybridedge!
— Functiondeletehybridedge!(net::HybridNetwork, edge::Edge,
nofuse=false, unroot=false,
multgammas=false, simplify=true, keeporiginalroot=false)
Delete a hybrid edge
from net
and return the network. The network does not have to be of level 1 and may contain polytomies, although each hybrid node must have exactly 2 parents. Branch lengths are updated, allowing for missing values.
If nofuse
is false, when edge
is removed, its child (hybrid) node is removed and its partner hybrid edge is removed. Its child edge is retained (below the hybrid node), fused with the former partner, with new length: old length + length of edge
's old partner. Any 2-cycle is simplified into a single edge, unless simplify
is false.
If nofuse
is true, edges with descendant leaves are kept as is, and are not fused. Nodes are retained during edge removal, provided that they have at least one descendant leaf. The hybrid edge that is partner to edge
becomes a tree edge, but has its γ value unchanged (it is not set to 1), since it is not merged with its child edge after removal of the reticulation. Also, 2-cycles are not simplified if nofuse
is true. That is, if we get 2 hybrid edges both from the same parent to the same child, these hybrid edges are retained without being fused into a single tree edge.
If unroot
is false and if the root is up for deletion during the process, it will be kept if it's of degree 2 or more. A root node of degree 1 will be deleted unless keeporiginalroot
is true.
If multgammas
is true: inheritance weights are kept by multiplying together the inheritance γ's of edges that are merged. For example, if there is a hybrid ladder, the partner hybrid edge remains a hybrid edge (with a new partner), and its γ is the product of the two hybrid edges that have been fused. So it won't add up to 1 with its new partner's γ.
If keeporiginalroot
is true, a root of degree one will not be deleted.
Warnings:
containRoot
is updated, but this requires correctisChild1
fields- if the parent of
edge
is the root and ifnofuse
is false, the root is moved to keep the network unrooted with a root of degree two. - does not update attributes needed for snaq! (like inCycle, edge.z, edge.y etc.)
PhyloNetworks.deletehybridedgeLiNC!
— MethoddeletehybridedgeLiNC!(obj::SSM, currLik::Float64,
no3cycle::Bool,
constraints::Vector{TopologyConstraint},
γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)
Deletes a random hybrid edge and updates SSM object as part of PhyLiNC optimization. Return true if the move is accepted, false if not.
Note: if net is tree-child and a hybrid edge is deleted, then the resulting network is still tree-child. But: if net
has no hybrid ladders, deleting an existing reticulation may create a hybrid ladder. It happens if there is a reticulation above a W structure (tree node whose children are both hybrid nodes). This creates a problem if the user asked for nohybridladder
: this request may not be met. fixit: In future, we could check for this case and prevent it.
For a description of arguments, see phyLiNC
. Called by optimizestructure!
, which does some checks.
PhyloNetworks.descendants
— Functiondescendants(edge::Edge, internal::Bool=false)
Return the node numbers of the descendants of a given edge: all descendant nodes if internal
is true (internal nodes and tips), or descendant tips only otherwise (defaults).
edge
should belong in a rooted network for which isChild1
is up-to-date. Run directEdges!
beforehand. This is very important, otherwise one might enter an infinite loop, and the function does not test for this.
Examples
julia> net5 = "(A,((B,#H1),(((C,(E)#H2),(#H2,F)),(D)#H1)));" |> readTopology |> directEdges! ;
julia> PhyloNetworks.descendants(net5.edge[12], true) # descendants of 12th edge: all of them
7-element Vector{Int64}:
-6
-7
4
6
5
-9
7
julia> PhyloNetworks.descendants(net5.edge[12]) # descendant leaves only
3-element Vector{Int64}:
4
5
7
PhyloNetworks.directionalconflict
— Methoddirectionalconflict(parent::Node, edge::Edge, hybridpartnernew::Bool)
Check if creating a hybrid edge down of parent
node into the middle of edge
would create a directed cycle in net
, i.e. not a DAG. The proposed hybrid would go in the direction of edge
down its child node if hybridpartnernew
is true. Otherwise, both halves of edge
would have their direction reversed, for the hybrid to go towards the original parent node of edge
. Does not modify the network.
Output: true
if a conflict would arise (non-DAG), false
if no conflict.
PhyloNetworks.discrete_backwardlikelihood_trait!
— Methoddiscrete_backwardlikelihood_trait!(obj::SSM, tree::Integer, ri::Integer)
Update and return the backward likelihood (last argument backwardlik
) assuming rate category ri
and tree index tree
, using current forward and backwards likelihoods in obj
: these depend on the trait (or site) given to the last call to discrete_corelikelihood_trait!
. Used by ancestralStateReconstruction
.
warning: assume correct transition probabilities.
PhyloNetworks.discrete_corelikelihood!
— Methoddiscrete_corelikelihood!(obj::StatisticalSubstitutionModel;
whichtrait::AbstractVector{Int} = 1:obj.nsites)
Calculate the likelihood and update obj.loglik
for discrete characters on a network, calling discrete_corelikelihood_trait!
. The algorithm extracts all displayed trees and weighs the likelihood under all these trees. The object's partial likelihoods are updated:
- forward and direct partial likelihoods are re-used, one trait at a time,
- overall likelihoods on each displayed tree, given each rate category and for each given site/trait: are cached in
_loglikcache
.
PhyloNetworks.discrete_corelikelihood_trait!
— Methoddiscrete_corelikelihood_trait!(obj::SSM, t::Integer, ci::Integer, ri::Integer)
Return the likelihood for tree t
, trait (character/site) index ci
and rate category ri
. Update & modify the forward & directional log-likelihoods obj.forwardlik
and obj.directlik
, which are indexed by [state, nodenumber or edgenumber]. Used by discrete_corelikelihood!
.
Preconditions: obj.logtrans
updated, edges directed, nodes/edges preordered
PhyloNetworks.displayedNetworks!
— FunctiondisplayedNetworks!(net::HybridNetwork, node::Node, keepNode=false,
unroot=false, multgammas=false, keeporiginalroot=false)
Extracts the two networks that simplify a given network at a given hybrid node: deleting either one or the other parent hybrid edge. If nofuse
is true, the original edges (and nodes) are kept in both networks, provided that they have one or more descendant leaves. If unroot
is true, the root will be deleted if it becomes of degree 2. If keeporiginalroot
is true, the root is retained even if it is of degree 1.
- the original network is modified: the minor edge removed.
- returns one HybridNetwork object: the network with the major edge removed
PhyloNetworks.edgerelation
— Methodedgerelation(e::Edge, node::Node, origin::Edge)
Return a symbol:
:origin
ife
is equal toorigin
, and otherwise::parent
ife
is a parent ofnode
,:child
ife
is a child ofnode
using the isChild1
attribute of edges. Useful when e
iterates over all edges adjacent to node
and when origin
is one of the edges adjacent to node
, to known the order in which these edges come.
example:
labs = [edgerelation(e, u, uv) for e in u.edge] # assuming u is a node of edge uv
parentindex = findfirst(isequal(:parent), labs) # could be 'nothing' if no parent
childindices = findall( isequal(:child), labs) # vector. could be empty
PhyloNetworks.fliphybrid!
— Functionfliphybrid!(net::HybridNetwork, hybridnode::Node, minor=true::Bool,
nohybridladder=false::Bool,
constraints=TopologyConstraint[]::Vector{TopologyConstraint})
Flip the direction of a single hybrid edge: the minor parent edge of hybridnode
by default, or the major parent edge if minor
is false. The parent node of the hybrid edge becomes the new hybrid node. The former hybrid edge partner is converted to a tree edge (with γ=1), and hybridnode
becomes a tree node.
For the flip to be admissible, the new network must be a semi-directed phylogenetic network: with a root such that the rooted version is a DAG. If nohybridladder
is false (default), the flip may create a hybrid ladder If nohybridladder
is true and if the flip would create a hybrid ladder, then the flip is not admissible. A hybrid ladder is when a hybrid child of another hybrid.
The new hybrid partner is an edge adjacent to the new hybrid node, such that the flip is admissible (so it must be a tree edge). The flipped edge retains its original γ. The new hybrid edge is assigned inheritance 1-γ.
Output: (newhybridnode, flippededge, oldchildedge)
if the flip is admissible, nothing
otherwise.
The network is unchanged if the flip is not admissible. If the flip is admissible, the root position may be modified, and the direction of tree edges (via isChild1
) is modified accordingly. If the root needs to be modified, then the new root is set to the old hybrid node.
The index of the new hybrid node in net.hybrid
is equal to that of the old hybridnode
.
Warning: Undoing this move may not recover the original root if the root position was modified.
PhyloNetworks.fliphybrid!
— Functionfliphybrid!(net::HybridNetwork, minor=true::Bool, nohybridladder=false::Bool,
constraints=TopologyConstraint[]::Vector{TopologyConstraint})
Cycle through hybrid nodes in random order until an admissible flip is found. At this hybrid node, flip the indicated hybrid parent edge (minor or major).
If an admissible flip is found, return the tuple: newhybridnode, flippededge, oldchildedge. Otherwise, return nothing.
The flip can be undone with fliphybrid!(net, newhybridnode, minor, constraints)
, or fliphybrid!(net, newhybridnode, !flippededge.isMajor, constraints)
more generally, such as if the flipped edge had its γ modified after the original flip.
Warnings
- if the root needed to be reset and if the original root was of degree 2, then a node of degree 2 remains in the modified network.
- undoing the flip may not recover the original root in case the root position was modified during the original flip.
PhyloNetworks.fliphybridedgeLiNC!
— MethodfliphybridedgeLiNC!(obj::SSM, currLik::Float64, nohybridladder::Bool,
constraints::Vector{TopologyConstraint}, ftolAbs::Float64,
γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)
Randomly chooses a minor hybrid edge and tries to flip its direction (that is, reverse the direction of gene flow) using fliphybrid!
. If the flip fails, it looks for the next minor hybrid edge. If all minor edges fail, tries to flip major edges in random order.
After a successful flip, optimize branch lengths and gammas, then compare the likelihood of the previous network with the new one.
Return:
- true if a flip hybrid move was completed and improved the likelihood
- false if a move was completed but did not improve the likelihoood
- nothing if no hybrid flip move was admissible in this network.
Warning: Undoing this move may not recover the original root if the root position was modified.
For arguments, see phyLiNC
. Called by optimizestructure!
.
PhyloNetworks.fuseedgesat!
— Functionfuseedgesat!(i::Integer,net::HybridNetwork, multgammas=false::Bool)
Removes i
th node in net.node, if it is of degree 2. The parent and child edges of this node are fused. If either of the edges is hybrid, the hybrid edge is retained. Otherwise, the edge with the lower edge number is retained.
Reverts the action of breakedge!
.
returns the fused edge.
PhyloNetworks.gammaZero!
— MethodgammaZero
road map
Function that tries to fix a gamma zero problem (h==0,1; t==0; hz==0,1
)
- First tries to do
changeDirection
- If not successful from start, we call
moveHybrid
- If successful move (change direction), we call
optBL
and check if we fixed the problem - If problem fixed and we do not have worse pseudolik, we return
success=true
- If still problem or worse pseudolik, we call
moveHybrid
** Important: ** Any function (afterOptBL
) calling gammaZero
is assuming that it only made a change, so if the returned value is true, then a change was made, and the other function needs to run optBL
and check that all parameters are 'valid'. If the returned value is false, then no change was possible and we need to remove a hybridization if the problem is h==0,1; hz==0,1. If the problem is t==0, we ignore this problem.
PhyloNetworks.getDataValue!
— MethodgetdataValue!(s::IO, int, numLeft::Array{Int,1})
Helper function for parseEdgeData!
. Read a single floating point edge data value in a tree topology. Ignore (and skip) nexus-style comments before & after the value (see readnexuscomment
).
Return -1.0 if no value exists before the next colon, return the value as a float otherwise. Modifies s by advancing past the next colon character. Only call this function to read a value when you know a numerical value exists!
PhyloNetworks.getGammas
— MethodgetGammas(net)
Vector of inheritance γ's of all major edges (tree edges and major hybrid edges), ordered according to the pre-order index of their child node, assuming this pre-order is already calculated (with up-to-date field nodes_changed
). Here, a "major" edge is an edge with field isMajor
set to true, regardless of its actual γ (below, at or above 0.5).
See setGammas!
PhyloNetworks.getHeights
— FunctiongetHeights(net, checkpreorder::Bool=true)
Return the height (distance to the root) of all nodes, assuming a time-consistent network (where all paths from the root to a given hybrid node have the same length) but not necessarily ultrametric: tips need not all be at the same distance from the root. If checkpreorder=false
, assumes the network has already been preordered with preorder!
, because it uses getGammas
and setGammas!
).
Output: vector of node heights, one per node, in the same order as in net.nodes_changed
. Examples:
julia> net = readTopology("(((C:1,(A:1)#H1:1.5::0.7):1,(#H1:0.3::0.3,E:2.0):2.2):1.0,O:5.2);");
julia> # using PhyloPlots; plot(net, useedgelength=true, showedgelength=true, shownodenumber=true); # to see
julia> nodeheight = PhyloNetworks.getHeights(net)
9-element Vector{Float64}:
0.0
5.2
1.0
3.2
5.2
2.0
3.5
4.5
3.0
julia> [node.number => (nodeheight[i], node.name) for (i,node) in enumerate(net.nodes_changed)]
9-element Vector{Pair{Int64, Tuple{Float64, String}}}:
-2 => (0.0, "")
5 => (5.2, "O")
-3 => (1.0, "")
-6 => (3.2, "")
4 => (5.2, "E")
-4 => (2.0, "")
3 => (3.5, "H1")
2 => (4.5, "A")
1 => (3.0, "C")
PhyloNetworks.getNeighborsTarget
— MethodgetNeighborsTarget(hybrid_node, majoredge)
Vector of edges that are incident to either:
- the node incident to
majoredge
other thanhybrid_node
, or - the tree child of
hybrid_node
.
This vector of edges is used as the list of suitable neighbors of "othermin" to move the target of a hybrid edge, in moveTargetUpdateRepeat!
.
PhyloNetworks.getTipSubmatrix
— MethodgetTipSubmatrix(M, net; indexation=:both)
Extract submatrix of M, with rows and/or columns corresponding to tips in the network, ordered like in net.leaf
. In M, rows and/or columns are assumed ordered as in net.nodes_changed
.
indexation: one of :rows
, :cols
or :both
: are nodes numbers indexed in the matrix by rows, by columns, or both? Subsetting is taken accordingly.
PhyloNetworks.getlengths
— Methodgetlengths(edges::Vector{Edge})
Vector of edge lengths for a vector of edges
.
PhyloNetworks.getparameters
— Methodgetparameters(obj::RateVariationAcrossSites)
Return a copy of the alpha and/or pinv parameters of model obj
, in a single vector.
PhyloNetworks.getparamindex
— Methodgetparamindex(obj::RateVariationAcrossSites)
Indices of parameters in (p_invariable, alpha).
PhyloNetworks.hardwiredClusterDistance_unrooted
— MethodhardwiredClusterDistance_unrooted(net1::HybridNetwork, net2::HybridNetwork)
Miminum hardwired cluster dissimilarity between the two networks, considered as unrooted (or semi-directed). This dissimilarity is defined as the minimum rooted distance, over all root positions that are compatible with the direction of hybrid edges. Called by hardwiredClusterDistance
.
To avoid repeating identical clusters, all degree-2 nodes are deleted before starting the comparison. Since rooting the network at a leaf creates a root node of degree 2 and an extra cluster, leaves are excluded from possible rooting positions.
PhyloNetworks.hashybridladder
— Methodhashybridladder(net::HybridNetwork)
Return true if net
contains a hybrid ladder: where a hybrid node's child is itself a hybrid node. This makes the network not treechild, assuming it is fully resolved. (One of the nodes does not have any tree-node child).
PhyloNetworks.hybrid3cycle
— Methodhybrid3cycle(edge1::Edge, edge2::Edge)
Check if proposed hybrid edge from edge1
into edge2
would create a 3 cycle, that is, if edge1
and edge2
have a node in common. (This move cannot create a 2-cycles because new nodes would be created in the middle of edges 1 and 2.)
PhyloNetworks.hybridEdges
— MethodhybridEdges(node::Node, e::Edge)
Return the 2 edges connected to node
other than e
, in the same order as node.edge
, except that e
absent from the list.
Despite what the name suggest, node
need not be a hybrid node! node
is assumed to have 3 edges, though.
PhyloNetworks.hybridEdges
— MethodhybridEdges(node::Node)
Return the 3 edges attached to node
in a specific order [e1,e2,e3]. Warning: assume a level-1 network with node field hasHybEdge
and edge field inCycle
up-to-date.
If node
is a hybrid node:
- e1 is the major hybrid parent edge of
node
- e2 is the minor hybrid parent edge
- e3 is the tree edge, child of
node
.
If node
is a tree node parent of one child edge:
- e1 is the hybrid edge, child of
node
- e2 is the tree edge that belongs to the cycle created by e1
- e3 is the other tree edge attached to
node
(not in a cycle)
Otherwise:
- e3 is an external edge from
node
to a leaf, if one exists.
PhyloNetworks.hybridatnode
— Methodhybridatnode(net::HybridNetwork, nodeNumber::Integer)
Move the hybrid node in a cycle to make node number nodeNumber
a hybrid node Compared to [hybridatnode!
], this method checks that net
is of level 1 (required) and does not modify it.
PhyloNetworks.inheritanceWeight
— MethodinheritanceWeight(tree::HybridNetwork)
Return the log inheritance weight of a network or tree (as provided by displayedTrees
with nofuse
= true for instance). For a tree displayed in a network, its inheritance weight is the log of the product of γ's of all edges retained in the tree. To avoid underflow, the log is calculated: i.e. sum of log(γ) across retained edges.
If any edge has a negative γ, it is assumed to mean that its γ is missing, and the function returns missing
.
Example
julia> net = readTopology("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> trees = displayedTrees(net,0.0; nofuse=true);
julia> PhyloNetworks.inheritanceWeight.(trees)
2-element Vector{Float64}:
-0.105361
-2.30259
PhyloNetworks.initializeWeightsFromLeaves!
— MethodinitializeWeightsFromLeaves!(w, net, tips, stateset, criterion)
Modify weight in w: to Inf for w[n, i] if the "tips" data has a state different from the lineage state of index i at node number n. Assumes that w was initialized to 0 for the leaves.
criterion: should be one of :softwired
, :parental
or :hardwired
.
- softwired parsimony: lineage states are in this order: ∅,{1},{2},{3},...,{nstates}
PhyloNetworks.initializeWeightsFromLeavesSoftwired!
— MethodinitializeWeightsFromLeavesSoftwired!(w, net, tips, charset)
Modify weight in w: to Inf for w[n, s] if the "tips" data has a state different from s at node number n. Assumes that w was initialized to 0 for the leaves.
PhyloNetworks.isconnected
— Methodisconnected(node1::Node, node2::Node)
Check if two nodes are connected by an edge. Return true if connected, false if not connected.
PhyloNetworks.isdescendant
— Methodisdescendant(des:Node, anc::Node)
Return true if des
is a strict descendant of anc
, using isChild1
fields to determine the direction of edges. See isdescendant_undirected
for a version that does not use isChild1
.
PhyloNetworks.isdescendant_undirected
— Methodisdescendant_undirected(des:Node, ancestor::Node, parentedge)
Return true
if des
is a strict descendant of ancestor
when starting from edge parentedge
and going towards ancestor
onward, regardless of the field isChild1
of tree edges; false
otherwise.
This is useful to know how descendant relationships would change as a result of reverting the direction of a tree edge, without actually modifying the direction (isChild1
) of any edge.
parentedge
should be connected to ancestor
(not checked). The direction of hybrid edges is respected (via isChild1
), that is, the traversal does not go from the child to the parent of a hybrid edge.
PhyloNetworks.ladderpartition
— Methodladderpartition(tree::HybridNetwork)
For each node in tree
, calculate the clade below each child edge of the node, and each clade moving up the "ladder" from the node to the root. The output is a tuple of 2 vectors (node) of vector (clade) of vectors (taxon in clade): below,above
. More specifically, for node number n
, below[n]
is generally of vector of 2 clades: one for the left child and one for the right child of the node (unless the node is of degree 2 or is a polytomy). above[n]
contains the grade of clades above node number n
.
WARNING: assumes that
- node numbers and edge numbers can be used as indices, that is, be all distinct, positive, covering exactly 1:#nodes and 1:#edges.
- edges are corrected directed (
isChild1
is up-to-date) and nodes have been pre-ordered already (fieldnodes_changed
up-to-date).
examples
julia> tree = readTopology("(O,A,((B1,B2),(E,(C,D))));");
julia> PhyloNetworks.resetNodeNumbers!(tree; checkPreorder=true, type=:postorder)
julia> printNodes(tree)
node leaf hybrid hasHybEdge name inCycle edges'numbers
1 true false false O -1 1
2 true false false A -1 2
3 true false false B1 -1 3
4 true false false B2 -1 4
8 false false false -1 3 4 5
5 true false false E -1 6
6 true false false C -1 7
7 true false false D -1 8
9 false false false -1 7 8 9
10 false false false -1 6 9 10
11 false false false -1 5 10 11
12 false false false -1 1 2 11
julia> below, above = PhyloNetworks.ladderpartition(tree);
julia> below
12-element Vector{Vector{Vector{Int64}}}:
[[1]]
[[2]]
[[3]]
[[4]]
[[5]]
[[6]]
[[7]]
[[3], [4]]
[[6], [7]]
[[5], [6, 7]]
[[3, 4], [5, 6, 7]]
[[1], [2], [3, 4, 5, 6, 7]]
julia> for n in 8:12
println("clades below node ", n, ": ", join(below[n], " "))
end
clades below node 8: [3] [4]
clades below node 9: [6] [7]
clades below node 10: [5] [6, 7]
clades below node 11: [3, 4] [5, 6, 7]
clades below node 12: [1] [2] [3, 4, 5, 6, 7]
julia> above[8:12] # clades sister to and above nodes 8 through 12:
5-element Vector{Vector{Vector{Int64}}}:
[[5, 6, 7], [1], [2]]
[[5], [3, 4], [1], [2]]
[[3, 4], [1], [2]]
[[1], [2]]
[]
PhyloNetworks.lambda!
— Methodlambda!(m::PhyloNetworkLinearModel, newlambda)
lambda!(m::ContinuousTraitEM, newlambda)
Assign a new value to the lambda parameter.
PhyloNetworks.lambda
— Methodlambda(m::PhyloNetworkLinearModel)
lambda(m::ContinuousTraitEM)
Value assigned to the lambda parameter, if appropriate.
PhyloNetworks.learnlabels
— Methodlearnlabels(model::Symbol, dat::DataFrame)
Return unique non-missing values in dat
, and check that these labels can be used to construct of substitution model of type model
.
examples:
julia> using DataFrames
julia> dat = DataFrame(trait1 = ["A", "C", "A", missing]); # 4×1 DataFrame
julia> PhyloNetworks.learnlabels(:BTSM, dat)
2-element Vector{String}:
"A"
"C"
julia> PhyloNetworks.learnlabels(:JC69, dat)
2-element Vector{String}:
"A"
"C"
PhyloNetworks.leaststableancestor
— Functionleaststableancestor(net, preorder=true::Bool)
Return (lsa, lsa_index)
where lsa
is the least stable ancestor node (LSA) in net
, and lsa_index
is the index of lsa
in net.nodes_changed
. The LSA the lowest node n
with the following property: any path between any leaf and the root must go through n
. All such nodes with this property are ancestral to the LSA (and therefore must have an index that is lower or equal to lsa_index
).
Exception: if the network has a single leaf, the output lsa
is the leaf's parent node, to maintain one external edge between the root and the leaf.
Warning: uses biconnectedComponents
and biconnectedcomponent_exitnodes
, therefore share the same caveats regarding the use of fields .inCycle
(for edges and nodes), .k
(for nodes) etc. As a positivie side effect, the biconnected components can be recovered via the edges' .inCycle
field –including the trivial blobs (cut edges).
See also: deleteaboveLSA!
PhyloNetworks.majoredgelength
— Methodmajoredgelength(net::HybridNetwork)
Generate vector of edge lengths of major net
edges organized in the same order as the edge
matrix created via majoredgematrix
. Considers values of -1.0
as missing values, recognized as NA in R
. Output: vector allowing for missing values.
Assume nodes_changed
was updated, to list nodes in pre-order.
Examples
julia> net = readTopology("(((A:3.1,(B:0.2)#H1:0.3::0.9),(C,#H1:0.3::0.1):1.1),D:0.7);");
julia> directEdges!(net); preorder!(net);
julia> PhyloNetworks.majoredgelength(net)
8-element Vector{Union{Missing, Float64}}:
missing
0.7
missing
1.1
missing
3.1
0.3
0.2
PhyloNetworks.majoredgematrix
— Methodmajoredgematrix(net::HybridNetwork)
Matrix of major edges from net
where edge[i,1] is the number of the parent node of edge i and edge[i,2] is the number of the child node of edge i. Assume nodes_changed
was updated, to list nodes in pre-order.
Examples
julia> net = readTopology("(A,(B,(C,D)));");
julia> PhyloNetworks.resetNodeNumbers!(net);
julia> PhyloNetworks.majoredgematrix(net)
6×2 Matrix{Int64}:
5 1
5 6
6 2
6 7
7 3
7 4
PhyloNetworks.makemissing!
— Methodmakemissing!(x::AbstractVector)
Turn to missing
any element of x
exactly equal to -1.0. Used for branch lengths and γs. x
needs to accept missing values. If not, this can be done with allowmissing(x)
.
PhyloNetworks.mapAllelesCFtable!
— MethodmapAllelesCFtable!(quartet CF DataFrame, mapping DataFrame, columns, write?, filename)
Modify (and return) the quartet concordance factor (CF) DataFrame: replace each allele name by the species name that the allele maps onto based on the mapping data frame. This mapping data frame should have columns named "allele" and "species" (see rename!
to change column names if need be).
If write?
is true
, the modified data frame is written to a file named "filename".
Warning: mapAllelesCFtable
takes the quartet data file as its second argument, while mapAllelesCFtable!
takes the quartet data (which it modifies) as its first argument.
PhyloNetworks.mapindividuals
— Methodmapindividuals(net::HybridNetwork, mappingFile::String)
Return a network expanded from net
, where species listed in the mapping file are replaced by individuals mapped to that species. If a species has only 1 individual, the name of the leaf for that species is replaced by the name of its one individual representative. If a species has 2 or more individuals, the leaf for that species is expanded into a "star" (polytomy if 3 or more individuals) with a tip for each individual. If a species is in the network but not listed in the mapping file, the tip for that species is left as is. Species listed in the mapping file but not present in the network are ignored.
The mapping file should be readable by CSV.File
and contain two columns: one for the species names and one for the individual (or allele) names. fixit: make this function more flexible by accepting column names
Output: individual-level network and vector of species constraint(s).
examples
julia> species_net = readTopology("(((S8,S9),((((S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));");
julia> filename = joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "mappingIndividuals.csv");
julia> filename |> read |> String |> print # to see what the mapping file contains
species,individual
S1,S1A
S1,S1B
S1,S1C
julia> individual_net, species_constraints = PhyloNetworks.mapindividuals(species_net, filename);
julia> writeTopology(individual_net, internallabel=true)
"(((S8,S9),(((((S1A,S1B,S1C)S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"
julia> species_constraints
1-element Vector{PhyloNetworks.TopologyConstraint}:
Species constraint, on tips: S1A, S1B, S1C
stem edge number 4
crown node number 3
PhyloNetworks.maxParsimonyNetRun1
— FunctionRoad map for various functions behind maxParsimonyNet
maxParsimonyNet
maxParsimonyNetRun1
maxParsimonyNetRun1!
All return their optimized network. Only maxParsimonyNet returns a rooted network (though all functions guarantee that the returned networks agree with the outgroup).
- maxParsimonyNet calls maxParsimonyNetRun1 per run, after a read(write(.)) of the starting network (to ensure level-1 and semi-directedness).
- maxParsimonyNetRun1 will make a copy of the topology, and will call findStartingTopology! to modify the topology according to random NNI/move origin/move target moves. It then calls maxParsimonyNetRun1! on the modified network
- maxParsimonyNetRun1! proposes new network with various moves (same moves as snaq), and stops when it finds the most parsimonious network, using
parsimonyGF
.
None of these functions allow for multiple alleles yet.
Note that the search algorithm keeps two HybridNetworks at a time: currT (current topology) and newT (proposed topology). Both are kept unrooted (semi-directed), otherwise the moves in proposedTop! function fail. We only root the topologies to calculate the parsimony, so we create a rooted copy (currTr, newTr) to compute parsimony score in this copied topology. We do not root and calculate parsimony score in the original HybridNetworks objects (currT,newT) because the computation of the parsimony score overwrites the inCycle attribute of the Nodes, which messes with the search moves.
Extensions:
- other criteria: hardwired, parental (only softwired implemented now)
- remove level-1 restriction: this will involve changing the proposedTop! function to use rSPR or rNNI moves (instead of the level-1 moves coded for snaq!). We need:
- functions for rSPR and rNNI moves
- create new proposedTop! function (proposedRootedTop?) to choose from the rSPR/rNNI/other moves
- have the search in maxParsimonuNetRun1! to have rooted currT and rooted newT, instead of keeping semi-directed objects (currT, newT), only to root for the parsimony score (currTr, newTr)
- outgroup is currently String or Node number, but it would be good if it allowed Edge number as an option too. Not sure best way to distinguish between Node number and Edge number, which is why left as Node number for now.
PhyloNetworks.maxParsimonyNetRun1!
— FunctionRoad map for various functions behind maxParsimonyNet
maxParsimonyNet
maxParsimonyNetRun1
maxParsimonyNetRun1!
All return their optimized network. Only maxParsimonyNet returns a rooted network (though all functions guarantee that the returned networks agree with the outgroup).
- maxParsimonyNet calls maxParsimonyNetRun1 per run, after a read(write(.)) of the starting network (to ensure level-1 and semi-directedness).
- maxParsimonyNetRun1 will make a copy of the topology, and will call findStartingTopology! to modify the topology according to random NNI/move origin/move target moves. It then calls maxParsimonyNetRun1! on the modified network
- maxParsimonyNetRun1! proposes new network with various moves (same moves as snaq), and stops when it finds the most parsimonious network, using
parsimonyGF
.
None of these functions allow for multiple alleles yet.
Note that the search algorithm keeps two HybridNetworks at a time: currT (current topology) and newT (proposed topology). Both are kept unrooted (semi-directed), otherwise the moves in proposedTop! function fail. We only root the topologies to calculate the parsimony, so we create a rooted copy (currTr, newTr) to compute parsimony score in this copied topology. We do not root and calculate parsimony score in the original HybridNetworks objects (currT,newT) because the computation of the parsimony score overwrites the inCycle attribute of the Nodes, which messes with the search moves.
Extensions:
- other criteria: hardwired, parental (only softwired implemented now)
- remove level-1 restriction: this will involve changing the proposedTop! function to use rSPR or rNNI moves (instead of the level-1 moves coded for snaq!). We need:
- functions for rSPR and rNNI moves
- create new proposedTop! function (proposedRootedTop?) to choose from the rSPR/rNNI/other moves
- have the search in maxParsimonuNetRun1! to have rooted currT and rooted newT, instead of keeping semi-directed objects (currT, newT), only to root for the parsimony score (currTr, newTr)
- outgroup is currently String or Node number, but it would be good if it allowed Edge number as an option too. Not sure best way to distinguish between Node number and Edge number, which is why left as Node number for now.
PhyloNetworks.minorreticulationgamma
— Methodminorreticulationgamma(net::HybridNetwork)
Vector of minor edge gammas (inheritance probabilities) organized in the same order as in the matrix created via minorreticulationmatrix
. Considers values of -1.0
as missing values, recognized as NA in R
. Output: vector allowing for missing values.
Examples
julia> net = readTopology("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> PhyloNetworks.minorreticulationgamma(net)
1-element Vector{Union{Float64, Missings.Missing}}:
0.1
PhyloNetworks.minorreticulationlength
— Methodminorreticulationlength(net::HybridNetwork)
Vector of lengths for the minor hybrid edges, organized in the same order as in the matrix created via minorreticulationmatrix
. Replace values of -1.0
with missing values recognized by R
. Output: vector allowing for missing values.
Examples
julia> net = readTopology("(((A:3.1,(B:0.2)#H1:0.4::0.9),(C,#H1:0.3::0.1):1.1),D:0.7);");
julia> PhyloNetworks.minorreticulationlength(net)
1-element Vector{Union{Missing, Float64}}:
0.3
PhyloNetworks.minorreticulationmatrix
— Methodminorreticulationmatrix(net::HybridNetwork)
Matrix of integers, representing the minor hybrid edges in net
. edge[i,1] is the number of the parent node of the ith minor hybrid edge, and edge[i,2] is the number of its child node. Node numbers may be negative, unless they were modified by resetNodeNumbers!
. Assumes correct isChild1
fields.
Examples
julia> net = readTopology("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> PhyloNetworks.minorreticulationmatrix(net)
1×2 Matrix{Int64}:
-6 3
PhyloNetworks.moveHybrid!
— MethodmoveHybrid
road map
Function that tries to fix a gamma zero problem (h==0,1; t==0; hz==0,1
) after changing direction of hybrid edge failed. This function is called in gammaZero
.
Arguments:
closeN=true
will try move origin/target on all neighbors (first choose minor/major edge at random, then make list of all neighbor edges and tries to put the hybrid node in all the neighbors until successful move); if false, will delete and add hybrid until successful move up to N times (this is never tested)
Returns true if change was successful (not testing optBL
again), and false if we could not move anything
PhyloNetworks.moveTargetUpdate!
— MethodmoveTargetUpdate!(net, hybrid_node, majoredge, newedge)
Modify a level-1 network net
by moving majoredge
, which should be a hybrid edge parent of hybrid_node
. Within SNaQ, majoredge
is chosen by chooseMinorMajor
.
- calls
moveTarget!(net,hybrid_node, majoredge, treeedge_belowhybrid, newedge)
, which does the move but does not update any attributes - updates all level-1 attributes needed for SNaQ: gammaz, containRoot
- un-does the move and updates if the move is invalid, through another call to
moveTarget!
but with the "undo" option.
newedge
should be a tree edge (enforced by chooseEdgeOriginTarget!
) adjacent to the parent node of majoredge
or to the tree child of hybrid_node
(enforced by getNeighborsTarget
)
Output: tuple of 3 booleans (success, flag_triangle, flag_root)
.
success
is false if the move failed (lead to an invalid network for SNaQ)flag_triangle
is false ifnet.hasVeryBadTriangle
flag_root
is false if the set of edges to place the root is empty
If success
is false, then the flags are not meant to be used downstream.
PhyloNetworks.moveroot!
— Functionmoveroot!(net::HybridNetwork, constraints=TopologyConstraint[]::Vector{TopologyConstraint})
Move the root to a randomly chosen non-leaf node that is different from the current root, and not within a constraint clade or species. Output: true
if successul, nothing
otherwise.
PhyloNetworks.nameinternalnodes!
— Methodnameinternalnodes!(net::HybridNetwork, prefix)
Add names to nodes in net
that don't already have a name. Leaves already have names; but if not, they will be given names as well. New node names will be of the form "prefixI" where I is an integer.
examples
julia> net = readTopology("((a:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);");
julia> PhyloNetworks.nameinternalnodes!(net, "I") # by default, shown without internal node names
HybridNetwork, Rooted Network
7 edges
7 nodes: 3 tips, 1 hybrid nodes, 3 internal tree nodes.
tip labels: a, b, c
((a:1.0,(b:1.0)#H1:1.0::0.8)I1:5.0,(#H1:0.0::0.2,c:1.0)I2:1.0)I3;
julia> writeTopology(net; internallabel=false) # by default, writeTopology shows internal names if they exist
"((a:1.0,(b:1.0)#H1:1.0::0.8):5.0,(#H1:0.0::0.2,c:1.0):1.0);"
julia> net = readTopology("((int5:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);"); # one taxon name starts with "int"
julia> PhyloNetworks.nameinternalnodes!(net, "int");
julia> writeTopology(net)
"((int5:1.0,(b:1.0)#H1:1.0::0.8)int6:5.0,(#H1:0.0::0.2,c:1.0)int7:1.0)int8;"
PhyloNetworks.nchoose1234
— Methodnchoose1234(nmax)
nmax+1 x 4
matrix containing the binomial coefficient "n choose k" in row n+1
and column k
. In other words, M[i,k]
gives "i-1 choose k". It is useful to store these values and look them up to rank (a large number of) 4-taxon sets: see quartetrank
.
PhyloNetworks.nj!
— Functionnj!(D::Matrix{Float64}, names::AbstractVector{<:AbstractString}=String[];
force_nonnegative_edges::Bool=false)
Construct a phylogenetic tree from the input distance matrix and vector of names (as strings), using the Neighbour-Joinging algorithm (Satou & Nei 1987). The order of the names
argument should match that of the row (and column) of the matrix D
. With force_nonnegative_edges
being true
, any negative edge length is changed to 0.0 (with a message).
Warning: D
is modified.
PhyloNetworks.nni_LiNC!
— Methodnni_LiNC!(obj::SSM, no3cycle::Bool, nohybridladder::Bool,
constraints::Vector{TopologyConstraint},
ftolAbs::Float64,
γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)
Loop over possible edges for a nearest-neighbor interchange move until one is found. Performs move and compares the original and modified likelihoods. If the modified likelihood is greater than the original by likAbs
, the move is accepted.
Return true if move accepted, false if move rejected. Return nothing if there are no nni moves possible in the network.
For arguments, see phyLiNC
.
Called by optimizestructure!
, which is called by phyLiNC!
.
Note: an RR move does not change the best likelihood. RR means that there's a hybrid ladder, so it looks like a hard polytomy at the reticulation after unzipping. Theoretically, we could avoid the re-optimizing the likelihood accept the move: just change inheritance values to get same likelihood, and update the tree priors. Not done.
PhyloNetworks.nnimax
— Methodnnimax(e::Edge)
Return the number of NNI moves around edge e
, assuming that the network is semi-directed. Return 0 if e
is not internal, and more generally if either node attached to e
does not have 3 edges.
Output: UInt8 (e.g. 0x02
)
PhyloNetworks.norootbelow!
— Methodnorootbelow!(e::Edge)
Set containRoot
to false
for edge e
and all edges below, recursively. The traversal stops if e.containRoot
is already false
, assuming that containRoot
is already false all the way below down that edge.
PhyloNetworks.optBL!
— MethodoptBL
road map
Function that optimizes the numerical parameters (branch lengths and inheritance probabilities) for a given network. This function is called multiple times inside optTopLevel!
.
- Input: network
net
, datad
- Numerical tolerances:
ftolAbs, ftolRel, xtolAbs, xtolRel
- Function based on
MixedModels
fit
function - The function assumes
net
has all the right attributes, and cannot check this inside because it would be inefficient
Procedure:
ht = parameters!(net)
extracts the vector of parameters to estimate(h,t,gammaz)
, and sets asnet.ht
; identifies a bad diamond I, setsnet.numht
(vector of hybrid node numbers for h, edge numbers for t, hybrid node numbers for gammaz), andnet.index
to keep track of the vector of parameters to estimateextractQuartet!(net,d)
does the following for all quartets ind.quartet
:- Extract quartet by deleting all leaves not in q -> create
QuartetNetwork
object saved inq.qnet
- This network is ugly and does not have edges collapsed. This is done to keep a one-to-one correspondence between the edges in
q.qnet
and the edges innet
(if we remove nodes with only two edges, we will lose this correspondence) - Calculate expected CF with
calculateExpCFAll
for a copy ofq.qnet
. We do this copy because we want to keepq.qnet
as it is (without collapsed edges into one). The function will then save theexpCF
inq.qnet.expCF
- Extract quartet by deleting all leaves not in q -> create
calculateExpCFAll!(qnet)
will- identify the type of quartet as type 1 (equivalent to a tree) or type 2 (minor CF different). Here the code will first clean up any hybrid node by removing nodes with only two edges before identifying the
qnet
(because identification depends on neighbor nodes to hybrid node); later, setqnet.which
(1 or 2),node.prev
(neighbor node to hybrid node), updatesnode.k
(number of nodes in hybridization cycle, this can change after deleting the nodes with only two edges),node.typeHyb
(1,2,3,4,5 depending on the number of nodes in the hybridization cycle and the origin/target of the minor hybrid edge; this attribute is never used). - eliminate hybridization: this will remove type 1 hybridizations first. If
qnet.which=1
, then theqnet
is similar to a tree quartet, so it will calculate the internal length of the tree quartet:qnet.t1
. - update split for
qnet.which=1
, to determine which taxa are together. For example, for the quartet 12|34, the split is [1,1,2,2] or [2,2,1,1], that is, taxon 1 and 2 are on the same side of the split. This will updateqnet.split
- update formula for
qnet.which=1
to know the order of minorCF and majorCF in the vectorqnet.expCF
. That is, if the quartet is 1342 (order inqnet.quartet.taxon
), then the expected CF should match the observed CF in 13|42, 14|32, 12|34 and theqnet
is 12|34 (given byqnet.split
),qnet.formula
will be [2,2,1] minor, minor, major calculateExpCF!(qnet)
forqnet.which=1
, it will do1-2/3exp(-qnet.t1)
ifqnet.formula[i]==1
, and1/3exp(qnet.t1)
ifqnet.formula[i]==2
. Forqnet.which=2
, we need to make sure that there is only one hybrid node, and compute the major, minor1,minor2 expected CF in the order 12|34, 13|24, 14|23 of the taxa inqnet.quartet.taxon
- identify the type of quartet as type 1 (equivalent to a tree) or type 2 (minor CF different). Here the code will first clean up any hybrid node by removing nodes with only two edges before identifying the
Then we create a NLopt
object with algorithm BOBYQA and k parameters (length of ht). We define upper and lower bounds and define the objective function that should only depend on x=(h,t,gz)
and g (gradient, which we do not have, but still need to put as argument).
The objective function obj(x,g)
calls
calculateExpCFAll!(d,x,net)
needs to be run afterextractQuartet(net,d)
that will updateq.qnet
for all quartet. Assumes thatqnet.indexht
is updated already: we only need to do this at the beginning ofoptBL!
because the topology is fixed at this point)- First it will update the edge lengths according to x
- If the
q.qnet.changed=true
(that is, any ofqnet
branches changed value), we need to callcalculateExpCFAll!(qnet)
on a copy ofq.qnet
(again because we want to leaveq.qnet
with the edge correspondence tonet
)
update!(net,x)
simply saves the new x innet.ht
Finally, we call NLopt.optimize
, and we update the net.loglik
and net.ht
at the end. After optBL
, we want to call afterOptBLAll
(or afterOptBLAllMultipleAlleles
) to check if there are h==0,1
; t==0
; hz==0,1
.
PhyloNetworks.optTopLevel!
— MethodoptTopLevel
road map
Function that does most of the heavy-lifting of snaq
. It optimizes the pseudolikelihood for a given starting topology, and returns the best network. Assumes that the starting topology is level-1 network, and has all the attributes correctly updated.
Input parameters:
- Starting topology
currT
, input dataDataCF
d
, maximum number of hybridizationshmax
- Numerical optimization parameters:
liktolAbs, Nfail, ftolRel, ftolAbs, xtolRel, xtolAbs
- Print parameters:
verbose, logfile, writelog
- Parameters to tune the search in space of networks:
closeN=true
only propose move origin/target to neighbor edges (coded, but not tested withcloseN=false
),Nmov0
vector with maximum number of trials allowed per type of move(add, mvorigin, mvtarget, chdir, delete, nni)
, by default computed inside with coupon’s collector formulas
The optimization procedure keeps track of
movescount
: count of proposed moves,movesgamma
: count of proposed moves to fix a gamma zero situation (see below for definition of this situation),movesfail
: count of failed moves by violation of level-1 network (inCycle
attribute) or worse pseudolikelihood than current,failures
: number of failed proposals that had a worse pseudolikelihood
Optimization procedure:
While the difference between current loglik and proposed loglik is greater than liktolAbs
, or failures<Nfail
, or stillmoves=true
:
Nmov
is updated based onnewT
. The type of move proposed will depend onnewT
(which is the same ascurrT
at this point). For example, ifcurrT
is a tree, we cannot propose move origin/target.move = whichMove
selects randomly a type of move, depending onNmov,movesfail,hmax,newT
with weights 1/5 by default for all, and 0 for delete. These weights are adjusted depending onnewT.numHybrids
andhmax
. IfnewT.numHybrids
is far fromhmax
, we give higher probability to adding a new hybrid (we want to reach thehmax
sooner, maybe not the best strategy, easy to change). Later, we adjust the weights bymovesfail
(first, give weight of 0 ifmovesfail[i]>Nmov[i]
, that is, if we reached the maximum possible number of moves allowed for a certain type) and then increase the probability of the other moves. So, unless one move hasw=0
, nothing changes. This could be improved by using the outlier quartets to guide the proposal of moves.whichMove
will choose a move randomly from the weights, it will returnnone
if no more moves allowed, in which case, the optimization endsflag=proposedTop!(move, newT)
will modifynewT
based onmove
. The functionproposedTop
will returnflag=true
if the move was successful (the move succeeded byinCycle
,containRoot
, available edge to make the move (more details inproposedTop
)). Ifflag=false
, thennewT
is cleaned, except for the case of multiple alleles. The functionproposedTop
keeps count ofmovescount
(successful move),movesfail
(unsuccessful move),Options:
random=true
: moves major/minor hybrid edge with prob h,1-h, respectivelyN=10
: number of trials for NNI edge.if(flag) Optimize branch lengths with
optBL
If
newT.loglik
is better thancurrT.loglik
byliktolAbs
, jump tonewT
(accepted=true
) and fixgamma=0, t=0
problems (more info onafterOptBL
)If(accepted)
failures=0
,movesfail=zeros
,movescount
for successful move +1
end while
After choosing the best network newT
, we do one last more thorough optimization of branch lengths with optBL
, we change non identifiable branch lengths to -1 (only in debug mode) and return newT
PhyloNetworks.optTopRun1!
— MethodoptTopRun1!(net, liktolAbs, Nfail, d::DataCF, hmax, etc.)
The function will run 1 run by modifying the starting topology and calling optTopLevel
. See optTopRuns!
for a roadmap.
probST
(default in snaq is 0.3) is the probability of starting one run at the same input tree. So, with probability 1-probST
, we will change the topology by a NNI move on a tree edge without neighbor hybrid. If the starting topology is a network, then with probability 1-probST
it will also modify one randomly chosen hybrid edge: with prob 0.5, the function will move origin, with prob 0.5 will do move target.
If there are multiple alleles (d.repSpecies
not empty), then the function has to check that the starting topology does not violate the multiple alleles condition.
After modifying the starting topology with NNI and/or move origin/target, optTopLevel
is called.
PhyloNetworks.optTopRuns!
— MethodRoad map for various functions behind snaq!
snaq!
optTopRuns!
optTopRun1!
optTopLevel!
optBL!
All return their optimized network.
snaq!
callsoptTopRuns!
once, after a deep copy of the starting network. If the data contain multiple alleles from a given species,snaq!
first expands the leaf for that species into 2 separate leaves, and merges them back into a single leaf after callingoptTopRuns!
.optTopRuns!
callsoptTopRun1!
several (nrun
) times. assumes level-1 network with >0 branch lengths. assumes same tips in network as in data: i.e. 2 separate tips per species that has multiple alleles. each call tooptTopRun1!
gets the same starting network.optTopRun1!
callsoptTopLevel!
once, after deep copying + changing the starting network slightly.optTopLevel!
callsoptBL!
various times and proposes new network with various moves.
PhyloNetworks.optimizeallgammas_LiNC!
— Methodoptimizeallgammas_LiNC!(obj::SSM, ftolAbs::Float64,
γcache::CacheGammaLiNC, maxeval=1000::Int)
Optimize all γ's in a network, one by one using [optimizegamma_LiNC!
] until maxeval
γ's have been optimized or until the difference in log-likelihood falls below ftolAbs
.
At the end: hybrid edges with γ=0 are deleted (if any).
Output: true if reticulations have been deleted, false otherwise. If true, updateSSM!
needs to be called afterwards, with constraints if any. (Constraints are not known here). Before updating the displayed trees in the SSM, shrink2cycles!
or shrink3cycles!
could be called, if desired, despite the (slight?) change in likelihood that this shrinking would cause. 2/3-cycles are not shrunk here.
PhyloNetworks.optimizealllengths_LiNC!
— Methodoptimizealllengths_LiNC!(obj::SSM, lcache::CacheLengthLiNC)
Optimize all branch lengths, except those below hybrid edges (whose lengths are not identifiable). Same assumptions as in optimizelocalBL_LiNC!
.
PhyloNetworks.optimizegamma_LiNC!
— Functionoptimizegamma_LiNC!(obj::SSM, focusedge::Edge,
ftolAbs::Float64, cache::CacheGammaLiNC, maxNR=10::Int)
Optimize γ on a single hybrid edge
using the Newton-Raphson method (the log-likelihood is concave). The new log-likelihood is returned after updating obj
and its fields with the new γ.
The search stops if the absolute difference in log-likelihood between two consecutive iterations is below ftolAbs
, or after maxNR
Newton-Raphson iterations.
Warnings:
- no check that
edge
is hybrid obj._loglikcache
and displayed trees (etc.) are assumed to be up-to-date, and is updated alongside the new γ.
Used by optimizelocalgammas_LiNC!
and optimizeallgammas_LiNC!
.
PhyloNetworks.optimizelength_LiNC!
— Methodoptimizelength_LiNC!(obj::SSM, edge::Edge, cache::CacheLengthLiNC, Qmatrix)
Optimize the length of a single edge
using a gradient method, if edge
is not below a reticulation (the unzipped canonical version should have a length of 0 below reticulations). Output: nothing.
Warning: displayed trees are assumed up-to-date, with nodes preordered
PhyloNetworks.optimizelocalBL_LiNC!
— MethodoptimizelocalBL_LiNC!(obj::SSM, edge::Edge, lcache::CacheLengthLiNC)
Optimize branch lengths in net
locally around edge
. Update all edges that share a node with edge
(including itself). Constrains branch lengths to zero below hybrid nodes. Return vector of updated edges
(excluding constrained edges). The tolerance values and parameters controlling the search are set in lcache
.
Used after nni!
or addhybridedge!
moves to update local branch lengths. See phyLiNC!
.
Assumptions:
- displayed trees in
obj
are up-to-date with nodes preordered, andobj.priorltw
are up-to-date. - branch lengths are at or above the lower bound, and definitely non-negative.
julia> net = readTopology("(((A:2.0,(B:0.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> fastafile = abspath(joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "simple.aln"));
julia> obj = PhyloNetworks.StatisticalSubstitutionModel(net, fastafile, :JC69);
julia> obj.loglik = -Inf64; # not calculated yet
julia> e = obj.net.edge[4];
julia> e.length
1.5
julia> using Random; Random.seed!(1234);
julia> PhyloNetworks.optimizelocalBL_LiNC!(obj, e, PhyloNetworks.CacheLengthLiNC(obj, 1e-6,1e-6,1e-2,1e-2, 10));
julia> round(e.length, sigdigits=6) # we get the lower bound from PhyLiNC in this case
1.0e-8
julia> writeTopology(obj.net; round=true)
"(((A:0.338,(B:0.0)#H1:0.04::0.9):0.0,(C:0.6,#H1:1.0::0.1):0.0):0.0,D:2.0);"
PhyloNetworks.optimizelocalgammas_LiNC!
— Functionoptimizelocalgammas_LiNC!(obj::SSM, edge::Edge,
ftolAbs::Float64, γcache::CacheGammaLiNC,
maxeval=1000::Int)
Optimize γ's in net
locally around edge
. Update all edges adjacent to edge
(including itself), one by one using [optimizegamma_LiNC!
] until maxeval
γ's have been optimized or until the difference in log-likelihood falls below ftolAbs
. nothing
is returned.
Used after nni!
or addhybridedge!
moves to update local gammas.
Assumptions:
- correct
isChild1
field foredge
and for hybrid edges - no in-coming polytomy: a node has 0, 1 or 2 parents, no more
```jldoctest julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> fastafile = abspath(joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "simple.aln"));
julia> obj = PhyloNetworks.StatisticalSubstitutionModel(net, fastafile, :JC69);
julia> obj.net.edge[3].gamma 0.9
julia> using Random; Random.seed!(1234);
julia> PhyloNetworks.optimizelocalgammas_LiNC!(obj, obj.net.edge[3], 1e-3, PhyloNetworks.CacheGammaLiNC(obj));
julia> obj.net.edge[3].gamma 0.0 ````
PhyloNetworks.optimizestructure!
— Methodoptimizestructure!(obj::SSM, maxmoves::Integer, maxhybrid::Integer,
no3cycle::Bool, nohybridladder::Bool,
nreject::Integer, nrejectmax::Integer,
constraints::Vector{TopologyConstraint},
ftolAbs::Float64,
γcache::CacheGammaLiNC, lcache::CacheLengthLiNC)
Alternate NNI moves, hybrid additions / deletions, and root changes based on their respective weights in moveweights_LiNC
([0.4, 0.2, 0.2, 0.2]). Branch lengths and hybrid γs around the NNI focal edge or around the added / deleted hybrid edge are optimized (roughly) on the proposed network. Each proposed network is accepted –or not– if the likelihood improves (or doesn't decrease much for hybrid deletion). After adding or removing a hybrid, obj
is updated, to have correct displayed trees
and node/edge numberings: done by nni_LiNC!
, addhybridedgeLiNC!
and deletehybridedgeLiNC!
.
Output: number of consecutive rejections so far.
The percent of nni moves, hybrid additions, hybrid deletions, and root changes to be performed is in PhyloNetworks.moveweights_LiNC
.
maxmoves
: maximum number of moves to be performed, including root changes, which don't actually change the semi-directed topology and likelihood.nreject
: number of consecutive rejections (ignoring root changes), prior to starting the search (from a prior function call)nrejectmax
: the search stops when there has been this number of moves that have been rejected in a row (ignoring root changes)
For a description of other arguments, see phyLiNC
.
Assumptions:
checknetworkbeforeLiNC
anddiscrete_corelikelihood!
have been called onobj.net
.- starting with a network without 2- and 3- cycles (checked by
checknetworkbeforeLiNC
)
Note: When removing a hybrid edge, always removes the minor edge.
PhyloNetworks.pairwiseTaxonDistanceGrad
— MethodpairwiseTaxonDistanceGrad(net; checkEdgeNumber=true, nodeAges=[])
3-dim array: gradient of pairwise distances between all nodes. (internal and leaves); gradient with respect to edge lengths if nodeAges
is empty; with respect to node ages otherwise. Assume correct net.nodes_changed
(preorder). This gradient depends on the network's topology and γ's only, not on branch lengths or node ages (distances are linear in either).
WARNING: edge numbers need to range between 1 and #edges.
PhyloNetworks.parseEdgeData!
— MethodparseEdgeData!(s::IO, edge, numberOfLeftParentheses::Array{Int,1})
Helper function for readSubtree!. Modifies e
according to the specified edge length and gamma values in the tree topology. Advances the stream s
past any existing edge data. Edges in a topology may optionally be followed by ":edgeLen:bootstrap:gamma" where edgeLen, bootstrap, and gamma are decimal values. Nexus-style comments [&...]
, if any, are ignored.
PhyloNetworks.parseHybridNode!
— MethodparseHybridNode!(node, parentNode, hybridName, net, hybrids)
Helper function for readSubtree!
. Create the parent edge for node
. Return this edge, and the hybrid node retained (node
or its clone in the newick string). Insert new edge and appropriate node into net
and hybrids
accordingly. Handles any type of given hybrid node. Called after a #
has been found in a tree topology.
PhyloNetworks.parseRemainingSubtree!
— MethodparseRemainingSubtree!(s::IO, numLeft, net, hybrids)
Create internal node. Helper for readSubtree!
, which creates the parent edge of the node created by parseRemainingSubtree!
: readSubtree!
calls parseRemainingSubtree!
, and vice versa. Called once a (
has been read in a tree topology and reads until the corresponding )
has been found. This function performs the recursive step for readSubtree!
. Advances s
past the subtree, adds discovered nodes and edges to net
, and hybrids
.
Does not read the node name and the edge information of the subtree root: this is done by readSubtree!
PhyloNetworks.parseTreeNode!
— MethodparseTreeNode!(node, parentNode, net)
Helper function for readSubtree!
. Insert the input tree node and associated edge (created here) into net
.
PhyloNetworks.parsimonyBottomUpFitch!
— MethodparsimonyBottomUpFitch!(node, states, score)
Bottom-up phase (from tips to root) of the Fitch algorithm: assign sets of character states to internal nodes based on character states at tips. Polytomies okay. Assumes a tree (no reticulation) and correct isChild1 attribute.
output: dictionary with state sets and most parsimonious score
PhyloNetworks.parsimonyBottomUpGF!
— MethodparsimonyBottomUpGF!(node, blobroot, nchar, w, scores,
costmatrix1, costmatrix2)
Compute the MP scores (one for each assignment of the blob root state) given the descendants of a blob, conditional on the states at predefined parents of hybrids in the blobs (one parent per hybrid) as described in
Leo Van Iersel, Mark Jones, Celine Scornavacca (2017). Improved Maximum Parsimony Models for Phylogenetic Networks, Systematic Biology, (https://doi.org/10.1093/sysbio/syx094).
Assumes a set of state guesses, ie correct initialization of w
for predefined hybrid parents, and correct fromBadDiamondI
field for the children edges of these predefined parents. fromBadDiamondI
is true for edges that are cut.
The field isExtBadTriangle
is used to know which nodes are at the root of a blob. The field isChild1
is used (and assumed correct). Field inCycle
is assumed to store the # of detached parents (with guessed states)
nchar
: number of characters considered at internal lineages. For softwired parsimony, this is # states + 1, because characters at internal nodes are ∅, {1}, {2}, etc. For parental parsimony, this is 2^#states -1, because characters are all sets on {1,2,...} except for the empty set ∅.costmatrix1
[i,j] andcostmatrix2
[k][i,j]: 2d array and vector of 2d arrays containing the cost of going to character j starting from character i when the end node has a single parent, or the cost of a child node having character j when its parents have characters k and i. These cost matrices are pre-computed depending on the parsimony criterion (softwired, hardwired, parental etc.)
used by parsimonyGF
.
PhyloNetworks.parsimonyBottomUpSoftwired!
— MethodparsimonyBottomUpSoftwired!(node, blobroot, states, w, scores)
Computing the MP scores (one for each assignment of the root state) of a swicthing as described in Algorithm 1 in the following paper:
Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C. (2015). On computing the Maximum Parsimony score of a phylogenetic network. SIAM J. Discrete Math., 29(1):559-585.
Assumes a switching (ie correct fromBadDiamondI
field) and correct isChild1 field. The field isExtBadTriangle
is used to know which nodes are at the root of a blob.
PhyloNetworks.parsimonyDiscreteFitch
— MethodparsimonyDiscreteFitch(net, tipdata)
Calculate the most parsimonious (MP) score of a network given a discrete character at the tips. The softwired parsimony concept is used: where the number of state transitions is minimized over all trees displayed in the network. Tip data can be given in a data frame, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait values are to appear in column 2 or in a column named "trait". Alternatively, tip data can be given as a dictionary taxon => trait.
also return the union of all optimized character states at each internal node as obtained by Fitch algorithm, where the union is taken over displayed trees with the MP score.
PhyloNetworks.parsimonySummaryFitch
— MethodparsimonySummaryFitch(tree, nodestates)
summarize character states at nodes, assuming a tree
PhyloNetworks.parsimonyTopDownFitch!
— MethodparsimonyTopDownFitch!(node, states)
Top-down phase (root to tips) of the Fitch algorithm: constrains character states at internal nodes based on the state of the root. Assumes a tree: no reticulation.
output: dictionary with state sets
PhyloNetworks.phyLiNC
— FunctionphyLiNC(net::HybridNetwork, fastafile::String, substitutionModel::Symbol)
Estimate a phylogenetic network from concatenated DNA data using maximum likelihood, ignoring incomplete lineage sorting (phyLiNC: phylogenetic Likelihood Network from Concatenated data). The network is constrained to have maxhybrid
reticulations at most, but can be of any level. The search starts at (or near) the network net
, using a local hill-climbing search to optimize the topology (nearest-neighbor interchange moves, add hybridizations, and remove hybridizations). Also optimized are evolutionary rates, amount of rate variation across sites, branch lengths and inheritance γs. This search strategy is run nruns
times, and the best of the nruns
networks is returned.
Return a StatisticalSubstitutionModel
object, say obj
, which contains the estimated network in obj.net
.
Required arguments:
net
: a network or tree of typeHybridNetwork
, to serve as a starting point in the search for the best network. Newick strings can be converted to this format withreadTopology
.fastafile
: file with the sequence data in FASTA format. Ambiguous states are treated as missing.substitutionModel
: A symbol indicating which substitution model is used. Choose:JC69
JC69
for the Jukes-Cantor model or:HKY85
for the Hasegawa, Kishino, Yano modelHKY85
.
The length of the edge below a reticulation is not identifiable. Therefore, phyLiNC estimates the canonical version of the network: with reticulations unzipped: edges below reticulations are set to 0, and hybrid edges (parental lineages) have estimated lengths that are increased accordingly.
If any branch lengths are missing in the input network, phyLiNC estimates the starting branch lengths using pairwise distances. Otherwise, it uses the input branch lengths as starting branch lengths, only unzipping all reticulations, as described above.
Optional arguments (default value in parenthesis):
- symbol for the model of rate variation across sites (
:noRV
for no rate variation): use:G
or:Gamma
for Gamma-distributed rates,:I
or:Inv
for a proportion of invariable sites, and:GI
or:GammaInv
for a combination (not recommended). - integer (4) for the number of categories to use in estimating evolutionary rates using a discretized gamma model. When allowing for rate variation, four categories is standard. With 1 category, no rate variation is assumed. See
RateVariationAcrossSites
.
Main optional keyword arguments (default value in parenthesis):
speciesfile
(""): path to a csv file with samples in rows and two columns: species (column 1), individual (column 2) Include this file to group individuals by species.cladefile
(""): path to a csv file containing two columns: clades and individuals used to create one or more clade topology constraints to meet during the search. (NOTE: clade contraints not yet implemented.)filename
("phyLiNC"): root name for the output files (.out
,.err
). If empty (""), files are not created, progress log goes to the screen only (standard out).maxhybrid
(1): maximum number of hybridizations allowed.net
(starting network) must havemaxhybrid
or fewer reticulations.nruns
(10): number of independent starting points for the searchno3cycle
(true): prevents 3-cycles, which are (almost) not identifiablenohybridladder
(true): prevents hybrid ladder in network. If true, the input network must not have hybrid ladders. Warning: Setting this to true will avoid most hybrid ladders, but some can still occur in some cases when deleting hybrid edges. This might be replaced with an option to avoid all non-tree-child networks in the future.
Optional arguments controlling the search:
seed
(default 0 to get it from the clock): seed to replicate a given searchnreject
(75): maximum number of times that new topologies are proposed and rejected in a row. Lower values ofnreject
result in a less thorough but faster search. Controls when to stop proposing new network topologies.probST
(0.5): probability to usenet
as the starting topology for each given run. If probST < 1, the starting topology is k NNI moves away fromnet
, where k is drawn from a geometric distribution: p (1-p)ᵏ, with success probability p =probST
.maxmoves
(100): maximum number of topology moves before branch lengths, hybrid γ values, evolutionary rates, and rate variation parameters are reestimated.verbose
(true): set to false to turn off screen outputalphamin
(0.02) andalphamax
(50.0): minimum and maximum values for the shape parameter alpha for Gamma-distributed rates across sites.pinvmin
(1e-8) andpinvmax
(0.99): minimum and maximum values for the proportion of invariable sites, if included in the model.
The following optional arguments control when to stop the optimization of branch lengths and gamma values on each individual candidate network. Defaults in parentheses.
ftolRel
(1e-6) andftolAbs
(1e-6): relative and absolute differences of the network score between the current and proposed parametersxtolRel
(1e-5) andxtolAbs
(1e-5): relative and absolute differences between the current and proposed parameters.
Greater values will result in a less thorough but faster search. These parameters are used when evaluating candidate networks only. Regardless of these arguments, once a final topology is chosen, branch lenghts are optimized using stricter tolerances (1e-10, 1e-12, 1e-10, 1e-10) for better estimates.
PhyloNetworks.phyLiNC!
— MethodphyLiNC!(obj::SSM; kwargs...)
Called by phyLiNC
after obj
is created (containing both the data and the model) and after checks are made to start from a network that satisfies all the constraints. Different runs are distributed to different processors, if more than one are available.
PhyloNetworks.phyLiNCone!
— MethodphyLiNCone!(obj::SSM, maxhybrid::Int, no3cycle::Bool,
nohybridladder::Bool, maxmoves::Int, nrejectmax::Int,
verbose::Bool, writelog_1proc::Bool, logfile::IO,
seed::Int, probST::Float64,
constraints::Vector{TopologyConstraint},
ftolRel::Float64, ftolAbs::Float64,
xtolRel::Float64, xtolAbs::Float64,
alphamin::Float64, alphamax::Float64,
pinvmin::Float64, pinvmax::Float64,
γcache::CacheGammaLiNC)
Estimate one phylogenetic network (or tree) from concatenated DNA data, like phyLiNC!
, but doing one run only, and taking as input an StatisticalSubstitutionModel object obj
. The starting network is obj.net
and is assumed to meet all the requirements.
writelog_1proc
is passed by phyLiNC! an indicates if a log should be written. If the number of processors is > 1, this will be false because workers can't write on streams opened by master. logfile
will be stdout if writelog_1proc
is false. Otherwise, it will be the log file created by phyLiNC!
.
See phyLiNC
for other arguments.
PhyloNetworks.posterior_loghybridweight
— Functionposterior_loghybridweight(obj::SSM, hybrid_name, trait = 1)
posterior_loghybridweight(obj::SSM, edge_number, trait = 1)
Log-posterior probability for all trees displaying the minor parent edge of hybrid node named hybrid_name
, or displaying the edge number edge_number
. That is: log of P(hybrid minor parent | trait) if a single trait
is requested, or A[i]= log of P(hybrid minor parent | trait i
) if trait
is a vector or range (e.g. trait = 1:obj.nsites
). These probabilities are conditional on the model parameters in obj
.
Precondition: _loglikcache
updated by discrete_corelikelihood!
examples
julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]); # arbitrary rates
julia> using DataFrames
julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);
julia> fit = fitdiscrete(net, m1, dat); # optimized rates: α=0.27 and β=0.35
julia> plhw = PhyloNetworks.posterior_loghybridweight(fit, "H1");
julia> round(exp(plhw), digits=5) # posterior probability of going through minor hybrid edge
0.08017
julia> hn = net.node[3]; getparentedgeminor(hn).gamma # prior probability
0.1
PhyloNetworks.posterior_logtreeweight
— Functionposterior_logtreeweight(obj::SSM, trait = 1)
Array A of log-posterior probabilities for each tree displayed in the network: such that A[t] = log of P(tree t
| trait trait
) if a single trait
is requested, or A[t,i]= log of P(tree t
| trait i
) if trait
is a vector or range (e.g. trait = 1:obj.nsites
). These probabilities are conditional on the model parameters in obj
.
Displayed trees are listed in the order in which they are stored in the fitted model object obj
.
Precondition: _loglikcache
updated by discrete_corelikelihood!
examples
julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]); # arbitrary rates
julia> using DataFrames
julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);
julia> fit = fitdiscrete(net, m1, dat); # optimized rates: α=0.27 and β=0.35
julia> pltw = PhyloNetworks.posterior_logtreeweight(fit);
julia> round.(exp.(pltw), digits=5) # posterior trees probabilities (sum up to 1)
2-element Vector{Float64}:
0.91983
0.08017
julia> round.(exp.(fit.priorltw), digits=4) # the prior tree probabilities are similar here (tiny data set!)
2-element Vector{Float64}:
0.9
0.1
PhyloNetworks.problem4cycle
— Methodproblem4cycle(β::Node, δ::Node, α::Node, γ::Node)
Check if the focus edge uv has a 4 cycle that could lead to a 3 cycle after an chosen NNI. Return true if there is a problem 4 cycle, false if none.
PhyloNetworks.proposedTop!
— MethodproposedTop!(move,newT,random,count,N,movescount,movesfail,multall)
road map
Function to change the current network newT
by a given move
, and checks that the move was successful (correct attributes). If not successful, newT
is changed back to its original state, except for the case of multiple alleles.
Note that the update of attributes by each move is not done in all the network, but only in the local edges that were changed by the move. This is efficient (and makes a move easy to undo), but makes the code of each move function very clunky.
Arguments:
- move chosen from
whichMove
as described inoptTopLevel
newT
is the topology that will be modified inside with the moverandom=true
: chooses minor hybrid edge with prob 1-h, and major edge with prob h, if false, always chooses minor hybrid edgecount
: simply which likelihood step we are in in the optimization atoptTopLevel
movescount
andmovesfail
: vector of counts of number of moves proposedmultall=true
if multiple alleles case: we need to check if the move did not violate the multiple alleles condition (sister alleles together and no gene flow into the alleles). This is inefficient because we are proposing moves that we can reject later, instead of being smart about the moves we propose: for example, move origin/target could rule out some neighbors that move gene flow into the alleles, the same for add hybridization; nni move can check if it is trying to separate the alleles)
Moves:
addHybridizationUpdate(newT,N)
:
will choose a partition first (to avoid choosing edges that will create a non level-1 network) will choose two edges from this partition randomly, will not allow two edges in a cherry (non-identifiable), or sister edges that are not identifiable (the blacklist was a way to keep track of "bad edges" were we should not waste time trying to put hybridizations, it has never been used nor tested). Also choose gamma from U(0,0.5). The "Update" in the function name means that it creates the new hybrid, and also updates all the attributes of newT
node = chooseHybrid(newT)
choose a hybrid randomly for the next moves:moveOriginUpdateRepeat!(newT,node,random)
will choose randomly the minor/major hybrid edge to move (if random=true
); will get the list of all neighbor edges where to move the origin, will move the origin and update all the attributes and check if the move was successful (not conflicting attributes); if not, will undo the move, and try with a different neighbor until it runs out of neighbors. Return true if the move was successful.
moveTargetUpdateRepeat!(newT,node,random)
same as move origin but moving the target
changeDirectionUpdate!(newT,node,random)
chooses minor/major hybrid edge at random (if `random=true), and changes the direction, and updates all the attributes. Checks if the move was successful (returns true), or undoes the change and returns false.
deleteHybridizationUpdate!(newT,node)
removes the hybrid node, updates the attributes, no need to check any attributes, always successful move
- NNIRepeat!(newT,N)
choose an edge for nni that does not have a neighbor hybrid. It will try to find such an edge N times, and if it fails, it will return false (unsuccessful move). N=10 by default. If N=1, it rarely finds such an edge if the network is small or complex. The function cannot choose an external edge. it will update locally the attributes.
** Important: ** All the moves undo what they did if the move was not successful, so at the end you either have a newT
with a new move and with all good attributes, or the same newT
that started. This is important to avoid having to do deepcopy of the network before doing the move. Also, after each move, when we update the attributes, we do not update the attributes of the whole network, we only update the attributes of the edges that were affected by the move. This saves time, but makes the code quite clunky. Only the case of multiple alleles the moves does not undo what it did, because it finds out that it failed after the function is over, so just need to treat this case special.
PhyloNetworks.quartetdata_columnnames
— Methodquartetdata_columnnames(T) where T <: StaticArray
Vector of column names to hold the quartet data of type T
in a data frame. If T is a length-3 vector type, they are "CF1234","CF1324","CF1423". If T is a length-4 vector type, the 4th name is "ngenes". If T is a 3×n matrix type, the output vector contains 3×n names, 3 for each of "CF", "V2", "V3", ... "Vn".
Used by writeTableCF
to build a data frame from a vector of QuartetT
objects.
PhyloNetworks.quartetrank
— Methodquartetrank(t1,t2,t3,t4, nCk::Matrix)
quartetrank([t1,t2,t3,t4], nCk)
Return the rank of a four-taxon set with taxon numbers t1,t2,t3,t4
, assuming that ti
s are positive integers such that t1<t2, t2<t3 and t3<t4 (assumptions not checked!). nCk
should be a matrix of "n choose k" binomial coefficients: see nchoose1234
.
examples
julia> nCk = PhyloNetworks.nchoose1234(5)
6×4 Matrix{Int64}:
0 0 0 0
1 0 0 0
2 1 0 0
3 3 1 0
4 6 4 1
5 10 10 5
julia> PhyloNetworks.quartetrank([1,2,3,4], nCk)
1
julia> PhyloNetworks.quartetrank([3,4,5,6], nCk)
15
PhyloNetworks.readCSVtoArray
— MethodreadCSVtoArray(dat::DataFrame)
readCSVtoArray(filename::String)
Read a CSV table containing both species names and data, create two separate arrays: one for the species names, a second for the data, in a format that parsimonyGF
needs.
Warning:
- it will try to find a column 'taxon' or 'species' for the taxon names. If none found, it will assume the taxon names are in column 1.
- will use all other columns as characters
PhyloNetworks.readFastaToArray
— FunctionreadFastaToArray(filename::AbstractString, sequencetype=BioSequences.LongDNA{4})
Read a fasta-formatted file. Return a tuple species, sequences
where species
is a vector of Strings with identifier names, and sequences
is a vector of BioSequences, each of type sequencetype
(DNA by default).
Warnings:
- assumes a semi-sequential format, not interleaved. More specifically, each taxon name should appear only once. For this one time, the corresponding sequence may be broken across several lines though.
- fails if all sequences aren't of the same length
PhyloNetworks.readInputData
— MethodreadInputData(trees, quartetfile, whichQuartets, numQuartets, writetable, tablename, writeQfile, writesummary)
readInputData(trees, whichQuartets, numQuartets, taxonlist, writetable, tablename, writeQfile, writesummary)
Read gene trees and calculate the observed quartet concordance factors (CF), that is, the proportion of genes (and the number of genes) that display each quartet for a given list of four-taxon sets.
Input:
trees
: name of a file containing a list of input gene trees, or vector of trees (HybridNetwork
objects)
Optional arguments (defaults):
quartetfile
: name of a file containing a list of quartets, or more precisely, a list of four-taxon setswhichQuartets
(:all
): which quartets to sample.:all
for all of them,:rand
for a random sample.numQuartets
: number of quartets in the sample. default: total number of quartets ifwhichQuartets=:all
and 10% of total ifwhichQuartets=:rand
taxonlist
(all in the input gene trees): Iftaxonlist
is used,whichQuartets
will consist of all sets of 4 taxa in thetaxonlist
.writetable
(true): write the table of observed CF?tablename
("tableCF.txt"): ifwritetable
is true, the table of observed CFs is write to filetablename
writeQfile
(false): write intermediate file with sampled quartets?writesummary
(true): write a summary file? if so, the summary will go in file "summaryTreesQuartets.txt".
Uses calculateObsCFAll!
, which implements a slow algorithm.
See also: countquartetsintrees
, which uses a much faster algorithm; readTrees2CF
, which is basically a re-naming of readInputData
, and readTableCF
to read a table of quartet CFs directly.
PhyloNetworks.readSubtree!
— MethodreadSubtree!(s::IO, parentNode, numLeft, net, hybrids)
Recursive helper method for readTopology
: read a subtree from an extended Newick topology. input s
: IOStream/IOBuffer.
Reads additional info formatted as: :length:bootstrap:gamma
. Allows for name of internal nodes without # after closing parenthesis: (1,2)A. Warning if hybrid edge without γ, or if γ (ignored) without hybrid edge
PhyloNetworks.readnexus_assigngammas!
— Methodreadnexus_assigngammas!(net, d::Dict)
Assign d[i] as the .gamma
value of the minor parent edge of hybrid "Hi", if this hybrid node name is found, and if its minor parent doesn't already have a non-missing γ. See readnexus_extractgamma
PhyloNetworks.readnexus_extractgamma
— Methodreadnexus_extractgamma(nexus_string)
Extract γ from comments and return a dictionary hybrid number ID => γ, from one single phylogeny given as a string. The output from BEAST2 uses this format for reticulations at minor edges, as output by bacter (Vaughan et al. 2017):
#11[&conv=0, relSize=0.08, ...
or as output by SpeciesNetwork (Zhang et al. 2018):
#H11[&gamma=0.08]
The function below assumes that the "H" was already added back if not present already (from bacter), like this:
#H11[&conv=0, relSize=0.19, ...
The bacter format is tried first. If this format doesn't give any match, then the SpeciesNetwork format is tried next. See readnexus_assigngammas!
.
PhyloNetworks.readnexus_translatetable
— Methodreadnexus_translatetable(io)
Read translate table from IO object io
, whose first non-empty line should contain "translate". Then each line should have "number name" and the end of the table is indicated by a ;. Output tuple:
- line that was last read, and is not part of the translate table, taken from
io
- translate: boolean, whether a table was successfully read
- id2name: dictionary mapping number to name.
PhyloNetworks.readnexuscomment
— Methodreadnexuscomment(s::IO, c::Char)
Read (and do nothing with) nexus-style comments: [& ... ]
Assumption: 'c' is the next character to be read from s. Output: nothing.
Comments can appear after (or instead of) a node or leaf name, before or after an edge length, and after another comment.
PhyloNetworks.readnodename
— Methodreadnodename(s::IO, c::Char, net, numLeft)
Auxiliary function to read a taxon name during newick parsing. output: tuple (number, name, pound_boolean)
Names may have numbers: numbers are treated as strings. Accepts #
as part of the name (but excludes it from the name), in which case pound_boolean
is true. #
is used in extended newick to flag hybrid nodes.
Nexus-style comments following the node name, if any, are read and ignored.
PhyloNetworks.recursionPostOrder
— MethodrecursionPostOrder(net::HybridNetwork, checkPreorder::Bool,
init_function, tip_function, node_function,
indexation="b", parameters...)
recursionPostOrder(nodes, init_function, tip_function, node_function,
parameters)
updatePostOrder!(index, nodes, updated_matrix, tip_function, node_function,
parameters)
Generic tool to apply a post-order (or topological ordering) algorithm, acting on a matrix where rows & columns correspond to nodes. Used by descendenceMatrix
.
PhyloNetworks.recursionPreOrder!
— FunctionrecursionPreOrder(nodes, init_function, root_function, tree_node_function,
hybrid_node_function, parameters)
recursionPreOrder!(nodes, AbstractArray, root_function, tree_node_function,
hybrid_node_function, parameters)
updatePreOrder(index, nodes, updated_matrix, root_function, tree_node_function,
hybrid_node_function, parameters)
Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedPathMatrix
and by pairwiseTaxonDistanceMatrix
.
PhyloNetworks.recursionPreOrder
— MethodrecursionPreOrder(nodes, init_function, root_function, tree_node_function,
hybrid_node_function, parameters)
recursionPreOrder!(nodes, AbstractArray, root_function, tree_node_function,
hybrid_node_function, parameters)
updatePreOrder(index, nodes, updated_matrix, root_function, tree_node_function,
hybrid_node_function, parameters)
Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedPathMatrix
and by pairwiseTaxonDistanceMatrix
.
PhyloNetworks.removeHybrid!
— MethodremoveHybrid!(net::Network, n::Node)
Delete a hybrid node n
from net.hybrid
, and update net.numHybrid
. The actual node n
is not deleted. It is kept in the full list net.node
.
PhyloNetworks.resetEdgeNumbers!
— FunctionresetEdgeNumbers!(net::HybridNetwork, verbose=true)
Check that edge numbers of net
are consecutive numbers from 1 to the total number of edges. If not, reset the edge numbers to be so.
PhyloNetworks.resetNodeNumbers!
— MethodresetNodeNumbers!(net::HybridNetwork; checkPreorder=true, type=:ape)
Change internal node numbers of net
to consecutive numbers from 1 to the total number of nodes.
keyword arguments:
type
: default is:ape
, to get numbers that satisfy the conditions assumed by theape
R package: leaves are 1 to n, the root is n+1, and internal nodes are higher consecutive integers. If:postorder
, nodes are numbered in post-order, with leaves from 1 to n (and the root last). If:internalonly
, leaves are unchanged. Only internal nodes are modified, to take consecutive numbers from (max leaf number)+1 and up. With this last option, the post-ordering of nodes is by-passed.checkPreorder
: if false, theisChild1
edge field and thenet.nodes_changed
network field are supposed to be correct (to get nodes in preorder). This is not needed whentype=:internalonly
.
Examples
julia> net = readTopology("(A,(B,(C,D)));");
julia> PhyloNetworks.resetNodeNumbers!(net)
julia> printNodes(net) # first column "node": root is 5
node leaf hybrid hasHybEdge name inCycle edges'numbers
1 true false false A -1 1
2 true false false B -1 2
3 true false false C -1 3
4 true false false D -1 4
7 false false false -1 3 4 5
6 false false false -1 2 5 6
5 false false false -1 1 6
julia> net = readTopology("(A,(B,(C,D)));");
julia> PhyloNetworks.resetNodeNumbers!(net; type=:postorder)
julia> printNodes(net) # first column "node": root is 7
node leaf hybrid hasHybEdge name inCycle edges'numbers
1 true false false A -1 1
2 true false false B -1 2
3 true false false C -1 3
4 true false false D -1 4
5 false false false -1 3 4 5
6 false false false -1 2 5 6
7 false false false -1 1 6
PhyloNetworks.rezip_canonical!
— Methodrezip_canonical!(hybridnodes::Vector{Node}, childedges::Vector{Edge},
originallengths::Vector{Float64})
Undo unzip_canonical!
.
PhyloNetworks.sameTaxa
— MethodsameTaxa(Quartet, HybridNetwork)
Return true
if all taxa in the quartet are represented in the network, false
if one or more taxa in the quartet does not appear in the network.
warning: the name can cause confusion. A more appropriate name might be "in", or "taxain", or "taxonsubset", or etc.
PhyloNetworks.sampleBootstrapTrees
— MethodsampleBootstrapTrees(vector of tree lists; seed=0::Integer, generesampling=false, row=0)
sampleBootstrapTrees!(tree list, vector of tree lists; seed=0::Integer, generesampling=false, row=0)
Sample bootstrap gene trees, 1 tree per gene. Set the seed with keyword argument seed
, which is 0 by default. When seed=0
, the actual seed is set using the clock. Assumes a vector of vectors of networks (see readBootstrapTrees
), each one of length 1 or more (error if one vector is empty, tested in bootsnaq
).
- site resampling: always, from sampling one bootstrap tree from each given list. This tree is sampled at random unless
row>0
(see below). - gene resampling: if
generesampling=true
(default is false), genes (i.e. lists) are sampled with replacement. row=i
: samples the ith bootstrap tree for each gene.row
is turned back to 0 if gene resampling is true.
output: one vector of trees. the modifying function (!) modifies the input tree list and returns it.
PhyloNetworks.sampleCFfromCI
— FunctionsampleCFfromCI(data frame, seed=0)
sampleCFfromCI!(data frame, seed=0)
Read a data frame containing CFs and their credibility intervals, and sample new obsCF uniformly within the CIs. These CFs are then rescaled to sum up to 1 for each 4-taxon sets. Return a data frame with taxon labels in first 4 columns, sampled obs CFs in columns 5-7 and credibility intervals in columns 8-13.
- The non-modifying function creates a new data frame (with re-ordered columns) and returns it. If
seed=-1
, the new df is a deep copy of the input df, with no call to the random number generator. Otherwise,seed
is passed to the modifying function. - The modifying function overwrites the input data frame with the sampled CFs and returns it. If
seed=0
, the random generator is seeded from the clock. Otherwise the random generator is seeded usingseed
.
Warning: the modifying version does not check the data frame: assumes correct columns.
optional argument: delim=','
by default: how columns are delimited.
PhyloNetworks.setBLGammaParsimony!
— MethodsetBLGammaParsimony!(net::HybridNetwork)
Maximum parsimony function does not provide estimates for branch lengths, or gamma. But since the maxParsimonyNet
function is using snaq move functions, branch lengths and gamma values are set randomly (to initialize optimization). We need to remove these random values before returning the maximum parsimony network.
PhyloNetworks.setBranchLength!
— MethodsetBranchLength!(Edge, newlength)
Set the length of an Edge object. The new length needs to be non-negative, or -1.0 to be interpreted as missing. edge.y
and edge.z
are updated accordingly.
PhyloNetworks.setGammaBLfromGammaz!
— MethodsetGammaBLfromGammaz!(node, network)
Update the γ values of the two sister hybrid edges in a bad diamond I, given the gammaz
values of their parent nodes, and update the branch lengths t1 and t2 of their parent edges (those across from the hybrid nodes), in such a way that t1=t2 and that these branch lengths and γ values are consistent with the gammaz
values in the network.
Similar to the first section of undoGammaz!
, but does not update anything else than γ and t's. Unlike undoGammaz!
, no error if non-hybrid node
or not at bad diamond I.
PhyloNetworks.setGammas!
— MethodsetGammas!(net, γ vector)
Set inheritance γ's of hybrid edges, using input vector for major edges. Assume pre-order calculated already, with up-to-date field nodes_changed
. See getGammas
.
Warning: very different from setGamma!
, which focuses on a single hybrid event, updates the field isMajor
according to the new γ, and is not used here.
Assumption: each hybrid node has only 2 parents, a major and a minor parent (according to the edges' field isMajor
).
PhyloNetworks.setNonIdBL!
— MethodsetNonIdBL!(net)
Set non-identifiable edge branch lengths to -1.0 (i.e. missing) for a level-1 network net
, except for edges in
- a good triangle: the edge below the hybrid is constrained to 0.
- a bad diamond II: the edge below the hybrid is constrained to 0
- a bad diamond I: the edges across from the hybrid node have non identifiable lengths but are kept, because the two γ*(1-exp(-t)) values are identifiable.
will break if inCycle
attributes are not initialized (at -1) or giving a correct node number.
see Node
for the meaning of boolean attributes isBadTriangle
(which corresponds to a "good" triangle above), isBadDiamondI
and isBadDiamondII
.
PhyloNetworks.setalpha!
— Methodsetalpha!(obj, alpha)
Set the shape parameter alpha
in a RateVariationAcrossSites model obj
, and update the rate multipliers accordingly. Return the modified object.
PhyloNetworks.seteigeninfo!
— Methodfor a [BinaryTraitSubstitutionModel
]: store eigenvalue (q01+q10) and stationary distribution
PhyloNetworks.seteigeninfo!
— Methodfor a [EqualRatesSubstitutionModel
]: store lambda = k/(k-1), where k is the number of states
PhyloNetworks.seteigeninfo!
— Methodfor HKY85
: store piR, piY, the 2 non-zero eigenvalues and a scaling factor
PhyloNetworks.seteigeninfo!
— Methodfor JC69
: store lambda = 4/3 (if relative) or rate * 4/3 (absolute).
PhyloNetworks.seteigeninfo!
— Methodseteigeninfo!(obj)
Calculate eigenvalue & eigenfector information for a substitution model (SM) object (as needed to calculate transition rate matrices) and store this info within the object.
PhyloNetworks.setlengths!
— Methodsetlengths!(edges::Vector{Edge}, lengths::Vector{Float64})
Assign new lengths to a vector of edges
.
PhyloNetworks.setparameters!
— Methodsetparameters!(obj::RateVariationAcrossSites, par::AbstractVector)
Set the values of the alpha and/or pinv parameters of model obj
. See also setalpha!
, setpinv!
and setpinvalpha!
PhyloNetworks.setpinv!
— Methodsetpinv!(obj, pinv)
Set the proportion of invariable sites pinv
in a RateVariationAcrossSites model obj
, and update the rate multipliers & weights accordingly. For RVASInvGamma
objects, the original rate multipliers are assumed correct, according to the original pinv
value. Return the modified object.
PhyloNetworks.setpinvalpha!
— Methodsetpinvalpha!(obj, pinv, alpha)
Set the proportion of invariable sites pinv
and the alpha
parameter for the discretized gamma distribution in a model obj
of type RVASGammaInv{S}
. Update the rate multipliers & weights accordingly. The mean of the distribution is constrained to 1.
Return the modified object.
PhyloNetworks.setrates!
— Methodsetrates!(model, rates)
update rates then call seteigeninfo!
to update a model's eigeninfo
PhyloNetworks.showQ
— MethodshowQ(IO, model)
Print the Q matrix to the screen, with trait states as labels on rows and columns. adapted from prettyprint function by mcreel, found 2017/10 at https://discourse.julialang.org/t/display-of-arrays-with-row-and-column-names/1961/6
PhyloNetworks.showdata
— Functionshowdata(io::IO, obj::SSM, fullsiteinfo=false::Bool)
Return information about the data in an SSM object: number of species, number or traits or sites, number of distinct patterns, and more information if fullsiteinfo
is true: number sites with missing data only, number of invariant sites, number of sites with 2 distinct states, number of parsimony-informative sites (with 2+ states being observed in 2+ tips), number of sites with some missing data, and overall proportion of entries with missing data.
Note: Missing is not considered an additional state. For example, if a site contains some missing data, but all non-missing values take the same state, then this site is counted in the category "invariant".
PhyloNetworks.shrink2cycleat!
— Methodshrink2cycleat!(net::HybridNetwork, minor::Edge, major::Edge, unroot::Bool)
Remove minor
edge then update the branch length of the remaining major
edge. Called by shrink2cycles!
Assumption: minor
and major
do form a 2-cycle. That is, they start and end at the same node.
PhyloNetworks.shrink3cycleat!
— Methodshrink3cycleat!(net::HybridNetwork, hybrid::Node, edge1::Edge, edge2::Edge,
node1::Node, node2::Node, unroot::Bool)
Replace a 3-cycle at a given hybrid
node by a single node, if any. Assumption: edge1
(node1
) and edge2
(node2
) are the parent edges (nodes) of hybrid
. Return true if a 3-cycle is found and removed, false otherwise. There is a 3-cycle if nodes 1 & 2 are connected, by an edge called e3
below.
There are two cases, with differing effects on the γ inheritance values and branch lengths.
Hybrid case: the 3-cycle is shrunk to a hybrid node, which occurs if either node 1 or 2 is a hybrid node (that is, e3 is hybrid). If e3 goes from node 1 to node 2, the 3-cycle (left) is shrunk as on the right:
\eA /eB \eA /eB
1--e3->2 γ1+γ2γ3 \ / γ2(1-γ3)
\ / hybrid
γ1\ /γ2
hybrid
with new branch lengths: new tA = tA + (γ1.t1 + γ2γ3.(t2+t3))/(γ1+γ2γ3), new tB = tB + t2, provided that γ1, γ2=1-γ1, and γ3 are not missing. If one of them is missing then γ1 and γ2 remain as is, and e3 is deleted naively, such that new tA = tA + t1 and new tB = tB + t2. If γ's are not missing but one of t1,t2,t3 is missing, then the γ's are updated to γ1+γ2γ3 and γ2(1-γ3), but t's are update naively.
Tree case: the 3-cycle is shrunk to a tree node, which occurs if node 1 & 2 are both tree nodes (that is, e3 is a tree edge). If eC is the child edge of hybrid
, the 3-cycle (left) is shrunk as on the right:
\eA \eA
1--e3--2--eB-- \
\ / n--eB--
γ1\ /γ2 |
hybrid |eC
|
|eC
with new branch lengths: new tA = tA + γ2.t3, new tB = tB + γ1.t3, new tC = tC + γ1.t1 + γ2.t2, provided that γ1, γ2=1-γ1, t1, t2 and t3 are not missing. If one is missing, then e1 is deleted naively such that tB is unchanged, new tC = tC + t2 and new tA = tA + t3.
PhyloNetworks.shrinkedge!
— Methodshrinkedge!(net::HybridNetwork, edge::Edge)
Delete edge
from net, provided that it is a non-external tree edge. Specifically: delete its child node (as determined by isChild1
) and connect all edges formerly incident to this child node to the parent node of edge
, thus creating a new polytomy, unless the child was of degree 2.
Warning: it's best for isChild1
to be in sync with the root for this. If not, the shrinking may fail (if edge
is a tree edge but its "child" is a hybrid) or the root may change arbitrarily (if the child of edge
is the root).
Output: true if the remaining node (parent of edge
) becomes a hybrid node with more than 1 child after the shrinking; false otherwise (e.g. no polytomy was created, or the new polytomy is below a tree node)
PhyloNetworks.sort_stringasinteger!
— Methodsort_stringasinteger!(taxa)
Sort a vector of strings taxa
, numerically if elements can be parsed as an integer, alphabetically otherwise.
PhyloNetworks.startingBL!
— FunctionstartingBL!(net::HybridNetwork,
trait::AbstractVector{Vector{Union{Missings.Missing,Int}}},
siteweight=ones(length(trait[1]))::AbstractVector{Float64})
Calibrate branch lengths in net
by minimizing the mean squared error between the JC-adjusted pairwise distance between taxa, and network-predicted pairwise distances, using calibrateFromPairwiseDistances!
. siteweight[k]
gives the weight of site (or site pattern) k
(default: all 1s). Note: the network is not "unzipped". PhyLiNC unzips reticulations later.
Assumptions:
- all species have the same number of traits (sites):
length(trait[i])
constant trait[i]
is for leaf withnode.number = i
innet
, andtrait[i][j] = k
means that leaf numberi
has state indexk
for traitj
. These indices are those used in a substitution model: kth value ofgetlabels(model)
.- Hamming distances are < 0.75 with four states, or < (n-1)/n for n states. If not, all pairwise hamming distances are scaled by
.75/(m*1.01)
wherem
is the maximum observed hamming distance, to make them all < 0.75.
PhyloNetworks.startingrate
— Methodstartingrate(net)
Estimate an evolutionary rate appropriate for the branch lengths in the network, which should be a good starting value before optimization in fitdiscrete
, assuming approximately 1 change across the entire tree. If all edge lengths are missing, set starting rate to 1/(number of taxa).
PhyloNetworks.startree_newick
— Functionstartree_newick(n, l)
String for the Newick parenthetical description of the star tree with n
tips, and all branch lengths equal to l
.
PhyloNetworks.symmetricnet_newick
— Functionsymmetricnet_newick(n::Int, h::Int, γ::Real)
Create a string for a symmetric network with 2^n tips, numbered from 1 to 2^n, with a symmetric major tree, whose branch lengths are all equal. 2^(n-h) hybrids are added from level h to h-1 "symmetrically". The network is time-consistent and ultrametric, with a total height of 1.
PhyloNetworks.symmetricnet_newick
— Methodsymmetricnet_newick(n::Int, i::Int, j::Int, γ::Real, l::Real)
Newick string for a network with a symmetric major tree with 2^n tips, numbered from 1 to 2^n. All the branch lengths of the major tree are set to l
. One hybrid branch, going from level i to level j is added, cutting in half each initial edge in the tree. The new edge has length l
and inheritance γ
.
PhyloNetworks.symmetrictree_newick
— Functionsymmetrictree_newick(n::Int, ell::Real, i=1)
String for the Newick parenthetical description of a symmetric tree with 2^n tips, numbered from i to i+2^n-1. All branch lengths are set equal to ell
. The tree can be created later by reading the string with readTopology
.
PhyloNetworks.synchronizePartnersData!
— MethodsynchronizePartnersData!(e::Edge, n::Node)
Synchronize γ and isMajor for edges e
and its partner, both hybrid edges with the same child n
:
- if one γ is missing and the other is not: set the missing γ to 1 - the other
- γ's should sum up to 1.0
- update
isMajor
to match the γ information: the major edge is the one with γ > 0.5.
Warnings: does not check that e
is a hybrid edge, nor that n
is the child of e
.
PhyloNetworks.taxadiff
— Methodtaxadiff(Vector{Quartet}, network; multiplealleles=true)
taxadiff(DataCF, network; multiplealleles=true)
Return 2 vectors:
- taxa in at least 1 of the quartets but not in the network, and
- taxa in the network but in none of the quartets.
When multiplealleles
is true, the taxon names that end with "__2" are ignored in the quartets: they are not expected to appear in the networks that users give as input, or get as output.
PhyloNetworks.traitlabels2indices
— Methodtraitlabels2indices(data, model::SubstitutionModel)
Check that the character states in data
are compatible with (i.e. subset of) the trait labels in model
. All columns are used. data
can be a DataFrame or a Matrix (multiple traits), or a Vector (one trait). Return a vector of vectors (one per species) with integer entries, where each state (label) is replaced by its index in model
. For DNA data, any ambiguous site is treated as missing.
PhyloNetworks.traverseContainRoot!
— MethodupdateContainRoot!(HybridNetwork, Node)
traverseContainRoot!(Node, Edge, edges_changed::Array{Edge,1}, rightDir::Vector{Bool})
The input node
to updateContainRoot!
must be a hybrid node (can come from searchHybridNode). updateContainRoot!
starts at the input node and calls traverseContainRoot!
, which traverses the network recursively. By default, containRoot attributes of edges are true. Changes containRoot
to false for all the visited edges: those below the input node, but not beyond any other hybrid node.
updateContainRoot!
Returns a flag
and an array of edges whose containRoot has been changed from true to false. flag
is false if the set of edges to place the root is empty
In traverseContainRoot!
, rightDir
turns false if hybridizations have incompatible directions (vector of length 1, to be modified).
Warning:
- does not update
containRoot
of minor hybrid edges. - assumes correct
isMajor
attributes: to stop the recursion at minor hybrid edges. - assumes correct hybrid attributes of both nodes & edges: to check if various hybridizations have compatible directions. For each hybrid node that is encountered, checks if it was reached via a hybrid edge (ok) or tree edge (not ok).
rightDir
: vector of length 1 boolean, to be mutable and modified by the function
PhyloNetworks.undoGammaz!
— MethodundoGammaz!(node, network)
Undo updateGammaz!
for the 2 cases: bad diamond I,II. node
should be a hybrid node. Set length to edges that were not identifiable and change edges' gammaz
attribute to -1.0. Recalculate branch lengths in terms of gammaz
. warning: needs to know incycle
attributes
PhyloNetworks.unzip_canonical!
— Methodunzip_canonical!(net::HybridNetwork)
Unzip all reticulations: set the length of child edge to 0, and increase the length of both parent edges by the original child edge's length, to obtain the canonical version of the network according to Pardi & Scornavacca (2015).
Output: vector of hybrid node in postorder, vector of child edges whose length is constrained to be 0, and vector of their original branch lengths to re-zip if needed using rezip_canonical!
.
Assumption: net.hybrid
is correct, but a preordering of all nodes is not assumed.
Note: This unzipping is not as straightforward as it might seem, because of "nested" zippers: when the child of a hybrid node is itself a hybrid node. The unzipping is propagated all the way through.
PhyloNetworks.unzipat_canonical!
— Methodunzipat_canonical!(hyb::Node, childedge::Edge)
Unzip the reticulation a node hyb
. See unzip_canonical!
. Warning: no check that hyb
has a single child.
Output: constrained edge (child of hyb
) and its original length.
PhyloNetworks.updateBL!
— MethodupdateBL!(net::HybridNetwork, d::DataCF)
Update internal branch lengths of net
based on the average quartet concordance factor (CF) across all quartets that exactly correspond to a given branch: new branch length = -log(3/2(1-mean(CF observed in d)))
. net
is assumed to be a tree, such that the above equation holds.
PhyloNetworks.updateContainRoot!
— FunctionupdateContainRoot!(HybridNetwork, Node)
traverseContainRoot!(Node, Edge, edges_changed::Array{Edge,1}, rightDir::Vector{Bool})
The input node
to updateContainRoot!
must be a hybrid node (can come from searchHybridNode). updateContainRoot!
starts at the input node and calls traverseContainRoot!
, which traverses the network recursively. By default, containRoot attributes of edges are true. Changes containRoot
to false for all the visited edges: those below the input node, but not beyond any other hybrid node.
updateContainRoot!
Returns a flag
and an array of edges whose containRoot has been changed from true to false. flag
is false if the set of edges to place the root is empty
In traverseContainRoot!
, rightDir
turns false if hybridizations have incompatible directions (vector of length 1, to be modified).
Warning:
- does not update
containRoot
of minor hybrid edges. - assumes correct
isMajor
attributes: to stop the recursion at minor hybrid edges. - assumes correct hybrid attributes of both nodes & edges: to check if various hybridizations have compatible directions. For each hybrid node that is encountered, checks if it was reached via a hybrid edge (ok) or tree edge (not ok).
rightDir
: vector of length 1 boolean, to be mutable and modified by the function
PhyloNetworks.updatePostOrder!
— FunctionrecursionPostOrder(net::HybridNetwork, checkPreorder::Bool,
init_function, tip_function, node_function,
indexation="b", parameters...)
recursionPostOrder(nodes, init_function, tip_function, node_function,
parameters)
updatePostOrder!(index, nodes, updated_matrix, tip_function, node_function,
parameters)
Generic tool to apply a post-order (or topological ordering) algorithm, acting on a matrix where rows & columns correspond to nodes. Used by descendenceMatrix
.
PhyloNetworks.updatePreOrder!
— FunctionrecursionPreOrder(nodes, init_function, root_function, tree_node_function,
hybrid_node_function, parameters)
recursionPreOrder!(nodes, AbstractArray, root_function, tree_node_function,
hybrid_node_function, parameters)
updatePreOrder(index, nodes, updated_matrix, root_function, tree_node_function,
hybrid_node_function, parameters)
Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedPathMatrix
and by pairwiseTaxonDistanceMatrix
.
PhyloNetworks.updateSSM!
— FunctionupdateSSM!(obj::SSM, renumber=false::Bool;
constraints=TopologyConstraint[]::Vector{TopologyConstraint})
After adding or removing a hybrid, displayed trees will change. Updates the displayed tree list. Return SSM object.
if renumber
, reorder edge and internal node numbers. Only need to renumber after deleting a hybrid (which could remove edges and nodes from the middle of the edge and node lists).
Assumptions:
- The SSM object has cache arrays of size large enough, that is, the constructor
StatisticalSubstitutionModel
was previously called with maxhybrid equal or greater than inobj.net
.obj.priorltw
is not part of the "cache" arrays.
Warning: Does not update the likelihood.
julia> net = readTopology("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> fastafile = abspath(joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "simple.aln"));
julia> obj = PhyloNetworks.StatisticalSubstitutionModel(net, fastafile, :JC69; maxhybrid=3);
julia> PhyloNetworks.checknetwork_LiNC!(obj.net, 3, true, true);
julia> using Random; Random.seed!(432);
julia> PhyloNetworks.addhybridedge!(obj.net, obj.net.edge[8], obj.net.edge[1], true, 0.0, 0.4);
julia> writeTopology(obj.net)
"(((B:1.0)#H1:0.1::0.9,(A:1.0)#H2:1.0::0.6):1.5,(C:0.6,#H1:1.0::0.1):1.0,(D:1.25,#H2:0.0::0.4):1.25);"
julia> length(obj.displayedtree) # still as if 1 single reticulation
2
julia> PhyloNetworks.updateSSM!(obj);
julia> length(obj.displayedtree) # now correct 4 displayed treess: 2 reticulations
4
PhyloNetworks.updateSSM_root!
— MethodupdateSSM_root!(obj::SSM)
Update root and direction of edges in displayed trees to match the root in the network.
PhyloNetworks.update_logtrans
— Methodupdate_logtrans(obj::SSM, edge::Edge)
Update the log-transition probabilities associates to one particular edge
in the network.
PhyloNetworks.update_logtrans
— Methodupdate_logtrans(obj::SSM)
Initialize and update obj.logtrans
, the log transition probabilities along each edge in the full network. They are re-used for each displayed tree, which is why edges are not fused around degree-2 nodes when extracting displayed trees.
PhyloNetworks.updatecache_edge!
— Methodupdatecache_edge!(lcache::CacheLengthLiNC, obj::SSM, focusedge)
Update fields in lcache
, to correspond to the forward likelihood at the child node of the focus edge, backwards (at parent node) x directional (at sister edges) likelihoods, and keep track of which displayed trees do have the focus edge. These don't change if the length of the focus edge is modified. These quantities are then rescaled on the log scale (to get a max of 0) and exponentiated to get back to the probability scale.
Assumptions: the following fields of obj
are up-to-date:
- displayed trees, with nodes preordered
- prior log tree weights in
.priorltw
- log transition probabilities in
.logtrans
Output:
- missing, if the edge length does not affect the likelihood,
- constant used to rescale each site on the log scale, otherwise.
PhyloNetworks.updateconstraintfields!
— Methodupdateconstraintfields!(constraints::Vector{TopologyConstraint}, net::HybridNetwork)
Update fields stem edge
and crown node
to match the given net
.
Assumes that the constraints are still met in net
, and that nodes & edges are numbered identically in net
as in the network used to create all constraints
.
fixit: remove the assumption that constraints are still met, since an NNI near the crown of a constrained clade might change the crown node and / or the stem edge (u and v exchange).
PhyloNetworks.updateconstraints!
— Methodupdateconstraints!(constraints::Vector{TopologyConstraint}, net::HybridNetwork)
Update the set taxonnum
in each constraint, assuming that the stem edge and the crown node are still correct, and that their descendants are still correct. May be needed if the node and edge numbers were modified by resetNodeNumbers!
or resetEdgeNumbers!
.
Warning: does not check that the names of leaves with numbers in taxonnum
are taxonnames
.
julia> net = readTopology("(((2a,2b),(((((1a,1b,1c),4),(5)#H1),(#H1,(6,7))))#H2),(#H2,10));");
julia> c_species1 = PhyloNetworks.TopologyConstraint(0x01, ["1a","1b","1c"], net)
Species constraint, on tips: 1a, 1b, 1c
stem edge number 7
crown node number -9
julia> c_species1.taxonnums
Set{Int64} with 3 elements:
5
4
3
julia> c_clade145 = PhyloNetworks.TopologyConstraint(0x02, ["1a","1b","1c","4","5"], net)
Clade constraint, on tips: 1a, 1b, 1c, 4, 5
stem edge number 12
crown node number -7
julia> PhyloNetworks.resetNodeNumbers!(net)
julia> net.node[4].number = 111;
julia> PhyloNetworks.updateconstraints!([c_species1, c_clade145], net)
julia> c_species1
Species constraint, on tips: 1a, 1b, 1c
stem edge number 7
crown node number 21
julia> c_species1.taxonnums
Set{Int64} with 3 elements:
5
4
111
PhyloNetworks.writeTopologyLevel1
— MethodwriteTopologyLevel1(net::HybridNetwork)
Write the extended Newick parenthetical format of a level-1 network object with many optional arguments (see below). Makes a deep copy of net: does not modify net
.
- di=true: write in format for Dendroscope (default false)
- namelabel=true: If
namelabel
is true, taxa are labelled by their names;
otherwise taxa are labelled by their numbers (unique identifiers).
- outgroup (string): name of outgroup to root the tree/network. if "none" is given, the root is placed wherever possible.
- printID=true, only print branch lengths for identifiable egdes according to the snaq estimation procedure (default false) (true inside of
snaq!
.) - round: rounds branch lengths and heritabilities γ (default: true)
- digits: digits after the decimal place for rounding (defult: 3)
- string: if true (default), returns a string, otherwise returns an IOBuffer object.
- multall: (default false). set to true when there are multiple alleles per population.
The topology may be written using a root different than net.root, if net.root is incompatible with one of more hybrid node. Missing hybrid names are written as "#Hi" where "i" is the hybrid node number if possible.
StatsAPI.coeftable
— Methodcoeftable(m::PhyloNetworkLinearModel; level::Real=0.95)
Return coefficient estimates, standard errors, t-values, p-values, and t-intervals as a StatsBase.CoefTable
.
StatsAPI.confint
— Methodconfint(m::PhyloNetworkLinearModel; level::Real=0.95)
Return confidence intervals for coefficients, with confidence level level
, based on the t-distribution whose degree of freedom is determined by the number of species (as returned by dof_residual
)
StatsAPI.deviance
— FunctionStatsBase.deviance(m::PhyloNetworkLinearModel)
-2 loglikelihood of the fitted model. See also loglikelihood
.
Note: this is not the residual-sum-of-squares deviance as output by GLM, such as one would get with deviance(m.model)
.
StatsAPI.deviance
— MethodStatsBase.deviance(m::PhyloNetworkLinearModel, Val(true))
Residual sum of squares with metric V, the estimated phylogenetic covariance, if the model is appropriate.
StatsAPI.fit
— Methodfit(StatisticalSubstitutionModel, net, model, traits; kwargs...)
fit!(StatisticalSubstitutionModel; kwargs...)
Internal function called by fitdiscrete
: with same key word arguments kwargs
. But dangerous: traits
should be a vector of vectors as for fitdiscrete
but here traits
need to contain the indices of trait values corresponding to the indices in getlabels(model)
, and species should appear in traits
in the order corresponding to the node numbers in net
. See traitlabels2indices
to convert trait labels to trait indices.
Warning: does not perform checks. fitdiscrete
calls this function after doing checks, preordering nodes in the network, making sure nodes have consecutive numbers, species are matched between data and network etc.
StatsAPI.loglikelihood
— Methodloglikelihood(m::PhyloNetworkLinearModel)
Log likelihood, or log restricted likelihood (REML), depending on m.reml
.
For models with no within-species variation, the likelihood (or REML) is calculated based on the joint density for species-level mean responses.
For within-species variation models, the likelihood is calculated based on the joint density for individual-level responses. This can be calculated from individual-level data, but also by providing species-level means and standard deviations which is accepted by phylolm
.
Warning: many summaries are based on the species-level model, like "dof_residual", "residuals", "predict" or "deviance". So deviance
is innapropriate to compare models with within-species variation. Use loglikelihood
to compare models based on data at the individual level.
Reminder: do not compare ML or REML values across models fit on different data. Do not compare REML values across models that do not have the same predictors (fixed effects): use ML instead, for that purpose.
StatsAPI.nobs
— MethodStatsBase.nobs(m::PhyloNetworkLinearModel)
Number of observations: number of species with data, if the model assumes known species means, and number of individuals with data, if the model accounts for within-species variation.
StatsAPI.nulldeviance
— MethodStatsBase.nulldeviance(m::PhyloNetworkLinearModel)
For appropriate phylogenetic linear models, the deviance of the null model is the total sum of square with respect to the metric V, the estimated phylogenetic covariance matrix.
StatsAPI.stderror
— Methodstderror(m::PhyloNetworkLinearModel)
Return the standard errors of the coefficient estimates. See vcov
for related information on how these are computed.
StatsAPI.vcov
— Methodvcov(m::PhyloNetworkLinearModel)
Return the variance-covariance matrix of the coefficient estimates.
For the continuous trait evolutionary models currently implemented, species-level mean response (conditional on the predictors), Y|X is modeled as:
- Y|X ∼ 𝒩(Xβ, σ²ₛV) for models assuming known species mean values (no within-species variation)
- Y|X ∼ 𝒩(Xβ, σ²ₛV + σ²ₑD⁻¹) for models with information from multiple individuals and assuming within-species variation
The matrix V is inferred from the phylogeny, but may also depend on additional parameters to be estimated (e.g. lambda
for Pagel's Lambda model). See ContinuousTraitEM
, PhyloNetworkLinearModel
for more details.
If (1), then return σ²ₛ(X'V⁻¹X)⁻¹, where σ²ₛ is estimated with REML, even if the model was fitted with reml=false
. This follows the conventions of nlme::gls
and stats::glm
in R.
If (2), then return σ²ₛ(X'W⁻¹X)⁻¹, where W = V+(σ²ₑ/σ²ₛ)D⁻¹ is estimated, and σ²ₛ & σₑ are the estimates obtained with ML or REML, depending on the reml
option used to fit the model m
. This follows the convention of MixedModels.fit
in Julia.
StatsModels.isnested
— Methodisnested(m1::PhyloNetworkLinearModel, m2::PhyloNetworkLinearModel)
isnested(m1::ContinuousTraitEM, m2::ContinuousTraitEM)
True if m1
is nested in m2
, false otherwise. Models fitted with different criteria (ML and REML) are not nested. Models with different predictors (fixed effects) must be fitted with ML to be considered nested.
PhyloNetworks.fAbs
— Constantdefault values for tolerance parameters, used in the optimization of branch lengths (fAbs, fRel, xAbs, xRel) and in the acceptance of topologies (likAbs, numFails).
if changes are made here, make the same in the docstring for snaq! below
version | fAbs | fRel | xAbs | xRel | numFails | likAbs | multiplier |
---|---|---|---|---|---|---|---|
v0.5.1 | 1e-6 | 1e-6 | 1e-3 | 1e-2 | 75 | 1e-6 | |
v0.3.0 | 1e-6 | 1e-5 | 1e-4 | 1e-3 | 100 | 0.01 | |
v0.0.1 | 1e-6 | 1e-5 | 1e-4 | 1e-3 | 100 | 10000 | |
older | 1e-10 | 1e-12 | 1e-10 | 1e-10 |
v0.5.1: based on Nan Ji's work. same xAbs and xRel as in phylonet (as of 2015). earlier: multiplier was used; later: likAbs = multiplier*fAbs) "older": values from GLM.jl, Prof Bates
default values used on a single topology, to optimize branch lengths and gammas, at the very end of snaq!, and by topologyMaxQPseudolik! since v0.5.1.
version | fAbsBL | fRelBL | xAbsBL | xRelBL |
---|---|---|---|---|
v0.0.1 | 1e-10 | 1e-12 | 1e-10 | 1e-10 |