internal documentation
Documentation for PhyloNetworks
's internal functions. These functions are not exported and their access (API) should not be considered stable. But they can still be used, like this for example: PhyloNetworks.foo()
for a function named foo()
.
functions & types
PhyloNetworks.ANode
— TypeANode
Abstract node. An object of type EdgeT
has a node
attribute, which is an vector of 2 objects of some subtype of ANode
. The concrete type Node
is a subtype of ANode
, and has an edge
attribute, which is vector of Edge
objects (where Edge is an alias for EdgeT{Node}).
PhyloNetworks.EdgeT
— TypeEdgeT{node type}
Edge = EdgeT{Node}
Edge(number, length=1.0)
Data structure for an edge and its various attributes. Most notably:
number
(integer): serves as unique identifier; remains unchanged when the network is modified, with a nearest neighbor interchange for examplenode
: vector ofNode
s, normally just 2 of themischild1
(boolean):true
ifnode[1]
is the child node of the edge, false ifnode[1]
is the parent node of the edgelength
: branch lengthhybrid
(boolean): whether the edge is a tree edge or a hybrid edge (in which caseischild1
is important, even if the network is semi-directed)gamma
: proportion of genetic material inherited by the child node via the edge; 1.0 for a tree edgeismajor
(boolean): whether the edge is the major path to the child node;true
for tree edges, since a tree edge is the only path to its child node; normally true ifgamma>0.5
.containroot
(boolean): is the interior of this edge a valid rooting position? That is, if we added a node on the edge, could we direct all edges away from this new node to root the semidirected network and get a valid rooted network?
and other fields, used very internally
PhyloNetworks.MatrixTopologicalOrder
— TypeMatrixTopologicalOrder
Matrix associated to a HybridNetwork
in which rows and/or columns correspond to nodes in the network. In this matrix, nodes are indexed by their topological order. For example, if rows list nodes in network net
, then V[i,:]
corresponds to node net.vec_node[i]
.
The following functions and extractors can be applied to it: tiplabels
, obj[:tips]
, obj[:internalnodes]
, obj[:tipsnodes]
. See documentation for getindex(::MatrixTopologicalOrder, ::Symbol)
).
A MatrixTopologicalOrder
object has field V
(the matrix of interest) and fields for mapping indices in V
to node numbers nodenumbers_toporder
, internalnodenumbers
, tipnumbers
, tipnames
, indexation
. Type in "?MatrixTopologicalOrder.field" to get documentation on a specific field.
PhyloNetworks.Node
— TypeNode(number, leaf)
Node(number, leaf, hybrid)
Data structure for a node and its various attributes. Most notably:
number
(integer): serves as unique identifier; remains unchanged when the network is modified, with a nearest neighbor interchange for exampleleaf
(boolean): whether the node is a leaf (with data typically) or an internal node (no data typically)name
(string): taxon name for leaves; internal node may or may not have a nameedge
: vector ofEdge
s that the node is attached to; 1 if the node is a leaf, 2 if the node is the root, 3 otherwise, and potentially more if the node has a polytomyhybrid
(boolean): whether the node is a hybrid node (with 2 or more parents) or a tree node (with a single parent)
PhyloNetworks.Partition
— TypePartition
Data structure for a collection of edges and (node) numbers. Fields:
cycle
: vector ofInt
edges
: vector ofEdge
process_biconnectedcomponents!
uses edges
to store the edges in a given biconnected component. (blocks = biconnected components partition edges, while blobs = 2-edge connected components partition nodes. They are in 1-to-1 correspondence in binary networks.) cycle
is used to store the indices, in net.vec_node
, of the block's articulation nodes (or root), with the first one being the entry node (or root) and other being exit articulation nodes. Note that a leaf is not an articulation node: it does not connect the pendent edge to another block.
For level-1 networks, SNaQ uses 1 partition for each maximum tree of cut-edges. edges
stores the cut edges in this tree. cycle
stores an integer for each cycle that the tree is adjacent to. This integer is the number of the unique hybrid node in this cycle.
See also entrynode_preindex
, number_exitnodes
PhyloNetworks.QuartetT
— TypeQuartetT{T}
Generic type for 4-taxon sets. Fields:
number
: rank of the 4-taxon settaxonnumber
: static vector of 4 integers, assumed to be distinct and sorteddata
: object of typeT
For easier look-up, a unique mapping is used between the rank (number
) of a 4-taxon set and its 4 taxa (see quartetrank
and nchoose1234
):
rank-1 = (t1-1) choose 1 + (t2-1) choose 2 + (t3-1) choose 3 + (t4-1) choose 4
examples
julia> nCk = PhyloNetworks.nchoose1234(5)
6×4 Matrix{Int64}:
0 0 0 0
1 0 0 0
2 1 0 0
3 3 1 0
4 6 4 1
5 10 10 5
julia> PhyloNetworks.QuartetT(1,3,4,6, [.92,.04,.04, 100], nCk)
4-taxon set number 8; taxon numbers: 1,3,4,6
data: [0.92, 0.04, 0.04, 100.0]
PhyloNetworks.TopologyConstraint
— TypeType for various topological constraints, such as:
- a set of taxa forming a clade in the major tree
- a set of individuals belonging to the same species
- a set of taxa forming a clade in any one of the displayed trees
PhyloNetworks.TopologyConstraint
— MethodTopologyConstraint(type::UInt8, taxonnames::Vector{String}, net::HybridNetwork)
Create a topology constraint from user-given type, taxon names, and network. There are 3 types of constraints:
- type 1: A set of tips that forms a species.
- type 2: A set of tips that forms a clade in the major tree. Note that the root matters. Constraining a set of species to be an outgroup is equivalent to constraining the ingroup to form a clade.
- type 3: A set of tips that forms a clade in any one of the displayed trees.
Note: currently, with type-1 constraints, hybridizations are prevented from coming into or going out of a species group.
examples
julia> net = readnewick("(((2a,2b),(((((1a,1b,1c),4),(5)#H1),(#H1,(6,7))))#H2),(#H2,10));");
julia> c_species1 = PhyloNetworks.TopologyConstraint(0x01, ["1a","1b","1c"], net)
Species constraint, on tips: 1a, 1b, 1c
stem edge number 7
crown node number -9
julia> c_species2 = PhyloNetworks.TopologyConstraint(0x01, ["2a","2b"], net)
Species constraint, on tips: 2a, 2b
stem edge number 3
crown node number -4
julia> nni!(net , net.edge[3], true, true, [c_species2]) === nothing # we get nothing: would break species 2
true
julia> # the following gives an error, because 4,5 do not form a clade in "net"
# PhyloNetworks.TopologyConstraint(0x02, ["4","5"], net)
julia> c_clade145 = PhyloNetworks.TopologyConstraint(0x02, ["1a","1b","1c","4","5"], net)
Clade constraint, on tips: 1a, 1b, 1c, 4, 5
stem edge number 12
crown node number -7
julia> nni!(net , net.edge[12], true, true, [c_species1, c_clade145]) # we get nothing (failed NNI): would break the clade
PhyloNetworks.exitnodes_preindex
— Typeexitnodes_preindex(biconnected_component::Partition)
Iterator over the preorder index of articulation exit nodes of a biconnected component, considering the network rooted (see number_exitnodes
). Indices are indices in the network's preordering .vec_node
.
Base.getindex
— Functiongetindex(obj::MatrixTopologicalOrder,
d::Symbol,[ indtips, nonmissing])
Get submatrices of a MatrixTopologicalOrder
object. In obj.V
, row and/column i
corresponds to the i
th node in topological order. In contrast, in matrix obj[:tips]
for example, row and/or column i
corresponds to the i
th tip when tips are listed in the same order as in the network's original .node
vector.
Arguments:
obj::MatrixTopologicalOrder
: the matrix from which to extract.d
: symbol specifying which sub-matrix to extract. Can be::tips
columns and/or rows corresponding to the tips:internalnodes
columns and/or rows corresponding to the internal nodes Includes tips not listed inindtips
or missing data according tononmissing
.:tipsnodes
columns corresponding to internal nodes, and row to tips (works only is indexation="b"):all
nodes, listed in topological order
indtips::Vector{Int}
: optional argument precising a specific order for the tips (internal use).nonmissing::BitVector
: optional argument saying which tips have data (internal use). Tips with missing data are treated as internal nodes.
PhyloNetworks._getnodeheights
— Function_getnodeheights(net::HybridNetwork, fixmissing::Bool,
inconsistencyhandler::Function, checkpreorder::Bool=true)
Helper to determine time-consistency and calculate node heights (distance from the root), used by getnodeheights
for example.
output: (isconsistent, nodes_distance_from_root)
Arguments:
fixmissing
:- if
false
, any missing hybrid edge length will cause a warning, the network is not modified, and the best-case is assumed to determine time-consistency (as explained below) - if
true
, will attempt to find values for missing lengths of hybrid edges, if any, to make the network time-consistent, placing hybrid nodes as close to the root as possible (as this gives most chances to find a time-consistent assignment of all missing hybrid edge lengths). If there is a time-consistent assignment, then the network is modified (with missing hybrid edge lengths set to time-consistent values). Otherwise, the network is not modified.
This option is passed to
update_getnodeheights_hybrid!
that handles 1 hybrid node at a time.- if
inconsistencyhandler
: function to check & handle time-consistency as desired as a given hybrid nodeh
, and to decide if the traversal should continue. It should take as input:- a vector of ≥1 candicate heights for
h
from parent edges with non-missing length, and - a vector of ≥0 heights of parent nodes whose child edge to
h
has no length isconsistent
: a boolean that is modified tofalse
if the network is not time-consistent ath
(unless an error is thrown anyway!).
This handler function decides what to do if the candidate heights are not all equal (the network is time-inconsistent), and if the values to be assigned to missing edge lengths would be negative. Its output should be:
(keepgoing_boolean, hybrid_node_height)
.Examples:
timeinconsistency_error
is conservative and throws an error in both cases.timeinconsistency_average
is lenient: only throws warnings, but keeps going and returns γ-weighted average node heights- a vector of ≥1 candicate heights for
PhyloNetworks.addAlternativeHybridizations!
— MethodaddAlternativeHybridizations!(net::HybridNetwork, BSe::DataFrame;
cutoff::Number=10, top::Int=3)
Modify the network net
(the best estimated network) by adding some of the hybridizations present in the bootstrap networks. By default, it will only add hybrid edges with more than 10% bootstrap support (cutoff
) and it will only include the top 3 hybridizations (top
) sorted by bootstrap support.
The dataframe BSe
is also modified. In the original BSe
, supposedly obtained with hybridclades_support
, hybrid edges that do not appear in the best network have a missing number. After hybrid edges from bootstrap networks are added, BSe
is modified to include the edge numbers of the newly added hybrid edges. To distinguish hybrid edges present in the original network versus new edges, an extra column of true/false values is also added to BSe
, named "alternative", with true for newly added edges absent from the original network.
The hybrid edges added to net
are added as minor edges, to keep the underlying major tree topology.
example
julia> bootnet = readmultinewick(joinpath(dirname(pathof(PhyloNetworks)), "..","examples", "bootsnaq.out")); # vector of 10 networks
julia> bestnet = readnewick("((O,(E,#H7:::0.196):0.314):0.332,(((A)#H7:::0.804,B):10.0,(C,D):10.0):0.332);");
julia> BSn, BSe, BSc, BSgam, BSedgenum = hybridclades_support(bootnet, bestnet);
julia> BSe[1:6,[:edge,:hybrid_clade,:sister_clade,:BS_hybrid_edge]]
6×4 DataFrame
Row │ edge hybrid_clade sister_clade BS_hybrid_edge
│ Int64? String String Float64
─────┼─────────────────────────────────────────────────────
1 │ 7 H7 B 33.0
2 │ 3 H7 E 32.0
3 │ missing c_minus3 c_minus8 44.0
4 │ missing c_minus3 H7 44.0
5 │ missing E O 12.0
6 │ missing c_minus6 c_minus8 9.0
julia> PhyloNetworks.addAlternativeHybridizations!(bestnet, BSe)
julia> BSe[1:6,[:edge,:hybrid_clade,:sister_clade,:BS_hybrid_edge,:alternative]]
6×5 DataFrame
Row │ edge hybrid_clade sister_clade BS_hybrid_edge alternative
│ Int64? String String Float64 Bool
─────┼──────────────────────────────────────────────────────────────────
1 │ 7 H7 B 33.0 false
2 │ 3 H7 E 32.0 false
3 │ 16 c_minus3 c_minus8 44.0 true
4 │ 19 c_minus3 H7 44.0 true
5 │ 22 E O 12.0 true
6 │ missing c_minus6 c_minus8 9.0 false
julia> # using PhyloPlots; plot(bestnet, edgelabel=BSe[:,[:edge,:BS_hybrid_edge]]);
PhyloNetworks.addHybridBetweenClades!
— MethodaddHybridBetweenClades!(net::HybridNetwork, hybnum::Number, sisnum::Number)
Modify net
by adding a minor hybrid edge from "donor" to "recipient", where "donor" is the major parent edge e1
of node number hybnum
and "recipient" is the major parent edge e2
of node number sisnum
. The new nodes are currently inserted at the middle of these parent edges.
If a hybrid edge from e1
to e2
would create a directed cycle in the network, then this hybrid cannot be added. In that case, the donor edge e1
is moved up if its parent is a hybrid node, to ensure that the sister clade to the new hybrid would be a desired (the descendant taxa from e1
) and a new attempt is made to create a hybrid edge.
Output: number of the new hybrid edge, or nothing
if the desired hybridization is not possible.
See also: addhybridedge!
(used by this method) and directionalconflict
to check that net
would still be a DAG.
PhyloNetworks.addhybridedge!
— Functionaddhybridedge!(
net::HybridNetwork,
edge1::Edge,
edge2::Edge,
hybridpartnernew::Bool,
edgelength::Float64=-1.0,
gamma::Float64=-1.0
)
Add hybridization to net
coming from edge1
going into edge2
. 2 new nodes and 3 new edges are created: edge1
are edge2
are both cut into 2 edges, and a new edge is created linking the 2 new "middle" nodes, pointing from edge1
to edge2
. The new node in the middle of edge1
is a tree node. The new node in the middle of edge2
is a hybrid node. Its parent edges are the newly created hybrid edge (with γ = gamma, missing by default), and either the newly edge "above" edge2
if hybridpartnernew=true
, or the old edge2
otherwise (which would reverse the direction of edge2
and others).
Should be called from the other method, which performs a bunch of checks. Updates containroot
attributes for edges below the new hybrid node.
Output: new hybrid node (middle of the old edge2
) and new hybrid edge.
examples
julia> net = readnewick("((S8,(((S1,(S5)#H1),(#H1,S6)))#H2),(#H2,S10));");
julia> hybnode, hybedge = PhyloNetworks.addhybridedge!(net, net.edge[13], net.edge[8], true, 0.0, 0.2)
(PhyloNetworks.Node:
number:9
name:H3
hybrid node
attached to 3 edges, numbered: 8 16 17
, PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:17
length:0.0
minor hybrid edge with gamma=0.2
attached to 2 node(s) (parent first): 8 9
)
julia> writenewick(net)
"((S8,(((S1,(S5)#H1),((#H1,S6))#H3:::0.8))#H2),(#H2,(S10,#H3:0.0::0.2)));"
PhyloNetworks.addhybridedge!
— Methodaddhybridedge!(
[rng::AbstractRNG,]
net::HybridNetwork,
nohybridladder::Bool,
no3cycle::Bool,
constraints::Vector{TopologyConstraint}=TopologyConstraint[];
maxattempts::Int=10,
fixroot::Bool=false
)
Randomly choose two edges in net
then: add hybrid edge from edge 1 to edge 2 of length 0.01. The two halves of edge 1 (and of edge 2) have equal lengths. The hybrid partner edge (top half of edge 2, if fixroot is true) will point towards the newly-created node on the middle of the original edge 2, and have an inheritance γ randomly uniformly chosen in (0,0.5).
If the resulting network is a DAG, satisfies the constraint(s), does not contain any 3-cycle (if no3cycle=true
), and does not have a hybrid ladder (if nohybridladder=true
) then the proposal is successful: net
is modified, and the function returns the newly created hybrid node and newly created hybrid edge.
If the resulting network is not acceptable, then a new set of edges is proposed (using a blacklist) until one is found acceptable, or until a maximum number of attempts have been made (maxattempts
). If none of the attempted proposals are successful, nothing
is returned (without causing an error).
After a pair of edges is picked, the "top" half of edge2 is proposed as the partner hybrid edge with probability 0.8 if fixroot
is false, (to avoid changing the direction of edge2 with more than 50% chance) and with probability 1.0 if fixroot
is true. If this choice does not work and if fixroot
is false, the other half of edge2 is proposed as the partner hybrid edge. Note that choosing the "bottom" half of edge2 as the partner edge requires to flip the direction of edge 2, and to move the root accordingly (to the original child of edge2).
examples
julia> net = readnewick("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");
julia> using Random
julia> Random.seed!(170);
julia> PhyloNetworks.addhybridedge!(net, true, true)
(PhyloNetworks.Node:
number:9
name:H3
hybrid node
attached to 3 edges, numbered: 5 16 17
, PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:17
length:0.01
minor hybrid edge with gamma=0.32771460911632916
attached to 2 node(s) (parent first): 8 9
)
julia> writenewick(net, round=true, digits=2)
"((S1,(((#H1,S4),((S2,(S3)#H1))#H3:::0.67))#H2),((#H2,S5),#H3:0.01::0.33));"
PhyloNetworks.addindividuals!
— Methodaddindividuals!(net::HybridNetwork, species::AbstractString, individuals::Vector)
Add individuals to their species leaf as a star, and return the corresponding species constraint. nothing
is returned in 2 cases:
- if
individuals
contains only 1 individual, in which case the name of the leaf for that species is replaced by the name of its one individual representative - if
species
is not found in the network
Spaces in individuals' names are eliminated, see cleantaxonname
.
Called by mapindividuals
.
examples
julia> net = readnewick("(S8,(((S1,S4),(S5)#H1),(#H1,S6)));");
julia> PhyloNetworks.addindividuals!(net, "S1", ["S1A", "S1B", "S1C"])
Species constraint, on tips: S1A, S1B, S1C
stem edge number 2
crown node number 2
julia> writenewick(net) # 3 new nodes, S1 now internal: not a tip
"(S8,((((S1A,S1B,S1C)S1,S4),(S5)#H1),(#H1,S6)));"
PhyloNetworks.addleaf!
— Functionaddleaf!(net::HybridNetwork, node::Node, leafname::String, edgelength::Float64=-1.0)
addleaf!(net::HybridNetwork, edge::Edge, leafname::String, edgelength::Float64=-1.0)
Add a new external edge between node
or between the "middle" of edge
and a newly-created leaf, of name leafname
. By default, the new edge length is missing (-1).
output: newly created leaf node.
examples
julia> net = readnewick("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");
julia> net.node[6].name # leaf S4
"S4"
julia> PhyloNetworks.addleaf!(net, net.node[6], "4a"); # adding leaf to a node
julia> writenewick(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(4a)S4)))#H2),(#H2,S5));"
julia> PhyloNetworks.addleaf!(net, net.node[6], "4b");
julia> writenewick(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(4a,4b)S4)))#H2),(#H2,S5));"
julia> net = readnewick("((S1,(((S2,(S3)#H1),(#H1,S4)))#H2),(#H2,S5));");
julia> [n.name for n in net.edge[7].node] # external edge to S4
2-element Vector{String}:
"S4"
""
julia> PhyloNetworks.addleaf!(net, net.edge[7], "4a"); # adding leaf to an edge
julia> writenewick(net, internallabel=true)
"((S1,(((S2,(S3)#H1),(#H1,(S4,4a))))#H2),(#H2,S5));"
PhyloNetworks.adjacentedges
— Methodadjacentedges(centeredge::Edge)
Vector of all edges that share a node with centeredge
.
PhyloNetworks.allowrootbelow!
— Methodallowrootbelow!(net::HybridNetwork)
Set containroot
to true
for each edge below the root node, then traverses net
in preorder to update containroot
of all edges (stopping at hybrid nodes): see the other methods. Assumes correct ischild1
edge field.
PhyloNetworks.allowrootbelow!
— Methodallowrootbelow!(e::Edge)
allowrootbelow!(n::Node, parent_edge_of_n::Edge)
Set containroot
to true
for edge e
and all edges below, recursively. The traversal stops whenever a hybrid node is encountered: if the child of e
is a hybrid node (that is, if e
is a hybrid edge) or if n
is a hybrid node, then the edges below e
or n
are not traversed.
PhyloNetworks.assignhybridnames!
— Methodassignhybridnames!(net)
Assign names to hybrid nodes in the network net
. Hybrid nodes with an empty name
field ("") are modified with a name that does not conflict with other hybrid names in the network. The preferred name is "H3" if the node number is 3 or -3, but an index other than 3 would be used if "H3" were the name of another node already.
If two hybrid nodes have non-empty and equal names, the name of one of them is changed and re-assigned as described above (with a warning).
PhyloNetworks.biconnectedcomponent_entrynodes
— Functionbiconnectedcomponent_entrynodes(net, bcc, preorder=true)
Array containing the entry node of the each biconnected component in bcc
. bcc
is supposed to contain the biconnected components as output by biconnectedcomponents
, that is, an array of array of edges.
These entry nodes depend on the rooting (whereas the BCC only depend on the unrooted graph). They are either the root of the network or cut node (articulation points).
PhyloNetworks.biconnectedcomponent_exitnodes
— Functionbiconnectedcomponent_exitnodes(net, bcc, preorder=true)
Array containing an array of the exit node(s) of the each biconnected component in bcc
. bcc
is supposed to contain the biconnected components as output by biconnectedcomponents
, that is, an array of array of edges.
These exit nodes depend on the rooting (whereas the BCC only depend on the unrooted graph). The degree of a blob is the number of exit nodes + 1 if the blob doesn't contain the root (its entry node is a cut node), or + 0 if the blob contains the root (which enters into the blob but isn't a cut node).
Warning (or positive side effect?): the edge .inte1
attribute is modified. It stores the index (in bcc
) of the biconnected component that an edge belongs to. If an edge doesn't belong in any (e.g. if trivial blobs are ignored), then its .inte1
is set to -1.
PhyloNetworks.blobinfo
— Functionblobinfo(network, ignoreTrivial=true)
Calculate the biconnected components (blobs) using function biconnectedcomponents
then:
- set node field
booln4
to true at the root of each non-trivial blob (and at the network root), false otherwise. (a better name for the field would be something like "isBlobRoot".) - output:
- array of nodes that are the roots of each non-trivial blob, and the network root. If the root of the full network is not part of a non-trivial blob, a corresponding blob is added to the list.
- array of arrays: for each non-trivial blob, array of major hybrid edges in that blob.
- array of arrays: same as #2 but for minor hybrid edges, with hybrids listed in the same order, for each blob.
Blobs are ordered in reverse topological ordering (aka post order). If ignoreTrivial
is true, trivial components are ignored.
keyword argument: checkpreorder
, true by default. If false, the ischild1
edge field and the net.vec_node
network field are supposed to be correct.
warning: see biconnectedcomponents
for node attributes modified during the algorithm.
PhyloNetworks.breakedge!
— Methodbreakedge!(edge::Edge, net::HybridNetwork)
Break an edge into 2 edges, each of length half that of original edge, creating a new node of degree 2. Useful to root network along an edge. Return the new node and the new edge, which is the "top" half of the original starting edge. These new node & edge are pushed last in net.node
and net.edge
.
If the starting edge was:
n1 --edge--> n2
then we get this:
n1 --newedge--> newnode --edge--> n2
ischild1
and containroot
are updated, but not internal fields (e.g. not those used by SNaQ for level-1 networks).
examples
julia> net = readnewick("(((S8,S9),((((S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));");
julia> length(net.node)
19
julia> net.edge[4] # edge 4 goes from node -8 to 3
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:4
length:-1.0
attached to 2 node(s) (parent first): -8 3
julia> newnode, newedge = PhyloNetworks.breakedge!(net.edge[4], net);
julia> length(net.node) # one more than before
20
julia> newedge # new edge 21 goes from node -8 and 11 (new)
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:21
length:-1.0
attached to 2 node(s) (parent first): -8 11
julia> net.edge[4] # original edge 4 now goes from node 11 (new) to 3
PhyloNetworks.EdgeT{PhyloNetworks.Node}:
number:4
length:-1.0
attached to 2 node(s) (parent first): 11 3
julia> writenewick(net) # note extra pair of parentheses around S1
"(((S8,S9),((((S4,(S1)),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"
See also: fuseedgesat!
PhyloNetworks.checkNumHybEdges!
— MethodcheckNumHybEdges!(net)
Check for consistency between hybrid-related attributes in the network:
- for each hybrid node: 2 or more hybrid edges
- exception: allows for a leaf to be attached to a single hybrid edge
- exactly 2 incoming parent hybrid edges
Run after storeHybrids!
.
PhyloNetworks.check_nonmissing_nonnegative_edgelengths
— Functioncheck_nonmissing_nonnegative_edgelengths(net, str="")
Throw an Exception if net
has undefined edge lengths (coded as -1.0) or negative edge lengths. The error message indicates the number of the offending edge(s), followed by str
.
PhyloNetworks.checkspeciesnetwork!
— Methodcheckspeciesnetwork!(network::HybridNetwork,
constraints::Vector{TopologyConstraint})
Check that the network satisfies a number of requirements:
- no polytomies, other than at species constraints: throws an error otherwise
- no unnecessary nodes: fuse edges at nodes with degree two, including at the root (hence the bang:
net
may be modified) - topology constraints are met.
Output: true if all is good, false if one or more clades are violated.
PhyloNetworks.cleantaxonname
— Methodcleantaxonname(taxonname::AbstractString)
Return a String with leading and trailing spaces removed, and interior spaces replaced by underscores: good for using as tip names in a network without causing future error when reading the newick description of the network.
PhyloNetworks.constraintviolated
— Methodconstraintviolated(network::HybridNetwork,
constraints::Vector{TopologyConstraint})
True if network
violates one (or more) of the constraints of type 1 (individuals in a species group) or type 2 (must be clades in the major tree). Warning: constraints of type 3 are not implemented. See TopologyConstraint
for constraint types.
PhyloNetworks.deleteEdge!
— MethoddeleteEdge!(net::HybridNetwork, e::Edge; part=true)
Delete edge e
from net.edge
and update net.numedges
. If part
is true, update the network's partition field.
Warning: if part
is true (the default), then net
is assumed to be of level-1, with valid internal fields e.inte1
(to track which cycle e
may be in) and valid net.partition
, which then gets updated.
PhyloNetworks.deleteNode!
— MethoddeleteNode!(net::HybridNetwork, n::Node)
Delete node n
from a network, i.e. removes it from net.node, and from net.hybrid or net.leaf as appropriate. Update attributes numnodes
, numtaxa
, numhybrids
.
Warning: if the root is deleted, the new root is arbitrarily set to the first node in the list. This is intentional to save time because this function is used frequently in snaq!, which handles semi-directed (unrooted) networks.
PhyloNetworks.deletehybridedge!
— Functiondeletehybridedge!(net::HybridNetwork, edge::Edge,
nofuse=false, unroot=false,
multgammas=false, simplify=true, keeporiginalroot=false)
Delete a hybrid edge
from net
and return the network. The network does not have to be of level 1 and may contain polytomies, although each hybrid node must have exactly 2 parents. Branch lengths are updated, allowing for missing values.
If nofuse
is false, when edge
is removed, its child (hybrid) node is removed and its partner hybrid edge is removed. Its child edge is retained (below the hybrid node), fused with the former partner, with new length: old length + length of edge
's old partner. Any 2-cycle is simplified into a single edge, unless simplify
is false.
If nofuse
is true, edges with descendant leaves are kept as is, and are not fused. Nodes are retained during edge removal, provided that they have at least one descendant leaf. The hybrid edge that is partner to edge
becomes a tree edge, but has its γ value unchanged (it is not set to 1), since it is not merged with its child edge after removal of the reticulation. Also, 2-cycles are not simplified if nofuse
is true. That is, if we get 2 hybrid edges both from the same parent to the same child, these hybrid edges are retained without being fused into a single tree edge.
If unroot
is false and if the root is up for deletion during the process, it will be kept if it's of degree 2 or more. A root node of degree 1 will be deleted unless keeporiginalroot
is true.
If multgammas
is true: inheritance weights are kept by multiplying together the inheritance γ's of edges that are merged. For example, if there is a hybrid ladder, the partner hybrid edge remains a hybrid edge (with a new partner), and its γ is the product of the two hybrid edges that have been fused. So it won't add up to 1 with its new partner's γ.
If keeporiginalroot
is true, a root of degree one will not be deleted.
Warnings:
containroot
is updated, but this requires correctischild1
fields- if the parent of
edge
is the root and ifnofuse
is false, the root is moved to keep the network unrooted with a root of degree two. - does not update attributes needed for snaq! (like inte1, edge.z, edge.y etc.)
PhyloNetworks.descendants
— Functiondescendants(edge::Edge, internal::Bool=false)
Return the node numbers of the descendants of a given edge: all descendant nodes if internal
is true (internal nodes and tips), or descendant tips only otherwise (defaults).
edge
should belong in a rooted network for which ischild1
is up-to-date. Run directedges!
beforehand. This is very important, otherwise one might enter an infinite loop, and the function does not test for this.
Examples
julia> net5 = "(A,((B,#H1),(((C,(E)#H2),(#H2,F)),(D)#H1)));" |> readnewick |> directedges! ;
julia> PhyloNetworks.descendants(net5.edge[12], true) # descendants of 12th edge: all of them
7-element Vector{Int64}:
-6
-7
4
6
5
-9
7
julia> PhyloNetworks.descendants(net5.edge[12]) # descendant leaves only
3-element Vector{Int64}:
4
5
7
PhyloNetworks.directionalconflict
— Methoddirectionalconflict(parent::Node, edge::Edge, hybridpartnernew::Bool)
Check if creating a hybrid edge down of parent
node into the middle of edge
would create a directed cycle in net
, i.e. not a DAG. The proposed hybrid would go in the direction of edge
down its child node if hybridpartnernew
is true. Otherwise, both halves of edge
would have their direction reversed, for the hybrid to go towards the original parent node of edge
. Does not modify the network.
Output: true
if a conflict would arise (non-DAG), false
if no conflict.
PhyloNetworks.displayednetworks!
— Functiondisplayednetworks!(net::HybridNetwork, node::Node, keepNode=false,
unroot=false, multgammas=false, keeporiginalroot=false)
Extracts the two networks that simplify a given network at a given hybrid node: deleting either one or the other parent hybrid edge. If nofuse
is true, the original edges (and nodes) are kept in both networks, provided that they have one or more descendant leaves. If unroot
is true, the root will be deleted if it becomes of degree 2. If keeporiginalroot
is true, the root is retained even if it is of degree 1.
- the original network is modified: the minor edge removed.
- returns one HybridNetwork object: the network with the major edge removed
PhyloNetworks.edgerelation
— Methodedgerelation(e::Edge, node::Node, origin::Edge)
Return a symbol:
:origin
ife
is equal toorigin
, and otherwise::parent
ife
is a parent ofnode
,:child
ife
is a child ofnode
using the ischild1
attribute of edges. Useful when e
iterates over all edges adjacent to node
and when origin
is one of the edges adjacent to node
, to known the order in which these edges come.
example:
labs = [edgerelation(e, u, uv) for e in u.edge] # assuming u is a node of edge uv
parentindex = findfirst(isequal(:parent), labs) # could be 'nothing' if no parent
childindices = findall( isequal(:child), labs) # vector. could be empty
PhyloNetworks.entrynode_preindex
— Methodentrynodepreindex(biconnectedcomponent::Partition)
Preorder index of the entry node of a biconnected component, considering the network rooted. It is an articulation node, except if it is the network's root.
PhyloNetworks.fliphybrid!
— Functionfliphybrid!(
net::HybridNetwork,
hybridnode::Node,
minor::Bool=true,
nohybridladder::Bool=false,
constraints::Vector{TopologyConstraint}=TopologyConstraint[]
)
Flip the direction of a single hybrid edge: the minor parent edge of hybridnode
by default, or the major parent edge if minor
is false. The parent node of the hybrid edge becomes the new hybrid node. The former hybrid edge partner is converted to a tree edge (with γ=1), and hybridnode
becomes a tree node.
For the flip to be admissible, the new network must be a semi-directed phylogenetic network: with a root such that the rooted version is a DAG. If nohybridladder
is false (default), the flip may create a hybrid ladder If nohybridladder
is true and if the flip would create a hybrid ladder, then the flip is not admissible. A hybrid ladder is when a hybrid child of another hybrid.
The new hybrid partner is an edge adjacent to the new hybrid node, such that the flip is admissible (so it must be a tree edge). The flipped edge retains its original γ. The new hybrid edge is assigned inheritance 1-γ.
Output: (newhybridnode, flippededge, oldchildedge)
if the flip is admissible, nothing
otherwise.
The network is unchanged if the flip is not admissible. If the flip is admissible, the root position may be modified, and the direction of tree edges (via ischild1
) is modified accordingly. If the root needs to be modified, then the new root is set to the old hybrid node.
The index of the new hybrid node in net.hybrid
is equal to that of the old hybridnode
.
Warning: Undoing this move may not recover the original root if the root position was modified.
PhyloNetworks.fliphybrid!
— Methodfliphybrid!(
[rng::AbstractRNG,]
net::HybridNetwork,
minor::Bool,
nohybridladder::Bool=false,
constraints::Vector{TopologyConstraint}=TopologyConstraint[]
)
Cycle through hybrid nodes in random order until an admissible flip is found. At this hybrid node, flip the indicated hybrid parent edge (minor or major).
If an admissible flip is found, return the tuple: newhybridnode, flippededge, oldchildedge. Otherwise, return nothing.
The flip can be undone with fliphybrid!(net, newhybridnode, minor, constraints)
, or fliphybrid!(net, newhybridnode, !flippededge.ismajor, constraints)
more generally, such as if the flipped edge had its γ modified after the original flip.
Warnings
- if the root needed to be reset and if the original root was of degree 2, then a node of degree 2 remains in the modified network.
- undoing the flip may not recover the original root in case the root position was modified during the original flip.
PhyloNetworks.fuseedgesat!
— Functionfuseedgesat!(i::Integer,net::HybridNetwork, multgammas::Bool=false)
Removes i
th node in net.node, if it is of degree 2. The parent and child edges of this node are fused. If either of the edges is hybrid, the hybrid edge is retained. Otherwise, the edge with the lower edge number is retained.
Reverts the action of breakedge!
.
returns the fused edge.
PhyloNetworks.getTipSubmatrix
— MethodgetTipSubmatrix(M, net; indexation=:both)
Extract submatrix of M, with rows and/or columns corresponding to tips in the network, ordered like in net.leaf
. In M, rows and/or columns are assumed ordered as in net.vec_node
.
indexation: one of :rows
, :cols
or :both
: are nodes numbers indexed in the matrix by rows, by columns, or both? Subsetting is taken accordingly.
PhyloNetworks.getconnectingedge
— Methodgetconnectingedge(node1::Node, node2::Node)
Edge shared by (or connecting) node1
and node2
, that is: edge incident to both nodes. An error is thrown if the 2 nodes are not connected.
See also isconnected
PhyloNetworks.getlengths
— Methodgetlengths(edges::Vector{Edge})
Vector of edge lengths for a vector of edges
.
PhyloNetworks.hardwiredclusterdistance_unrooted
— Methodhardwiredclusterdistance_unrooted(net1::HybridNetwork, net2::HybridNetwork)
Miminum hardwired cluster dissimilarity between the two networks, considered as unrooted (or semi-directed). This dissimilarity is defined as the minimum rooted distance, over all root positions that are compatible with the direction of hybrid edges. Called by hardwiredclusterdistance
.
To avoid repeating identical clusters, all degree-2 nodes are deleted before starting the comparison. Since rooting the network at a leaf creates a root node of degree 2 and an extra cluster, leaves are excluded from possible rooting positions.
PhyloNetworks.hybrid3cycle
— Methodhybrid3cycle(edge1::Edge, edge2::Edge)
Check if proposed hybrid edge from edge1
into edge2
would create a 3 cycle, that is, if edge1
and edge2
have a node in common. (This move cannot create a 2-cycles because new nodes would be created in the middle of edges 1 and 2.)
PhyloNetworks.hybridEdges
— MethodhybridEdges(node::Node, e::Edge)
Return the 2 edges connected to node
other than e
, in the same order as node.edge
, except that e
absent from the list.
Despite what the name suggest, node
need not be a hybrid node! node
is assumed to have 3 edges, though.
PhyloNetworks.hybridEdges
— MethodhybridEdges(node::Node)
Return the 3 edges attached to node
in a specific order [e1,e2,e3]. Warning: assume a level-1 network with up-to-date node fields booln1
(tracking whether the node is incident to a hybrid edge and edge) and field intn1
(tracking the number given to the cycle in which the node might be).
If node
is a hybrid node:
- e1 is the major hybrid parent edge of
node
- e2 is the minor hybrid parent edge
- e3 is the tree edge, child of
node
.
If node
is a tree node parent of one child hybrid edge:
- e1 is the hybrid edge, child of
node
- e2 is the tree edge that belongs to the cycle created by e1
- e3 is the other tree edge attached to
node
(not in a cycle)
Otherwise:
- e3 is an external edge from
node
to a leaf, if one exists.
PhyloNetworks.inheritanceweight
— Methodinheritanceweight(tree::HybridNetwork)
Return the log inheritance weight of a network or tree (as provided by displayedtrees
with nofuse
= true for instance). For a tree displayed in a network, its inheritance weight is the log of the product of γ's of all edges retained in the tree. To avoid underflow, the log is calculated: i.e. sum of log(γ) across retained edges.
If any edge has a negative γ, it is assumed to mean that its γ is missing, and the function returns missing
.
Example
julia> net = readnewick("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> trees = displayedtrees(net,0.0; nofuse=true);
julia> PhyloNetworks.inheritanceweight.(trees)
2-element Vector{Float64}:
-0.105361
-2.30259
PhyloNetworks.initializeweightsfromleaves!
— Methodinitializeweightsfromleaves!(w, net, tips, stateset, criterion)
Modify weight in w: to Inf for w[n, i] if the "tips" data has a state different from the lineage state of index i at node number n. Assumes that w was initialized to 0 for the leaves.
criterion: should be one of :softwired
, :parental
or :hardwired
.
- softwired parsimony: lineage states are in this order: ∅,{1},{2},{3},...,{nstates}
PhyloNetworks.initializeweightsfromleaves_softwired!
— Methodinitializeweightsfromleaves_softwired!(w, net, tips, charset)
Modify weight in w: to Inf for w[n, s] if the "tips" data has a state different from s at node number n. Assumes that w was initialized to 0 for the leaves.
PhyloNetworks.isconnected
— Methodisconnected(node1::Node, node2::Node)
Check if two nodes are connected by an edge. Return true if connected, false if not connected.
See also getconnectingedge
PhyloNetworks.isdescendant
— Methodisdescendant(des:Node, anc::Node)
Return true if des
is a strict descendant of anc
, using ischild1
fields to determine the direction of edges. See isdescendant_undirected
for a version that does not use ischild1
.
PhyloNetworks.isdescendant_undirected
— Methodisdescendant_undirected(des:Node, ancestor::Node, parentedge)
Return true
if des
is a strict descendant of ancestor
when starting from edge parentedge
and going towards ancestor
onward, regardless of the field ischild1
of tree edges; false
otherwise.
This is useful to know how descendant relationships would change as a result of reverting the direction of a tree edge, without actually modifying the direction (ischild1
) of any edge.
parentedge
should be connected to ancestor
(not checked). The direction of hybrid edges is respected (via ischild1
), that is, the traversal does not go from the child to the parent of a hybrid edge.
PhyloNetworks.ladderpartition
— Methodladderpartition(tree::HybridNetwork)
For each node in tree
, calculate the clade below each child edge of the node, and each clade moving up the "ladder" from the node to the root. The output is a tuple of 2 vectors (node) of vector (clade) of vectors (taxon in clade): below,above
. More specifically, for node number n
, below[n]
is generally of vector of 2 clades: one for the left child and one for the right child of the node (unless the node is of degree 2 or is a polytomy). above[n]
contains the grade of clades above node number n
.
Warning: assumes that
- the network is a tree (this is checked)
- node numbers and edge numbers can be used as indices, that is, be all distinct, positive, covering exactly 1:#nodes and 1:#edges (this is checked for nodes).
- edges are corrected directed (
ischild1
is up-to-date) and nodes have been pre-ordered already (fieldvec_node
up-to-date).
examples
julia> tree = readnewick("(O,A,((B1,B2),(E,(C,D))));");
julia> PhyloNetworks.resetnodenumbers!(tree; checkpreorder=true, type=:postorder)
julia> printnodes(tree)
node leaf hybrid name i_cycle edges'numbers
1 true false O -1 1
2 true false A -1 2
3 true false B1 -1 3
4 true false B2 -1 4
8 false false -1 3 4 5
5 true false E -1 6
6 true false C -1 7
7 true false D -1 8
9 false false -1 7 8 9
10 false false -1 6 9 10
11 false false -1 5 10 11
12 false false -1 1 2 11
julia> below, above = PhyloNetworks.ladderpartition(tree);
julia> below
12-element Vector{Vector{Vector{Int64}}}:
[[1]]
[[2]]
[[3]]
[[4]]
[[5]]
[[6]]
[[7]]
[[3], [4]]
[[6], [7]]
[[5], [6, 7]]
[[3, 4], [5, 6, 7]]
[[1], [2], [3, 4, 5, 6, 7]]
julia> for n in 8:12
println("clades below node ", n, ": ", join(below[n], " "))
end
clades below node 8: [3] [4]
clades below node 9: [6] [7]
clades below node 10: [5] [6, 7]
clades below node 11: [3, 4] [5, 6, 7]
clades below node 12: [1] [2] [3, 4, 5, 6, 7]
julia> above[8:12] # clades sister to and above nodes 8 through 12:
5-element Vector{Vector{Vector{Int64}}}:
[[5, 6, 7], [1], [2]]
[[5], [3, 4], [1], [2]]
[[3, 4], [1], [2]]
[[1], [2]]
[]
PhyloNetworks.leaststableancestor
— Functionleaststableancestor(net, preorder=true, preprocess=true)
Tuple (lsa, lsa_index)
where lsa
is the least stable ancestor (LSA) node in net
, and lsa_index
is the index of lsa
in net.vec_node
. The LSA the lowest node n
with the following property: any path between any leaf and the root must go through n
. All such nodes with this property are ancestral to the LSA (and therefore must have an index that is lower or equal to lsa_index
).
Exception: if the network has a single leaf, the output lsa
is the leaf's parent node, to maintain one external edge between the root and the leaf.
Arguments:
preorder
to direct edges according to the root and store a pre-ordering of nodes innet.vec_node
, if not done earlier on the network (e.g. after a topology modification)preprocess
to recalculate & store the biconnected components withprocess_biconnectedcomponents!
, assuming the preordering was done
Warning: uses biconnectedcomponents
and biconnectedcomponent_exitnodes
, therefore share the same caveats regarding the use of field .intn1
and .intn2
for nodes; boole2
and .inte1
for edges. As a positivie side effect, the biconnected components can be recovered via the edges' .inte1
field –including the trivial blobs (cut edges).
See also: deleteaboveLSA!
PhyloNetworks.majoredgelength
— Methodmajoredgelength(net::HybridNetwork)
Generate vector of edge lengths of major net
edges organized in the same order as the edge
matrix created via majoredgematrix
. Considers values of -1.0
as missing values, recognized as NA in R
. Output: vector allowing for missing values.
Assume vec_node
was updated, to list nodes in pre-order.
Examples
julia> net = readnewick("(((A:3.1,(B:0.2)#H1:0.3::0.9),(C,#H1:0.3::0.1):1.1),D:0.7);");
julia> directedges!(net); preorder!(net);
julia> PhyloNetworks.majoredgelength(net)
8-element Vector{Union{Missing, Float64}}:
missing
0.7
missing
1.1
missing
3.1
0.3
0.2
PhyloNetworks.majoredgematrix
— Methodmajoredgematrix(net::HybridNetwork)
Matrix of major edges from net
where edge[i,1] is the number of the parent node of edge i and edge[i,2] is the number of the child node of edge i. Assume vec_node
was updated, to list nodes in pre-order.
Examples
julia> net = readnewick("(A,(B,(C,D)));");
julia> PhyloNetworks.resetnodenumbers!(net);
julia> PhyloNetworks.majoredgematrix(net)
6×2 Matrix{Int64}:
5 1
5 6
6 2
6 7
7 3
7 4
PhyloNetworks.makemissing!
— Methodmakemissing!(x::AbstractVector)
Turn to missing
any element of x
exactly equal to -1.0. Used for branch lengths and γs. x
needs to accept missing values. If not, this can be done with allowmissing(x)
.
PhyloNetworks.mapindividuals
— Methodmapindividuals(net::HybridNetwork, mappingFile::String)
Return a network expanded from net
, where species listed in the mapping file are replaced by individuals mapped to that species. If a species has only 1 individual, the name of the leaf for that species is replaced by the name of its one individual representative. If a species has 2 or more individuals, the leaf for that species is expanded into a "star" (polytomy if 3 or more individuals) with a tip for each individual. If a species is in the network but not listed in the mapping file, the tip for that species is left as is. Species listed in the mapping file but not present in the network are ignored.
The mapping file should be readable by CSV.File
and contain two columns: one for the species names and one for the individual (or allele) names. fixit: make this function more flexible by accepting column names
Output: individual-level network and vector of species constraint(s).
examples
julia> species_net = readnewick("(((S8,S9),((((S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));");
julia> filename = joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "mappingIndividuals.csv");
julia> filename |> read |> String |> print # to see what the mapping file contains
species,individual
S1,S1A
S1,S1B
S1,S1C
julia> individual_net, species_constraints = PhyloNetworks.mapindividuals(species_net, filename);
julia> writenewick(individual_net, internallabel=true)
"(((S8,S9),(((((S1A,S1B,S1C)S1,S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));"
julia> species_constraints
1-element Vector{PhyloNetworks.TopologyConstraint}:
Species constraint, on tips: S1A, S1B, S1C
stem edge number 4
crown node number 3
PhyloNetworks.maxParsimonyNet
— MethodmaxParsimonyNet(T::HybridNetwork, df::DataFrame)
This function has been disabled. It will be re-implemented, without the level-1 restriction. Please use version 0.16 of PhyloNetworks to access this older functionality, until a better one is made available.
Search for the most parsimonious network (or tree). A level-1 network is assumed. df
should be a data frame containing the species names in column 1, or in a column named species
or taxon
. Trait data are assumed to be in all other columns. The search starts from topology T
, which can be a tree or a network with no more than hmax
hybrid nodes (see optional arguments below for hmax
).
Output:
- estimated network in file
.out
(also in.log
): best network overall and list of networks from each individual run. - if any error occurred, file
.err
provides information (seed) to reproduce the error.
Optional arguments include
- hmax: maximum number of hybridizations allowed (default 1)
- runs: number of starting points for the search (default 10); each starting point is
T
with probabilityprobST
=0.3 or a modification ofT
otherwise (using a NNI move, or a hybrid edge direction change) - Nfail: number of failures (proposed networks with equal or worse score) before the search is aborted. 75 by default: this is quite small, which is okay for a first trial. Larger values are recommended.
- outgroup: outgroup taxon. It can be a taxon name (String) or Node number (Integer). If none provided, or if the outgroup conflicts the starting topology, the function returns an error
- filename: root name for the output files. Default is "mp". If empty (""), files are not created, progress log goes to the screen only (standard out).
- seed: seed to replicate a given search
- criterion: parsimony score could be hardwired, softwired (default) or parental. Currently, only softwired is implemented
References
Leo Van Iersel, Mark Jones, Celine Scornavacca (2017). Improved Maximum Parsimony Models for Phylogenetic Networks, Systematic Biology, (https://doi.org/10.1093/sysbio/syx094).
Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C. (2015). On computing the Maximum Parsimony score of a phylogenetic network. SIAM J. Discrete Math., 29(1):559-585.
PhyloNetworks.minorreticulationgamma
— Methodminorreticulationgamma(net::HybridNetwork)
Vector of minor edge gammas (inheritance probabilities) organized in the same order as in the matrix created via minorreticulationmatrix
. Considers values of -1.0
as missing values, recognized as NA in R
. Output: vector allowing for missing values.
Examples
julia> net = readnewick("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> PhyloNetworks.minorreticulationgamma(net)
1-element Vector{Union{Float64, Missings.Missing}}:
0.1
PhyloNetworks.minorreticulationlength
— Methodminorreticulationlength(net::HybridNetwork)
Vector of lengths for the minor hybrid edges, organized in the same order as in the matrix created via minorreticulationmatrix
. Replace values of -1.0
with missing values recognized by R
. Output: vector allowing for missing values.
Examples
julia> net = readnewick("(((A:3.1,(B:0.2)#H1:0.4::0.9),(C,#H1:0.3::0.1):1.1),D:0.7);");
julia> PhyloNetworks.minorreticulationlength(net)
1-element Vector{Union{Missing, Float64}}:
0.3
PhyloNetworks.minorreticulationmatrix
— Methodminorreticulationmatrix(net::HybridNetwork)
Matrix of integers, representing the minor hybrid edges in net
. edge[i,1] is the number of the parent node of the ith minor hybrid edge, and edge[i,2] is the number of its child node. Node numbers may be negative, unless they were modified by resetnodenumbers!
. Assumes correct ischild1
fields.
Examples
julia> net = readnewick("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);");
julia> PhyloNetworks.minorreticulationmatrix(net)
1×2 Matrix{Int64}:
-6 3
PhyloNetworks.moveroot!
— Methodmoveroot!(
[rng::AbstractRNG,]
net::HybridNetwork,
constraints::Vector{TopologyConstraint}=TopologyConstraint[]
)
Move the root to a randomly chosen non-leaf node that is different from the current root, and not within a constraint clade or species. Output: true
if successul, nothing
otherwise.
PhyloNetworks.nchoose1234
— Methodnchoose1234(nmax)
nmax+1 x 4
matrix containing the binomial coefficient "n choose k" in row n+1
and column k
. In other words, M[i,k]
gives "i-1 choose k". It is useful to store these values and look them up to rank (a large number of) 4-taxon sets: see quartetrank
.
PhyloNetworks.nj!
— Functionnj!(D::Matrix{Float64}, names::AbstractVector{<:AbstractString}=String[];
force_nonnegative_edges::Bool=false)
Construct a phylogenetic tree from the input distance matrix and vector of names (as strings), using the Neighbour-Joinging algorithm (Satou & Nei 1987). The order of the names
argument should match that of the row (and column) of the matrix D
. With force_nonnegative_edges
being true
, any negative edge length is changed to 0.0 (with a message).
Warning: D
is modified.
PhyloNetworks.nnimax
— Methodnnimax(e::Edge)
Return the number of NNI moves around edge e
, assuming that the network is semi-directed. Return 0 if e
is not internal, and more generally if either node attached to e
does not have 3 edges.
Output: UInt8 (e.g. 0x02
)
PhyloNetworks.norootbelow!
— Methodnorootbelow!(e::Edge)
Set containroot
to false
for edge e
and all edges below, recursively. The traversal stops if e.containroot
is already false
, assuming that containroot
is already false all the way below down that edge.
PhyloNetworks.number_exitnodes
— Methodnumber_exitnodes(biconnected_component::Partition)
Number of articulation exit nodes of a biconnected component, considering the network rooted. An articulation node is such that if removed, the network is disconnected. An articulation exit node is a node in the block that is incident to a different block below.
Note that leaves are not considered articulation exit nodes because they are not articulation nodes. So external (pendent) edges are trivial blobs with 0 (non-internal) exit nodes. If the network is rooted at a leaf, there may be some pathological behaviors depending on the downstream task (e.g. this is checked for by [leaststableancestor
])(@ref)).
PhyloNetworks.pairwisetaxondistance_gradient
— Methodpairwisetaxondistance_gradient(net; checkEdgeNumber=true, nodeAges=[])
3-dim array: gradient of pairwise distances between all nodes. (internal and leaves); gradient with respect to edge lengths if nodeAges
is empty; with respect to node ages otherwise. Assume correct net.vec_node
(preorder). This gradient depends on the network's topology and γ's only, not on branch lengths or node ages (distances are linear in either).
WARNING: edge numbers need to range between 1 and #edges.
PhyloNetworks.parsenewick_edgedata!
— Methodparsenewick_edgedata!(s::IO, edge, numberOfLeftParentheses::Array{Int,1})
Helper function for readnewick_subtree!. Modifies e
according to the specified edge length and gamma values in the tree topology. Advances the stream s
past any existing edge data. Edges in a topology may optionally be followed by ":edgeLen:bootstrap:gamma" where edgeLen, bootstrap, and gamma are decimal values. Nexus-style comments [&...]
, if any, are ignored.
PhyloNetworks.parsenewick_getfloat!
— Methodparsenewick_getfloat!(s::IO, int, numLeft::Array{Int,1})
Helper function for parsenewick_edgedata!
. Read a single floating point edge data value in a tree topology. Ignore (and skip) nexus-style comments before & after the value (see readnexus_comment
).
Return -1.0 if no value exists before the next colon, return the value as a float otherwise. Modifies s by advancing past the next colon character. Only call this function to read a value when you know a numerical value exists!
PhyloNetworks.parsenewick_hybridnode!
— Methodparsenewick_hybridnode!(node, parentNode, hybridName, net, hybrids)
Helper function for readnewick_subtree!
. Create the parent edge for node
. Return this edge, and the hybrid node retained (node
or its clone in the newick string). Insert new edge and appropriate node into net
and hybrids
accordingly. Handles any type of given hybrid node. Called after a #
has been found in a tree topology.
PhyloNetworks.parsenewick_remainingsubtree!
— Methodparsenewick_remainingsubtree!(s::IO, numLeft, net, hybrids)
Create internal node. Helper for readnewick_subtree!
, which creates the parent edge of the node created by parsenewick_remainingsubtree!
: readnewick_subtree!
calls parsenewick_remainingsubtree!
, and vice versa. Called once a (
has been read in a tree topology and reads until the corresponding )
has been found. This function performs the recursive step for readnewick_subtree!
. Advances s
past the subtree, adds discovered nodes and edges to net
, and hybrids
.
Does not read the node name and the edge information of the subtree root: this is done by readnewick_subtree!
PhyloNetworks.parsenewick_treenode!
— Methodparsenewick_treenode!(node, parentNode, net)
Helper function for readnewick_subtree!
. Insert the input tree node and associated edge (created here) into net
.
PhyloNetworks.parsimonyGF_bottomup!
— MethodparsimonyGF_bottomup!(node, blobroot, nchar, w, scores,
costmatrix1, costmatrix2)
Compute the MP scores (one for each assignment of the blob root state) given the descendants of a blob, conditional on the states at predefined parents of hybrids in the blobs (one parent per hybrid) as described in
Leo Van Iersel, Mark Jones, Celine Scornavacca (2017). Improved Maximum Parsimony Models for Phylogenetic Networks, Systematic Biology, (https://doi.org/10.1093/sysbio/syx094).
Assumes a set of state guesses, ie correct initialization of w
for predefined hybrid parents, and correct boole2
field for the children edges of these predefined parents. boole2
is true for edges that are cut.
The field booln4
is used to know which nodes are at the root of a blob. The field ischild1
is used (and assumed correct). Field intn1
is assumed to store the # of detached parents (with guessed states)
nchar
: number of characters considered at internal lineages. For softwired parsimony, this is # states + 1, because characters at internal nodes are ∅, {1}, {2}, etc. For parental parsimony, this is 2^#states -1, because characters are all sets on {1,2,...} except for the empty set ∅.costmatrix1
[i,j] andcostmatrix2
[k][i,j]: 2d array and vector of 2d arrays containing the cost of going to character j starting from character i when the end node has a single parent, or the cost of a child node having character j when its parents have characters k and i. These cost matrices are pre-computed depending on the parsimony criterion (softwired, hardwired, parental etc.)
used by parsimonyGF
.
PhyloNetworks.parsimonyfitch
— Methodparsimonyfitch(net, tipdata)
Calculate the most parsimonious (MP) score of a network given a discrete character at the tips. The softwired parsimony concept is used: where the number of state transitions is minimized over all trees displayed in the network. Tip data can be given in a data frame, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait values are to appear in column 2 or in a column named "trait". Alternatively, tip data can be given as a dictionary taxon => trait.
also return the union of all optimized character states at each internal node as obtained by Fitch algorithm, where the union is taken over displayed trees with the MP score.
PhyloNetworks.parsimonyfitch_bottomup!
— Methodparsimonyfitch_bottomup!(node, states, score)
Bottom-up phase (from tips to root) of the Fitch algorithm: assign sets of character states to internal nodes based on character states at tips. Polytomies okay. Assumes a tree (no reticulation) and correct ischild1 attribute.
output: dictionary with state sets and most parsimonious score
PhyloNetworks.parsimonyfitch_summary
— Methodparsimonyfitch_summary(tree, nodestates)
summarize character states at nodes, assuming a tree
PhyloNetworks.parsimonyfitch_topdown!
— Methodparsimonyfitch_topdown!(node, states)
Top-down phase (root to tips) of the Fitch algorithm: constrains character states at internal nodes based on the state of the root. Assumes a tree: no reticulation.
output: dictionary with state sets
PhyloNetworks.parsimonysoftwired_bottomup!
— Methodparsimonysoftwired_bottomup!(node, blobroot, states, w, scores)
Computing the MP scores (one for each assignment of the root state) of a swicthing as described in Algorithm 1 in the following paper:
Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C. (2015). On computing the Maximum Parsimony score of a phylogenetic network. SIAM J. Discrete Math., 29(1):559-585.
Assumes a switching (ie correct boole2
field) and correct ischild1 field. The field booln4
is used to know which nodes are at the root of a blob.
PhyloNetworks.postorder_nodeupdate!
— Functiontraversal_postorder(nodes, init_function,
tip_function, internalnode_function, parameters...)
postorder_nodeupdate!(node_index, nodes, output_array,
tip_function, internalnode_function, parameters...)
Generic tool to apply a post-order (or reverse topological ordering) algorithm, acting on a matrix where rows & columns correspond to nodes. Used by descendencematrix
.
output: matrix output_array
.
arguments:
nodes
: array of nodes in the network, pre-ordered, typically the internalnet.vec_node
after applyingpreorder!(net)
. This array is traversed in reverse order.output_array
: output object, of typeMatrix
, namedV
belowinit_function
: to initialize the output array, taking(nodes, parameters...)
as argumentstip_function
: to do whatever needs to be done toV
at a leaf node, using(V, tip_index, parameters...)
as argumentsinternalnode_function
: to do whatever needs to be done toV
at an internal node, using arguments(V, node_index, childnodes_index_vector, childedges_vector, parameters...)
The last 3 functions should return a boolean. If true: traversal continues. If false: traversal is stopped, that is, the next node is not processed.
See also traversal_preorder
, and traversalupdate_default!
for a default function that does nothing to V
and returns true, with an adequate signature to be used here.
PhyloNetworks.preorder_nodeupdate!
— Functiontraversal_preorder(nodes, init_function, root_function, tree_node_function,
hybrid_node_function, parameters...)
traversal_preorder!(nodes, output_array, root_function, tree_node_function,
hybrid_node_function, parameters...)
preorder_nodeupdate!(node_index, nodes, output_array, root_function, tree_node_function,
hybrid_node_function, parameters...)
Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedpathmatrix
and by pairwisetaxondistancematrix
, for example. preorder_nodeupdate!
is a helper that calls the root / tree node / hybrid node function as appropriate.
output: array object output_array
.
arguments:
nodes
: array of nodes in the network, pre-ordered, typically the internalnet.vec_node
after applyingpreorder!(net)
output_array
: output object, of typeAbstractArray
, namedV
belowinit_function
: to initialize the output array, taking(nodes, parameters...)
as argumentsroot_function
: to do whatever needs to be done toV
at the root, using(V, rootnode_index, parameters...)
as argumentstree_node_function
: to do whatever needs to be done toV
at a tree node, using(V, treenode_index, parentnode_index, parentedge, parameters...)
as argumentshybrid_node_function
: to do whatever needs to be done toV
at a hybrid node, using arguments(V, hybnode_index, parentnodes_index_vector, parentedges_vector, parameters...)
The last 3 functions should return a boolean. If true: traversal continues. If false: traversal is stopped, that is, the next node is not processed.
See also traversal_postorder
, and traversalupdate_default!
for a default function that does nothing to V
and returns true, with an adequate signature to be used here.
PhyloNetworks.problem4cycle
— Methodproblem4cycle(β::Node, δ::Node, α::Node, γ::Node)
Check if the focus edge uv has a 4 cycle that could lead to a 3 cycle after an chosen NNI. Return true if there is a problem 4 cycle, false if none.
PhyloNetworks.process_biconnectedcomponents!
— Functionprocess_biconnectedcomponents!(net, preorder=true)
Calculate biconnected components (aka "blobs" in binary networks, "blocks" more generally) and save them in net
for re-use later. The field net.partition
is reset to store the biconnected components, by storing them in preorder (a block comes before any of its descendents). Each one is stored as a Partition
object, which stores a list of edges in the biconnected component (which partition edges).
Trivial blocks are formed by one cut-edge and its 2 adjacent nodes. They are stored.
The edge field .inte1
is used to store the biconnected component that the edge belongs to, that is, an edge belongs in net.partition[edge.inte1]
. This is an internal coding that could break in the future, and that could be modified by other functions.
This preprocessing and net.partition
become obsolete and incorrect if the network's topology is later modified, such as after a semidirected NNI, the deletion/addition of a hybrid edge / a leaf. Functions that modify a network are not required to update the blob decomposition (to save time for analyses that don't need the blob decomposition).
After a re-rooting with rootatnode!
, the blobs remain the same with the same set of edges, but their entry & exit nodes may become incorrect. After rootonedge!
, the number of blobs remains correct but one edge was subdivided into 2 edges (and the old root node might have been suppressed), so the edge list in some blobs becomes obsolete.
examples
The network below is binary, so each blocks and blobs are the same. It has 2 non-trivial blobs: one with 1 hybrid and the other with 2 hybrids. We first get the blobs, then look at the 2-hybrid blob in more detail.
julia> net = readnewick("(((D,#H2),((((E)#H2,C),#H1),((B)#H1,A))),((H,#H3),((F)#H3,G)));");
julia> # using PhyloPlots; plot(net, showedgenumber=true, shownodenumber=true);
julia> PhyloNetworks.process_biconnectedcomponents!(net);
julia> length(net.partition) # number of biconnected components
12
julia> findall(!PhyloNetworks.istrivial(blob) for blob in net.partition)
2-element Vector{Int64}:
3
7
julia> blob = net.partition[7]; getlevel(blob)
2
julia> [e.number for e in blob.edges]
9-element Vector{Int64}:
14
9
13
11
8
7
5
2
3
julia> i = PhyloNetworks.entrynode_preindex(blob); net.vec_node[i] # entry to the blob
PhyloNetworks.Node:
number:-3
attached to 3 edges, numbered: 3 14 15
julia> PhyloNetworks.number_exitnodes(blob) # 5 exit nodes connecting to other blobs below
5
julia> [net.vec_node[i].number for i in PhyloNetworks.exitnodes_preindex(blob)]
5-element Vector{Int64}:
-9
5
-7
2
-4
PhyloNetworks.quartetdata_columnnames
— Methodquartetdata_columnnames(T) where T <: StaticArray
Vector of column names to hold the quartet data of type T
in a table. If T is a length-3 vector type, they are "CF1234","CF1324","CF1423". If T is a length-4 vector type, the 4th name is "ngenes". If T is a 3×n matrix type, the output vector contains 3×n names, 3 for each of "CF", "V2", "V3", ... "Vn".
Used by tablequartetCF
to build a table from a vector of QuartetT
objects.
PhyloNetworks.quartetrank
— Methodquartetrank(t1,t2,t3,t4, nCk::Matrix)
quartetrank([t1,t2,t3,t4], nCk)
Return the rank of a four-taxon set with taxon numbers t1,t2,t3,t4
, assuming that ti
s are positive integers such that t1<t2, t2<t3 and t3<t4 (assumptions not checked!). nCk
should be a matrix of "n choose k" binomial coefficients: see nchoose1234
.
examples
julia> nCk = PhyloNetworks.nchoose1234(5)
6×4 Matrix{Int64}:
0 0 0 0
1 0 0 0
2 1 0 0
3 3 1 0
4 6 4 1
5 10 10 5
julia> PhyloNetworks.quartetrank([1,2,3,4], nCk)
1
julia> PhyloNetworks.quartetrank([3,4,5,6], nCk)
15
PhyloNetworks.readcsvtoarray
— Methodreadcsvtoarray(dat::DataFrame)
readcsvtoarray(filename::String)
Read a CSV table containing both species names and data, create two separate arrays: one for the species names, a second for the data, in a format that parsimonyGF
needs.
Warning:
- it will try to find a column 'taxon' or 'species' for the taxon names. If none found, it will assume the taxon names are in column 1.
- will use all other columns as characters
PhyloNetworks.readfastatoarray
— Functionreadfastatoarray(filename::AbstractString, sequencetype=BioSequences.LongDNA{4})
Read a fasta-formatted file. Return a tuple species, sequences
where species
is a vector of Strings with identifier names, and sequences
is a vector of BioSequences, each of type sequencetype
(DNA by default).
Warnings:
- assumes a semi-sequential format, not interleaved. More specifically, each taxon name should appear only once. For this one time, the corresponding sequence may be broken across several lines though.
- fails if all sequences aren't of the same length
PhyloNetworks.readnewick_nodename
— Methodreadnewick_nodename(s::IO, c::Char, net, numLeft)
Auxiliary function to read a taxon name during newick parsing. output: tuple (number, name, pound_boolean)
Names may have numbers: numbers are treated as strings. Accepts #
as part of the name (but excludes it from the name), in which case pound_boolean
is true. #
is used in extended newick to flag hybrid nodes.
Nexus-style comments following the node name, if any, are read and ignored.
PhyloNetworks.readnewick_subtree!
— Methodreadnewick_subtree!(s::IO, parentNode, numLeft, net, hybrids)
Recursive helper method for readnewick
: read a subtree from an extended Newick topology. input s
: IOStream/IOBuffer.
Reads additional info formatted as: :length:bootstrap:gamma
. Allows for name of internal nodes without # after closing parenthesis: (1,2)A. Warning if hybrid edge without γ, or if γ (ignored) without hybrid edge
PhyloNetworks.readnexus_assigngammas!
— Methodreadnexus_assigngammas!(net, d::Dict)
Assign d[i] as the .gamma
value of the minor parent edge of hybrid "Hi", if this hybrid node name is found, and if its minor parent doesn't already have a non-missing γ. See readnexus_extractgamma
PhyloNetworks.readnexus_comment
— Methodreadnexus_comment(s::IO, c::Char)
Read (and do nothing with) nexus-style comments: [& ... ]
Assumption: 'c' is the next character to be read from s. Output: nothing.
Comments can appear after (or instead of) a node or leaf name, before or after an edge length, and after another comment.
PhyloNetworks.readnexus_extractgamma
— Methodreadnexus_extractgamma(nexus_string)
Extract γ from comments and return a dictionary hybrid number ID => γ, from one single phylogeny given as a string. The output from BEAST2 uses this format for reticulations at minor edges, as output by bacter (Vaughan et al. 2017):
#11[&conv=0, relSize=0.08, ...
or as output by SpeciesNetwork (Zhang et al. 2018):
#H11[&gamma=0.08]
The function below assumes that the "H" was already added back if not present already (from bacter), like this:
#H11[&conv=0, relSize=0.19, ...
The bacter format is tried first. If this format doesn't give any match, then the SpeciesNetwork format is tried next. See readnexus_assigngammas!
.
PhyloNetworks.readnexus_translatetable
— Methodreadnexus_translatetable(io)
Read translate table from IO object io
, whose first non-empty line should contain "translate". Then each line should have "number name" and the end of the table is indicated by a ;. Output tuple:
- line that was last read, and is not part of the translate table, taken from
io
- translate: boolean, whether a table was successfully read
- id2name: dictionary mapping number to name.
PhyloNetworks.removeHybrid!
— MethodremoveHybrid!(net::Network, n::Node)
Delete a hybrid node n
from net.hybrid
, and update net.numHybrid
. The actual node n
is not deleted. It is kept in the full list net.node
. Very internal function, used by deletehybridedge!
and others.
PhyloNetworks.remove_edgelengthsgammas!
— Methodremove_edgelengthsgammas!(net::HybridNetwork)
Reset all edge lengths and all hybrid edge γs to be missing (coded as -1.0).
PhyloNetworks.resetedgenumbers!
— Functionresetedgenumbers!(net::HybridNetwork, verbose=true)
Check that edge numbers of net
are consecutive numbers from 1 to the total number of edges. If not, reset the edge numbers to be so.
PhyloNetworks.resetnodenumbers!
— Methodresetnodenumbers!(net::HybridNetwork; checkpreorder=true, type=:ape)
Change internal node numbers of net
to consecutive numbers from 1 to the total number of nodes.
keyword arguments:
type
: default is:ape
, to get numbers that satisfy the conditions assumed by theape
R package: leaves are 1 to n, the root is n+1, and internal nodes are higher consecutive integers. If:postorder
, nodes are numbered in post-order, with leaves from 1 to n (and the root last). If:internalonly
, leaves are unchanged. Only internal nodes are modified, to take consecutive numbers from (max leaf number)+1 and up. With this last option, the post-ordering of nodes is by-passed.checkpreorder
: if false, theischild1
edge field and thenet.vec_node
network field are supposed to be correct (to get nodes in preorder). This is not needed whentype=:internalonly
.
Examples
julia> net = readnewick("(A,(B,(C,D)));");
julia> PhyloNetworks.resetnodenumbers!(net)
julia> printnodes(net) # first column "node": root is 5
node leaf hybrid name i_cycle edges'numbers
1 true false A -1 1
2 true false B -1 2
3 true false C -1 3
4 true false D -1 4
7 false false -1 3 4 5
6 false false -1 2 5 6
5 false false -1 1 6
julia> net = readnewick("(A,(B,(C,D)));");
julia> PhyloNetworks.resetnodenumbers!(net; type=:postorder)
julia> printnodes(net) # first column "node": root is 7
node leaf hybrid name i_cycle edges'numbers
1 true false A -1 1
2 true false B -1 2
3 true false C -1 3
4 true false D -1 4
5 false false -1 3 4 5
6 false false -1 2 5 6
7 false false -1 1 6
PhyloNetworks.resolvetreepolytomy!
— Functionresolvetreepolytomy!(net::HybridNetwork, n::Node, e1::Edge, e2::Edge)
Create new node v
and new edge n --> v
, detach edges e1
and e2
from n
to connect them to v
instead. Return the new edge, whose child is v
.
Warning: assumes that n
is incident to both e1
and e2
. It is best if these edges are originally children of n
because the new edge is directed from n
to the newly created node.
PhyloNetworks.resolvetreepolytomy!
— Methodresolvetreepolytomy!(net::HybridNetwork, n::Node)
Create new nodes and edges from a tree node n
, until it has degree 3. It is best if the network's tree edges are correctly directed beforehand.
PhyloNetworks.rezip_canonical!
— Methodrezip_canonical!(hybridnodes::Vector{Node}, childedges::Vector{Edge},
originallengths::Vector{Float64})
Undo unzip_canonical!
.
PhyloNetworks.samplebootstrap_multiloci
— Methodsamplebootstrap_multiloci(vector of tree lists; seed=0, generesampling=false, row=0)
samplebootstrap_multiloci!(tree list, vector of tree lists; seed=0, generesampling=false, row=0)
Sample bootstrap gene trees, 1 tree per gene. Set the seed with keyword argument seed
, which is 0 by default. When seed=0
, the actual seed is set using the clock. Assumes a vector of vectors of networks (see readmultinewick_files
), each one of length 1 or more (error if one vector is empty, tested in bootsnaq
).
- site resampling: always, from sampling one bootstrap tree from each given list. This tree is sampled at random unless
row>0
(see below). - gene resampling: if
generesampling=true
(default is false), genes (i.e. lists) are sampled with replacement. row=i
: samples the ith bootstrap tree for each gene.row
is turned back to 0 if gene resampling is true.
output: one vector of trees. the modifying function (!) modifies the input tree list and returns it.
PhyloNetworks.setmultiplegammas!
— Methodsetmultiplegammas!(edges::Vector{Edge}, γs::Vector{Float64})
Set the inheritance of the ith edge to the ith γ value, calling setgamma!
.
PhyloNetworks.shrink2cycleat!
— Methodshrink2cycleat!(net::HybridNetwork, minor::Edge, major::Edge, unroot::Bool)
Remove minor
edge then update the branch length of the remaining major
edge. Called by shrink2cycles!
Assumption: minor
and major
do form a 2-cycle. That is, they start and end at the same node.
PhyloNetworks.shrink3cycleat!
— Methodshrink3cycleat!(net::HybridNetwork, hybrid::Node, edge1::Edge, edge2::Edge,
node1::Node, node2::Node, unroot::Bool)
Replace a 3-cycle at a given hybrid
node by a single node, if any. Assumption: edge1
(node1
) and edge2
(node2
) are the parent edges (nodes) of hybrid
. Return true if a 3-cycle is found and removed, false otherwise. There is a 3-cycle if nodes 1 & 2 are connected, by an edge called e3
below.
There are two cases, with differing effects on the γ inheritance values and branch lengths.
Hybrid case: the 3-cycle is shrunk to a hybrid node, which occurs if either node 1 or 2 is a hybrid node (that is, e3 is hybrid). If e3 goes from node 1 to node 2, the 3-cycle (left) is shrunk as on the right:
\eA /eB \eA /eB
1--e3->2 γ1+γ2γ3 \ / γ2(1-γ3)
\ / hybrid
γ1\ /γ2
hybrid
with new branch lengths: new tA = tA + (γ1.t1 + γ2γ3.(t2+t3))/(γ1+γ2γ3), new tB = tB + t2, provided that γ1, γ2=1-γ1, and γ3 are not missing. If one of them is missing then γ1 and γ2 remain as is, and e3 is deleted naively, such that new tA = tA + t1 and new tB = tB + t2. If γ's are not missing but one of t1,t2,t3 is missing, then the γ's are updated to γ1+γ2γ3 and γ2(1-γ3), but t's are update naively.
Tree case: the 3-cycle is shrunk to a tree node, which occurs if node 1 & 2 are both tree nodes (that is, e3 is a tree edge). If eC is the child edge of hybrid
, the 3-cycle (left) is shrunk as on the right:
\eA \eA
1--e3--2--eB-- \
\ / n--eB--
γ1\ /γ2 |
hybrid |eC
|
|eC
with new branch lengths: new tA = tA + γ2.t3, new tB = tB + γ1.t3, new tC = tC + γ1.t1 + γ2.t2, provided that γ1, γ2=1-γ1, t1, t2 and t3 are not missing. If one is missing, then e1 is deleted naively such that tB is unchanged, new tC = tC + t2 and new tA = tA + t3.
PhyloNetworks.shrinkedge!
— Methodshrinkedge!(net::HybridNetwork, edge::Edge)
Delete edge
from net, provided that it is a non-external tree edge. Specifically: delete its child node (as determined by ischild1
) and connect all edges formerly incident to this child node to the parent node of edge
, thus creating a new polytomy, unless the child was of degree 2.
Warning: it's best for ischild1
to be in sync with the root for this. If not, the shrinking may fail (if edge
is a tree edge but its "child" is a hybrid) or the root may change arbitrarily (if the child of edge
is the root).
Output: true if the remaining node (parent of edge
) becomes a hybrid node with more than 1 child after the shrinking; false otherwise (e.g. no polytomy was created, or the new polytomy is below a tree node)
PhyloNetworks.sort_stringasinteger!
— Methodsort_stringasinteger!(taxa)
Sort a vector of strings taxa
, numerically if elements can be parsed as an integer, alphabetically otherwise.
PhyloNetworks.startingBL!
— FunctionstartingBL!(net::HybridNetwork,
trait::AbstractVector{Vector{Union{Missings.Missing,Int}}},
siteweight::AbstractVector{Float64}=ones(length(trait[1])))
Calibrate branch lengths in net
by minimizing the mean squared error between the JC-adjusted pairwise distance between taxa, and network-predicted pairwise distances, using calibratefrompairwisedistances!
. The network is not forced to be time-consistent nor ultrametric. To avoid one source of non-identifiability, the network is "zipped" by forcing minor hybrid edges to have length 0.
siteweight[k]
gives the weight of site (or site pattern) k
(default: all 1s).
Assumptions:
- all species have the same number of traits (sites):
length(trait[i])
constant trait[i]
is for leaf withnode.number = i
innet
, andtrait[i][j] = k
means that leaf numberi
has state indexk
for traitj
. These indices are those used in a substitution model (seePhyloTraits.jl
): kth value ofgetlabels(model)
.
Other:
- Hamming distances are calculated for each pair, ignoring any site in which one of 2 values is missing. The total number of differences is then divided by the total weight of all sites, ignoring that some of them may have been missing for the pair. This is an inexact rescaling of the hamming distance, assuming a small proportion of missing values for each pair.
- Theoretically, Hamming distances are < 0.75 with four states, or < (n-1)/n for n states. If not, all pairwise hamming distances are scaled by
.75/(m*1.01)
wherem
is the maximum observed hamming distance, to make them all < 0.75. The JC correction calculates: dJC = - 0.75 log(1-dhamming/0.75) for n=4 states - At the end, any edge length smaller than 1.0e-10 is reset to 0.0001, to avoid the 0 boundary.
PhyloNetworks.startree_newick
— Functionstartree_newick(n, l)
String for the Newick parenthetical description of the star tree with n
tips, and all branch lengths equal to l
.
PhyloNetworks.symmetricnet_newick
— Functionsymmetricnet_newick(n::Int, h::Int, γ::Real)
Create a string for a symmetric network with 2^n tips, numbered from 1 to 2^n, with a symmetric major tree, whose branch lengths are all equal. 2^(n-h) hybrids are added from level h to h-1 "symmetrically". The network is time-consistent and ultrametric, with a total height of 1.
PhyloNetworks.symmetricnet_newick
— Methodsymmetricnet_newick(n::Int, i::Int, j::Int, γ::Real, l::Real)
Newick string for a network with a symmetric major tree with 2^n tips, numbered from 1 to 2^n. All the branch lengths of the major tree are set to l
. One hybrid branch, going from level i to level j is added, cutting in half each initial edge in the tree. The new edge has length l
and inheritance γ
.
PhyloNetworks.symmetrictree_newick
— Functionsymmetrictree_newick(n::Int, ell::Real, i=1)
String for the Newick parenthetical description of a symmetric tree with 2^n tips, numbered from i to i+2^n-1. All branch lengths are set equal to ell
. The tree can be created later by reading the string with readnewick
.
PhyloNetworks.synchronizepartnersdata!
— Methodsynchronizepartnersdata!(e::Edge, n::Node)
Synchronize γ and ismajor for edges e
and its partner, both hybrid edges with the same child n
:
- if one γ is missing and the other is not: set the missing γ to 1 - the other
- γ's should sum up to 1.0
- update
ismajor
to match the γ information: the major edge is the one with γ > 0.5.
Warnings: does not check that e
is a hybrid edge, nor that n
is the child of e
.
PhyloNetworks.timeinconsistency_average
— Methodtimeinconsistency_average(
candidate_nodeheights,
missingparent_heights,
isconsistent::Ref{Bool},
parent_edges,
nonmissingparent_j;
atol::Real=1e-8, rtol::Real=√eps(Float64))
Calculate the γ-weighted average node height of a given hybrid node h
, based on the candidate node heights from its parents with non-missing edge lengths.
- If some of these parent edges have a missing γ, then equal weights are used and a warning is issued.
- If the hybrid node's average height (calculate from non-missing lengths) turns out to be lower than one of the parent's height with a missing length (such that this parent edge would need to be assigned a negative value) then a warning is issued.
Outcome:
- update
isconsistent
to false if the candidate node heights are not all equal or if some missing edge length would have to be assigned a negative value to make the network time-consistent - returns
(true, nodeheight)
Assumption: candidate_nodeheight
is not empty, that is, the node has at least one parent edge with a non-missing length.
See also _getnodeheights
and getnodeheights_average
PhyloNetworks.timeinconsistency_check
— Functiontimeinconsistency_error(
candidate_nodeheights,
missingparent_heights,
args...;
atol::Real=1e-8, rtol::Real=√eps(Float64))
timeinconsistency_check
Check that all candidate node heights are approximately equal to one another, and that this shared value nodeheight
is higher (farther from the root) than the height of parents connected by edge of missing length missingparent_heights
to ensure that these edge lengths would be assigned non-negative values.
If any of these conditions is not met, timeinconsistency_error
throws an error. Otherwise, it returns (true, nodeheight)
where true
means that the network is (or could be) time-consistent at the node being considered. timeinconsistency_check
returns (is_timeconsistent, nodeheight)
but does not throw an error the is_timeconsistent
if false (for either reason).
Assumption: candidate_nodeheight
is not empty, that is, the node has at least one parent edge with a non-missing length.
See also _getnodeheights
PhyloNetworks.timeinconsistency_error
— Methodtimeinconsistency_error(
candidate_nodeheights,
missingparent_heights,
args...;
atol::Real=1e-8, rtol::Real=√eps(Float64))
timeinconsistency_check
Check that all candidate node heights are approximately equal to one another, and that this shared value nodeheight
is higher (farther from the root) than the height of parents connected by edge of missing length missingparent_heights
to ensure that these edge lengths would be assigned non-negative values.
If any of these conditions is not met, timeinconsistency_error
throws an error. Otherwise, it returns (true, nodeheight)
where true
means that the network is (or could be) time-consistent at the node being considered. timeinconsistency_check
returns (is_timeconsistent, nodeheight)
but does not throw an error the is_timeconsistent
if false (for either reason).
Assumption: candidate_nodeheight
is not empty, that is, the node has at least one parent edge with a non-missing length.
See also _getnodeheights
PhyloNetworks.timeinconsistency_majortree
— Methodtimeinconsistency_majortree(
candidate_nodeheights,
missingparent_heights,
isconsistent::Ref{Bool},
parent_edges,
nonmissingparent_j;
atol::Real=1e-8, rtol::Real=√eps(Float64))
Calculate node height of a given hybrid node h
, based on the its major parent node height, if its major parent edge has a non-missing length. If missing, then the non-missing edge length with the largest γ is used and a warning is issued. If all parent edges have missing γ values then an error is thrown.
Outcome:
- update
isconsistent
to false if the candidate node heights are not all equal or if some missing edge length would have to be assigned a negative value to make the network time-consistent - returns
(true, nodeheight)
Assumption: candidate_nodeheight
is not empty, that is, the node has at least one parent edge with a non-missing length.
See also _getnodeheights
and getnodeheights_average
PhyloNetworks.traversal_postorder
— Methodtraversal_postorder(nodes, init_function,
tip_function, internalnode_function, parameters...)
postorder_nodeupdate!(node_index, nodes, output_array,
tip_function, internalnode_function, parameters...)
Generic tool to apply a post-order (or reverse topological ordering) algorithm, acting on a matrix where rows & columns correspond to nodes. Used by descendencematrix
.
output: matrix output_array
.
arguments:
nodes
: array of nodes in the network, pre-ordered, typically the internalnet.vec_node
after applyingpreorder!(net)
. This array is traversed in reverse order.output_array
: output object, of typeMatrix
, namedV
belowinit_function
: to initialize the output array, taking(nodes, parameters...)
as argumentstip_function
: to do whatever needs to be done toV
at a leaf node, using(V, tip_index, parameters...)
as argumentsinternalnode_function
: to do whatever needs to be done toV
at an internal node, using arguments(V, node_index, childnodes_index_vector, childedges_vector, parameters...)
The last 3 functions should return a boolean. If true: traversal continues. If false: traversal is stopped, that is, the next node is not processed.
See also traversal_preorder
, and traversalupdate_default!
for a default function that does nothing to V
and returns true, with an adequate signature to be used here.
PhyloNetworks.traversal_preorder!
— Functiontraversal_preorder(nodes, init_function, root_function, tree_node_function,
hybrid_node_function, parameters...)
traversal_preorder!(nodes, output_array, root_function, tree_node_function,
hybrid_node_function, parameters...)
preorder_nodeupdate!(node_index, nodes, output_array, root_function, tree_node_function,
hybrid_node_function, parameters...)
Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedpathmatrix
and by pairwisetaxondistancematrix
, for example. preorder_nodeupdate!
is a helper that calls the root / tree node / hybrid node function as appropriate.
output: array object output_array
.
arguments:
nodes
: array of nodes in the network, pre-ordered, typically the internalnet.vec_node
after applyingpreorder!(net)
output_array
: output object, of typeAbstractArray
, namedV
belowinit_function
: to initialize the output array, taking(nodes, parameters...)
as argumentsroot_function
: to do whatever needs to be done toV
at the root, using(V, rootnode_index, parameters...)
as argumentstree_node_function
: to do whatever needs to be done toV
at a tree node, using(V, treenode_index, parentnode_index, parentedge, parameters...)
as argumentshybrid_node_function
: to do whatever needs to be done toV
at a hybrid node, using arguments(V, hybnode_index, parentnodes_index_vector, parentedges_vector, parameters...)
The last 3 functions should return a boolean. If true: traversal continues. If false: traversal is stopped, that is, the next node is not processed.
See also traversal_postorder
, and traversalupdate_default!
for a default function that does nothing to V
and returns true, with an adequate signature to be used here.
PhyloNetworks.traversal_preorder
— Methodtraversal_preorder(nodes, init_function, root_function, tree_node_function,
hybrid_node_function, parameters...)
traversal_preorder!(nodes, output_array, root_function, tree_node_function,
hybrid_node_function, parameters...)
preorder_nodeupdate!(node_index, nodes, output_array, root_function, tree_node_function,
hybrid_node_function, parameters...)
Generic tool to apply a pre-order (or topological ordering) algorithm. Used by sharedpathmatrix
and by pairwisetaxondistancematrix
, for example. preorder_nodeupdate!
is a helper that calls the root / tree node / hybrid node function as appropriate.
output: array object output_array
.
arguments:
nodes
: array of nodes in the network, pre-ordered, typically the internalnet.vec_node
after applyingpreorder!(net)
output_array
: output object, of typeAbstractArray
, namedV
belowinit_function
: to initialize the output array, taking(nodes, parameters...)
as argumentsroot_function
: to do whatever needs to be done toV
at the root, using(V, rootnode_index, parameters...)
as argumentstree_node_function
: to do whatever needs to be done toV
at a tree node, using(V, treenode_index, parentnode_index, parentedge, parameters...)
as argumentshybrid_node_function
: to do whatever needs to be done toV
at a hybrid node, using arguments(V, hybnode_index, parentnodes_index_vector, parentedges_vector, parameters...)
The last 3 functions should return a boolean. If true: traversal continues. If false: traversal is stopped, that is, the next node is not processed.
See also traversal_postorder
, and traversalupdate_default!
for a default function that does nothing to V
and returns true, with an adequate signature to be used here.
PhyloNetworks.traversalupdate_default!
— Methodtraversalupdate_default!(::AbstractArray, ::Int, args...)
Returns true
. With its signature, this function can be used as a default update function at any node (root/tree/hybrid) in traversal_preorder
or traversal_postorder
, if we need a function that does nothing and keeps the traversal going.
PhyloNetworks.unzip_canonical!
— Methodunzip_canonical!(net::HybridNetwork)
Unzip all reticulations: set the length of child edge to 0, and increase the length of both parent edges by the original child edge's length, to obtain the canonical version of the network according to Pardi & Scornavacca (2015).
Output: vector of hybrid node in postorder, vector of child edges whose length is constrained to be 0, and vector of their original branch lengths to re-zip if needed using rezip_canonical!
.
Assumption: net.hybrid
is correct, but a preordering of all nodes is not assumed.
Note: This unzipping is not as straightforward as it might seem, because of "nested" zippers: when the child of a hybrid node is itself a hybrid node. The unzipping is propagated all the way through.
PhyloNetworks.unzipat_canonical!
— Methodunzipat_canonical!(hyb::Node, childedge::Edge)
Unzip the reticulation a node hyb
. See unzip_canonical!
. Warning: no check that hyb
has a single child.
Output: constrained edge (child of hyb
) and its original length.
PhyloNetworks.updateconstraintfields!
— Methodupdateconstraintfields!(constraints::Vector{TopologyConstraint}, net::HybridNetwork)
Update fields stem edge
and crown node
to match the given net
.
Assumes that the constraints are still met in net
, and that nodes & edges are numbered identically in net
as in the network used to create all constraints
.
fixit: remove the assumption that constraints are still met, since an NNI near the crown of a constrained clade might change the crown node and / or the stem edge (u and v exchange).
PhyloNetworks.updateconstraints!
— Methodupdateconstraints!(constraints::Vector{TopologyConstraint}, net::HybridNetwork)
Update the set taxonnum
in each constraint, assuming that the stem edge and the crown node are still correct, and that their descendants are still correct. May be needed if the node and edge numbers were modified by resetnodenumbers!
or resetedgenumbers!
.
Warning: does not check that the names of leaves with numbers in taxonnum
are taxonnames
.
julia> net = readnewick("(((2a,2b),(((((1a,1b,1c),4),(5)#H1),(#H1,(6,7))))#H2),(#H2,10));");
julia> c_species1 = PhyloNetworks.TopologyConstraint(0x01, ["1a","1b","1c"], net)
Species constraint, on tips: 1a, 1b, 1c
stem edge number 7
crown node number -9
julia> c_species1.taxonnums
Set{Int64} with 3 elements:
5
4
3
julia> c_clade145 = PhyloNetworks.TopologyConstraint(0x02, ["1a","1b","1c","4","5"], net)
Clade constraint, on tips: 1a, 1b, 1c, 4, 5
stem edge number 12
crown node number -7
julia> PhyloNetworks.resetnodenumbers!(net)
julia> net.node[4].number = 111;
julia> PhyloNetworks.updateconstraints!([c_species1, c_clade145], net)
julia> c_species1
Species constraint, on tips: 1a, 1b, 1c
stem edge number 7
crown node number 21
julia> c_species1.taxonnums
Set{Int64} with 3 elements:
5
4
111
index
PhyloNetworks.ANode
PhyloNetworks.EdgeT
PhyloNetworks.MatrixTopologicalOrder
PhyloNetworks.Node
PhyloNetworks.Partition
PhyloNetworks.QuartetT
PhyloNetworks.TopologyConstraint
PhyloNetworks.TopologyConstraint
PhyloNetworks.exitnodes_preindex
Base.getindex
PhyloNetworks._getnodeheights
PhyloNetworks.addAlternativeHybridizations!
PhyloNetworks.addHybridBetweenClades!
PhyloNetworks.addhybridedge!
PhyloNetworks.addhybridedge!
PhyloNetworks.addindividuals!
PhyloNetworks.addleaf!
PhyloNetworks.adjacentedges
PhyloNetworks.allowrootbelow!
PhyloNetworks.allowrootbelow!
PhyloNetworks.assignhybridnames!
PhyloNetworks.biconnectedcomponent_entrynodes
PhyloNetworks.biconnectedcomponent_exitnodes
PhyloNetworks.blobinfo
PhyloNetworks.breakedge!
PhyloNetworks.checkNumHybEdges!
PhyloNetworks.check_nonmissing_nonnegative_edgelengths
PhyloNetworks.checkspeciesnetwork!
PhyloNetworks.cleantaxonname
PhyloNetworks.constraintviolated
PhyloNetworks.deleteEdge!
PhyloNetworks.deleteNode!
PhyloNetworks.deletehybridedge!
PhyloNetworks.descendants
PhyloNetworks.directionalconflict
PhyloNetworks.displayednetworks!
PhyloNetworks.edgerelation
PhyloNetworks.entrynode_preindex
PhyloNetworks.fliphybrid!
PhyloNetworks.fliphybrid!
PhyloNetworks.fuseedgesat!
PhyloNetworks.getTipSubmatrix
PhyloNetworks.getconnectingedge
PhyloNetworks.getlengths
PhyloNetworks.hardwiredclusterdistance_unrooted
PhyloNetworks.hybrid3cycle
PhyloNetworks.hybridEdges
PhyloNetworks.hybridEdges
PhyloNetworks.inheritanceweight
PhyloNetworks.initializeweightsfromleaves!
PhyloNetworks.initializeweightsfromleaves_softwired!
PhyloNetworks.isconnected
PhyloNetworks.isdescendant
PhyloNetworks.isdescendant_undirected
PhyloNetworks.ladderpartition
PhyloNetworks.leaststableancestor
PhyloNetworks.majoredgelength
PhyloNetworks.majoredgematrix
PhyloNetworks.makemissing!
PhyloNetworks.mapindividuals
PhyloNetworks.maxParsimonyNet
PhyloNetworks.minorreticulationgamma
PhyloNetworks.minorreticulationlength
PhyloNetworks.minorreticulationmatrix
PhyloNetworks.moveroot!
PhyloNetworks.nchoose1234
PhyloNetworks.nj!
PhyloNetworks.nnimax
PhyloNetworks.norootbelow!
PhyloNetworks.number_exitnodes
PhyloNetworks.pairwisetaxondistance_gradient
PhyloNetworks.parsenewick_edgedata!
PhyloNetworks.parsenewick_getfloat!
PhyloNetworks.parsenewick_hybridnode!
PhyloNetworks.parsenewick_remainingsubtree!
PhyloNetworks.parsenewick_treenode!
PhyloNetworks.parsimonyGF_bottomup!
PhyloNetworks.parsimonyfitch
PhyloNetworks.parsimonyfitch_bottomup!
PhyloNetworks.parsimonyfitch_summary
PhyloNetworks.parsimonyfitch_topdown!
PhyloNetworks.parsimonysoftwired_bottomup!
PhyloNetworks.postorder_nodeupdate!
PhyloNetworks.preorder_nodeupdate!
PhyloNetworks.problem4cycle
PhyloNetworks.process_biconnectedcomponents!
PhyloNetworks.quartetdata_columnnames
PhyloNetworks.quartetrank
PhyloNetworks.readcsvtoarray
PhyloNetworks.readfastatoarray
PhyloNetworks.readnewick_nodename
PhyloNetworks.readnewick_subtree!
PhyloNetworks.readnexus_assigngammas!
PhyloNetworks.readnexus_comment
PhyloNetworks.readnexus_extractgamma
PhyloNetworks.readnexus_translatetable
PhyloNetworks.removeHybrid!
PhyloNetworks.remove_edgelengthsgammas!
PhyloNetworks.resetedgenumbers!
PhyloNetworks.resetnodenumbers!
PhyloNetworks.resolvetreepolytomy!
PhyloNetworks.resolvetreepolytomy!
PhyloNetworks.rezip_canonical!
PhyloNetworks.samplebootstrap_multiloci
PhyloNetworks.setmultiplegammas!
PhyloNetworks.shrink2cycleat!
PhyloNetworks.shrink3cycleat!
PhyloNetworks.shrinkedge!
PhyloNetworks.sort_stringasinteger!
PhyloNetworks.startingBL!
PhyloNetworks.startree_newick
PhyloNetworks.symmetricnet_newick
PhyloNetworks.symmetricnet_newick
PhyloNetworks.symmetrictree_newick
PhyloNetworks.synchronizepartnersdata!
PhyloNetworks.timeinconsistency_average
PhyloNetworks.timeinconsistency_check
PhyloNetworks.timeinconsistency_error
PhyloNetworks.timeinconsistency_majortree
PhyloNetworks.traversal_postorder
PhyloNetworks.traversal_preorder
PhyloNetworks.traversal_preorder!
PhyloNetworks.traversalupdate_default!
PhyloNetworks.unzip_canonical!
PhyloNetworks.unzipat_canonical!
PhyloNetworks.updateconstraintfields!
PhyloNetworks.updateconstraints!