public documentation

Documentation for PhyloNetworks's public (exported) interface.

functions & types

PhyloNetworks.biconnectedcomponents — Function

biconnectedcomponents(network, ignoreTrivial=false)

Calculate biconnected components (aka "blobs") using Tarjan's algorithm.

Output: array of arrays of edges.

the length of the array is the number of blobs
each element is an array of all the edges inside a given blob.

These blobs are returned in post-order, but within a blob, edges are not necessarily sorted in topological order. If ignoreTrivial is true, trivial components (of a single edge) are not returned. The network is assumed to be connected.

Warnings: for nodes, fields intn2 and intn1 are modified during the algorithm. They are used to store the node's "index" (time of visitation), "lowpoint", and the node's "parent", as defined by the order in which nodes are visited. For edges, field boole2 is modified, to store whether the edge has been already been visited or not.

References:

p. 153 of Tarjan (1972). Depth first search and linear graph algorithms, SIAM Journal on Computing, 1(2):146-160
on geeksforgeeks, there is an error (as of 2018-01-30): elif v != parent[u] and low[u] > disc[v]: (python version) should be replaced by elif v != parent[u] and disc[u] > disc[v]:
nice explanation at this url

PhyloNetworks.biconnectedcomponents — Method

biconnectedcomponents(node, index, S, blobs, ignoreTrivial)

Helper recursive function starting at a node (not a network). index is an array containing a single integer, thus mutable: order in which nodes are visited.

PhyloNetworks.blobdecomposition — Method

blobdecomposition!(network)
blobdecomposition(network)

Find blobs using biconnectedcomponents; find their roots using blobinfo; create a forest in the form of a disconnected network (for efficiency), by deconnecting the root of each non-trivial blob from its parent. The root of each blob corresponds to a new leaf (in another tree of the forest): the number of the blob's root is given to the newly created leaf.

The first (bang) version modifies the network and returns the array of blob roots. The second version copies the network then returns a tuple: the forest and the array of blob roots.

Warnings:

the forest is represented by a single HybridNetwork object, on which most functions don't work (like writenewick, plotting etc.) because the network is disconnected (to make the forest). Revert back to low-level functions, e.g. printedges and printnodes.
see biconnectedcomponents for node attributes modified during the algorithm.

PhyloNetworks.calibratefrompairwisedistances! — Method

calibratefrompairwisedistances!(net, distances::Matrix{Float64},
    taxon_names::Vector{<:AbstractString})

Calibrate the network to match (as best as possible) input pairwise distances between taxa, such as observed from sequence data. taxon_names should provide the list of taxa, in the same order in which they they are considered in the distances matrix. The optimization criterion is the sum of squares between the observed distances, and the distances from the network (weighted average of tree distances, weighted by γ's). The network's edge lengths are modified.

Warning: for many networks, mutiple calibrations can fit the pairwise distance data equally well (lack of identifiability). This function will output one of these equally good calibrations.

optional arguments (default):

checkpreorder (true)
forceMinorLength0 (false) to force minor hybrid edges to have a length of 0
ultrametric (true) to force the network to be
- time-consistent: all paths from the root to a given node must have the same length, so the age of this node is well-defined, and
- ultrametric: all tips are at the same distance from the root, so have the same age.
NLoptMethod (:LD_MMA) for the optimization algorithm. Other options include :LN_COBYLA (derivative-free); see NLopt package.
tolerance values to control when the optimization is stopped: ftolRel (1e-12), ftolAbs (1e-10) on the criterion, and xtolRel (1e-10), xtolAbs (1e-10) on branch lengths / divergence times.
verbose (false)

PhyloNetworks.checkroot! — Method

checkroot!(net)
checkroot!(net::HybridNetwork, membership::Dict{Node, Int})

Set the root of net to an appropriate node and update the edges containroot field appropriately, using the membership output by treeedgecomponents. A node is appropriate to serve as root if it belongs in the root tree-edge component, that is, the root of the tree-edge component graph.

If the current root is appropriate, it is left as is. The direction of edges (via ischild1) is also left as is, assuming it was in synch with the existing root.
Otherwise, the root is set to the first appropriate node in net.node, that is not a leaf. Then edges are directed away from this root.

A RootMismatch error is thrown if net is not a valid semidirected phylogenetic network (i.e. it is not possible to root the network in a way compatible with the given hybrid edges).

Output: the membership ID of the root component. The full set of nodes in the root component can be obtained as shown below. Warning: only use the output component ID after calling the second version checkroot!(net, membership).

julia> net = readnewick("(#H1:::0.1,#H2:::0.2,(((b)#H1)#H2,a));");

julia> membership = treeedgecomponents(net);

julia> rootcompID = checkroot!(net, membership);

julia> rootcomp = keys(filter(p -> p.second == rootcompID, membership));

julia> sort([n.number for n in rootcomp]) # number of nodes in the root component
3-element Vector{Int64}:
 -3
 -2
  4

PhyloNetworks.cladewiseorder! — Method

cladewiseorder!(net::HybridNetwork)

Update the internal attribute net.vec_int1. Used for plotting the network. In the major tree, all nodes in a given clade are consecutive. On a tree, this function also provides a pre-ordering of the nodes. The edges' direction needs to be correct before calling cladewiseorder!, using directedges!

PhyloNetworks.countquartetsintrees — Function

countquartetsintrees(trees [, taxonmap]; which=:all, weight_byallele=true)

Calculate the quartet concordance factors (CF) observed in the trees vector. If present, taxonmap should be a dictionary that maps each allele name to it's species name. To save to a file, first convert to a data frame using tablequartetCF. When which=:all, quartet CFs are calculated for all 4-taxon sets. (Other options are not implemented yet.)

The algorithm runs in O(mn⁴) where m is the number of trees and n is the number of tips in the trees.

CFs are calculated at the species level only, that is, considering 4-taxon sets made of 4 distinct species, even if the gene trees may have multiple alleles from the same species. For 4 distinct species a,b,c,d, all alleles from each species (a etc.) will be considered to calculate the quartet CF.

By default, each gene has a weight of 1. So if there are n_a alleles from a, n_b alleles from b etc. in a given gene, then each set of 4 alleles has a weight of 1/(n_a n_b b_c n_c) in the calculation of the CF for a,b,c,d. With option weight_byallele=true, then each set of 4 alleles is given a weight of 1 instead. This inflates the total number of sets used to calculate the quartet CFs (to something larger than the number of genes). This may also affect the CF values if the number of alleles varies across genes: genes with more alleles will be given more weight.

examples

julia> tree1 = readnewick("(E,(A,B),(C,D),O);"); tree2 = readnewick("(((A,B),(C,D)),E);");

julia> q,t = countquartetsintrees([tree1, tree2]);
Reading in trees, looking at 15 quartets in each...
0+--+100%
  **

julia> t # taxon order: t[i] = name of taxon number i
6-element Vector{String}:
 "A"
 "B"
 "C"
 "D"
 "E"
 "O"

julia> length(q) # 15 four-taxon sets on 6 taxa
15

julia> q[1] # both trees agree on AB|CD: resolution 1
4-taxon set number 1; taxon numbers: 1,2,3,4
data: [1.0, 0.0, 0.0, 2.0]

julia> q[8] # tree 2 is missing O (taxon 6), tree 1 wants resolution 3: AO|CD
4-taxon set number 8; taxon numbers: 1,3,4,6
data: [0.0, 0.0, 1.0, 1.0]

julia> q[11] # tree 1 has ACEO unresolved, and tree 2 is missing O: no data for this quartet
4-taxon set number 11; taxon numbers: 1,3,5,6
data: [0.0, 0.0, 0.0, 0.0]

In the next example, each tree has 2 individuals from population A.

julia> tree1 = readnewick("(E,(a1,B),(a2,D),O);"); tree2 = readnewick("(((a1,a2),(B,D)),E);");

julia> q,t = countquartetsintrees([tree1, tree2], Dict("a1"=>"A", "a2"=>"A"); showprogressbar=false);

julia> t
5-element Vector{String}:
 "A"
 "B"
 "D"
 "E"
 "O"

julia> q[1] # tree 1 has discordance: a1B|DE and a2D|BE. tree 2 has AE|BD for both alleles of A
4-taxon set number 1; taxon numbers: 1,2,3,4
data: [0.25, 0.25, 0.5, 2.0]

julia> q[3] # tree 2 is missing O (taxon 5), and a2 is unresolved in tree 1. There's only a1B|EO
4-taxon set number 3; taxon numbers: 1,2,4,5
data: [1.0, 0.0, 0.0, 0.5]

Next we show how to convert these objects to a table using tablequartetCF. The output is a NamedTuple. It can be saved later to a DataFrame for example, using option copycols=false to avoid copying the columns (which can be very large if there are many 4-taxon sets). Data frames are easier to visualize, filter etc., but performance can be better on named tuples.

julia> nt = tablequartetCF(q,t); # named tuple

julia> using DataFrames

julia> df = DataFrame(nt, copycols=false); # convert to a data frame, without copying the column data

julia> show(df, allcols=true) # data frames are displayed much more nicely than named tuples
5×9 DataFrame
 Row │ qind   t1      t2      t3      t4      CF12_34  CF13_24  CF14_23  ngenes  
     │ Int64  String  String  String  String  Float64  Float64  Float64  Float64 
─────┼───────────────────────────────────────────────────────────────────────────
   1 │     1  A       B       D       E          0.25     0.25      0.5      2.0
   2 │     2  A       B       D       O          0.5      0.5       0.0      1.0
   3 │     3  A       B       E       O          1.0      0.0       0.0      0.5
   4 │     4  A       D       E       O          1.0      0.0       0.0      0.5
   5 │     5  B       D       E       O          0.0      0.0       0.0      0.0

julia> # using CSV; CSV.write(nt, "filename.csv");

Note that CSV.write can take a data frame or a named tuple as input, to write the table to a file.

Finally, the example below shows the effect of using weight_byallele=true when caculating quartet concordance factors from gene trees with multiple alleles per population, and of filtering out 4-taxon sets with no informative genes with keepQwithoutgenes=false when converting to a table.

julia> tree2 = readnewick("((A,(B,D)),E);");

julia> q,t = countquartetsintrees([tree1, tree2], Dict("a1"=>"A", "a2"=>"A"); weight_byallele=true);
Reading in trees, looking at 5 quartets in each...
0+--+100%
  **

julia> nt = tablequartetCF(q,t; keepQwithoutgenes=false); # qind=5 excluded: 0 genes

julia> show(DataFrame(nt, copycols=false), allcols=true)
4×9 DataFrame
 Row │ qind   t1      t2      t3      t4      CF12_34   CF13_24   CF14_23   ngenes  
     │ Int64  String  String  String  String  Float64   Float64   Float64   Float64 
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │     1  A       B       D       E       0.333333  0.333333  0.333333      3.0
   2 │     2  A       B       D       O       0.5       0.5       0.0           2.0
   3 │     3  A       B       E       O       1.0       0.0       0.0           1.0
   4 │     4  A       D       E       O       1.0       0.0       0.0           1.0

PhyloNetworks.deleteaboveLSA! — Function

deleteaboveLSA!(net, preorder=true)

Delete edges and nodes above (ancestral to) the least stable ancestor (LSA) of the leaves in net. See leaststableancestor for the definition of the LSA. Output: modified network net.

PhyloNetworks.deletehybridthreshold! — Function

deletehybridthreshold!(net::HybridNetwork, threshold::Float64,
                       nofuse=false, unroot=false, multgammas=false,
                       keeporiginalroot=false)

Deletes from a network all hybrid edges with heritability below a threshold gamma. Returns the network.

if threshold<0.5: delete minor hybrid edges with γ < threshold (or with a missing γ, for any threshold > -1.0)
if threshold=0.5: delete all minor hybrid edges (i.e normally with γ < 0.5, if γ non-missing)
nofuse: if true, do not fuse edges and keep original nodes.
unroot: if false, the root will not be deleted if it becomes of degree 2.
multgammas: if true, the modified edges have γ values equal to the proportion of genes that the extracted subnetwork represents. For an edge e in the modified network, the inheritance γ for e is the product of γs of all edges in the original network that have been merged into e.

-keeporiginalroot: if true, the root will be retained even if of degree 1.

Warnings:

by default, nofuse is false, partner hybrid edges are fused with their child edge and have their γ changed to 1.0. If nofuse is true: the γ's of partner hybrid edges are unchanged.
assumes correct ismajor fields, and correct ischild1 fields to update containroot.

PhyloNetworks.deleteleaf! — Method

deleteleaf!(HybridNetwork, leafName::AbstractString; ...)
deleteleaf!(HybridNetwork, Node; ...)
deleteleaf!(HybridNetwork, Integer; index=false, ...)

Delete a node from the network, possibly from its name, number, or index in the network's array of nodes. The first two versions require that the node is a leaf. The third version does not require that the node is a leaf: If it has degree 3 or more, nothing happens. If it has degree 1 or 2, then it is deleted.

keyword arguments

simplify: if true and if deleting the node results in 2 hybrid edges forming a cycle of k=2 nodes, then these hybrid edges are merged and simplified as a single tree edge.

unroot: if true, a root of degree 1 or 2 is deleted. If false, the root is deleted if it is of degree 1 (no root edge is left), but is kept if it is of degree 2. Deleting all leaves in an outgroup clade or grade will leave the ingroup rooted (that is, the new root will be of degree 2).

nofuse: if true, keep nodes (and edges) provided that they have at least one descendant leaf, even if they are of degree 2. This will keep two-cycles (forcing simplify to false). Nodes without any descendant leaves are deleted. If nofuse is false, edges adjacent to degree-2 nodes are fused.

multgammas: if true, the fused edge has γ equal to the product of the hybrid edges that have been fused together, which may result in tree edges with γ<1, or with reticulations in which the two parent γ don't add up to 1.

keeporiginalroot: if true, keep the root even if it is of degree one (forcing unroot to be false).

Warning: does not update edges' containroot nor internal attributes (e.g. those used by SNaQ for level-1 networks). Does not require branch lengths, and designed to work on networks of all levels.

PhyloNetworks.descendencematrix — Method

descendencematrix(net::HybridNetwork; checkpreorder::Bool=true)

Descendence matrix between all the nodes of a network: object D of type MatrixTopologicalOrder in which D[i,j] is the proportion of genetic material in node i that can be traced back to node j. If D[i,j]>0 then i is a descendent of j (and j is an ancestor of i). The network is assumed to be pre-ordered if checkpreorder is false. If checkpreorder is true (default), preorder! is run on the network beforehand.

PhyloNetworks.directedges! — Method

directedges!(net::HybridNetwork; checkMajor::Bool=true)

Updates the edges' attribute ischild1, according to the root placement. Also updates edges' attribute containroot, for other possible root placements compatible with the direction of existing hybrid edges. Relies on hybrid nodes having exactly 1 major hybrid parent edge, but checks for that if checkMajor is true.

Warnings:

Assumes that ischild1 is correct on hybrid edges (to avoid changing the identity of which nodes are hybrids and which are not).
Does not check for cycles (to maintain a network's DAG status)

Returns the network. Throws a 'RootMismatch' Exception if the root was found to conflict with the direction of any hybrid edge.

PhyloNetworks.displayednetworkat! — Function

displayednetworkat!(net::HybridNetwork, node::Node, nofuse=false,
                    unroot=false, multgammas=false)

Delete all the minor hybrid edges, except at input node. The network is left with a single hybridization, and otherwise displays the same major tree as before. If nofuse is true, edges are not fused (degree-2 nodes are kept).

Warning: assume correct ismajor fields.

PhyloNetworks.displayedtrees — Method

displayedtrees(net::HybridNetwork, gamma::Float64; nofuse::Bool=false,
               unroot::Bool=false, multgammas::Bool=false,
               keeporiginalroot::Bool=false)

Extracts all trees displayed in a network, following hybrid edges with heritability >= γ threshold (or >0.5 if threshold=0.5) and ignoring any hybrid edge with heritability lower than γ. Returns an array of trees, as HybridNetwork objects.

nofuse: if true, do not fuse edges (keep degree-2 nodes) during hybrid edge removal. unroot: if false, the root will not be deleted if it becomes of degree 2 unless keeporiginalroot is true. multgammas: if true, the edges in the displayed trees have γ values equal to the proportion of genes that the edge represents, even though all these edges are tree edges. The product of all the γ values across all edges is the proportion of genes that the tree represents. More specifically, edge e in a given displayed tree has γ equal to the product of γs of all edges in the original network that have been merged into e. keeporiginalroot: if true, keep root even if of degree 1.

Warnings:

if nofuse is true: the retained partner hybrid edges have their γ values unchanged, but their ismajor is changed to true
assume correct ismajor attributes.

PhyloNetworks.expectedf2matrix — Method

expectedf2matrix(net::HybridNetwork; preorder::Bool=true)

Matrix of f2 statistics expected from net, assuming that branch lengths in net represent "f2 distance". When data are based on allele frequencies, edge lengths measure drift units scaled by a variance factor p(1-p) depending on the allele frequency p at the root.

The rows and columns correspond to the taxa in the network, in the same order as listed by tiplabels(net).

For background on f-statistics, see for example Patterson et al. 2012 or Lipson 2020.

example

julia> net2 = readnewick("(O:5.5,(((E:1.5)#H1:2.5::0.7,((#H1:0,D:1.5):1.5,((C:1,B:1):1)#H2:1::0.6):1.0):1.0,(#H2:0,A:2):3):0.5);");

julia> f2 = expectedf2matrix(net2)
6×6 Matrix{Float64}:
  0.0   9.95  11.0   9.56  9.56  11.0
  9.95  0.0    5.45  5.95  5.95   8.95
 11.0   5.45   0.0   6.16  6.16  10.0
  9.56  5.95   6.16  0.0   2.0    6.16
  9.56  5.95   6.16  2.0   0.0    6.16
 11.0   8.95  10.0   6.16  6.16   0.0

julia> tiplabels(net2) # order of taxa in rows and columns of f2 matrix above
6-element Vector{String}:
 "O"
 "E"
 "D"
 "C"
 "B"
 "A"

PhyloNetworks.expectedf4table — Method

expectedf4table(net::HybridNetwork;
                showprogressbar::Bool=true,
                preorder::Bool=true)

Calculate the f4 statistics expected from net, assuming that branch lengths in net represent "f2 distance". Output: (q,t) where t is a list of taxa and q is a list of 4-taxon set objects of type PhyloNetworks.QuartetT{datatype}. In each element of q, taxonnumber gives the indices in taxa of the 4 taxa of interest; and data contains the 3 primary expected f4-statistics, for the following 3 ordering of the 4 taxa:

t1,t2|t3,t4   t1,t3|t4,t2   t1,t4|t2,t3.

This output is similar to that of countquartetsintrees, with 4-taxon sets listed in the same alphabetical order (same output t, then same order of 4-taxon sets in q).

For background on f-statistics, see for example Patterson et al. 2012 and Lipson 2020.

f4-statistics are linear combination of f2-statistics:

f4[t1,t2|t3,t4] = (f2[t1,t4] + f2[t2,t3] - f2[t1,t3] - f2[t2,t4])/2

Given a set of 4 taxa, there are 12 ways to order them, but there are only 2 "degrees of freedom" in the associated 12 f4-statistics, thanks to symmetries:

f4[t2,t1|t3,t4] = - f4[t1,t2|t3,t4]
f4[t3,t4|t1,t2] = f4[t1,t2|t3,t4]
f4[t1,t2|t3,t4] + f4[t1,t3|t4,t2] + f4[t1,t4|t2,t3] = 0

example

The first example is a tree: on which some f4 values are expected to be 0.

julia> net0 = readnewick("((D:0.6,((a1:.1,a2:.1):0.1,B:0.2):0.3),C:0.4);");

julia> # using PhyloPlots; plot(net0, showedgelength=true);

julia> f4,t = expectedf4table(net0);
Calculation of expected f4 for 5 4-taxon sets...
0+-----+100%
  *****

julia> show(t)
["B", "C", "D", "a1", "a2"]
julia> show(f4[1].taxonnumber) # taxa numbered 1-4 are: B,C,D,a1
[1, 2, 3, 4]
julia> for q in f4
         println(join(t[q.taxonnumber],",") * ": " * string(round.(q.data, digits=3)))
       end
B,C,D,a1: [-0.3, 0.3, 0.0]
B,C,D,a2: [-0.3, 0.3, 0.0]
B,C,a1,a2: [0.0, -0.1, 0.1]
B,D,a1,a2: [0.0, -0.1, 0.1]
C,D,a1,a2: [0.0, -0.4, 0.4]

The zeros correspond to splits in the tree: Ba1|CD, Ba2|CD, BC|a1a2, BD|a1a2, CD|a1a2. The other values correspond to the internal path length for each split.

Next, we use a network with 2 reticulations, each time between sister species (resulting in 3-cycle blobs).

julia> net = readnewick("(D:1,((C:1,#H25:0):0.1,
        ((((B1:10,B2:1):1.5,#H1:0):10.8,
        ((A1:1,A2:1):0.001)#H1:0::0.5):0.5)#H25:0::0.501):1);");

julia> # plot(net, showedgelength=true);

julia> f4,t = expectedf4table(net, showprogressbar=false);

julia> using DataFrames

julia> df = tablequartetf4(f4, t) |> DataFrame
15×8 DataFrame
 Row │ qind   t1      t2      t3      t4      f4_12_34  f4_13_42  f4_14_23 
     │ Int64  String  String  String  String  Float64   Float64   Float64  
─────┼─────────────────────────────────────────────────────────────────────
   1 │     1  A1      A2      B1      B2           0.0    -4.201     4.201
   2 │     2  A1      A2      B1      C            0.0     2.699    -2.699
   3 │     3  A1      A2      B2      C            0.0     2.699    -2.699
   4 │     4  A1      B1      B2      C           -6.9     6.9       0.0
   5 │     5  A2      B1      B2      C           -6.9     6.9       0.0
   6 │     6  A1      A2      B1      D            0.0     2.699    -2.699
   7 │     7  A1      A2      B2      D            0.0     2.699    -2.699
   8 │     8  A1      B1      B2      D           -6.9     6.9       0.0
   9 │     9  A2      B1      B2      D           -6.9     6.9       0.0
  10 │    10  A1      A2      C       D            0.0    -3.176     3.176
  11 │    11  A1      B1      C       D            0.0    -5.875     5.875
  12 │    12  A2      B1      C       D            0.0    -5.875     5.875
  13 │    13  A1      B2      C       D            0.0    -5.875     5.875
  14 │    14  A2      B2      C       D            0.0    -5.875     5.875
  15 │    15  B1      B2      C       D            0.0   -12.775    12.775

PhyloNetworks.getchild — Method

getchild(edge)
getchild(node)
getchildren(node)

Get child(ren) node(s).

getchild: single child node of edge, or of node after checking that node has a single child.
getchildren: vector of all children nodes of node.

getchildedge(node)

Single child edge of node. Checks that it's a single child.

Warning: these functions rely on correct edge direction, via their ischild1 field.

See also: getparent, getpartneredge, isparentof, hassinglechild.

PhyloNetworks.getchildedge — Function

getchild(edge)
getchild(node)
getchildren(node)

Get child(ren) node(s).

getchild: single child node of edge, or of node after checking that node has a single child.
getchildren: vector of all children nodes of node.

getchildedge(node)

Single child edge of node. Checks that it's a single child.

Warning: these functions rely on correct edge direction, via their ischild1 field.

See also: getparent, getpartneredge, isparentof, hassinglechild.

PhyloNetworks.getchildren — Function

getchild(edge)
getchild(node)
getchildren(node)

Get child(ren) node(s).

getchild: single child node of edge, or of node after checking that node has a single child.
getchildren: vector of all children nodes of node.

getchildedge(node)

Single child edge of node. Checks that it's a single child.

Warning: these functions rely on correct edge direction, via their ischild1 field.

See also: getparent, getpartneredge, isparentof, hassinglechild.

PhyloNetworks.getlevel — Function

getlevel(net, preorder=true, preprocess=false)

Level of net: maximum number of edges to delete within a biconnected component to make it a tree. If the network is bicombining (each hybrid has 2 parents), then the level is the maximum number of hybrid nodes within a biconnected component. Arguments:

preorder to direct edges according to the root and store a pre-ordering of nodes in net.vec_node, if not done earlier on the network (e.g. after a topology modification)
preprocess to recalculate & store the biconnected components with process_biconnectedcomponents!, assuming the preordering was done

PhyloNetworks.getnodeages — Method

getnodeages(net)

vector of node ages in pre-order, as in vec_node.

Warnings: net is assumed to

have been preordered before (to calculate vec_node)
be time-consistent (all paths to the root to a given hybrid have the same length)
be ultrametric (all leaves have the same age: 0)

PhyloNetworks.getnodeheights — Function

getnodeheights(net, checkpreorder::Bool=true)
getnodeheights!(net, checkpreorder::Bool=true)

Vector of node heights, that is: the distance of each node to the root. An error is thrown if the network is not time-consistent. A network is time-consistent if, for any node v, all paths from the root to v have the same length. (It is sufficient to check this condition at hybrid nodes). Ultrametricity is not assumed: tips need not all be at the same distance from the root. If checkpreorder=false, assumes the network has already been preordered with preorder!.

If a tree edge has a missing length (coded as -1), both functions throw an error. In general, there may be an exponential number of ways to assign tree edge lengths that make the network time-consistent.

getnodeheights sends a warning upon finding a missing hybrid edge length, otherwises proceeds as getnodeheights! but without modifying the network. getnodeheights! will attempt to assign values to missing lengths, for hybrid edges only, so as to make the network time-consistent.

If a hybrid edge e has a missing length, getnodeheights! proceeds as follows at its child hybrid node h:

If all of h's parent edges lack a length: the shortest non-negative lengths are assigned to make the network time-consistent at h. In particular, one of the partner edges is assigned length 0, and h is made as old as possible, that is, as close to the root as possible: the reticulation is "zipped-up".
Otherwise: the length of e is set to the unique value that makes the network time-consistent at h, based on the partner edge's length. If this value is negative, then an error is thrown.

Output: vector of node heights, one per node, in the same order as in net.vec_node.

See also: istimeconsistent and getnodeheights_average.

Examples:

julia> net = readnewick("(((C:1,(A:1)#H1:1.5::0.7):1,(#H1:0.3::0.3,E:2.0):2.2):1.0,O:5.2)root;");

julia> # using PhyloPlots; plot(net, useedgelength=true, showedgelength=true, shownodenumber=true); # to see

julia> nodeheight = getnodeheights(net)
9-element Vector{Float64}:
 0.0
 5.2
 1.0
 3.2
 5.2
 2.0
 3.5
 4.5
 3.0

julia> [node.number => (height, node.name) for (height,node) in zip(nodeheight, net.vec_node)]
9-element Vector{Pair{Int64, Tuple{Float64, String}}}:
 -2 => (0.0, "root")
  5 => (5.2, "O")
 -3 => (1.0, "")
 -6 => (3.2, "")
  4 => (5.2, "E")
 -4 => (2.0, "")
  3 => (3.5, "H1")
  2 => (4.5, "A")
  1 => (3.0, "C")

PhyloNetworks.getnodeheights! — Function

getnodeheights(net, checkpreorder::Bool=true)
getnodeheights!(net, checkpreorder::Bool=true)

Vector of node heights, that is: the distance of each node to the root. An error is thrown if the network is not time-consistent. A network is time-consistent if, for any node v, all paths from the root to v have the same length. (It is sufficient to check this condition at hybrid nodes). Ultrametricity is not assumed: tips need not all be at the same distance from the root. If checkpreorder=false, assumes the network has already been preordered with preorder!.

If a tree edge has a missing length (coded as -1), both functions throw an error. In general, there may be an exponential number of ways to assign tree edge lengths that make the network time-consistent.

getnodeheights sends a warning upon finding a missing hybrid edge length, otherwises proceeds as getnodeheights! but without modifying the network. getnodeheights! will attempt to assign values to missing lengths, for hybrid edges only, so as to make the network time-consistent.

If a hybrid edge e has a missing length, getnodeheights! proceeds as follows at its child hybrid node h:

If all of h's parent edges lack a length: the shortest non-negative lengths are assigned to make the network time-consistent at h. In particular, one of the partner edges is assigned length 0, and h is made as old as possible, that is, as close to the root as possible: the reticulation is "zipped-up".
Otherwise: the length of e is set to the unique value that makes the network time-consistent at h, based on the partner edge's length. If this value is negative, then an error is thrown.

Output: vector of node heights, one per node, in the same order as in net.vec_node.

See also: istimeconsistent and getnodeheights_average.

Examples:

julia> net = readnewick("(((C:1,(A:1)#H1:1.5::0.7):1,(#H1:0.3::0.3,E:2.0):2.2):1.0,O:5.2)root;");

julia> # using PhyloPlots; plot(net, useedgelength=true, showedgelength=true, shownodenumber=true); # to see

julia> nodeheight = getnodeheights(net)
9-element Vector{Float64}:
 0.0
 5.2
 1.0
 3.2
 5.2
 2.0
 3.5
 4.5
 3.0

julia> [node.number => (height, node.name) for (height,node) in zip(nodeheight, net.vec_node)]
9-element Vector{Pair{Int64, Tuple{Float64, String}}}:
 -2 => (0.0, "root")
  5 => (5.2, "O")
 -3 => (1.0, "")
 -6 => (3.2, "")
  4 => (5.2, "E")
 -4 => (2.0, "")
  3 => (3.5, "H1")
  2 => (4.5, "A")
  1 => (3.0, "C")

PhyloNetworks.getnodeheights_average — Function

getnodeheights_average(net, checkpreorder::Bool=true; warn=true)

Vector of average node heights, that is: the average distance from the root to each node. The average is a weighted average with weights taken to be the hybrid edges' inheritance values γ, if available. Equal weights are used at hybrid nodes with some parents lacking a γ inheritance value (with a warning).

missing edge lengths:

An error is thrown if a tree edge has a missing edge length.
If all parent hybrid edges have missing lengths at a given hybrid node, then the hybrid node is assumed to be as close to the root as possible, that is, the reticulation is assumed "zipped-up" with one of its hybrid edges of length 0.
If some but not all parent hybrid edges have a missing length, then the average node height is calculated based on the non-missing parents only. If the hybrid node height turns out to be lower than one of the parent's height (such that some missing length would need to be negative) then a warning is issued.

A warning is issued, unless warn=false, if the network is not time-consistent.

See also: istimeconsistent, getnodeheights, and getnodeheights_majortree](@ref).

PhyloNetworks.getnodeheights_majortree — Function

getnodeheights_majortree(net, checkpreorder::Bool=true; warn=true)

Vector of node heights from the major tree, that is: the distance from the root to each node when considering the major tree for node heights.

missing edge lengths:

An error is thrown if a tree edge has a missing edge length.
If all parent hybrid edges have missing lengths at a given hybrid node, then the hybrid node is assumed to be as close to the root as possible, that is, the reticulation is assumed "zipped-up" with one of its hybrid edges of length 0.
If a major hybrid edge has a missing length, then the hybrid node height will be calculated using the node height and edge length of the minor parent with the largest inheritance γ (with a warning). If the major hybrid edge lacks a length and all non-missing minor edges lack an inheritance γ or have the same value, then an error is thrown.

A warning is issued, unless warn=false, if the network is not time-consistent.

See also: istimeconsistent, getnodeheights and getnodeheights_average.

#node heights of time-consistent networks are the same 
julia> consistent_net = readnewick("((A:2.5,#H1:1.5::0.4):0.25,(C:1.5,(B:1)#H1:0.5::0.6):1.25);");

julia> heights = getnodeheights(consistent_net)
7-element Vector{Float64}:
 0.0
 1.25
 2.75
 0.25
 1.75
 2.75
 2.75

julia> heights_average = getnodeheights_average(consistent_net);

julia> heights_major = getnodeheights_majortree(consistent_net);

julia> heights == heights_average == heights_major  
true

#inconsistent networks give different results
julia> inconsistent_net = readnewick("((A:2.5,#H1:1.5::0.4):0.25,(C:1.5,(B:1)#H1:2.5::0.6):1.25);");

julia> getnodeheights_average(inconsistent_net;warn=false)
7-element Vector{Float64}:
 0.0
 1.25
 2.75
 0.25
 2.95
 3.95
 2.75

julia> getnodeheights_majortree(inconsistent_net;warn=false) 
7-element Vector{Float64}:
 0.0
 1.25
 2.75
 0.25
 3.75
 4.75
 2.75

PhyloNetworks.getparent — Method

getparent(edge)
getparent(node)
getparentminor(node)
getparents(node)

Get parental node(s).

getparent: major (or only) parent node of edge or node
getparentminor: minor parent node of node
getparents: vector of all parent nodes of node.

getparentedge(node)
getparentedgeminor(node)

Get one parental edge of a node.

getparentedge: major parent edge. For a tree node, it's its only parent edge.
getparentedgeminor: minor parent edge, if node is hybrid (with an error if node has no minor parent).

If node has multiple major (resp. minor) parent edges, the first one would be returned without any warning or error.

Warning: these functions use the field ischild1 of edges.

See also: getchild, getpartneredge.

PhyloNetworks.getparentedge — Function

getparent(edge)
getparent(node)
getparentminor(node)
getparents(node)

Get parental node(s).

getparent: major (or only) parent node of edge or node
getparentminor: minor parent node of node
getparents: vector of all parent nodes of node.

getparentedge(node)
getparentedgeminor(node)

Get one parental edge of a node.

getparentedge: major parent edge. For a tree node, it's its only parent edge.
getparentedgeminor: minor parent edge, if node is hybrid (with an error if node has no minor parent).

If node has multiple major (resp. minor) parent edges, the first one would be returned without any warning or error.

Warning: these functions use the field ischild1 of edges.

See also: getchild, getpartneredge.

PhyloNetworks.getparentedgeminor — Function

getparent(edge)
getparent(node)
getparentminor(node)
getparents(node)

Get parental node(s).

getparent: major (or only) parent node of edge or node
getparentminor: minor parent node of node
getparents: vector of all parent nodes of node.

getparentedge(node)
getparentedgeminor(node)

Get one parental edge of a node.

getparentedge: major parent edge. For a tree node, it's its only parent edge.
getparentedgeminor: minor parent edge, if node is hybrid (with an error if node has no minor parent).

If node has multiple major (resp. minor) parent edges, the first one would be returned without any warning or error.

Warning: these functions use the field ischild1 of edges.

See also: getchild, getpartneredge.

PhyloNetworks.getparentminor — Function

getparent(edge)
getparent(node)
getparentminor(node)
getparents(node)

Get parental node(s).

getparent: major (or only) parent node of edge or node
getparentminor: minor parent node of node
getparents: vector of all parent nodes of node.

getparentedge(node)
getparentedgeminor(node)

Get one parental edge of a node.

getparentedge: major parent edge. For a tree node, it's its only parent edge.
getparentedgeminor: minor parent edge, if node is hybrid (with an error if node has no minor parent).

If node has multiple major (resp. minor) parent edges, the first one would be returned without any warning or error.

Warning: these functions use the field ischild1 of edges.

See also: getchild, getpartneredge.

PhyloNetworks.getparents — Function

getparent(edge)
getparent(node)
getparentminor(node)
getparents(node)

Get parental node(s).

getparent: major (or only) parent node of edge or node
getparentminor: minor parent node of node
getparents: vector of all parent nodes of node.

getparentedge(node)
getparentedgeminor(node)

Get one parental edge of a node.

getparentedge: major parent edge. For a tree node, it's its only parent edge.
getparentedgeminor: minor parent edge, if node is hybrid (with an error if node has no minor parent).

If node has multiple major (resp. minor) parent edges, the first one would be returned without any warning or error.

Warning: these functions use the field ischild1 of edges.

See also: getchild, getpartneredge.

PhyloNetworks.getpartneredge — Method

getpartneredge(edge::Edge)
getpartneredge(edge::Edge, node::Node)

Edge that is the hybrid partner of edge, meaning that is has the same child node as edge. This child node is given as an argument in the second method. Assumptions, not checked:

no in-coming polytomy: a node has 0, 1 or 2 parents, no more
when node is given, it is assumed to be the child of edge (the first method calls the second).

See also: getparent, getchild

PhyloNetworks.getroot — Method

getroot(net)

Node used to root net. If net is to be considered as semi-directed or unrooted, this root node is used to write the networks' Newick parenthetical description or for network traversals.

See also: isrootof

PhyloNetworks.hardwiredcluster — Method

hardwiredcluster(edge::Edge, taxa::Union{AbstractVector{String},AbstractVector{Int}})
hardwiredcluster!(v::Vector{Bool}, edge::Edge, taxa)
hardwiredcluster!(v::Vector{Bool}, edge::Edge, taxa, visited::Vector{Int})

Calculate the hardwired cluster of edge, coded as a vector of booleans: true for taxa that are descendent of the edge, false for other taxa (including missing taxa).

The edge should belong in a rooted network for which ischild1 is up-to-date. Run directedges! beforehand. This is very important, otherwise one might enter an infinite loop, and the function does not test for this.

visited: vector of node numbers, of all visited nodes.

Examples:

julia> net5 = "(A,((B,#H1),(((C,(E)#H2),(#H2,F)),(D)#H1)));" |> readnewick |> directedges! ;

julia> taxa = net5 |> tiplabels # ABC EF D
6-element Vector{String}:
 "A"
 "B"
 "C"
 "E"
 "F"
 "D"

julia> hardwiredcluster(net5.edge[12], taxa) # descendants of 12th edge = CEF
6-element Vector{Bool}:
 0
 0
 1
 1
 1
 0

See also hardwiredclusterdistance and hardwiredclusters

PhyloNetworks.hardwiredclusterdistance — Method

hardwiredclusterdistance(net1::HybridNetwork, net2::HybridNetwork, rooted::Bool)

Hardwired cluster distance between the topologies of net1 and net2, that is, the number of hardwired clusters found in one network and not in the other (with multiplicity, see below).

If the 2 networks are trees, this is the Robinson-Foulds distance. If rooted=false, then both networks are considered as semi-directed.

Networks are assumed bicombining (each hybrid has exactly 2 parents, no more).

Dissimilarity vs distance

This is not a distance per se on the full space of phylogenetic networks: there are pairs of distinct networks for which this dissimilarity is 0. But it is a distance on some classes of networks, such as the class of tree-child networks that are "normal" (without shortcuts), or the class of tree-child networks that can be assigned node ages such that hybrid edges have length 0 and tree edges have non-negative lengths. See Cardona, Rossello & Valiente (2008), Cardona, Llabres, Rossello & Valiente (2008), and Huson, Rupp, Scornavacca (2010).

Example

julia> net1 = readnewick("(t6,(t5,((t4,(t3,((t2,t1))#H1)),#H1)));");

julia> taxa = sort(tiplabels(net1)); # t1 through t6, sorted alphabetically

julia> # using PhyloPlots; plot(net1, showedgenumber=true);

julia> # in matrix below: column 1: edge number. last column: tree (10) vs hybrid (11) edge
       # middle columns: for 'taxa': t1,...t6. 1=descendant, 0=not descendant
       hardwiredclusters(net1, taxa)
6×8 Matrix{Int64}:
 13  1  1  1  1  1  0  10
 12  1  1  1  1  0  0  10
 10  1  1  1  1  0  0  10
  9  1  1  1  0  0  0  10
  8  1  1  0  0  0  0  11
  7  1  1  0  0  0  0  10

julia> net2 = readnewick("(t6,(t5,((t4,(t3)#H1),(#H1,(t1,t2)))));");

julia> hardwiredclusters(net2, taxa)
6×8 Matrix{Int64}:
 13  1  1  1  1  1  0  10
 12  1  1  1  1  0  0  10
  6  0  0  1  1  0  0  10
  5  0  0  1  0  0  0  11
 11  1  1  1  0  0  0  10
 10  1  1  0  0  0  0  10

julia> hardwiredclusterdistance(net1, net2, true) # true: as rooted networks
4

What is a hardwired cluster?

Each edge in a network is associated with its hardwired cluster, that is, the set of all its descendant taxa (leaves). The set of hardwired cluster of a network is the set of its edges' hardwired clusters. The dissimilarity d_hard defined in Huson, Rupp, Scornavacca (2010) is the number of hardwired clusters that are in one network but not in the other.

This implementation is a slightly more discriminative version of d_hard, where each cluster is counted with multiplicity and annotated with its edge's hybrid status, as follows:

External edges are not counted (they are tree edges to a leaf, shared by all phylogenetic networks).
A cluster is counted for each edge for which it's the hardwired cluster.
At a given hybrid node, both hybrid partner edges have the same cluster, so this cluster is only counted once for both partners.
A given cluster is matched between the two networks only if it's the cluster from a tree edge in both networks, or from a hybrid edge in both networks.

In the example above, net1 has a shortcut (hybrid edge 11) resulting in 2 tree edges (12 and 10) with the same cluster {t1,t2,t3,t4}. So cluster {t1,t2,t3,t4} has multiplicity 2 in net1. net2 also has this cluster, but only associated with 1 tree edge, so this cluster contributes (2-1)=1 towards the hardwired cluster distance between the two networks. The distance of 4 corresponds to these 4 clusters:

{t1,t2,t3,t4}: twice in net1, once in net2
{t3,t4}: absent in net1, once in net2
{t1,t2}: twice in net1 (from a hybrid edge & a tree edge), once in net2
{t3}: absent in net1 (because external edges are not counted), once in net2 (from a hybrid edge).

Degree-2 nodes cause multiple edges to have the same cluster, so counting clusters with multiplicity distinguishes a network with extra degree-2 nodes from the "same" network after these nodes have been suppressed (e.g. with PhyloNetworks.fuseedgesat! or PhyloNetworks.shrinkedge!).

Networks as semi-directed

If rooted is false and one of the phylogenies is not a tree (1+ reticulations), then all degree-2 nodes are removed before comparing the hardwired clusters, and the minimum distance is returned over all possible ways to root the networks at internal nodes.

See also: hardwiredclusters, hardwiredcluster

PhyloNetworks.hardwiredclusters — Method

hardwiredclusters(net::HybridNetwork, taxon_labels)

Returns a matrix describing all the hardwired clusters in a network, with taxa listed in same order as in taxon_labels to describe their membership in each cluster. Allows for missing taxa, with entries all 0.

Warnings:

clusters are rooted, so the root must be correct.
each hybrid node is assumed to have exactly 2 parents (no more).

Each row corresponds to one internal edge, that is, external edges are excluded. If the root is a leaf node, the external edge to that leaf is included (first row). Both parent hybrid edges to a given hybrid node only contribute a single row (they share the same hardwired cluster).

first column: edge number
next columns: 0/1. 1=descendant of edge, 0=not a descendant, or missing taxon.
last column: 10/11 values. 10=tree edge, 11=hybrid edge

See also hardwiredclusterdistance and hardwiredcluster.

PhyloNetworks.hashybridladder — Method

hashybridladder(net::HybridNetwork)

Return true if net contains a hybrid ladder: where a hybrid node's child is itself a hybrid node. This makes the network not treechild, assuming it is fully resolved. (One of the nodes does not have any tree-node child).

PhyloNetworks.hassinglechild — Method

hassinglechild(node)

true if node has a single child, based on the edges' ischild1 field; false otherwise.

See also: getchild, getparent

PhyloNetworks.hybridclades_support — Method

hybridclades_support(boot_net::Vector{HybridNetwork}, ref_net::HybridNetwork; rooted=false)

Match hybrid nodes in a reference network with those in an array of networks, like bootstrap networks or in a posterior sample of networks. All networks must be fully resolved, and on the same taxon set. If rooted=true, all networks are assumed to have been properly rooted beforehand. Otherwise, the origin of each hybrid edge is considered as an unrooted bipartition (default).

Two hybrid edges in two networks are said to match if they share the same "hybrid" clade (or recipient) and the same "donor clade", which is a sister to the hybrid clade in the network. Since a hybrid clade has 2 parent edges, it is sister to two clades simultaneously: one is its major sister (following the major hybrid edge with γ>0.5) and one is its minor sister (following the major hybrid edge with γ<0.5).

To calculate these hybrid and sister clades at a given hybrid node, all other hybrid edges are first removed from the network. Then, the hybrid clade is the hardwired cluster (descendants) of either hybrid edge and major/minor clade is the hardwired cluster of the sibling edge of the major/minor hybrid parent. If rooted=false, sister clades are considered as bipartitions.

Output:

a "node" data frame (see below)
an "edge" data frame (see below)
a "clade" data frame to describe the make up of all clades found as hybrids or sisters, starting with a column taxa that lists all taxa. All other columns correspond to a given clade and contain true/false values. true means that a given taxon belongs in a given clade. For a clade named H1, for instance, and if the data frame was named cla, the list of taxa in this clade can be obtained with cla[:taxa][cla[:H1]].
an array of gamma values, with one row for each network in the sample and two columns (major/minor) for each hybrid edge in the reference network. If this hybrid edge was found in the sampled network (i.e. same hybrid and sister clades, after removal of all other hybrid nodes), its inheritcance gamma value is recorded here. Otherwise, the gamma entry is 0.0.
a vector with the number of each hybrid edge in the reference network, in the same order as for the columns in the array of gamma values above.

The "node" data frame has one row per clade and 9 columns giving:

:clade: the clade's name, like the taxon name (if a hybrid is a single taxon) or the hybrid tag (like 'H1') in the reference network
:node: the node number in the reference network. missing if the clade is not in this network.
:hybridnode: typically the same node number as above, except for hybrid clades in the reference network. For those, the hybrid node number is listed here.
:edge: number of the parent edge, parent to the node in column 2, if found in the ref network. missing otherwise.
:BS_hybrid: percentage of sample networks in which the clade is found to be a hybrid clade.
:BS_sister: percentage of sample networks in which the clade is found to be sister to some hybrid clade (sum of the next 2 columns)
:BS_major_sister: percentage of sample networks in which the clade is found to be the major sister to some hybrid clade
:BS_minor_sister: same as previous, but minor
:BS_hybrid_samesisters: percentage of sample networks in which the clade is found to be a hybrid and with the same set of sister clades as in the reference network. Applies to hybrid clades found in the reference network only, missing for all other clades.

The "edge" data frame has one row for each pair of clades, and 8 columns:

:edge: hybrid edge number, if the edge appears in the reference network. missing otherwise.
:hybrid_clade: name of the clade found to be a hybrid, descendent of 'edge'
:hybrid: node number of that clade, if it appears in the reference network. missing otherwise.
:sister_clade: name of the clade that is sister to 'edge', i.e. be sister to a hybrid
:sister: node number of that clade, if in the ref network.
:BS_hybrid_edge: percentage of sample networks in which 'edge' is found to be a hybrid edge, i.e. when the clade in the 'hybrid' column is found to be a hybrid and the clade in the 'sister' column is one of its sisters.
:BS_major: percentage of sample networks in which 'edge' is found to be a major hybrid edge, i.e. when 'hybrid' is found to be a hybrid clade and 'sister' is found to be its major sister.
:BS_minor: same as previous, but minor

PhyloNetworks.hybridlambdaformat — Method

hybridlambdaformat(net::HybridNetwork; prefix="I")

Output net as a string in the format that the Hybrid-Lambda simulator expects, namely:

all internal nodes are named, including the root, with names that are unique and start with a letter.
hybrid nodes are written as H6#γ1:length1 and H6#γ1:length2 instead of #H6:length1::γ1 and #H6:length2::γ2 (note the samme γ value expected by Hybrid-Lambda)

This is a modified version of the extended Newick format.

Optional keyword argument prefix: must start with a letter, other than "H". Internal nodes are given names like "I1", "I2", etc. Existing internal non-hybrid node names are replaced, which is crucial if some of them don't start with a letter (e.g. in case node names are bootstrap values). See nameinternalnodes! to add node names.

examples

julia> net = readnewick("((a:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);");

julia> hybridlambdaformat(net) # net is unchanged here
"((a:1.0,(b:1.0)H1#0.8:1.0)I1:5.0,(H1#0.8:0.0,c:1.0)I2:1.0)I3;"

julia> # using PhyloPlots; plot(net, shownodenumber=true) # shows that node -2 is the root

julia> rotate!(net, -2)

julia> writenewick(net) # now the minor edge with γ=0.2 appears first
"((#H1:0.0::0.2,c:1.0):1.0,(a:1.0,(b:1.0)#H1:1.0::0.8):5.0);"

julia> hybridlambdaformat(net)
"((H1#0.2:0.0,c:1.0)I2:1.0,(a:1.0,(b:1.0)H1#0.2:1.0)I1:5.0)I3;"

julia> net = readnewick("((((B)#H1:::.6)#H2,((D,C,#H2:::0.8),(#H1,A))));"); # 2 reticulations, no branch lengths

julia> writenewick(net, round=true)
"(#H2:::0.2,((D,C,((B)#H1:::0.6)#H2:::0.8),(#H1:::0.4,A)));"

julia> hybridlambdaformat(net; prefix="int")
"(H2#0.2,((D,C,((B)H1#0.6)H2#0.2)int1,(H1#0.6,A)int2)int3)int4;"

PhyloNetworks.ischildof — Function

isparentof(node, edge)
ischildof(node, edge)

true if node is the tail / head, or parent / child, of edge; false otherwise. Assumes that the edge's direction is correct, meaning its field ischild1 is reliable (in sync with the rooting).

See also: getparent, getchild, isrootof

PhyloNetworks.isexternal — Function

isrootof(node, net)

true if node is the root of net (or used as such for network traversals in case the network is considered as semi-directed); false otherwise.

isleaf(node)
isexternal(edge)

true if node is a leaf or edge is adjacent to a leaf, false otherwise.

See also: getroot, getparent, getchild

PhyloNetworks.isgalled — Function

isgalled(net::HybridNetwork, checkpreorder::Bool=true)

true (resp. false) if net is (resp. is not) a galled network, assuming that net is bicombining (every hybrid has exactly 2 parents). A network is galled if every hybrid node is contained in a cycle that has its 2 hybrid parent edges and otherwise tree edges only. See for example Huson, Rupp & Scornavacca (2010) and Gunawan, DasGupta & Zhang (2017). Equivalently, a bicombining network is galled if, for any hybrid node n, both of its parent edges originate from the same tree component. We get the tree components by deleting all hybrid edges from the network: we are left with a forest of trees.

The network is traversed in preorder (from the root to the leaves). Turn checkpreorder to false if the pre-ordering was run and is known to be up-to-date. Assumption: tree edges are directly correctly according to the current root.

PhyloNetworks.isleaf — Function

isrootof(node, net)

true if node is the root of net (or used as such for network traversals in case the network is considered as semi-directed); false otherwise.

isleaf(node)
isexternal(edge)

true if node is a leaf or edge is adjacent to a leaf, false otherwise.

See also: getroot, getparent, getchild

PhyloNetworks.isparentof — Method

isparentof(node, edge)
ischildof(node, edge)

true if node is the tail / head, or parent / child, of edge; false otherwise. Assumes that the edge's direction is correct, meaning its field ischild1 is reliable (in sync with the rooting).

See also: getparent, getchild, isrootof

PhyloNetworks.isrootof — Method

isrootof(node, net)

true if node is the root of net (or used as such for network traversals in case the network is considered as semi-directed); false otherwise.

isleaf(node)
isexternal(edge)

true if node is a leaf or edge is adjacent to a leaf, false otherwise.

See also: getroot, getparent, getchild

PhyloNetworks.istimeconsistent — Function

istimeconsistent(net, checkpreorder::Bool=true)

True (resp. false) if net network is (resp. is not) time-consistent. A network is time-consistent if for any node v, all paths from the root to v have the same length. It is sufficient to check this condition at nodes v that are hybrid nodes.

See also getnodeheights and getnodeheights_average.

PhyloNetworks.istreechild — Method

istreechild(net::HybridNetwork)

Tuple (rooted, semi_weakly, semi_strongly) in which each element is true or false, to indicate if net is / is not

tree-child as a rooted network
weakly tree-child as a semidirected network
strongly tree-child as a semidirected network

A rooted network is tree-child if all of its internal nodes have at least one child that is a tree node (or equivalently, one child edge that is a tree edge). A semidirected network is strongly (resp. weakly) tree-child if all (resp. at least one) of its rooted partners are tree-child.

Note that degree-2 nodes are not suppressed: a degree-2 node with a single hybrid child causes the network to not be tree-child, despite the fact that suppressing this node may render the reduced network tree-child.

Assumes that tree edges are directly correctly according to the current root.

PhyloNetworks.majortree — Method

majortree(net::HybridNetwork; nofuse::Bool=false, unroot::Bool=false,
          keeporiginalroot::Bool=false)

Extract the major tree displayed in a network, keeping the major edge and dropping the minor edge at each hybrid node.

nofuse: if true, edges and degree-2 nodes are retained during edge removal. Otherwise, at each reticulation the child edge (below the hybrid node) is retained: the major hybrid edge is fused with it.

unroot: is true, the root will be deleted if it becomes of degree 2.

keeporiginalroot: the network's root is kept even if it becomes of degree 1.

Warnings:

if nofuse is true: the hybrid edges that are retained (without fusing) have their γ values unchanged, but their ismajor is changed to true
assume correct ismajor attributes.

PhyloNetworks.minortreeat — Function

minortreeat(net::HybridNetwork, hybindex::Integer, nofuse=false, unroot::Bool=false)

Extract the tree displayed in the network, following the major hybrid edge at each hybrid node, except at the ith hybrid node (i=hybindex), where the minor hybrid edge is kept instead of the major hybrid edge. If nofuse is true, edges are not fused (degree-2 nodes are kept). If unroot is true, the root will be deleted if it becomes of degree 2.

Warning: assume correct ismajor fields.

PhyloNetworks.mudistance_rooted — Method

mudistance_rooted(net1::HybridNetwork, net2::HybridNetwork;
    labels::AbstractVector{<:AbstractString}=sort!(union(tiplabels(net1), tiplabels(net2))),
    preorder::Bool=true)

Distance or matrix of distances between the networks, considered as rooted, based on their node-based μ-representation: number of nodes in one network whose μ-vector does not match with that of a node in the other network. The μ-vector of a node has n+1 coordinates: one for each tip taxon, and one for hybrids. Each value counts the number of paths starting at the node and ending at a specific leaf, or ending at a hybrid node (any hybrid node).

See Cardona, Rossello & Valiente (2009) for the original μ-distance, and Cardona et al. (2024) for the extension using paths ending at hybrids.

The distance implemented here considers one μ-vector per non-leaf node: without ignoring any non-leaf node (including degree-2 nodes if any), and ignoring leaves whose μ-vectors are trivial.

Assumption: networks have a single root.

See also node_murepresentation.

keyword arguments

preorder indicates whether to preprocess each network with directedges! and preorder!.
labels: tip labels that each μ-vector will track, by default the union of taxon labels across both networks. If a taxon is absent from net1 (say) but tracked because present in net2, then there are 0 paths from any node in net1 to that taxon.

If the networks have different tip labels, consider pruning leaves that are not shared between them beforehand, using deleteleaf!, to compare the subnetworks on their shared leaf set.

If a non-default label set is used, such as the sorted shared taxon set sort!(intersect(tiplabels(net1), tiplabels(net2))), then the distance may drop, because the number of paths to non-shared taxa will be ignored. But all nodes still count towards the distance, including degree-2 nodes that would be suppressed in subnetworks.

PhyloNetworks.mudistance_semidirected — Method

mudistance_semidirected(net1::HybridNetwork, net2::HybridNetwork; ...)
mudistance_semidirected(nets::AbstractVector{HybridNetwork}; ...)

Distance or matrix of distances between the networks, considered as semidirected, based on their edge-based μ-representation: number of edges in one network whose μ-entry does not match with that of an edge in the other network.

The μ-vector of an edge, assigned some specific direction, has n+1 coordinates: one for each tip taxon, and one for hybrids. Each value counts the number of paths starting with the edge (directed as specified) and ending at a specific leaf, or ending at a (any) hybrid node. For more details, see Maxfield, Xu & Ané 2025. The implementation here departs from Maxfield, Xu & Ané (2025) by considering μ-vectors "extended" with the counts of paths ending at hybrids, as defined on rooted networks by Cardona et al. (2024).

Each μ-entry includes 1 or 2 μ-vector(s) and tag(s) indicating the edge type (e.g. hybrid). The number of μ-vector(s) depends on the direction(s) that an edge may have under different rootings.

If the networks have different tip labels, the union of all their labels is considered to build μ-vectors, and the resulting distance must be positive. Consider pruning leaves that are not shared between the two networks beforehand, using deleteleaf!, to compare the subnetworks on their shared leaf set.

Assumption: networks have a single root.

See also edge_murepresentation.

keyword arguments

preorder=true indicates whether to preprocess each network with directedges! and preorder!.
userootμ=false controls whether the root μ-vector in included in the set of μ-entries being compared. It is ignored by default (departing from Maxfield, Xu & Ané 2025) because different root μ-vectors for networks with a single root would cause all μ-entries in their root components to be different.
suppressroot=true to exclude one internal tree edge incident to the root, in case the root is incident to exactly 2 edges which are not both hybrid edges. In that case, re-rooting the network would suppress that internal tree edge, hence its associated μ-entry. This option does not modify the network.
labels: tip labels that each μ-vector will track, by default the union of taxon labels across both networks, sorted alphabetically. If a taxon is absent from net1 (say) but tracked because present in net2, then there are 0 paths from any edge in net1 to that taxon.

If the networks have different tip labels, consider pruning leaves that are not shared between them beforehand, using deleteleaf!, to compare the subnetworks on their shared leaf set.

If a non-default label set is used, such as the sorted shared taxon set sort!(intersect(tiplabels(net1), tiplabels(net2))), then the distance may drop, as the number of paths to non-shared taxa will be ignored. But all edges still count towards the distance, whereas edges below degree-2 nodes would be suppressed in subnetworks.

PhyloNetworks.nameinternalnodes! — Function

nameinternalnodes!(net::HybridNetwork, prefix="i")

Add names to nodes in net that don't already have a name. Leaves already have names; but if not, they will be given names as well. New node names will be of the form "prefixk" where k is an integer. So by default, new node names will be of the form "i1", "i2", etc.

examples

julia> net = readnewick("((a:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);");

julia> nameinternalnodes!(net, "I") # by default, shown without internal node names
HybridNetwork, Rooted Network
7 edges
7 nodes: 3 tips, 1 hybrid nodes, 3 internal tree nodes.
tip labels: a, b, c
((a:1.0,(b:1.0)#H1:1.0::0.8)I1:5.0,(#H1:0.0::0.2,c:1.0)I2:1.0)I3;

julia> writenewick(net; internallabel=false) # by default, writenewick shows internal names if they exist
"((a:1.0,(b:1.0)#H1:1.0::0.8):5.0,(#H1:0.0::0.2,c:1.0):1.0);"

julia> net = readnewick("((int5:1,(b:1)#H1:1::0.8):5,(#H1:0::0.2,c:1):1);"); # one taxon name starts with "int"

julia> nameinternalnodes!(net, "int");

julia> writenewick(net)
"((int5:1.0,(b:1.0)#H1:1.0::0.8)int6:5.0,(#H1:0.0::0.2,c:1.0)int7:1.0)int8;"

PhyloNetworks.nj — Method

nj(D::DataFrame; force_nonnegative_edges::Bool=false)

Construct a tree from a distance matrix by neighbor joining, where D is a DataFrame of the distance matrix, with taxon names taken from the header of the data frame. The rows are assumed to correspond to tips in the tree in the same order as they do in columns. With force_nonnegative_edges being true, any negative edge length is changed to 0.0 (with a message).

For the algorithm, see Satou & Nei 1987.

See nj! for using a matrix as input.

PhyloNetworks.nni! — Method

nni!([rng::AbstractRNG,]
     net::HybridNetwork,
     e::Edge,
     nohybridladder::Bool=true,
     no3cycle::Bool=true,
     constraints::Vector{TopologyConstraint}=TopologyConstraint[]
)

Attempt to perform a nearest neighbor interchange (NNI) around edge e, randomly chosen among all possible NNIs (e.g 3, sometimes more depending on e) satisfying the constraints, and such that the new network is a DAG. The number of possible NNI moves around an edge depends on whether the edge's parent/child nodes are tree or hybrid nodes. This is calculated by nnimax.

The option no3cycle forbids moves that would create a 3-cycle in the network. When no3cycle = false, 2-cycle and 3-cycles may be generated.

Note that the defaults values are for positional (not keyword) arguments, so two or more arguments can be used, but in a specific order: nni!(net, e) or nni!(net, e, nohybridladder), nni!(net, e, nohybridladder, no3cycle), nni!(net, e, nohybridladder, no3cycle, contraints).

Assumptions:

The starting network does not have 3-cycles, if no3cycle=true. No check for the presence of 2- and 3-cycles in the input network.
The edges' field ischild1 is correct in the input network. (This field will be correct in the output network.)

Output: information indicating how to undo the move or nothing if all NNIs failed.

examples

julia> str_network = "(((S8,S9),(((((S1,S2,S3),S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));";

julia> net = readnewick(str_network);

julia> # using Random; Random.seed!(3); ## commented out for doctest reproducibility across julia versions, but users can use this line to set the seed in their analyses.

julia> undoinfo = nni!(net, net.edge[3], true, true); # true's to avoid hybrid ladders and 3-cycles

In the next example, we use a stable RNG to make the example reproducible across julia versions. However, this particular RNG is not recommended. The RNG used by default is better (e.g. much more efficient).

julia> str_network = "(((S8,S9),(((((S1,S2,S3),S4),(S5)#H1),(#H1,(S6,S7))))#H2),(#H2,S10));";

julia> net = readnewick(str_network);

julia> # using Pkg; Pkg.add("StableRNGs") # to install StableRNGs if not done earlier

julia> using StableRNGs

julia> rng = StableRNG(791);

julia> undoinfo = nni!(rng, net, net.edge[3], true, true); # true's to avoid hybrid ladders and 3-cycles

julia> writenewick(net)
"((S9,((((((S1,S2,S3),S4),(S5)#H1),(#H1,(S6,S7))))#H2,S8)),(#H2,S10));"

julia> nni!(undoinfo...);

julia> writenewick(net) == str_network # net back to original topology: the NNI was "undone"
true

PhyloNetworks.nni! — Method

nni!(αu::Edge, u::Node, uv::Edge, v::Node, vδ::Edge)
nni!(αu,u,uv,v,vδ, flip::Bool, inner::Bool, indices)

Transform a network locally around the focus edge uv with the following NNI, that detaches u-β and grafts it onto vδ:

α - u -- v ------ δ
    |    |
    β    γ

α ------ v -- u - δ
         |    |
         γ    β

flip boolean indicates if the uv edge was flipped inner boolean indicates if edges αu and uv both point toward node u, i.e. α->u<-v<-δ. If this is true, we flip the hybrid status of αu and vδ.

indices give indices for nodes and edges uinαu, αuinu, vδinv, and vinvδ. These are interpreted as:

u_in_αu: the index for u in the edge αu
αu_in_u: the index for αu in node u
vδ_in_v: the index for vδ in node v
v_in_vδ: the index for v in edge vδ

Warnings:

No check of assumed adjacencies
Not implemented for cases that are not necessary thanks to symmetry, such as cases covered by nni!(vδ, v, uv, u, αu) or nni!(βu, u, v, vγ). More specifically, these cases are not implemented (and not checked):
- u not hybrid & v hybrid
- u hybrid, v not hybrid, α -> u <- v -> δ
Because of this, nni(αu,u,uv,v,vδ, ...) should not be used directly; use instead nni!(uv, move_number).
nni!(undoinfo...) restores the topology, but edges below hybrid nodes will have length 0.0 even if they didn't before.

Node numbers and edge numbers are not modified. Edge uv keeps its direction unchanged unless the directions were α -> u -> v -> δ or α <- u <- v <- δ, in which case the direction of uv is flipped.

The second version's input has the same signature as the output, but will undo the NNI more easily. This means that if output = nni!(input), then nni!(output...) is valid and undoes the first operation.

Right now, branch lengths are not modified except when below a hybrid node. Future versions might implement options to modify branch lengths.

PhyloNetworks.nni! — Method

nni!(uv::Edge, nummove::UInt8, nohybridladder::Bool, no3cycle::Bool)

Modify a network with a nearest neighbor interchange (NNI) around its edge uv. Return the information necessary to undo the NNI, or nothing if the move was not successful (such as if the resulting graph was not acyclic (not a DAG) or if the focus edge is adjacent to a polytomy). If the move fails, the network is not modified. nummove specifies which of the available NNIs is performed.

rooted-NNI options according to Gambette et al. (2017), fig. 8:

BB: 2 moves, both to BB, if directed edges. 8 moves if undirected.
RR: 2 moves, both to RR.
BR: 3 moves, 1 RB & 2 BRs, if directed. 6 moves if e is undirected.
RB: 4 moves, all 4 BRs.

The extra options are due to assuming a semi-directed network, whereas Gambette et al (2017) describe options for rooted networks. On a semi-directed network, there might be a choice of how to direct the edges that may contain the root, e.g. choice of e=uv versus vu, and choice of labelling adjacent nodes as α/β (BB), or as α/γ (BR).

nohybridladder = true prevents moves that would create a hybrid ladder in the network, that is, 2 consecutive hybrid nodes (one parent of the other). no3cycle = true prevents NNI moves that would make a 3-cycle, and assumes that the input network does not have any 2- or 3-cycles. If no3cycle is false, 3-cycles can be generated, but NNIs generating 2-cycles are prevented.

The edge field ischild1 is assumed to be correct in the overall network (that is, in sync with the network's field .rooti).

PhyloNetworks.pairwisetaxondistancematrix — Method

pairwisetaxondistancematrix(net; keepInternal=false,
                            checkpreorder=true, nodeAges=[])
pairwisetaxondistancematrix!(M, net, nodeAges)

Return the matrix M of pairwise distances between nodes in the network:

between all nodes (internal and leaves) if keepInternal=true, in which case the nodes are listed in M in the order in which they appear in net.vec_node
between taxa only otherwise, in which case the nodes are listed in M in the order in which they appear in tiplabels(net) (i.e. same order as in net.leaf)

The second form modifies M in place, assuming all nodes.

The distance between the root and a given hybrid node (to take an example) is the weighted average of path lengths from the root to that node, where each path is weighted by the product of γs of all edges on that path. This distance measures the average genetic distance across the genome, if branch lengths are in substitutions/site.

optional arguments:

checkpreorder: if true, net.vec_node is updated to get a topological ordering of nodes.
nodeAges: if not provided, i.e. empty vector, the network is not modified. If provided and non-empty, nodeAges should list node ages in the pre-order in which nodes are listed in vec_node (including leaves), and edge lengths in net are modified accordingly.

Providing node ages hence makes the network time consistent: such that all paths from the root to a given hybrid node have the same length. If node ages are not provided, the network need not be time consistent.

PhyloNetworks.parsimonyGF — Method

parsimonyGF(net, tip_dictionary, criterion=:softwired)
parsimonyGF(net, species, sequenceData, criterion=:softwired)

Calculate the most parsimonious score of a network given discrete characters at the tips using a general framework (Van Iersel et al. 2018) allowing for various parsimony criteria: softwired (default), hardwired, parental etc. Only softwired is implemented at the moment.

Data can given in one of the following:

tipdata: data frame for a single trait, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait values are to appear in column 2 or in a column named "trait".
tipdata: dictionary taxon => state, for a single trait.
species: array of strings, and sequences: array of sequences, in the order corresponding to the order of species names.

algorithm

The complexity of the algorithm is exponential in the level of the network, that is, the maximum number of hybridizations in a single blob, or biconnected component (Fischer et al. 2015). The function loops over all the state assignments of the minor parent of each hybrid node within a blob, so its complexity is of the order of n * m * c^2 * c^level where n is the number of tips, m the number of traits and c the number of states.

See parsimonysoftwired for a faster algorithm, but solving the softwired criterion only.

references

Leo Van Iersel, Mark Jones, Celine Scornavacca (2017). Improved Maximum Parsimony Models for Phylogenetic Networks, Systematic Biology, (https://doi.org/10.1093/sysbio/syx094).
Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C. (2015). On computing the Maximum Parsimony score of a phylogenetic network. SIAM J. Discrete Math., 29(1):559-585.

Use the recursive helper function parsimonyGF_bottomup!. Use the fields ischild1, booln4 to know which nodes are at the root of a blob, and boole2 to know which edges are cut (below the minor parent of each hybrid).

PhyloNetworks.parsimonysoftwired — Method

parsimonysoftwired(net, tipdata)
parsimonysoftwired(net, species, sequences)

Calculate the most parsimonious (MP) score of a network given a discrete character at the tips. The softwired parsimony concept is used: where the number of state transitions is minimized over all trees displayed in the network.

Data can given in one of the following:

tipdata: data frame for a single trait, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait values are to appear in column 2 or in a column named "trait".
tipdata: dictionary taxon => state, for a single trait.
species: array of strings, and sequences: array of sequences, in the order corresponding to the order of species names.

algorithm

The dynamic programming algorithm by Fischer et al. (2015) is used. The function loops over all the displayed subtrees within a blob (biconnected component), so its complexity is of the order of n * m * c^2 * 2^level where n is the number of tips, m the number of traits, c the number of states, and level is the level of the network: the maximum number of hybridizations within a blob.

See parsimonyGF for a different algorithm, slower but extendable to other parsimony criteria.

references

Fischer, M., van Iersel, L., Kelk, S., Scornavacca, C. (2015). On computing the Maximum Parsimony score of a phylogenetic network. SIAM J. Discrete Math., 29(1):559-585.

PhyloNetworks.preorder! — Method

preorder!(net::HybridNetwork)

Update attribute net.vec_node in which the nodes are pre-ordered (also called topological sorting), such that each node is visited after its parent(s). The edges' direction needs to be correct before calling preorder!, using directedges!

PhyloNetworks.printedges — Method

printedges(net)
printedges(io::IO, net)

Print information on the edges of a HybridNetwork net: edge number, numbers of nodes attached to it, edge length, whether it's a hybrid edge, its γ inheritance value, whether it's a major edge, if it could contain the root (this field is not always updated, though) and one more attribute pertaining to level-1 networks used in SNaQ: in which cycle it is contained (-1 if no cycle).

PhyloNetworks.printnodes — Method

printnodes(net)
printnodes(io, net)

Print information on the nodes of a HybridNetwork net: node number, whether it's a leaf, whether it's a hybrid node, it's name (label), its intn1 field (for level-1 networks in SNaQ: number given to the cycle in which the node might be, -1 if the node it not in a cycle cycle), and the list of edges attached to it, by their numbers.

PhyloNetworks.readfastatodna — Function

readfastatodna(filename::String, countPatterns::Bool=false)

Read a fasta file to a dataframe containing a column for each site. If countPatterns is true, calculate weights and remove identical site patterns to reduce matrix dimension.

Return a tuple containing:

data frame of BioSequence DNA sequences, with taxon names in column 1 followed by a column for each site pattern, in columns 2-npatterns;
array of weights, one weight for each of the site columns. The length of the weight vector is equal to npatterns.

Warning: assumes a semi-sequential format, not interleaved, where each taxon name appears only once. For this one time, the corresponding sequence may be broken across several lines though.

PhyloNetworks.readmultinewick — Function

readmultinewick(filename::AbstractString, fast=true)
readmultinewick(newicktrees_list::Vector{<:AbstractString})

Read a list of networks in parenthetical format, either from a file (one network per line) if the input is a string giving the path to the file, or from a vector of strings with each string corresponding to a newick-formatted topology. By default (fast=true), Functors.fmap is used for repeatedly reading the newick trees into of HybridNetwork-type objects. The option fast=false corresponds to the behavior up until v0.14.3: with a file name as input, it prints a message (without failing) when a phylogeny cannot be parsed, and allows for empty lines. Each network is read with readnewick.

Return an array of HybridNetwork objects.

Examples

julia> multitreepath = joinpath(dirname(Base.find_package("PhyloNetworks")), "..", "examples", "multitrees.newick");
julia> multitree = readmultinewick(multitreepath) # vector of 25 HybridNetworks
julia> multitree = readmultinewick(multitreepath, false) # same but slower & safer
julia> treestrings = readlines(multitreepath) # vector of 25 strings
julia> multitree = readmultinewick(treestrings)
julia> readmultinewick(treestrings, false) # same, but slower

PhyloNetworks.readmultinewick_files — Method

readmultinewick_files(listfile; relative2listfile=true)

Read the list of file names in listfile, then read all the trees in each of these files. Output: vector of vectors of trees (networks with h>0 allowed).

listfile should be the name of a file containing the path/name to multiple bootstrap files, one on each line (no header). Each named bootstrap file should contain multiple trees, one per line (such as bootstrap trees from a single gene).

The path/name to each bootstrap file should be relative to listfile. Otherwise, use option relative2listfile=false, in which case the file names are interpreted as usual: relative to the user's current directory if not given as absolute paths.

PhyloNetworks.readnewick — Method

readnewick(file name)
readnewick(parenthetical description)
readnewick(IO)

Read tree or network topology from parenthetical format (extended Newick). If the root node has a single child: ignore (i.e. delete from the topology) the root node and its child edge.

Input: text file or parenthetical format directly. The file name may not start with a left parenthesis, otherwise the file name itself would be interpreted as the parenthetical description. Nexus-style comments ([&...]) are ignored, and may be placed after (or instead) of a node name, and before/after an edge length.

A root edge, not enclosed within a pair a parentheses, is ignored. If the root node has a single edge, this one edge is removed.

PhyloNetworks.readnexus_treeblock — Function

readnexus_treeblock(filename, treereader=readnewick, args...;
                    reticulate=true, stringmodifier=[r"#(\d+)" => s"#H\1"])

Read the first "trees" block of a nexus-formatted file, using the translate table if present, and return a vector of HybridNetworks. Information inside [&...] are interpreted as comments and are discarded by the default tree reader. Optional arguments args are passed to the tree reader.

For the nexus format, see Maddison, Swofford & Maddison (1997).

Unless reticulate is false, the following is done to read networks with reticulations.

Prior to reading each phylogeny, each instance of #number is replaced by #Hnumber to fit the standard extended Newick format at hybrid nodes. This behavior can be changed with option stringmodifier, which should be a vector of pairs accepted by replace.

Inheritance γ values are assumed to be given within "comment" blocks at minor hybrid edges (cut as tips to form the extended Newick) like this for example, as output by bacter (Vaughan et al. 2017):

#11[&conv=0, relSize=0.08, ...

or like this, as output by SpeciesNetwork (Zhang et al. 2018):

#H11[&gamma=0.08]

In this example, the corresponding edge to hybrid H11 has γ=0.08.

PhyloNetworks.removedegree2nodes! — Function

removedegree2nodes!(net::HybridNetwork, keeproot::Bool=false)

Delete all nodes of degree two in net, fusing the two adjacent edges together each time, and return the network. If the network has a degree-2 root and keeproot is false, then the root is eliminated as well, leaving the network unrooted. The only exception to this rule is if the root is incident to 2 (outgoing) hybrid edges. Removing the root should leave a loop-edge (equal end point), which we don't want to do, to preserve the paths in the original network. In this case, the root is maintained even if keeproot is false. If keeproot is true, then the root is kept even if it's of degree 2.

See fuseedgesat!.

julia> net = readnewick("(((((S1,(S2)#H1),(#H1,S3)))#H2),(#H2,S4));");

julia> PhyloNetworks.breakedge!(net.edge[3], net); # create a degree-2 node along hybrid edge

julia> PhyloNetworks.breakedge!(net.edge[3], net); # another one: 2 in a row

julia> PhyloNetworks.breakedge!(net.edge[10], net); # another one, elsewhere

julia> writenewick(net) # extra pairs of parentheses
"((#H2,S4),(((((S1,(((S2)#H1))),(#H1,S3)))#H2)));"

julia> removedegree2nodes!(net);

julia> writenewick(net) # even the root is gone
"(#H2,S4,(((S1,(S2)#H1),(#H1,S3)))#H2);"

julia> net = readnewick("((((C:0.9)I1:0.1)I3:0.1,((A:1.0)I2:0.4)I3:0.6):1.4,(((B:0.2)H1:0.6)I2:0.5)I3:2.1);");

julia> removedegree2nodes!(net, true);

julia> writenewick(net, round=true) # the root was kept
"((C:1.1,A:2.0):1.4,B:3.4);"

PhyloNetworks.rootatnode! — Method

rootatnode!(HybridNetwork, nodeNumber::Integer; index::Bool=false)
rootatnode!(HybridNetwork, Node)
rootatnode!(HybridNetwork, nodeName::AbstractString)

Root the network/tree object at the node with name 'nodeName' or number 'nodeNumber' (by default) or with index 'nodeNumber' if index=true. Attributes ischild1 and containroot are updated along the way. Use plot(net, shownodenumber=true, showedgelength=false) to visualize and identify a node of interest. (see package PhyloPlots)

Return the network.

Warnings:

If the node is a leaf, the root will be placed along the edge adjacent to the leaf. This might add a new node.
If the desired root placement is incompatible with one or more hybrids, then
- the original network is restored with its old root and edges' direction.
- a RootMismatch error is thrown.

See also: rootonedge!.

PhyloNetworks.rootonedge! — Method

rootonedge!(HybridNetwork, edgeNumber::Integer; index::Bool=false)
rootonedge!(HybridNetwork, Edge)

Root the network/tree along an edge with number edgeNumber (by default) or with index edgeNumber if index=true. Attributes ischild1 and containroot are updated along the way.

This adds a new node and a new edge to the network. Use plot(net, showedgenumber=true, showedgelength=false) to visualize and identify an edge of interest. (see package PhyloPlots)

See also: rootatnode!.

PhyloNetworks.rotate! — Method

rotate!(net::HybridNetwork, nodeNumber::Integer; orderedEdgeNum::Array{Int,1})

Rotates the order of the node's children edges. Useful for plotting, to remove crossing edges. If node is a tree node with no polytomy, the 2 children edges are switched and the optional argument orderedEdgeNum is ignored.

Use plot(net, shownodenumber=true, showedgenumber=false) to map node and edge numbers on the network, as shown in the examples below. (see package PhyloPlots)

Warning: assumes that edges are correctly directed (ischild1 updated). This is done by plot(net). Otherwise run directedges!(net).

Example

julia> net = readnewick("(A:1.0,((B:1.1,#H1:0.2::0.2):1.2,(((C:0.52,(E:0.5)#H2:0.02::0.7):0.6,(#H2:0.01::0.3,F:0.7):0.8):0.9,(D:0.8)#H1:0.3::0.8):1.3):0.7):0.1;");
julia> using PhyloPlots
julia> plot(net, shownodenumber=true)
julia> rotate!(net, -4)
julia> plot(net)
julia> net=readnewick("(4,((1,(2)#H7:::0.864):2.069,(6,5):3.423):0.265,(3,#H7:::0.136):10.0);");
julia> plot(net, shownodenumber=true, showedgenumber=true)
julia> rotate!(net, -1, orderedEdgeNum=[1,12,9])
julia> plot(net, shownodenumber=true, showedgenumber=true)
julia> rotate!(net, -3)
julia> plot(net)

Note that LinearAlgebra also exports a function named rotate! in Julia v1.5. If both packages need to be used in Julia v1.5 or higher, usage of rotate! needs to be qualified, such as with PhyloNetworks.rotate!.

PhyloNetworks.setgamma! — Function

setgamma!(Edge, new γ, change_other=true)

Set inheritance probability γ for an edge, which must be a hybrid edge. The new γ needs to be in [0,1]. The γ of the "partner" hybrid edge is changed accordingly, to 1-γ. The field ismajor is also changed accordingly. If the new γ is approximately 0.5, Edge is set to the major parent, its partner is set to the minor parent.

If net is a HybridNetwork object, printedges(net) will show the list of edges and their γ's. The γ of the third hybrid edge (say) can be changed to 0.2 with setgamma!(net.edge[3],0.2). This will automatically set γ of the partner hybrid edge to 0.8.

The last argument is true by default. If false: the partner edge is not updated. This is useful if the new γ is 0.5, and the partner's γ is already 0.5, in which case the ismajor attributes can remain unchanged.

See also PhyloNetworks.setmultiplegammas!

PhyloNetworks.setlength! — Method

setlength!(edge::Edge, new_length)

Assign new length to edge. new_length should be non-negative, or missing (or -1, interpreted as missing).

PhyloNetworks.setlengths! — Method

setlengths!(edges::Vector{Edge}, lengths::AbstractVector)

Assign new lengths to a vector of edges. Checks that the new edge lengths are non-negative or missing (or -1 to be interpreted as missing).

PhyloNetworks.sharedpathmatrix — Method

sharedpathmatrix(net::HybridNetwork; checkpreorder::Bool=true)

Matrix Ω of shared path lengths from the root to a pair of nodes, for all pairs of nodes in network net. This is the covariance under a Brownian motion model, returned as a data frame by vcv. The network is assumed to be pre-ordered if checkpreorder is false. If checkpreorder is true (default), preorder! is run on the network beforehand.

Returns an object of type MatrixTopologicalOrder.

See also descendenceweight, which returns a matrix P independent of edge lengths. P only depends on the network topology and γ. Then Ω = PDP' where D is a diagonal matrix with the network's edge lengths on its diagonal.

PhyloNetworks.shrink2cycles! — Function

shrink2cycles!(net::HybridNetwork, unroot::Bool=false)

If net contains a 2-cycle, collapse the cycle into one edge of length tA + γt1+(1-γ)t2 + tB (see below), and return true. Return false otherwise. A 2-cycle is a set of 2 parallel hybrid edges, from the same parent node to the same hybrid child node.

       A                A
       | tA             |
     parent             |
       | \              |
t2,1-γ |  | t1,γ        | tA + γ*t1 + (1-γ)*t2 + tB
       | /              |
     hybrid             |
       | tB             |
       B                B

If any of the lengths or gammas associated with a 2-cycle are missing, the combined length is missing. If γ is missing, branch lengths are calculated using γ=0.5.

If unroot is false and the root is up for deletion, it will be kept only if it is has degree 2 or more. If unroot is true and the root is up for deletion, it will be kept only if it has degree 3 or more. A root node with degree 1 will be deleted in both cases.

PhyloNetworks.shrink3cycles! — Function

shrink3cycles!(net::HybridNetwork, unroot::Bool=false)

Remove all 2- and 3-cycles from a network.

Return true if net contains a 2-cycle or a 3-cycle; false otherwise. A 3-cycle (2-cycle) is a set of 3 (2) nodes that are all connected. One of them must be a hybrid node, since net is a DAG.

If unroot is false and the root is up for deletion, it will be kept only if it is has degree 2 or more. If unroot is true and the root is up for deletion, it will be kept only if it has degree 3 or more. A root node with degree 1 will be deleted in both cases.

See shrink3cycleat! for details on branch lengths and inheritance values.

PhyloNetworks.tablequartetCF — Method

tablequartetCF(quartetlist::Vector{QuartetT} [, taxonnames];
               keepQwithoutgenes=true, colnames=nothing)

Convert a vector of QuartetT objects to a table with 1 row for each four-taxon set in the list. Each four-taxon set contains quartet data of some type (vector of length 3 or 4, or 3×n matrix), which determines the number of columns in the table (3, 4, or 3n). The first 3 columns are named "CF1234", "CF1324", "CF14_23", unless the full list of column names is provided in colnames. See tablequartetdata for details.

The output is a NamedTuple, to which we can apply common table operations as a Table.jl-compatible source. It can easily be converted to other table formats such as a DataFrame.

See countquartetsintrees for examples.

PhyloNetworks.tablequartetf4 — Method

tablequartetf4(quartetlist::Vector{QuartetT} [, taxonnames];
               keepQwithoutgenes=true, colnames=nothing)

Convert a vector of QuartetT objects to a table with 1 row for each four-taxon set in the list. Each four-taxon set contains quartet data of some type (vector of length 3 or 4, or 3×n matrix), which determines the number of columns in the table (3, 4, or 3n). The first 3 columns are named "f41234", "f41342" and "f41423", unless the full list of column names is provided in colnames. See tablequartetdata for details.

The output is a NamedTuple, to which we can apply common table operations as a Table.jl-compatible source. It can easily be converted to other table formats such as a DataFrame.

See countquartetsintrees for examples.

PhyloNetworks.tiplabels — Method

tiplabels(x)

Vector of taxon names at the leaves, defined for objects of various types: HybridNetwork, MatrixTopologicalOrder.

For a network, the taxon names are coerced to strings.

PhyloNetworks.treeedgecomponents — Method

treeedgecomponents(net::HybridNetwork)

Return the tree-edge components of the semidirected network as a membership dictionary Node => Int. Nodes with the same membership integer value are in the same tree-edge component. The tree-edge components of a network are the connected components of the network when all hybrid edges are removed.

A RootMismatch error is thrown if there exists a cycle in any of the tree-edge components, or if a tree-edge component has more than one "entry" hybrid node.

Warnings:

since Nodes are mutable, the network should not be modified until usage of the output membership dictionary is over.
the component IDs are not predicable, but will be consecutive integers from 1 to the number of components.

PhyloNetworks.treeedges_support — Method

treeedges_support(sample_net::Vector{HybridNetwork}, ref_net::HybridNetwork)`

Read a sample of networks sample_net (such as a bootstrap sample or a posterior sample) and a reference network (ref_net), and calculate the support for the tree edges in the reference network. All minor hybrid edges (γ<0.5) are removed to extract the major tree from each network. All remaining edges are tree edges, each associated with a bipartition.

output:

a data frame with one row per tree edge and two columns: edge number, bootstrap support (as a percentage)
the major tree from the reference network, where minor hybrid edges (with γ<0.5) have been removed.

PhyloNetworks.treeofblobs — Method

treeofblobs(net::HybridNetwork; preorder::Bool=true, preprocess::Bool=true)

Tree of blobs for net. This tree is obtained from the network by shrinking any blob (2-edge connected component) that is not trivial (more than 1 edge) into a single node. The remaining edges are cut-edges (or bridges): they disconnect the network if removed. After shrinking blobs, only these cut-edges remain (with their edge length if any) and the graph forms a tree: the tree-of-blobs. It typically has polytomies, and may have degree-2 nodes. See Allman et al. (2022) (on binary networks) and Xu & Ané (2023).

If the network is binary, each blob is a block (biconnected component). If the network is not binary, then multiple blocks forming a single blob are shrunk into the same node.

The network is not modified. PhyloNetworks.treeofblobs!(net) modifies net, but is unsafe to use, in the sense that it assumes that the network is already preordered and preprocessed to get its biconnected component.

In the output tree, the biconnected component information is stored.

Arguments:

preorder to direct edges according to the root and store a pre-ordering of nodes in net.vec_node, if not done earlier on the network (e.g. after a topology modification)
preprocess to recalculate & store the biconnected components with process_biconnectedcomponents!, assuming the preordering was done.

examples

In this first example, the network is binary (no polytomies of any kind) and has 2 blobs.

julia> net = readnewick("((S1:0.1,(((S2,(S3)#H1),(#H1,S4:0.4)):0.2)#H2:0.2),(#H2,S5));");

julia> tob = treeofblobs(net);

julia> writenewick(tob) # 2 polytomies: at root, and to S2,S3,S4
"(S5,S1:0.1,(S4:0.4,S2,S3):0.2);"

Now we use the same network but without the edge connecting the 2 blobs: there are now 2 blocks sharing one node, so 1 blob only.

julia> net = readnewick("((S1:0.1,((S2,(S3)#H1),(#H1,S4:0.4))#H2:0.2),(#H2,S5));");

julia> # using PhyloPlots; plot(net, style=:majortree, arrowlen=0.1);

julia> tob = treeofblobs(net);

julia> writenewick(tob) # star tree: 1 big polytomy
"(S5,S1:0.1,S4:0.4,S2,S3);"

PhyloNetworks.vcv — Method

vcv(net::HybridNetwork; model::AbstractString="BM",
                        corr::Bool=false,
                        checkpreorder::Bool=true)

This function computes the variance covariance matrix between the tips of the network, assuming a Brownian model of trait evolution (with unit variance). If optional argument corr is set to true, then the correlation matrix is returned instead.

The function returns a DataFrame object, with columns named by the tips of the network.

The calculation of the covariance matrix requires a pre-ordering of nodes to be fast. If checkpreorder is true (default), then preorder! is run on the network beforehand. Otherwise, the network is assumed to be already in pre-order.

This function internally calls sharedpathmatrix, which computes the variance matrix between all the nodes of the network.

Examples

julia> tree_str = "(((t2:0.14,t4:0.33):0.59,t3:0.96):0.14,(t5:0.70,t1:0.18):0.90);";

julia> tree = readnewick(tree_str);

julia> C = vcv(tree)
5×5 DataFrame
 Row │ t2       t4       t3       t5       t1      
     │ Float64  Float64  Float64  Float64  Float64 
─────┼─────────────────────────────────────────────
   1 │    0.87     0.73     0.14      0.0     0.0
   2 │    0.73     1.06     0.14      0.0     0.0
   3 │    0.14     0.14     1.1       0.0     0.0
   4 │    0.0      0.0      0.0       1.6     0.9
   5 │    0.0      0.0      0.0       0.9     1.08

The following block needs ape to be installed (not run):

julia> using RCall # Comparison with ape vcv function

julia> R"ape::vcv(ape::read.tree(text = $tree_str))"
RCall.RObject{RCall.RealSxp}
     t2   t4   t3  t5   t1
t2 0.87 0.73 0.14 0.0 0.00
t4 0.73 1.06 0.14 0.0 0.00
t3 0.14 0.14 1.10 0.0 0.00
t5 0.00 0.00 0.00 1.6 0.90
t1 0.00 0.00 0.00 0.9 1.08

The covariance can also be calculated on a network (for the model, see Bastide et al. 2018)

julia> net = readnewick("((t1:1.0,#H1:0.1::0.30):0.5,((t2:0.9)#H1:0.2::0.70,t3:1.1):0.4);");

julia> C = vcv(net)
3×3 DataFrame
 Row │ t1       t2       t3      
     │ Float64  Float64  Float64 
─────┼───────────────────────────
   1 │    1.5     0.15      0.0
   2 │    0.15    1.248     0.28
   3 │    0.0     0.28      1.5

PhyloNetworks.writemultinewick — Method

writemultinewick(nets, file_name; append=false)
writemultinewick(nets, IO)

Write an array of networks in parenthetical extended Newick format, one network per line. Use the option append=true to append to the file. Otherwise, the default is to create a new file or overwrite it, if it already existed. Each network is written with writenewick.

Examples

julia> net = [readnewick("(D,((A,(B)#H7:::0.864):2.069,(F,E):3.423):0.265,(C,#H7:::0.1361111):10);"),
              readnewick("(A,(B,C));"),readnewick("(E,F);"),readnewick("(G,H,F);")];

julia> writemultinewick(net, "fournets.net") # to (over)write to file "fournets.net"
julia> writemultinewick(net, "fournets.net", append=true) # to append to this file
julia> writemultinewick(net, stdout)         # to write to the screen (standard out)
(D,((A,(B)#H7:::0.864):2.069,(F,E):3.423):0.265,(C,#H7:::0.1361111):10.0);
(A,(B,C));
(E,F);
(G,H,F);

PhyloNetworks.writenewick — Method

writenewick(net)
writenewick(net, filename)
writenewick(net, IO)

Write the parenthetical extended Newick format of a network, as a string, to a file or to an IO buffer / stream. Optional arguments (default values):

di (false): write in format for Dendroscope
round (false): rounds branch lengths and heritabilities γ
digits (3): digits after the decimal place for rounding
append (false): if true, appends to the file
internallabel (true): if true, writes internal node labels

If the current root placement is not admissible, other placements are tried. The network is updated with this new root placement, if successful.

Uses lower-level function writesubtree!.

PhyloNetworks.writesubtree! — Method

writesubtree!(IO, node, edge, dendroscope::Bool, namelabel::Bool,
              round_branch_lengths::Bool, digits::Integer, internallabel::Bool)

Write the extended newick format of the sub-network rooted at node and assuming that edge is a parent of node.

If the parent edge is nothing, the edge attribute ischild1 is used and assumed to be correct to write the subtree rooted at node. This is useful to write a subtree starting at a non-root node. Example:

net = readnewick("(((A,(B)#H1:::0.9),(C,#H1:::0.1)),D);")
directedges!(net)
s = IOBuffer()
writesubtree!(s, net.node[7], nothing, false, true)
String(take!(s))

Used by writenewick.

PhyloNetworks.writesubtree! — Method

writesubtree!(IO, network, dendroscope::Bool, namelabel::Bool,
              round_branch_lengths::Bool, digits::Integer,
              internallabel::Bool)

Write to IO the extended newick format (parenthetical description) of a network. If written for dendroscope, inheritance γ's are not written. If namelabel is true, taxa are labelled by their names; otherwise taxa are labelled by their number IDs. If unspecified, branch lengths and γ's are rounded to 3 digits. Use internallabel=false to suppress the labels of internal nodes.

PhyloNetworks.HybridNetwork — Type

HybridNetwork

Subtype of abstract Network type. Explicit network or tree with the following attributes:

numtaxa: number of taxa, that is, number of are leaves (or tips). Leaves are required to be attached to a single edge.
numnodes: total number of nodes: tips and internal nodes
numedges: total number of edges
numhybrids: total number of hybrid nodes
edge: vector of Edges
node: vector of Nodes
rooti: index of the root in vector 'node'. May be artificial in a semidirected network, but is necessary for printing and traversal purposes.
hybrid: vector of Nodes: those are are hybrid nodes
leaf: vector of Nodes: those that are leaves
fscore: score after fitting network to data, i.e. parsimony score, or multipe of the negative log pseudodeviance for SNaQ
isrooted: true or false
partition: vector of Partition

index

PhyloNetworks.HybridNetwork
PhyloNetworks.biconnectedcomponents
PhyloNetworks.biconnectedcomponents
PhyloNetworks.blobdecomposition
PhyloNetworks.calibratefrompairwisedistances!
PhyloNetworks.checkroot!
PhyloNetworks.cladewiseorder!
PhyloNetworks.countquartetsintrees
PhyloNetworks.deleteaboveLSA!
PhyloNetworks.deletehybridthreshold!
PhyloNetworks.deleteleaf!
PhyloNetworks.descendencematrix
PhyloNetworks.directedges!
PhyloNetworks.displayednetworkat!
PhyloNetworks.displayedtrees
PhyloNetworks.expectedf2matrix
PhyloNetworks.expectedf4table
PhyloNetworks.getchild
PhyloNetworks.getchildedge
PhyloNetworks.getchildren
PhyloNetworks.getlevel
PhyloNetworks.getnodeages
PhyloNetworks.getnodeheights
PhyloNetworks.getnodeheights!
PhyloNetworks.getnodeheights_average
PhyloNetworks.getnodeheights_majortree
PhyloNetworks.getparent
PhyloNetworks.getparentedge
PhyloNetworks.getparentedgeminor
PhyloNetworks.getparentminor
PhyloNetworks.getparents
PhyloNetworks.getpartneredge
PhyloNetworks.getroot
PhyloNetworks.hardwiredcluster
PhyloNetworks.hardwiredclusterdistance
PhyloNetworks.hardwiredclusters
PhyloNetworks.hashybridladder
PhyloNetworks.hassinglechild
PhyloNetworks.hybridclades_support
PhyloNetworks.hybridlambdaformat
PhyloNetworks.ischildof
PhyloNetworks.isexternal
PhyloNetworks.isgalled
PhyloNetworks.isleaf
PhyloNetworks.isparentof
PhyloNetworks.isrootof
PhyloNetworks.istimeconsistent
PhyloNetworks.istreechild
PhyloNetworks.majortree
PhyloNetworks.minortreeat
PhyloNetworks.mudistance_rooted
PhyloNetworks.mudistance_semidirected
PhyloNetworks.nameinternalnodes!
PhyloNetworks.nj
PhyloNetworks.nni!
PhyloNetworks.nni!
PhyloNetworks.nni!
PhyloNetworks.pairwisetaxondistancematrix
PhyloNetworks.parsimonyGF
PhyloNetworks.parsimonysoftwired
PhyloNetworks.preorder!
PhyloNetworks.printedges
PhyloNetworks.printnodes
PhyloNetworks.readfastatodna
PhyloNetworks.readmultinewick
PhyloNetworks.readmultinewick_files
PhyloNetworks.readnewick
PhyloNetworks.readnexus_treeblock
PhyloNetworks.removedegree2nodes!
PhyloNetworks.rootatnode!
PhyloNetworks.rootonedge!
PhyloNetworks.rotate!
PhyloNetworks.setgamma!
PhyloNetworks.setlength!
PhyloNetworks.setlengths!
PhyloNetworks.sharedpathmatrix
PhyloNetworks.shrink2cycles!
PhyloNetworks.shrink3cycles!
PhyloNetworks.tablequartetCF
PhyloNetworks.tablequartetf4
PhyloNetworks.tiplabels
PhyloNetworks.treeedgecomponents
PhyloNetworks.treeedges_support
PhyloNetworks.treeofblobs
PhyloNetworks.vcv
PhyloNetworks.writemultinewick
PhyloNetworks.writenewick
PhyloNetworks.writesubtree!
PhyloNetworks.writesubtree!