Comparing and manipulating networks

Examples below use networks available from the package. We can load them as follows. astraltree is a tree estimated using ASTRAL, net0 is a network estimated using SNaQ under the constraint of 0 reticulation, so it's also a tree, and net1 is a network estimated under the constraint of 1 reticulation, so not a tree.

All of them are to be interpreted as semidirected networks, because their root is not identifiable by the method used to estimate them. This affects how we want them.

astralfile = joinpath(dirname(pathof(PhyloNetworks)), "..","examples","astral.tre")
astraltree = readmultinewick(astralfile)[102] # 102th tree = last tree here
net0 = readnewick(joinpath(dirname(pathof(PhyloNetworks)), "..","examples","net0.out"))
net1 = readnewick(joinpath(dirname(pathof(PhyloNetworks)), "..","examples","net1.out"))

Robinson-Foulds distance for trees

Do the 2 trees astraltree and net0 have the same topology, unrooted? We can calculate the Robinson-Foulds distance between them, using a function that extends the Robinson-Foulds distance to general networks (more on this below).

julia> hardwiredclusterdistance(astraltree, net0, false)0

The last option false is to consider topologies as unrooted. The RF distance is 0, so the two unrooted topologies are the same. If we had considered them as rooted, with whatever root they currently have in their internal representation, we would find a difference:

julia> hardwiredclusterdistance(astraltree, net0, true)5

Re-rooting trees and networks

We can re-root our networks with the outgroup, O, and then re-compare the ASTRAL tree and the SNaQ tree as rooted topologies (and find no difference):

julia> rootatnode!(astraltree, "O")HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: E, O, C, D, ...
(O,(((C,D)100.0:3.029,(B,A)100.0:1.324)100.0:0.464,E));
julia> rootatnode!(net0, "O")HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: C, D, B, A, ... (O,(E,((B,A):1.396,(C,D):10.0):0.485));
julia> hardwiredclusterdistance(astraltree, net0, true)0
using PhyloPlots, RCall
R"name <- function(x) file.path('..', 'assets', 'figures', x)"
R"svg(name('net0_O.svg'), width=4, height=4)"
R"par"(mar=[0,0,0,0])
plot(net0);
R"dev.off()"

net0_O

Note that we use the possibilities of RCall to save the plot. We only show these commands once, but they will be run behind the scene each time a plot is called.

After trees/networks are rooted with a correct outgroup, their visualization is more meaningful.

Networks can be re-rooted at a given node or along a given edge. Get help (type ?) on the functions rootatnode! and rootonedge! for more info. There are examples in the Network support section.

If the network is plotted with crossing edges, you may identify ways to rotate the children edges at some nodes to untangle some crossing edges. This can be done using the function rotate!. See an example in the Network support section, or type ? then rotate!.

Does the root conflict with the direction of a reticulation?

With 1 hybridization or more, the direction of hybrid edges constrain the position of the root. The root cannot be downstream of hybrid edges. Any hybrid node has to be younger than, or of the same age as both of its parents. So time has to flow "downwards" of any hybrid node, and the root cannot be placed "below" a hybrid node. An attempt to re-root the network at a position incompatible with hybrid edges will fail, with a RootMismatch error. To show an example, let's use the network below. We plotted the edge numbers, because we will want to use them later to place the root.

net7taxa = readnewick("(C,D,((O,(E,#H7:::0.196):0.314):0.664,(((A1,A2))#H7:::0.804,B):10.0):10.0);")
plot(net7taxa, showgamma=true, showedgenumber=true, tipoffset=0.2);

reroot net7taxa 1

Let's imagine that A1 and A2 are our outgroups, and we estimated the network above. According to this network, time must flow from the hybrid node towards A1 and A2. So any attempt to reroot the network with A1 as outgroup, or with A2 as outgroup, or with the A clade (on edge 11), will fail with a RootMismatch error:

rootatnode!(net7taxa, "A1"); # ERROR: RootMismatch: non-leaf node 5 had 0 children. ...
rootatnode!(net7taxa, "A2"); # ERROR: RootMismatch (again)
rootonedge!(net7taxa, 10);   # ERROR: RootMismatch (again)

In this case, however, it is possible to root the network on either parent edge of the hybrid node. These edges have numbers 12 and 5, based on the plot above. We get these 2 rooted versions of the network:

R"layout(matrix(1:2,1,2))";
rootonedge!(net7taxa, 11);
rotate!(net7taxa, -5)
plot(net7taxa, showgamma=true, tipoffset=0.2, shownodenumber=true);
R"mtext"("rooted on hybrid edge 11 (major)", line=-1)
rootonedge!(net7taxa, 5);
plot(net7taxa, showgamma=true, tipoffset=0.2);
R"mtext"("rooted on hybrid edge 5 (minor)", line=-1);

reroot net7taxa 2

On the second plot, the A clade does not appear to be an outgroup, but this is just because the plot follows the major tree primarily, based the major hybrid edges (those with γ>0.5). We can display the exact same network differently, by changing the γ inheritance values to invert the major/minor consideration of the hybrid edges.

net7taxa.edge[5] # just to check that it's one of the 2 hybrid edges of interest
setgamma!(net7taxa.edge[5], 0.501) # switch major/minor edges
plot(net7taxa, tipoffset=0.2); # not showing gamma values, because we changed them artificially
R"mtext"("rooted on hybrid edge 5 (considered major)", line=-1);

reroot net7taxa 3

Conclusion, in this particular example: it is possible to re-root the network to a place where the A clade is indeed an outgroup. But it did require some care, and we discovered that there are 2 acceptable rooting options. The first is more plausible, if we think that the species tree is the major tree, meaning that any gene flow or introgression event replaced less than 50% of the genes in the recipient population.

In other cases, it may not be possible to re-root the network with a known outgroup. It would be the case if A1 was the only outgroup, and if A2 was an ingroup taxon. In such a case, the outgroup knowledge tells us that our estimated network is wrong. One (or more) reticulation in the network must be incorrect. Its placement might be correct, but then its direction would be incorrect.

Extracting the major tree

We can also compare the networks estimated with h=0 (net0) and h=1 (net1):

julia> rootatnode!(net1, "O"); # the ; suppresses screen output
julia> hardwiredclusterdistance(net0, net1, true)2
plot(net1, showgamma=true);

net1_O

They differ by 2 clusters: that's because A is of hybrid descent in net1 (descendant of each hybrid edge), not in net0.

To beyond this hybrid difference, we can extract the major tree from the network with 1 hybridization, that is, delete the hybrid edge supported by less than 50% of genes. Then we can compare this tree with the ASTRAL/SNaQ tree net0.

julia> tree1 = majortree(net1); # major tree from net1
julia> hardwiredclusterdistance(net0, tree1, true)0

They are identical (at distance 0), so here the species network with 1 hybrid node is a refinement of the estimated species tree (this needs not be the case always).

Hardwired-cluster distance

Is net1, the SNaQ network with 1 hybrid node, the same as the true network, the network that was initially used to simulate the data?

(Digression on this example's data: gene trees were simulated under the coalescent along some "true" network, then 500 base-pair alignments were simulated along each gene tree with the HKY model, gene trees were estimated from each alignment with RAxML, and these estimated gene trees served as input to both ASTRAL and SNaQ.)

The "true" network is shown below, correctly rooted at the outgroup O, and plotted with branch lengths proportional to their values in coalescence units:

julia> truenet = readnewick("((((D:0.4,C:0.4):4.8,((A:0.8,B:0.8):2.2)#H1:2.2::0.7):4.0,(#H1:0::0.3,E:3.0):6.2):2.0,O:11.2);");
plot(truenet, useedgelength=true, showgamma=true);

truenet

We can compare two networks using the hardwired-cluster dissimilarity.

  • This dissimilarity counts the number of hardwired clusters present in one network but absent in the other.
  • The hardwired cluster associated with an edge is the set of all tips descendant from that edge, i.e. all tips that inherited at least some genetic material from that edge.
  • Each edge is associated with its hardwired cluster of descendants, and also with a "tree" tag or "hybrid" tag depending on the edge type.
distance versus dissimilarity

When comparing level-1 networks, or tree-child networks more generally, the hardwired-cluster dissimilarity is a distance: d(N, N') = 0 exactly when N and N' have the same topology. (Cardona et al. 2009, Bai et al. Maxfield et al. 2024). Unfortunately, this is not generally true for complex network: there are networks at hardwired-cluster 'distance' 0, that have different topologies. But no dissimilarity measure can both be fast to calculate and be a distance on the full space of phylogenetic networks (Cardona et al. 2014).

julia> hardwiredclusterdistance(net1, truenet, true) # true: yes consider these networks as rooted4

Our estimated network is at distance 4 (not 0), so it is different from the true network (there was estimation error). From the plots, we see that:

  • the underlying tree is correctly estimated
  • the origin of gene flow is correctly estimated: E
  • the target of gene flow is not correctly estimated: it was the lineage ancestral to (A,B), but it is estimated to be A only.

The 4 cluster differences correspond to the 2 hybrid edges in net1 (whose cluster is A) plus the 2 hybrid edges in truenet (whose cluster is AB).

Displayed trees and subnetworks

We can extract all trees displayed in a network. These trees are obtained by picking one parent hybrid edge at each hybrid node, and dropping the other parent hybrid edge. We can choose to pick the "important" hybrid edges only, with heritability γ at or above a threshold. Below we use a γ threshold of 0, so we get all displayed trees:

julia> t = displayedtrees(net1, 0.0) # list of trees displayed in network2-element Vector{HybridNetwork}:
 HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: C, D, O, E, ...
(O,(((B,A):10.0,(C,D):10.0):0.664,E));

 HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: C, D, O, E, ...
(O,((E,A):0.314,((C,D):10.0,B):0.664));
julia> writenewick(t[1], round=true)"(O,(((B,A):10.0,(C,D):10.0):0.664,E));"
julia> writenewick(t[2], round=true)"(O,((E,A):0.314,((C,D):10.0,B):0.664));"

If we decide to keep edges with γ>0.2 only, then we are left with a single tree in the list (the major tree). This is because our example has 1 hybrid node with minor γ=0.196.

julia> t = displayedtrees(net1, 0.2)1-element Vector{HybridNetwork}:
 HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: C, D, O, E, ...
(O,(((B,A):10.0,(C,D):10.0):0.664,E));

We can also delete all "non-important" reticulations, those with a minor heritability γ below some threshold. The function below changes our network net1, as indicated by its name ending with a !.

julia> deletehybridthreshold!(net1, 0.1)HybridNetwork, Rooted Network
13 edges
13 nodes: 6 tips, 1 hybrid nodes, 6 internal tree nodes.
tip labels: C, D, O, E, ...
(O,((E,#H7:::0.196):0.314,((B,(A)#H7:::0.804):10.0,(C,D):10.0):0.664));

Nothing happened to our network: because its γ is above 0.1. But if we set the threshold to 0.3, then our reticulation disappears:

julia> deletehybridthreshold!(net1, 0.3)HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: C, D, O, E, ...
(O,(((B,A):10.0,(C,D):10.0):0.664,E));

See also function displayednetworkat! to get the network with a single reticulation of interest, and eliminate all other reticulations.