Comparing and manipulating networks
Examples below use networks available from the package. We can load them as follows. astraltree
is a tree estimated using ASTRAL, net0
is a network estimated using SNaQ under the constraint of 0 reticulation, so it's also a tree, and net1
is a network estimated under the constraint of 1 reticulation, so not a tree.
All of them are to be interpreted as semidirected networks, because their root is not identifiable by the method used to estimate them. This affects how we want them.
astralfile = joinpath(dirname(pathof(PhyloNetworks)), "..","examples","astral.tre")
astraltree = readmultinewick(astralfile)[102] # 102th tree = last tree here
net0 = readnewick(joinpath(dirname(pathof(PhyloNetworks)), "..","examples","net0.out"))
net1 = readnewick(joinpath(dirname(pathof(PhyloNetworks)), "..","examples","net1.out"))
Robinson-Foulds distance for trees
Do the 2 trees astraltree
and net0
have the same topology, unrooted? We can calculate the Robinson-Foulds distance between them, using a function that extends the Robinson-Foulds distance to general networks (more on this below).
julia> hardwiredclusterdistance(astraltree, net0, false)
0
The last option false
is to consider topologies as unrooted. The RF distance is 0, so the two unrooted topologies are the same. If we had considered them as rooted, with whatever root they currently have in their internal representation, we would find a difference:
julia> hardwiredclusterdistance(astraltree, net0, true)
5
Re-rooting trees and networks
We can re-root our networks with the outgroup, O, and then re-compare the ASTRAL tree and the SNaQ tree as rooted topologies (and find no difference):
julia> rootatnode!(astraltree, "O")
HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: E, O, C, D, ... (O,(((C,D)100.0:3.029,(B,A)100.0:1.324)100.0:0.464,E));
julia> rootatnode!(net0, "O")
HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: C, D, B, A, ... (O,(E,((B,A):1.396,(C,D):10.0):0.485));
julia> hardwiredclusterdistance(astraltree, net0, true)
0
using PhyloPlots, RCall
R"name <- function(x) file.path('..', 'assets', 'figures', x)"
R"svg(name('net0_O.svg'), width=4, height=4)"
R"par"(mar=[0,0,0,0])
plot(net0);
R"dev.off()"
Note that we use the possibilities of RCall
to save the plot. We only show these commands once, but they will be run behind the scene each time a plot is called.
After trees/networks are rooted with a correct outgroup, their visualization is more meaningful.
Networks can be re-rooted at a given node or along a given edge. Get help (type ?
) on the functions rootatnode!
and rootonedge!
for more info. There are examples in the Network support section.
If the network is plotted with crossing edges, you may identify ways to rotate the children edges at some nodes to untangle some crossing edges. This can be done using the function rotate!
. See an example in the Network support section, or type ?
then rotate!
.
Does the root conflict with the direction of a reticulation?
With 1 hybridization or more, the direction of hybrid edges constrain the position of the root. The root cannot be downstream of hybrid edges. Any hybrid node has to be younger than, or of the same age as both of its parents. So time has to flow "downwards" of any hybrid node, and the root cannot be placed "below" a hybrid node. An attempt to re-root the network at a position incompatible with hybrid edges will fail, with a RootMismatch
error. To show an example, let's use the network below. We plotted the edge numbers, because we will want to use them later to place the root.
net7taxa = readnewick("(C,D,((O,(E,#H7:::0.196):0.314):0.664,(((A1,A2))#H7:::0.804,B):10.0):10.0);")
plot(net7taxa, showgamma=true, showedgenumber=true, tipoffset=0.2);
Let's imagine that A1 and A2 are our outgroups, and we estimated the network above. According to this network, time must flow from the hybrid node towards A1 and A2. So any attempt to reroot the network with A1 as outgroup, or with A2 as outgroup, or with the A clade (on edge 11), will fail with a RootMismatch
error:
rootatnode!(net7taxa, "A1"); # ERROR: RootMismatch: non-leaf node 5 had 0 children. ...
rootatnode!(net7taxa, "A2"); # ERROR: RootMismatch (again)
rootonedge!(net7taxa, 10); # ERROR: RootMismatch (again)
In this case, however, it is possible to root the network on either parent edge of the hybrid node. These edges have numbers 12 and 5, based on the plot above. We get these 2 rooted versions of the network:
R"layout(matrix(1:2,1,2))";
rootonedge!(net7taxa, 11);
rotate!(net7taxa, -5)
plot(net7taxa, showgamma=true, tipoffset=0.2, shownodenumber=true);
R"mtext"("rooted on hybrid edge 11 (major)", line=-1)
rootonedge!(net7taxa, 5);
plot(net7taxa, showgamma=true, tipoffset=0.2);
R"mtext"("rooted on hybrid edge 5 (minor)", line=-1);
On the second plot, the A clade does not appear to be an outgroup, but this is just because the plot follows the major tree primarily, based the major hybrid edges (those with γ>0.5). We can display the exact same network differently, by changing the γ inheritance values to invert the major/minor consideration of the hybrid edges.
net7taxa.edge[5] # just to check that it's one of the 2 hybrid edges of interest
setgamma!(net7taxa.edge[5], 0.501) # switch major/minor edges
plot(net7taxa, tipoffset=0.2); # not showing gamma values, because we changed them artificially
R"mtext"("rooted on hybrid edge 5 (considered major)", line=-1);
Conclusion, in this particular example: it is possible to re-root the network to a place where the A clade is indeed an outgroup. But it did require some care, and we discovered that there are 2 acceptable rooting options. The first is more plausible, if we think that the species tree is the major tree, meaning that any gene flow or introgression event replaced less than 50% of the genes in the recipient population.
In other cases, it may not be possible to re-root the network with a known outgroup. It would be the case if A1 was the only outgroup, and if A2 was an ingroup taxon. In such a case, the outgroup knowledge tells us that our estimated network is wrong. One (or more) reticulation in the network must be incorrect. Its placement might be correct, but then its direction would be incorrect.
Extracting the major tree
We can also compare the networks estimated with h=0 (net0
) and h=1 (net1
):
julia> rootatnode!(net1, "O"); # the ; suppresses screen output
julia> hardwiredclusterdistance(net0, net1, true)
2
plot(net1, showgamma=true);
They differ by 2 clusters: that's because A is of hybrid descent in net1
(descendant of each hybrid edge), not in net0
.
To beyond this hybrid difference, we can extract the major tree from the network with 1 hybridization, that is, delete the hybrid edge supported by less than 50% of genes. Then we can compare this tree with the ASTRAL/SNaQ tree net0
.
julia> tree1 = majortree(net1); # major tree from net1
julia> hardwiredclusterdistance(net0, tree1, true)
0
They are identical (at distance 0), so here the species network with 1 hybrid node is a refinement of the estimated species tree (this needs not be the case always).
Hardwired-cluster distance
Is net1
, the SNaQ network with 1 hybrid node, the same as the true network, the network that was initially used to simulate the data?
(Digression on this example's data: gene trees were simulated under the coalescent along some "true" network, then 500 base-pair alignments were simulated along each gene tree with the HKY model, gene trees were estimated from each alignment with RAxML, and these estimated gene trees served as input to both ASTRAL and SNaQ.)
The "true" network is shown below, correctly rooted at the outgroup O, and plotted with branch lengths proportional to their values in coalescence units:
julia> truenet = readnewick("((((D:0.4,C:0.4):4.8,((A:0.8,B:0.8):2.2)#H1:2.2::0.7):4.0,(#H1:0::0.3,E:3.0):6.2):2.0,O:11.2);");
plot(truenet, useedgelength=true, showgamma=true);
We can compare two networks using the hardwired-cluster dissimilarity.
- This dissimilarity counts the number of hardwired clusters present in one network but absent in the other.
- The hardwired cluster associated with an edge is the set of all tips descendant from that edge, i.e. all tips that inherited at least some genetic material from that edge.
- Each edge is associated with its hardwired cluster of descendants, and also with a "tree" tag or "hybrid" tag depending on the edge type.
When comparing level-1 networks, or tree-child networks more generally, the hardwired-cluster dissimilarity is a distance: d(N, N') = 0 exactly when N and N' have the same topology. (Cardona et al. 2009, Bai et al. Maxfield et al. 2024). Unfortunately, this is not generally true for complex network: there are networks at hardwired-cluster 'distance' 0, that have different topologies. But no dissimilarity measure can both be fast to calculate and be a distance on the full space of phylogenetic networks (Cardona et al. 2014).
julia> hardwiredclusterdistance(net1, truenet, true) # true: yes consider these networks as rooted
4
Our estimated network is at distance 4 (not 0), so it is different from the true network (there was estimation error). From the plots, we see that:
- the underlying tree is correctly estimated
- the origin of gene flow is correctly estimated: E
- the target of gene flow is not correctly estimated: it was the lineage ancestral to (A,B), but it is estimated to be A only.
The 4 cluster differences correspond to the 2 hybrid edges in net1
(whose cluster is A) plus the 2 hybrid edges in truenet
(whose cluster is AB).
Displayed trees and subnetworks
We can extract all trees displayed in a network. These trees are obtained by picking one parent hybrid edge at each hybrid node, and dropping the other parent hybrid edge. We can choose to pick the "important" hybrid edges only, with heritability γ at or above a threshold. Below we use a γ threshold of 0, so we get all displayed trees:
julia> t = displayedtrees(net1, 0.0) # list of trees displayed in network
2-element Vector{HybridNetwork}: HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: C, D, O, E, ... (O,(((B,A):10.0,(C,D):10.0):0.664,E)); HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: C, D, O, E, ... (O,((E,A):0.314,((C,D):10.0,B):0.664));
julia> writenewick(t[1], round=true)
"(O,(((B,A):10.0,(C,D):10.0):0.664,E));"
julia> writenewick(t[2], round=true)
"(O,((E,A):0.314,((C,D):10.0,B):0.664));"
If we decide to keep edges with γ>0.2 only, then we are left with a single tree in the list (the major tree). This is because our example has 1 hybrid node with minor γ=0.196.
julia> t = displayedtrees(net1, 0.2)
1-element Vector{HybridNetwork}: HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: C, D, O, E, ... (O,(((B,A):10.0,(C,D):10.0):0.664,E));
We can also delete all "non-important" reticulations, those with a minor heritability γ below some threshold. The function below changes our network net1
, as indicated by its name ending with a !
.
julia> deletehybridthreshold!(net1, 0.1)
HybridNetwork, Rooted Network 13 edges 13 nodes: 6 tips, 1 hybrid nodes, 6 internal tree nodes. tip labels: C, D, O, E, ... (O,((E,#H7:::0.196):0.314,((B,(A)#H7:::0.804):10.0,(C,D):10.0):0.664));
Nothing happened to our network: because its γ is above 0.1. But if we set the threshold to 0.3, then our reticulation disappears:
julia> deletehybridthreshold!(net1, 0.3)
HybridNetwork, Rooted Network 10 edges 11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes. tip labels: C, D, O, E, ... (O,(((B,A):10.0,(C,D):10.0):0.664,E));
See also function displayednetworkat!
to get the network with a single reticulation of interest, and eliminate all other reticulations.