consensus phylogenies

consensus of phylogenetic trees

Given a set of input phylogenies that are all trees, we can get their greedy consensus or their majority-rule consensus with consensustree.

To give an example, we will use a set of trees from an example file that comes with the package:

julia> using PhyloNetworks
julia> inputfile = joinpath(dirname(pathof(PhyloSummaries)), "..","test","raxmltrees.tre");
julia> treesample = readmultinewick(inputfile);
julia> length(treesample) # 30 trees30
julia> treesample[1] # first tree in the listPhyloNetworks.HybridNetwork, Rooted Network
9 edges
10 nodes: 6 tips, 0 hybrid nodes, 4 internal tree nodes.
tip labels: E, A, B, C, ...
(E:0.038,((A:0.014,B:0.01):0.01,(C:0.008,D:0.002):0.01):0.025,O:0.078);

To visualize trees and network, we use package PhyloPlots.

using PhyloPlots
using RCall            # to tweak our plot within R

R"layout"([1 2])       # figure of 2 panels
R"par"(mar=[0,0,1,0])  # for smaller margins
plot(treesample[1], showedgelength=true);
R"mtext"("tree 1")     # add text annotation: title here
plot(treesample[2], showedgelength=true);
R"mtext"("tree 2")

trees 1-2, from raxml.tre

By default, we get the greedy consensus tree of our input trees, considered as unrooted trees.

julia> con = consensustree(treesample)PhyloNetworks.HybridNetwork, Semidirected Network
9 edges
10 nodes: 6 tips, 0 hybrid nodes, 4 internal tree nodes.
tip labels: A, B, C, D, ...
(E,O,((A,B),(C,D)));
julia> writenewick(con, support=true)"(E,O,((A,B)::0.8333333333333334,(C,D)::1.0)::0.5333333333333333);"

To plot the consensus tree showing support values, we can first extract the support values into a data frame, then use it to label edges. Below, we multiple support values by 100 to get percentages.

julia> using DataFrames
julia> esup = DataFrame(
           number = [e.number for e in con.edge if !isexternal(e)],
           support = [round(100 * e.y, digits=1) for e in con.edge if !isexternal(e)]
       )3×2 DataFrame
 Row │ number  support 
     │ Int64   Float64 
─────┼─────────────────
   1 │      7     53.3
   2 │      8     83.3
   3 │      9    100.0
julia> ecol = Dict(r[:number] => (r[:support] < 70 ? "orange2" : "black")
                   for r in eachrow(esup))Dict{Int64, String} with 3 entries:
  7 => "orange2"
  9 => "black"
  8 => "black"

plot(con, showedgenumber=true);
R"mtext"("edge numbers", line=0)
plot(con, edgelabel=esup, edgecolor=ecol);
R"mtext"("support: % input trees", line=0)

majority rule consensus tree

consensus tree of blob

Given a set of input phylogenetic networks (which may be trees or not), we can obtain a consensus of their "tree of blobs".

For a given network, the tree of blob is obtained by contracting each blob (2-edge connected component) in the network into a single node. We get a tree, whose edges are exactly the "cut edges" of the network: the edges that, when removed from the network, cut it into disconnected components.

The example below uses a set of 10 networks from a file that comes with the package, all of which have a single reticulation:

julia> netfile = joinpath(dirname(pathof(PhyloSummaries)), "..",
           "test","bootstrapnets_h1.nwk");
julia> netsample = readmultinewick(netfile);
julia> length(netsample) # 10 networks10
julia> netsample[1] # first network. taxa: t1 through t6PhyloNetworks.HybridNetwork, Rooted Network
12 edges
12 nodes: 6 tips, 1 hybrid nodes, 5 internal tree nodes.
tip labels: t2, t5, t6, t3, ...
(t2,(((t5,(t6)#H7:::0.665):1.116,t3):0.509,t4):0.532,(t1,#H7:::0.335):0.812);

Let's look at the first and second, say:

plot(netsample[1], showedgelength=true);
plot(netsample[2], showedgelength=true);
R"mtext"("net 2");

nets 1 & 2, from bootstrapnets_h1

The function consensus_treeofblobs returns a NamedTuple with the following keys:

tob: the consensus tree of blobs
taxa: list of taxon names, alphabetically, to connect with taxon numbers in some columns of the tables below
blob_table: support for each blob partition in the consensus.
circorder_table: support for circular orderings of the taxon blocks within each blob
hybrid_table: support for hybrid clades within blobs
bipartition_table: support for cut-edge bipartitions not adjacent to an interesting blob (i.e. non-redundant bipartitions).

The tables are row-oriented (NamedTuples of vectors) and can be converted to data frames.

julia> res = consensus_treeofblobs(netsample);┌ Info: Node & edge numbers in tables correspond to current numbers in net.
│ If the network is modified or re-read from file, restore matching
└ numbers with `resetnodenumbers_fromnames!` and `edgenumber`.
julia> keys(res)(:tob, :blob_table, :circorder_table, :hybrid_table, :bipartition_table, :taxa)
julia> tree = res[:tob]PhyloNetworks.HybridNetwork, Semidirected Network
6 edges
7 nodes: 6 tips, 0 hybrid nodes, 1 internal tree nodes.
tip labels: t1, t2, t3, t4, ...
(t1,t2,t3,t4,t5,t6)_7_blob1;
julia> writenewick(tree, internallabel=false) # star tree! 1 big blob"(t1,t2,t3,t4,t5,t6);"
julia> writenewick(tree) # internal node names: indicate their numbers used later"(t1,t2,t3,t4,t5,t6)_7_blob1;"

Here our consensus tree of blob is a star! The root is a polytomy with 6 edges: one to each taxon. This root node represents a blob that is present in many input networks. It was assigned the name _7_blob1, so later this node will be referred as "node 7" and stands for "blob 1".

The other parts of the results are tables that let us inspect the support for the blob(s) and other features. Here, we use the DataFrames package to display the tables.

julia> using DataFrames
julia> res[:taxa]6-element Vector{String}:
 "t1"
 "t2"
 "t3"
 "t4"
 "t5"
 "t6"
julia> blb_df = DataFrame(res[:blob_table], copycols=false); # re-use in memory
julia> show(blb_df, allcols=true)1×6 DataFrame
 Row │ blob   degree  node   support_partition  partition_num  partition         
     │ Int64  Int64   Int64  Float64            String         String            
─────┼───────────────────────────────────────────────────────────────────────────
   1 │     1       6      7                0.6  2|4|3|5|6|1    t2|t4|t3|t5|t6|t1

This blb_df table describes each blob partition, and shows its support. Here our consensus tree of blobs has a single blob, so this table has a single row.

When the edges and nodes of this blob are removed, the full taxon set is disconnected into multiple taxon blocks: a partition of the taxa. This partition is described by a list of taxa in each block, with blocks separated by |. So here we see that our partition has 6 parts, or taxon blocks, and each taxon is alone in its own block.
The number of taxon blocks is the degree of the blob, given in the "degree" column.
The "node" column given the number of the node that represents the blob, in the tree of blobs. It's a node with ≥ 4 edges (a polytomy), "degree" edges actually: here 6 edges. Here, it's the root. This node number is indicated in the name assigned to internal node (here 7: in _7_blob1, see above).
The support_partition column gives the proportion of input networks that have a blob with this particular partition: when the blob is removed from the input network, the taxa are divided into these same blocks. Here, 60% of our networks have this partition –and 40% don't. Scrolling back up, we see that "net 1" does have a blob with this partition, and "net 2" does not. This second network "net 2" has t1,t2 together in the same block, when we delete its blob.
The partition is given twice: with taxon numbers in column "partition_num", and with taxon names in column "partition". With longer taxon names than in this simple example, the full partition is often long and not shown in full. To see more with all taxon names, we can increase the truncate number. For example:

julia> show(select(blb_df, [:support_partition,:partition]), truncate=64)1×2 DataFrame
 Row │ support_partition  partition         
     │ Float64            String            
─────┼──────────────────────────────────────
   1 │               0.6  t2|t4|t3|t5|t6|t1

Next we can look the table describing bipartitions: edges in the consensus tree of blobs that, when deleted, separate the taxon into two parts. Since many edges are already in the tree because they are connected to a blob, we only look at the support of edges as being non-redundant with any blob, that is, additional support: the proportion of input networks that have this edge not connected to any blob.

Here we don't have any such edges because our consensus is a star tree (a single blob, no cut edges). So the corresponding table bip_df is empty:

julia> bip_df = DataFrame(res[:bipartition_table], copycols=false)0×6 DataFrame
 Row │ node1  node2  edge   support_nonredundant  cluster_num  cluster 
     │ Int64  Int64  Int64  Float64               String       String  
─────┴─────────────────────────────────────────────────────────────────

More interestingly then, we can look at which taxon blocks are coming off of a (any) blob from a hybrid node:

julia> hyb_df = DataFrame(res[:hybrid_table], copycols=false)4×7 DataFrame
 Row │ blob   node_from  node_to  edge   support_hybrid  cluster_num  cluster 
     │ Int64  Int64      Int64    Int64  Float64         String       String  
─────┼────────────────────────────────────────────────────────────────────────
   1 │     1          7        6      6             0.5  6            t6
   2 │     1          7        1      1             0.1  1            t1
   3 │     1          7        4      4             0.2  4            t4
   4 │     1          7        3      3             0.2  3            t3

This hyb_df table shows which taxon blocks are of hybrid origin in the input networks, and in which proportion of networks. A taxon block is of "hybrid origin" if it is below a hybrid node that "exits" a blob: a hybrid with a child edge outside the blob that the hybrid is in, and whose descendant clade is exactly this taxon block.

In our example:

The most frequent hybrid is t6: in 50% of input networks. For example, t6 is of hybrid origin in "net 1" above. Note that it may be of hybrid origin in networks that do not have the blob in our consensus tree of blobs.
The other rows list the other taxon blocks of hybrid origin in some input networks: t3, t4 (each in 20% of networks) and t1 (10%). For example, t3 is of hybrid origin in "net 2", although below a blob that partitions the taxa differently than the blob retained in the consensus.

The table about circular orders is about blobs of level 1 in input networks, for which the taxon blocks are arranged around the blob like around a circle. This "circular order" can be read clock-wise or counter-clockwise, and starting from any taxon block. The table below lists all the circular orders found in the input network, for the blob partitions retained in the consensus.

julia> cir_df = DataFrame(res[:circorder_table], copycols=false);
julia> show(cir_df, allcols=true)1×5 DataFrame
 Row │ blob   order               support_circorder  partition_num  partition         
     │ Int64  Tuple…              Float64            String         String            
─────┼────────────────────────────────────────────────────────────────────────────────
   1 │     1  (1, 2, 3, 4, 5, 6)                0.6  2|4|3|5|6|1    t2|t4|t3|t5|t6|t1

Here, this table has a single row: the same circular order was found in all input networks that have the blob in our consensus (60% of input networks from the blob information in blb_df above). This circular order is indicated in the columns partition_num (taxon numbers) and partition (taxon names). Turning around the cycle, and starting arbitrarily at t2, our networks have this order: t2 → t4 → t3 → t5 → t6 → t1 (then back to t2).

Below, we visualize some of these support values on the consensus tree of blobs. We show:

in blue, at nodes: the support for the blob partition at each blob node
in black, along edges adjacent to a blob: the support for each taxon block to be of hybrid origin (in input networks that have this blob or not).

To annotate the appropriate nodes and edges of the consensus, we use the node and edge numbers in the table. To clarify, we first plot the consensus with nodes and edges annotated with their numbers. And for fun, we plot the 8th intput network: which has the blob in the consensus, but with t1 of hybrid origin, not t6 like in "net 1".

R"layout"([1 2 3]);
R"par"(mar=[0,0,0.5,0]);
plot(tree, shownodenumber=true, showedgenumber=true, tipoffset=0.05);
R"mtext"("node & edge numbers", side=3, line=-1);
plot(tree, nodelabelcolor="orangered", edgelabelcolor="deepskyblue",
    nodelabeladj=1.1, edgelabeladj=[.5, -0.2],
    nodelabel=select(blb_df, [:node, :support_partition]),
    edgelabel=select(hyb_df, [:edge, :support_hybrid]));
R"mtext"("blob (red) & hybrid (blue) support", side=3, line=-1);
rotate!(netsample[8], -3);
rotate!(netsample[8], -6); # to de-tangle crossing edges in plot below
plot(netsample[8]);
R"mtext"("input net 8", side=3, line=-1);

consensus tree of blobs with support values

consensus level-1 network

The function consensus_level1network first calculates the consensus tree of blobs, then expands each blob node in that tree, choosing the best-supported circular ordering and hybrid clade for each blob. This function only takes level-1 networks as input: in which each blob is a cycle with a circular order and a single hybrid node. The result is a level-1 phylogenetic network.

Using the same set of input networks:

julia> res_l1 = consensus_level1network(netsample);┌ Info: Node & edge numbers in tables correspond to current numbers in net.
│ If the network is modified or re-read from file, restore matching
└ numbers with `resetnodenumbers_fromnames!` and `edgenumber`.
julia> keys(res_l1)(:net, :blob_table, :hybrid_table, :bipartition_table, :taxa)
julia> con = res_l1[:net] # node labels match node numbers used laterPhyloNetworks.HybridNetwork, Semidirected Network
12 edges
12 nodes: 6 tips, 1 hybrid nodes, 5 internal tree nodes.
tip labels: t1, t2, t3, t4, ...
(t2,(t4,(t3,(t5,#H11)_10)_9)_8,(t1,(t6)#H11)_12)_7_blob1;
julia> writenewick(con, internallabel=false)"(t2,(t4,(t3,(t5,#H11))),(t1,(t6)#H11));"

The plot below shows our consensus level-1 network, with node and edges annotated by their numbers. We used the first plot to decide at which nodes to rotate edges, to untangle the crossing lines.

R"par"(mar=[0,0,0.5,0]);
plot(con, shownodenumber=true, showedgenumber=true, tipoffset=0.05);
rotate!(con, 12); # rotating edges below node 12 will un-cross lines
R"mtext"("node & edge numbers", side=3, line=-1);
plot(con, shownodelabel=true);
R"mtext"("node names: match node numbers", side=3, line=-1);

consensus level-1 network with edge/node numbers and names

Like for the consensus tree of blobs, the rest of the result gives us support values for various features of our consensus.

The blob_table for a level-1 network consensus includes additional columns compared to the tree-of-blobs version: hybrid (the hybrid node number), support_circorder (support for the chosen circular ordering) and hybrid_cluster (the taxon block chosen as the hybrid clade):

julia> blb_l1 = DataFrame(res_l1[:blob_table], copycols=false)1×11 DataFrame
 Row │ blob   degree  node   hybrid  support_partition  support_circorder  support_hybrid  partition_num  hybrid_cluster_num  partition          hybrid_cluster 
     │ Int64  Int64   Int64  Int64   Float64            Float64            Float64         String         String              String             String         
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     1       6      7      11                0.6                0.6             0.5  2|4|3|5|6|1    6                   t2|t4|t3|t5|t6|t1  t6
julia> names(blb_l1) # column names: including those not shown above11-element Vector{String}:
 "blob"
 "degree"
 "node"
 "hybrid"
 "support_partition"
 "support_circorder"
 "support_hybrid"
 "partition_num"
 "hybrid_cluster_num"
 "partition"
 "hybrid_cluster"
julia> select(blb_l1, r"hybrid") # columns with 'hybrid' in col name1×4 DataFrame
 Row │ hybrid  support_hybrid  hybrid_cluster_num  hybrid_cluster 
     │ Int64   Float64         String              String         
─────┼────────────────────────────────────────────────────────────
   1 │     11             0.5  6                   t6
julia> show(select(blb_l1, r"partition"), truncate=64)1×3 DataFrame
 Row │ support_partition  partition_num  partition         
     │ Float64            String         String            
─────┼─────────────────────────────────────────────────────
   1 │               0.6  2|4|3|5|6|1    t2|t4|t3|t5|t6|t1

We know from the tree-of-blobs consensus that other taxon blocks (like t1) are of hybrid origin in some input networks. The blb_l1 table only lists t6 as hybrid, because it's the hybrid used to build the level-1 consensus network, and it has 1 row per blob. We see that t6 is the block below the blob's hybrid node in the consensus, and it is of hybrid origin (perhaps in different types of blobs) in 60% of input networks.

But we get the information that t1 is an alternative from the table about hybrid clades, which can have multiple rows per blob, with the same columns as from the tree-of-blobs consensus above:

julia> hyb_l1 = DataFrame(res_l1[:hybrid_table], copycols=false)4×7 DataFrame
 Row │ blob   node_from  node_to  edge   support_hybrid  cluster_num  cluster 
     │ Int64  Int64      Int64    Int64  Float64         String       String  
─────┼────────────────────────────────────────────────────────────────────────
   1 │     1         11        6      6             0.5  6            t6
   2 │     1         12        1      1             0.1  1            t1
   3 │     1          8        4      4             0.2  4            t4
   4 │     1          9        3      3             0.2  3            t3

It's easier to see support values on a plot. Below, no edge has any additional support for being non-redundant with a blob (because our blob is large: no internal edge is outside the blob).

plot(con, shownodenumber=true, showedgenumber=true, tipoffset=0.05);
R"mtext"("node & edge numbers", side=3, line=-.5);
plot(con, nodelabelcolor="orangered", edgelabelcolor="deepskyblue",
    nodelabeladj=-0.1, edgelabeladj=[.5, -0.2],
    nodelabel=select(blb_l1, [:node, :support_partition]),
    edgelabel=select(hyb_l1, [:edge, :support_hybrid]));
R"mtext"("blob (red) & hybrid (blue) support", side=3, line=-.5);

consensus level-1 network with support values

saving the results

To save the consensus network and its data tables to files, use consensus_level1network_save. The following will create 4 files: level1consensus_net.nwk with the consensus network in newick format, and the various tables in csv format: level1consensus_blob.csv, level1consensus_hybrid.csv and level1consensus_bipartition.csv.

julia> consensus_level1network_save(res_l1, "level1consensus")

We can re-read these files later, like this:

con2 = readnewick("level1consensus_net.nwk");
writenewick(con2) # internal node names give node numbers used in tables

"(t2,(t4,(t3,(t5,#H11)_10)_9)_8,((t6)#H11,t1)_12)_7_blob1;"

using CSV
blb2 = CSV.read("level1consensus_blob.csv", DataFrame)
hyb2  = CSV.read("level1consensus_hybrid.csv", DataFrame); # no edge column
bip2  = CSV.read("level1consensus_bipartition.csv", DataFrame); # no edge

However, care should be taken after re-reading the network from the newick file to recover the original node and edge numbers, because readnewick may set them differently. We want to recover internal node numbers in the network that match with those saved in the csv files. To get the original node numbers, which are part of the node labels, use resetnodenumbers_fromnames! like below.

julia> printnodes(con2) # node numbers are in the first columnnode leaf  hybrid name     i_cycle edges'numbers
1    true  false  t2       -1      1   
2    true  false  t4       -1      2   
3    true  false  t3       -1      3   
4    true  false  t5       -1      4   
6    false false  _10      -1      4    5    6   
7    false false  _9       -1      3    6    7   
8    false false  _8       -1      2    7    8   
9    true  false  t6       -1      9   
5    false true   H11      -1      9    5    10  
10   true  false  t1       -1      11  
11   false false  _12      -1      10   11   12  
-2   false false  _7_blob1 -1      1    8    12
julia> resetnodenumbers_fromnames!(con2)PhyloNetworks.HybridNetwork, Rooted Network
12 edges
12 nodes: 6 tips, 1 hybrid nodes, 5 internal tree nodes.
tip labels: t2, t4, t3, t5, ...
(t2,(t4,(t3,(t5,#H11)_10)_9)_8,((t6)#H11,t1)_12)_7_blob1;
julia> printnodes(con2) # new node numbers: now match names. everything else the samenode leaf  hybrid name     i_cycle edges'numbers
2    true  false  t2       -1      1   
4    true  false  t4       -1      2   
3    true  false  t3       -1      3   
5    true  false  t5       -1      4   
10   false false  _10      -1      4    5    6   
9    false false  _9       -1      3    6    7   
8    false false  _8       -1      2    7    8   
6    true  false  t6       -1      9   
11   false true   H11      -1      9    5    10  
1    true  false  t1       -1      11  
12   false false  _12      -1      10   11   12  
7    false false  _7_blob1 -1      1    8    12

Now, we can re-do our plot, but the original edge numbers are not recovered. To map hybrid support, or support for edges as coming off a hybrid node, we need to get the current number of each edge based on its nodes.

julia> hyb2 # no 'edge' column, because its old numbers are different after readnewick4×6 DataFrame
 Row │ blob   node_from  node_to  support_hybrid  cluster_num  cluster 
     │ Int64  Int64      Int64    Float64         Int64        String3 
─────┼─────────────────────────────────────────────────────────────────
   1 │     1         11        6             0.5            6  t6
   2 │     1         12        1             0.1            1  t1
   3 │     1          8        4             0.2            4  t4
   4 │     1          9        3             0.2            3  t3
julia> hyb2.edge = edgenumbers_fromnodenumbers(hyb2, con2);
julia> hyb2 # new edge column: with number that match those in "con2" network4×7 DataFrame
 Row │ blob   node_from  node_to  support_hybrid  cluster_num  cluster  edge  
     │ Int64  Int64      Int64    Float64         Int64        String3  Int64 
─────┼────────────────────────────────────────────────────────────────────────
   1 │     1         11        6             0.5            6  t6           9
   2 │     1         12        1             0.1            1  t1          11
   3 │     1          8        4             0.2            4  t4           2
   4 │     1          9        3             0.2            3  t3           3

Edge support, as non-redundant bipartitions, can be extracted from the network itself: these support values were saved in the newick file. Or it can be recovered from the bipartition table like we did for the hybrid table above:

bip2.edge = edgenumbers_fromnodenumbers(bip2, con2)

Here this is not interesting because no edge had support as a non-redundant bipartition, so this table is empty.

plot(con2, shownodenumber=true, showedgenumber=true, tipoffset=0.05);
R"mtext"("node & edge numbers", side=3, line=-.5);
R"mtext"("nodes: as before. edges: different", side=1, line=-1);
plot(con2, nodelabelcolor="orangered", edgelabelcolor="deepskyblue",
    nodelabeladj=-0.1, edgelabeladj=[.5, -0.2],
    nodelabel = select(blb2, [:node, :support_partition]),
    edgelabel = select(hyb2, [:edge, :support_hybrid])
);
R"mtext"("blob (red) & hybrid (blue) support", side=3, line=-.5);

consensus level-1 network with support values, after re-reading results from file

outgroup

We can specify one outgroup to root the consensus network. This forces the root to be along the edge between the outgroup taxon and the rest of the network, and prevents any reticulation from the ingroup into this outgroup.

Here, for example, if we indicate that t6 is the outgroup, then the second most frequent hybrid block is chosen instead of t6. We saw that t3 and t4 are tied next (in section consensus tree of blob). One of the 2 will be chosen arbitrarily, here t4:

res_t6out = consensus_level1network(netsample; outgroup="t6", suppressinfo=true);
writenewick(res_t6out[:net], internallabel=false) # t4 below the hybrid H10

"(t6,((t5,(t3,(t4)#H10)),(t1,(t2,#H10))));"

plot(res_t6out[:net], shownodenumber=true, tipoffset=0.05);
R"mtext"("node numbers", side=3, line=-.5);
R"mtext"("to see around which ones to rotate edges", side=1, line=-1);
for i in [13,9] rotate!(res_t6out[:net], i); end
plot(res_t6out[:net], tipoffset=0.05, nodelabeladj=-0.1, edgelabeladj=[.5,-0.2],
    nodelabelcolor="orangered", edgelabelcolor="deepskyblue",
    nodelabel = DataFrame(res_t6out[:blob_table][[:node, :support_partition]]),
    edgelabel = DataFrame(res_t6out[:hybrid_table][[:edge, :support_hybrid]])
);
R"mtext"("blob (red) & hybrid (blue) support", side=3, line=-.5);
R"mtext"("we see t6 hybrid in 50% input nets", side=1, line=-1);

consensus level-1 network, required to be rooted with t6 as outgroup

non-redundant bipartitions

We use a different example for this: again a small set of small input networks, so it's easy to look at them. They are all of level 1.

netfile = joinpath(dirname(pathof(PhyloSummaries)), "..",
    "test","level1_7taxa_abc.nwk");
netsample = readmultinewick(netfile);
length(netsample) # 5 networks: very small example

We first get the consensus tree of blobs, to check if there are multiple circular orders, and if the top ones may be tied (we don't get this info from the level-1 consensus):

julia> tob = consensus_treeofblobs(netsample, suppressinfo=true);
julia> DataFrame(tob[:circorder_table]) # no ties: only 1 order1×5 DataFrame
 Row │ blob   order            support_circorder  partition_num  partition            
     │ Int64  Tuple…           Float64            String         String               
─────┼────────────────────────────────────────────────────────────────────────────────
   1 │     1  (1, 2, 3, 4, 5)                0.6  3|4|5,6,7|1|2  a3|a4|b1,c1,c2|a1|a2

We see only 1 circular order, meaning that the input networks with the top-ranking blob all share the same circular order. In particular, there's no tie (good to know).

julia> con = consensus_level1network(netsample, suppressinfo=true);
julia> writenewick(con[:net], support=true) # edges support: as non-redundant biparts"(a3,(a4,#H12)_11,(a2,(a1,((b1,(c1,c2)_8::0.6)_10::0.2)#H12)_13)_14)_9_blob1;"
julia> bip_df = DataFrame(con[:bipartition_table])2×6 DataFrame
 Row │ node1  node2  edge   support_nonredundant  cluster_num  cluster        
     │ Int64  Int64  Int64  Float64               String       String         
─────┼────────────────────────────────────────────────────────────────────────
   1 │    10      8      9                   0.6  1,2,3,4,5    a1,a2,a3,a4,b1
   2 │    12     10      8                   0.2  1,2,3,4      a1,a2,a3,a4

Let's plot our level-1 consensus: with support values on edges that show support for hybrid clades (middle) or for bipartitions to be non-redundant with blobs (right):

net = con[:net]
blb_df = DataFrame(con[:blob_table])
hyb_df = DataFrame(con[:hybrid_table])
bip_df = DataFrame(con[:bipartition_table])
plot(net, shownodenumber=true, tipoffset=0.05);
R"mtext"("node numbers: to decide rotations", side=3, line=-1);
for i in [14,13] rotate!(net, i); end
plot(net, nodelabeladj=1.1, edgelabeladj=[.5,-0.2],
    nodelabelcolor="orangered", edgelabelcolor="deepskyblue",
    nodelabel=select(blb_df, [:node, :support_partition]),
    edgelabel=select(hyb_df, [:edge, :support_hybrid]));
R"mtext"("blob (red) & hybrid (blue) support", side=3, line=-1);
plot(net, nodelabeladj=1.1, edgelabeladj=[.5,-0.2], nodelabelcolor="orangered",
    nodelabel=select(blb_df, [:node, :support_partition]),
    edgelabel=select(bip_df, [:edge, :support_nonredundant]));
R"mtext"("support for edges as non-redundant (black)", side=3, line=-1);

consensus level-1 network with support values, 5 input networks on 7 taxa

(b1,c1,c2) has 20% support as a non-redundant bipartition. We also know that this clade is in the 60% input networks that have the blob (and redundant with the blob in these networks), so it's in at least 80% of the input networks.
(c1,c2) has 60% support as a non-redundant bipartition. This clade may be present in more than 60% networks, if it's connected to a blob in these other networks, but we don't get to see this here.

support for hybrid sisters

There may be high support for a clade to be of hybrid origin, in which case this clade has 2 sisters (in the simplest level-1 case). We may want to know if there is uncertainty about where gene flow came from, that is, what groups are its sisters. For this, we can use functions currently in PhyloNetworks, that map support onto a reference network: PhyloNetworks.treeedges_support and PhyloNetworks.hybridclades_support.

Below, we use these functions to map the support of more features (like hybrid sisters) onto our level-1 consensus network as a reference network. Note that, unlike the consensus functions here, the results of treeedges_support and hybridclades_support depend on which hybrid edges are major or minor, in the input networks. The support for an edge to be a "tree" edge is the proportion of input networks in which this edge is retained after deleting all the minor hybrid edges are deleted, to get the major tree of each input network.

julia> BSe_majortree, majortree = treeedges_support(netsample, net);┌ Info: edge numbers in the data frame correspond to the current edge numbers in the network.
│ If the network is modified, the edge numbers in the (modified) network might not correspond
│ to those in the bootstrap table. Plot the bootstrap values onto the current network with
└ plot(network_name, edgelabel=bootstrap_table_name)
julia> BSn, BSe, BSc, BSgam, BSedgenum = hybridclades_support(netsample, net);

plot(net, edgelabel=BSe_majortree);
R"mtext"("support for edges to be in major tree", side=3, line=-1);
R"mtext"("support are in %, not proportions!", side=1, line=-1);
plot(net, edgelabeladj=[.5,-.2], nodelabeladj=[.5,.5],
    edgelabelcolor="orangered", nodelabelcolor="deepskyblue4",
    edgelabel=select(BSe,[:edge, :BS_hybrid_edge]),
    nodelabel=select(BSn,[:hybridnode,:BS_hybrid_samesisters])
);
R"mtext"("support for sister-hybrid relationships", side=3, line=-1);
R"mtext"("blue: hybrid + both sisters", side=1, line=-1);
plot(net, edgelabeladj=[.5,-.2],
    edgelabel=filter(row->row[:BS_hybrid]>0, BSn)[!,[:edge,:BS_hybrid]]);
R"mtext"("support for hybrid clades", side=3, line=-1);
R"mtext"("b1 of hybrid origin in 20% nets", side=1, line=-1);

support for major-tree edges and sister-relationships, 5 input networks on 7 taxa

From the left panel, we see that (b1,c1,c2) and (c1,c2) are in fact clades in the major tree of 100% of our input networks.

See the section about PhyloNetworks's Network support for more details.

support for blobs in a reference network

The function blobpartitions_support takes a reference network and a sample of networks, and returns support values for blob-related features in the reference network, measured by how often they are found in the sample: the partition of taxa from each blob, the circular order of these taxon blocks (if the sample networks are of level 1), the clades below a lowest hybrid node in a blob, and the bipartitions non-redundant with a blob. Like consensus_treeofblobs, the sample networks can be of any level.

The example below uses the same 10 sample networks as in the consensus tree of blob section, but with a twist. Here we will give unequal weights to the sample networks, as if the 10 networks came from 8 bootstrap replicates. Some methods (like NANUQ+) may return multiple networks with equally good scores. We will do here as if:

for the 5th bootstrap data set, 3 networks were returned. We will give each of these 3 networks a weight of 1/3.
for bootstrap data sets 1-4, and 6-8, only 1 network was returned, so each one of these networks will be given a weight of 1.

julia> netfile = joinpath(dirname(pathof(PhyloSummaries)), "..",
           "test","bootstrapnets_h1.nwk");
julia> bootnet = readmultinewick(netfile);
julia> wts = [1,1,1,1, 1/3,1/3,1/3, 1,1,1]; # nets 5,6,7 from same bootstrap data

Next we read our reference network, and map support for its blob features.

julia> refnwk = "((t6,(t5,#H9:::0.336)),(((t3)#H9:::0.664,(t2,t1)),t4));";
julia> refnet = readnewick(refnwk);
julia> res_bs = blobpartitions_support(bootnet, refnet; netweights=wts);
julia> keys(res_bs) # names of tables we should look at(:blob_table, :circorder_table, :hybrid_table, :bipartition_table, :taxa)

The output is a NamedTuple with the same keys as consensus_treeofblobs and consensus_level1network. Support values are proportions of the total weight (sum(wts) = 8.0 from 8 bootstrap data sets) rather than the total number of sample networks (10).

julia> blb_bs = DataFrame(res_bs[:blob_table])1×6 DataFrame
 Row │ blob   degree  node   support_partition  partition_num  partition         
     │ Int64  Int64   Int64  Float64            String         String            
─────┼───────────────────────────────────────────────────────────────────────────
   1 │     1       5     -2               0.25  6|5|3|1,2|4    t6|t5|t3|t1,t2|t4

Our reference network has only 1 reticulation, so only 1 blob. We see that this blob has support_partition of 0.25. If we plotted our input networks, we would see that this blob is in networks 2 and 3: whose weights sum to 2 out of 8. These support values did not appear in the consensus tree of blob section, because this blob is not selected to be in the consensus tree of blob.

julia> DataFrame(res_bs[:circorder_table])2×5 DataFrame
 Row │ blob   order            support_circorder  partition_num  partition         
     │ Int64  Tuple…           Float64            String         String            
─────┼─────────────────────────────────────────────────────────────────────────────
   1 │     1  (1, 2, 3, 5, 4)              0.125  6|5|3|4|1,2    t6|t5|t3|t4|t1,t2
   2 │     1  (1, 2, 3, 4, 5)              0.125  6|5|3|1,2|4    t6|t5|t3|t1,t2|t4

We see that the blob appears in our sample with 2 different circular orders of its taxon blocks: the first in the table is the order in the reference network, and is the order that network 2 has. Let's visualize the reference network alongside input network 3, which has the alternative circular order. We use rotate! to untangle edges in network 3, to better see the circular order visually:

plot(refnet,  tipoffset=0.05);
R"mtext"("reference network", side=3, line=-1);
net3 = bootnet[3]
rotate!(net3, -6)
plot(net3, tipoffset=0.05);
R"mtext"("net 3: same blocks around the blob,\n≠ circular order",
         side=3, line=-2, adj=0.1);

reference network and network 3 with alternative circular order

julia> hyb_bs = DataFrame(res_bs[:hybrid_table])3×7 DataFrame
 Row │ blob   node_from  node_to  edge   support_hybrid  cluster_num  cluster 
     │ Int64  Int64      Int64    Int64  Float64         String       String  
─────┼────────────────────────────────────────────────────────────────────────
   1 │     1         -5        7     12           0.25   4            t4
   2 │     1          3        4      6           0.25   3            t3
   3 │     1         -3        1      1           0.375  6            t6

The taxon block {t3} is of hybrid origin in the reference network, but only 25% of the sample networks (support 0.25 = 2/8), same as {t4}. Clade {t6} is of hybrid origin with higher support (0.375 = 3/8).

julia> DataFrame(res_bs[:bipartition_table]) # refnet has 0 non-redundant bipartitions0×6 DataFrame
 Row │ node1  node2  edge   support_nonredundant  cluster_num  cluster 
     │ Int64  Int64  Int64  Float64               String       String  
─────┴─────────────────────────────────────────────────────────────────

The table is empty because every cut-edge in the reference network is adjacent to one of its two blobs — there are no non-redundant bipartitions.

We can plot these support values onto the reference network in the same way as for the consensus outputs above:

plot(refnet, nodelabeladj=-0.1, edgelabeladj=[.5,-0.2],
  nodelabelcolor="orangered", edgelabelcolor="deepskyblue",
  nodelabel=select(blb_bs, [:node, :support_partition]),
  edgelabel=select(hyb_bs, [:edge, :support_hybrid]));
R"mtext"("red: blob partition support\nblue: hybrid support, on edges",
  side=3, line=-.8);
plot(refnet, nodelabeladj=[-0.1,-0.2], nodelabelcolor="deepskyblue",
  nodelabel=select(hyb_bs, [:node_from, :support_hybrid]));
R"mtext"("blue: hybrid support, at nodes\n(follow the edge that exits the blob)",
  side=3, line=-.8);
R"mtext"("support as proportions (not %!)", outer=true, side=1, line=-1);

blob partition support mapped onto reference network, 5 input networks on 7 taxa