expected concordance factors

The quartet concordance factors expected from a network can be calculated exactly on a network of any level. This can be done here using a recursive algorithm (see Ané et al. 2024) – although it may run slowly as its complexity was not optimized. Below is an example, with a network on 6 taxa with 2 reticulations.

julia> net = readnewick("(D:1,((C:1,#H25:0):0.1,((((B1:10,B2:1):1.5,#H1:0):10.8,
               ((A1:1,A2:1):0.001)#H1:0::0.5):0.5)#H25:0::0.501):1);");
julia> eCFs,t = network_expectedCF(net);Calculation quartet CFs for 15 quartets... 0+---------------+100% ***************
julia> t # taxon list6-element Vector{String}: "A1" "A2" "B1" "B2" "C" "D"

below: we look at the first two 4-taxon sets, each with 3 quartet CFs. Taxon numbers are indices in the taxon list above. For example, taxon 1 is "A1".

julia> first(eCFs, 2)2-element Vector{PhyloNetworks.QuartetT{StaticArraysCore.MVector{3, Float64}}}:
 4-taxon set number 1; taxon numbers: 1,2,3,4
data: [0.8885456713760765, 0.05572716431196175, 0.05572716431196175]
 4-taxon set number 2; taxon numbers: 1,2,3,5
data: [0.16750297999120484, 0.41624851000439755, 0.41624851000439755]

To look at these quartet CFs, we define below a function to convert them to a string. This string can then be printed to the screen (shown below with print) or written to a file (using write, not shown).

julia> qCFstring(qlist,taxa, sigdigits=4) =
        join(
         [join(taxa[q.taxonnumber],",") * ": " * string(round.(q.data, sigdigits=sigdigits))
           for q in qlist],
         "\n");
julia> print(qCFstring(eCFs,t))A1,A2,B1,B2: [0.8885, 0.05573, 0.05573] A1,A2,B1,C: [0.1675, 0.4162, 0.4162] A1,A2,B2,C: [0.1675, 0.4162, 0.4162] A1,B1,B2,C: [0.03719, 0.03719, 0.9256] A2,B1,B2,C: [0.03719, 0.03719, 0.9256] A1,A2,B1,D: [0.1675, 0.4162, 0.4162] A1,A2,B2,D: [0.1675, 0.4162, 0.4162] A1,B1,B2,D: [0.03719, 0.03719, 0.9256] A2,B1,B2,D: [0.03719, 0.03719, 0.9256] A1,A2,C,D: [0.6898, 0.1551, 0.1551] A1,B1,C,D: [0.793, 0.1035, 0.1035] A2,B1,C,D: [0.793, 0.1035, 0.1035] A1,B2,C,D: [0.793, 0.1035, 0.1035] A2,B2,C,D: [0.793, 0.1035, 0.1035] B1,B2,C,D: [1.0, 9.422e-7, 9.422e-7]

For each 4-taxon set above, the 3 concordance factors are for the quartets listed in this order: 12|34, 13|24, 14|23 where 1,2,3,4 refer to the order of the taxa listed to the left.

We can also convert our list to a data frame:

julia> using DataFrames
julia> df = DataFrame(tablequartetCF(eCFs, t), copycols=false)15×8 DataFrame Row qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 Int64 String String String String Float64 Float64 Float64 ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.888546 0.0557272 0.0557272 2 │ 2 A1 A2 B1 C 0.167503 0.416249 0.416249 3 │ 3 A1 A2 B2 C 0.167503 0.416249 0.416249 4 │ 4 A1 B1 B2 C 0.0371891 0.0371891 0.925622 5 │ 5 A2 B1 B2 C 0.0371891 0.0371891 0.925622 6 │ 6 A1 A2 B1 D 0.167503 0.416249 0.416249 7 │ 7 A1 A2 B2 D 0.167503 0.416249 0.416249 8 │ 8 A1 B1 B2 D 0.0371891 0.0371891 0.925622 9 │ 9 A2 B1 B2 D 0.0371891 0.0371891 0.925622 10 │ 10 A1 A2 C D 0.689828 0.155086 0.155086 11 │ 11 A1 B1 C D 0.793009 0.103496 0.103496 12 │ 12 A2 B1 C D 0.793009 0.103496 0.103496 13 │ 13 A1 B2 C D 0.793009 0.103496 0.103496 14 │ 14 A2 B2 C D 0.793009 0.103496 0.103496 15 │ 15 B1 B2 C D 0.999998 9.42151e-7 9.42151e-7

If we wanted to compare with the observed frequency among 100 gene trees, we could use data frame manipulations to join the two data frames:

julia> using PhyloCoalSimulations
julia> genetrees = simulatecoalescent(net, 100, 1); # 100 genes, 1 individual / pop
julia> nt_sim = tablequartetCF(countquartetsintrees(genetrees; showprogressbar=false)...);
julia> df_sim = DataFrame(nt_sim, copycols=false)15×9 DataFrame Row qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 ngenes Int64 String String String String Float64 Float64 Float64 Float64 ─────┼─────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.88 0.06 0.06 100.0 2 │ 2 A1 A2 B1 C 0.16 0.27 0.57 100.0 3 │ 3 A1 A2 B2 C 0.16 0.27 0.57 100.0 4 │ 4 A1 B1 B2 C 0.02 0.02 0.96 100.0 5 │ 5 A2 B1 B2 C 0.05 0.05 0.9 100.0 6 │ 6 A1 A2 B1 D 0.16 0.27 0.57 100.0 7 │ 7 A1 A2 B2 D 0.16 0.27 0.57 100.0 8 │ 8 A1 B1 B2 D 0.02 0.02 0.96 100.0 9 │ 9 A2 B1 B2 D 0.05 0.05 0.9 100.0 10 │ 10 A1 A2 C D 0.69 0.19 0.12 100.0 11 │ 11 A1 B1 C D 0.75 0.17 0.08 100.0 12 │ 12 A2 B1 C D 0.84 0.09 0.07 100.0 13 │ 13 A1 B2 C D 0.75 0.17 0.08 100.0 14 │ 14 A2 B2 C D 0.84 0.09 0.07 100.0 15 │ 15 B1 B2 C D 1.0 0.0 0.0 100.0
julia> first(df_sim, 2)2×9 DataFrame Row qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 ngenes Int64 String String String String Float64 Float64 Float64 Float64 ─────┼─────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.88 0.06 0.06 100.0 2 │ 2 A1 A2 B1 C 0.16 0.27 0.57 100.0
julia> select!(df_sim, Not([:qind,:ngenes])); # delete columns with q index & # genes
julia> df_both = outerjoin(select(df, Not(:qind)), df_sim; on=[:t1,:t2,:t3,:t4], renamecols = "exact" => "sim")15×10 DataFrame Row t1 t2 t3 t4 CF12_34exact CF13_24exact CF14_23exact CF12_34sim CF13_24sim CF14_23sim String String String String Float64? Float64? Float64? Float64? Float64? Float64? ─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ A1 A2 B1 B2 0.888546 0.0557272 0.0557272 0.88 0.06 0.06 2 │ A1 A2 B1 C 0.167503 0.416249 0.416249 0.16 0.27 0.57 3 │ A1 A2 B2 C 0.167503 0.416249 0.416249 0.16 0.27 0.57 4 │ A1 B1 B2 C 0.0371891 0.0371891 0.925622 0.02 0.02 0.96 5 │ A2 B1 B2 C 0.0371891 0.0371891 0.925622 0.05 0.05 0.9 6 │ A1 A2 B1 D 0.167503 0.416249 0.416249 0.16 0.27 0.57 7 │ A1 A2 B2 D 0.167503 0.416249 0.416249 0.16 0.27 0.57 8 │ A1 B1 B2 D 0.0371891 0.0371891 0.925622 0.02 0.02 0.96 9 │ A2 B1 B2 D 0.0371891 0.0371891 0.925622 0.05 0.05 0.9 10 │ A1 A2 C D 0.689828 0.155086 0.155086 0.69 0.19 0.12 11 │ A1 B1 C D 0.793009 0.103496 0.103496 0.75 0.17 0.08 12 │ A2 B1 C D 0.793009 0.103496 0.103496 0.84 0.09 0.07 13 │ A1 B2 C D 0.793009 0.103496 0.103496 0.75 0.17 0.08 14 │ A2 B2 C D 0.793009 0.103496 0.103496 0.84 0.09 0.07 15 │ B1 B2 C D 0.999998 9.42151e-7 9.42151e-7 1.0 0.0 0.0