expected concordance factors
The quartet concordance factors expected from a network can be calculated exactly on a network of any level. This can be done here using a recursive algorithm – although it may run slowly as its complexity was not optimized. Below is an example, with a network on 6 taxa with 2 reticulations.
julia> net = readTopology("(D:1,((C:1,#H25:0):0.1,((((B1:10,B2:1):1.5,#H1:0):10.8,((A1:1,A2:1):0.001)#H1:0::0.5):0.5)#H25:0::0.501):1);");
julia> eCFs,t = network_expectedCF(net);
Calculation quartet CFs for 15 quartets... 0+---------------+100% ***************
julia> t # taxon list
6-element Vector{String}: "A1" "A2" "B1" "B2" "C" "D"
julia> first(eCFs, 2) # first 2 4-taxon sets, each with 3 quartet CFs. taxon numbers are indices in the taxon list above
2-element Vector{PhyloNetworks.QuartetT{StaticArraysCore.MVector{3, Float64}}}: 4-taxon set number 1; taxon numbers: 1,2,3,4 data: [0.8885456713760765, 0.05572716431196175, 0.05572716431196175] 4-taxon set number 2; taxon numbers: 1,2,3,5 data: [0.16750297999120484, 0.41624851000439755, 0.41624851000439755]
To look at these quartet CFs, we define below a function to convert them to a string. This string can then be printed to the screen (shown below with print
) or written to a file (using write
, not shown).
julia> qCFstring(qlist,taxa, sigdigits=4) = join( [join(taxa[q.taxonnumber],",") * ": " * string(round.(q.data, sigdigits=sigdigits)) for q in qlist], "\n");
julia> print(qCFstring(eCFs,t))
A1,A2,B1,B2: [0.8885, 0.05573, 0.05573] A1,A2,B1,C: [0.1675, 0.4162, 0.4162] A1,A2,B2,C: [0.1675, 0.4162, 0.4162] A1,B1,B2,C: [0.03719, 0.03719, 0.9256] A2,B1,B2,C: [0.03719, 0.03719, 0.9256] A1,A2,B1,D: [0.1675, 0.4162, 0.4162] A1,A2,B2,D: [0.1675, 0.4162, 0.4162] A1,B1,B2,D: [0.03719, 0.03719, 0.9256] A2,B1,B2,D: [0.03719, 0.03719, 0.9256] A1,A2,C,D: [0.6898, 0.1551, 0.1551] A1,B1,C,D: [0.793, 0.1035, 0.1035] A2,B1,C,D: [0.793, 0.1035, 0.1035] A1,B2,C,D: [0.793, 0.1035, 0.1035] A2,B2,C,D: [0.793, 0.1035, 0.1035] B1,B2,C,D: [1.0, 9.422e-7, 9.422e-7]
For each 4-taxon set above, the 3 concordance factors are for the quartets listed in this order: 12|34, 13|24, 14|23 where 1,2,3,4 refer to the order of the taxa listed to the left.
We can also convert our list to a data frame:
julia> using DataFrames
julia> df = writeTableCF(eCFs, t) # requires PhyloNetworks v0.16.1
15×8 DataFrame Row │ qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 │ Int64 String String String String Float64 Float64 Float64 ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.888546 0.0557272 0.0557272 2 │ 2 A1 A2 B1 C 0.167503 0.416249 0.416249 3 │ 3 A1 A2 B2 C 0.167503 0.416249 0.416249 4 │ 4 A1 B1 B2 C 0.0371891 0.0371891 0.925622 5 │ 5 A2 B1 B2 C 0.0371891 0.0371891 0.925622 6 │ 6 A1 A2 B1 D 0.167503 0.416249 0.416249 7 │ 7 A1 A2 B2 D 0.167503 0.416249 0.416249 8 │ 8 A1 B1 B2 D 0.0371891 0.0371891 0.925622 9 │ 9 A2 B1 B2 D 0.0371891 0.0371891 0.925622 10 │ 10 A1 A2 C D 0.689828 0.155086 0.155086 11 │ 11 A1 B1 C D 0.793009 0.103496 0.103496 12 │ 12 A2 B1 C D 0.793009 0.103496 0.103496 13 │ 13 A1 B2 C D 0.793009 0.103496 0.103496 14 │ 14 A2 B2 C D 0.793009 0.103496 0.103496 15 │ 15 B1 B2 C D 0.999998 9.42151e-7 9.42151e-7
If we wanted to compare with the observed frequency among 100 gene trees, we could use data frame manipulations to join the two data frames:
julia> using PhyloCoalSimulations
julia> genetrees = simulatecoalescent(net, 100, 1); # 100 genes, 1 individual / pop
julia> df_sim = writeTableCF(countquartetsintrees(genetrees; showprogressbar=false)...);
julia> first(df_sim, 2)
2×9 DataFrame Row │ qind t1 t2 t3 t4 CF12_34 CF13_24 CF14_23 ngenes │ Int64 String String String String Float64 Float64 Float64 Float64 ─────┼─────────────────────────────────────────────────────────────────────────── 1 │ 1 A1 A2 B1 B2 0.93 0.04 0.03 100.0 2 │ 2 A1 A2 B1 C 0.12 0.39 0.49 100.0
julia> select!(df_sim, Not([:qind,:ngenes])); # delete columns with q index and number of genes
julia> df_both = outerjoin(select(df, Not(:qind)), df_sim; on=[:t1,:t2,:t3,:t4], renamecols = "exact" => "sim")
15×10 DataFrame Row │ t1 t2 t3 t4 CF12_34exact CF13_24exact CF14_23exact CF12_34sim CF13_24sim CF14_23sim │ String String String String Float64? Float64? Float64? Float64? Float64? Float64? ─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ A1 A2 B1 B2 0.888546 0.0557272 0.0557272 0.93 0.04 0.03 2 │ A1 A2 B1 C 0.167503 0.416249 0.416249 0.12 0.39 0.49 3 │ A1 A2 B2 C 0.167503 0.416249 0.416249 0.12 0.38 0.5 4 │ A1 B1 B2 C 0.0371891 0.0371891 0.925622 0.04 0.02 0.94 5 │ A2 B1 B2 C 0.0371891 0.0371891 0.925622 0.04 0.01 0.95 6 │ A1 A2 B1 D 0.167503 0.416249 0.416249 0.12 0.4 0.48 7 │ A1 A2 B2 D 0.167503 0.416249 0.416249 0.12 0.39 0.49 8 │ A1 B1 B2 D 0.0371891 0.0371891 0.925622 0.04 0.02 0.94 9 │ A2 B1 B2 D 0.0371891 0.0371891 0.925622 0.04 0.01 0.95 10 │ A1 A2 C D 0.689828 0.155086 0.155086 0.66 0.17 0.17 11 │ A1 B1 C D 0.793009 0.103496 0.103496 0.74 0.13 0.13 12 │ A2 B1 C D 0.793009 0.103496 0.103496 0.87 0.06 0.07 13 │ A1 B2 C D 0.793009 0.103496 0.103496 0.74 0.13 0.13 14 │ A2 B2 C D 0.793009 0.103496 0.103496 0.87 0.06 0.07 15 │ B1 B2 C D 0.999998 9.42151e-7 9.42151e-7 1.0 0.0 0.0