internal documentation
Documentation for QuartetNetworkGoodnessFit
's internal functions. Those functions are not exported, but can still be used (like: QuartetNetworkGoodnessFit.foo()
for a function named foo()
).
index
functions
QuartetNetworkGoodnessFit.dirichlet_max
— Methoddirichlet_max(dcf::DataCF)
Calculate outlier p-values, one for each four-taxon set, using the maximum concordance factor under a Dirichlet distribution. Used by ticr!
.
output
- vector of outlier p-values, one for each 4-taxon set
- value of the concentration parameter α
- value of the pseudo likelihood (optimized at α)
QuartetNetworkGoodnessFit.dirichlet_min
— Methoddirichlet_min(dcf::DataCF)
First calculate outlier p-values using each of the three concordance factors for each quartet under a Dirichlet distribution, then take the smallest p-value among the three, as the outlier p-value for each four-taxon set.
output
- a vector of outlier p-values, one for each quartet
- value of the concentration parameter α
- value of the pseudo likelihood (optimized at α)
QuartetNetworkGoodnessFit.expectedCF_ordered
— FunctionexpectedCF_ordered(dcf::DataCF, net::HybridNetwork, suffix=""::AbstractString)
Expected quartet concordance factors in dcf
, but ordered as they would be if output by PhyloNetworks.countquartetsintrees
. Output:
- 2-dimentional
SharedArray
(number of 4-taxon sets x 3).dcf.quartet[i].qnet.expCF[j]
for 4-taxon seti
and resolutionj
is stored in rowqi
and columnk
ifqi
is the rank of 4-taxon seti
(seePhyloNetworks.quartetrank
). This rank depends on how taxa are ordered. - vector of taxon names, whose order matters. These are tip labels in
net
with suffixsuffix
added, then ordered alphabetically, or numerically if taxon names can be parsed as integers.
QuartetNetworkGoodnessFit.multinom_lrt!
— Methodmultinom_lrt!(pval::AbstractVector{Float64}, quartet::Vector{Quartet})
multinom_lrt!(pval::AbstractVector{Float64}, obsCF, expCF::AbstractMatrix{Float64})
Calculate outlier p-values (one per four-taxon set) using the likelihood ratio test under a multinomial distribution for the observed concordance factors.
QuartetNetworkGoodnessFit.multinom_pearson!
— Methodmultinom_pearson!(pval::AbstractVector{Float64}, quartet::Vector{Quartet})
multinom_pearson!(pval::AbstractVector{Float64}, obsCF, expCF::AbstractMatrix{Float64})
Calculate outlier p-values (one per four-taxon set) using Pearson's chi-squared statistic under a multinomial distribution for the observed concordance factors.
QuartetNetworkGoodnessFit.multinom_qlog!
— Methodmultinom_qlog!(pval::AbstractVector{Float64}, quartet::Vector{Quartet})
multinom_qlog!(pval::AbstractVector{Float64}, obsCF, expCF::AbstractMatrix{Float64})
Calculate outlier p-values (one per four-taxon set) using the Qlog statistic (Lorenzen, 1995), under a multinomial distribution for the observed concordance factors.
QuartetNetworkGoodnessFit.network_expectedCF!
— Methodnetwork_expectedCF!(quartet::QuartetT, net::HybridNetwork, taxa, taxonnumber,
inheritancecorrelation)
Update quartet.data
to contain the quartet concordance factors expected from the multispecies coalescent along network net
for the 4-taxon set taxa[quartet.taxonnumber]
. taxa
should contain the tip labels in net
. quartet.taxonnumber
gives the indices in taxa
of the 4 taxa of interest. taxonnumber
should be a dictionary mapping taxon labels in to their indices in taxa
, for easier lookup.
net
is not modified.
For inheritancecorrelation
see network_expectedCF
. Its value should be between 0 and 1 (not checked by this internal function).
QuartetNetworkGoodnessFit.network_expectedCF_4taxa!
— Methodnetwork_expectedCF_4taxa!(net::HybridNetwork, fourtaxa, inheritancecorrelation)
Return the quartet concordance factors expected from the multispecies coalescent along network net
, where the 3 quartet topologies are ordered following the ordering of taxon names in fourtaxa
, that is: if fourtaxa
is a,b,c,d, then the concordance factors are listed in this order:
(qCF(ab|cd), qCF(ac|bd), qCF(ad,bc))
Assumptions about net
:
- has 4 taxa, and those are the same as
fourtaxa
- no degree-2 nodes, except perhaps for the root
- edge lengths are non-missing
- hybrid edge γ's are non-missing
The network is modified as follows: what's above the LSA is removed, the 2 edges incident to the root are fused (if the root is of degree 2), and external degree-2 blobs are removed. net
is then simplified recursively by removing hybrid edges for the recursive calculation of qCFs.
For inheritancecorrelation
see network_expectedCF
. Its value should be between 0 and 1 (not checked by this internal function).
QuartetNetworkGoodnessFit.quarnetGoFtest
— MethodquarnetGoFtest(quartet::Vector{Quartet}, outlierp_fun!::Function)
quarnetGoFtest(outlier_pvalues::AbstractVector)
Calculate an outlier p-value for each quartet
according to function outlierp_fun!
(or take outlier-values as input: second version) and calculate the z-value to test the null hypothesis that 5% of the p-values are < 0.05, versus the one-sided alternative of more outliers than expected.
See quarnetGoFtest!
for more details.
Output:
- z-value
- outlier p-values (first version only)
QuartetNetworkGoodnessFit.quarnetGoFtest_simulation
— MethodquarnetGoFtest_simulation(net::HybridNetwork, dcf::DataCF, outlierp_fun!::Function,
seed::Int, nsim::Int, verbose::Bool, keepfiles::Bool)
Simulate gene trees under the multispecies coalescent model along network net
using PhyloCoalSimulations. The quartet concordance factors (CFs) from these simulated gene trees are used as input to outlierp_fun!
to categorize each 4-taxon set as an outlier (p-value < 0.05) or not. For each simulated data set, a goodness-of-fit z-value is calculated by comparing the proportion of outlier 4-taxon sets to 0.05. The standard deviation of these z-values (assuming a mean of 0), and the z-values themselves are returned.
Used by quarnetGoFtest!
.
Warning: The quartet CFs expected from net
are assumed to be stored in dcf.quartet[i].qnet.expCF
. This is not checked.
QuartetNetworkGoodnessFit.reroot!
— Methodreroot!(net, refnet)
Reroot net
to minimize the hardwired cluster distance between the net
(with the new root position) and the reference network refnet
. Candidate root positions are limited to internal nodes (excluding leaves) that are compatible with the direction of hybrid edges.
QuartetNetworkGoodnessFit.ticr_optimalpha
— Methodticr_optimalpha(dcf::DataCF)
Find the concentration parameter α by maximizing the pseudo-log-likelihood of observed quartet concordance factors. The model assumes a Dirichlet distribution with mean equal to the expected concordance factors calculated from a phylogenetic network (under ILS). These expected CFs are assumed to be already calculated, and stored in dcf
.
When calculating the pseudo-log-likelihood, this function checks the observed concordance factors for any values equal to zero: they cause a problem because the Dirichlet density is 0 at 0 (for concentration α > 1). Those 0.0 observed CF values are re-set to the minimum of:
- the minimum of all expected concordance factors, and
- the minimum of all nonzero observed concordance factors.
output
- maximized pseudo-loglikelihood
- value of α where the pseudo-loglikelihood is maximized
- return code of the optimization
The optimization uses NLOpt, with the :LN_BOBYQA
method. Optional arguments can tune the optimization differently: nloptmethod
, xtol_rel
(1e-6 by default), starting α value x_start
(1.0 by default).
QuartetNetworkGoodnessFit.ultrametrize!
— Methodultrametrize!(net::HybridNetwork, verbose::Bool)
Assign values to missing branch lengths in net
to make the network time-consistent (all paths from the root to a given hybrid node have the same length) and ultrametric (all paths from the root to the tips have the same length), if possible. Warnings are given if it's not possible and if verbose
is true.
Output: true if the modified network is ultrametric, false otherwise.
The major tree is used to calculate the distance from nodes to the root. If a tree edge has a missing length, this length is changed to the following:
- 0 if the edge is internal,
- the smallest value possible to make the network ultrametric if the edge is external.
It is assumed that hybrid nodes are not leaves, such that external edges are necessarily tree edges. If a hybrid edge has a missing length, this length is changed as follows:
- If both partner hybrid edges lack a length: the shortest lengths are assigned to make the network time-consistent at the hybrid node. In particular, either the major edge or the minor edge is assigned length 0.0.
- Otherwise: the value needed to make the network time-consistent considering based on the partner edge's length if this value is non-negative, and 0 if the ideal value is negative.