public documentation
Documentation for PhyloCoalSimulations
's public (exported) functions. Most functions are internal (not exported).
functions & types
PhyloCoalSimulations.gene_edgemapping!
— Functiongene_edgemapping!(gene_tree, species_network, checknames=true)
Given a gene tree with labeled internal nodes that map to a species phylogeny (a species tree or a species network), this function maps each gene edge to the species edge that it is contained "within". Gene edge mappings are stored in the inte1
field of each gene tree edge, but it's best to access this mapping via population_mappedto
.
Assumption: the species_network
has unique node names to uniquely identify the speciation and reticulation events; and the gene_tree
has degree-2 nodes with matching names, to indicate which species event each degree-2 node corresponds to.
The checknames
argument takes a boolean, and, if true
, then the function will check that both the species and the gene phylogeny have the same internal node names. The mapping of edges is recovered from matching names between nodes in the gene tree and nodes in the species network.
PhyloCoalSimulations.population_mappedto
— Methodpopulation_mappedto(edge or node)
Identifier of the population (edge in the species network) that a gene tree's edge or a node is mapped onto, or nothing
if not mapped. For example, coalescent nodes in gene trees map to a node in the species phylogeny, instead of mapping to an edge.
PhyloCoalSimulations.simulatecoalescent
— Methodsimulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals;
nodemapping=false, inheritancecorrelation=0.0)
Simulate nloci
gene trees with nindividuals
from each species under the multispecies network coalescent, along network net
whose branch lengths are assumed to be in coalescent units (ratio: number of generations / effective population size). The coalescent model uses the infinite-population-size approximation.
The random number generator rng
is optional.
Output: vector of gene trees, of length nloci
.
nindividuals
can be a single integer, or a dictionary listing the number of individuals to be simulated for each species.
If nodemapping
is true, each simulated gene tree is augmented with degree-2 nodes that can be mapped to speciation or hybridization events. The mapping of gene tree nodes & edges to network edges is carried by their attribute .intn1
(for nodes) and .inte1
(for edges). The mapping of gene tree nodes to network nodes is carried by the .name
attribute. Namely:
- A degree-3 node (1 parent + 2 children) represents a coalescent event that occurred along a population edge in
net
. Its.intn1
attribute is set to the number of that network population edge. Its parent edge has its.inte1
attribute also set to the number of the population edge that it originated from. - The gene tree's root node (of degree 2) represents a coalescent event along the network's root edge. Its
.intn1
attribute is the number assigned to the network's root edge, which is set byget_rootedgenumber
as the maximum edge number + 1. - A leaf (or degree-1 node) represents an individual. It maps to a species in
net
. The individual leaf name is set to the species name ifnindividuals
is 1. Otherwise, its name is set tospeciesname_i
wherei
is the individual number in that species. Itsintn1
attribute is the default-1
. - A non-root degree-2 node represents a speciation or hybridization and maps to a population node in
net
. Itsintn1
attribute is the default-1
. Its name is set to network node name, if it exists. If the network node has no name, the gene tree node is given a name built from the network node number.
By default, lineages at a hybrid node come from a parent (chosen according to inheritance probabilities γ) independently across lineages. Positive dependence can be simulated with option inheritancecorrelation
. For example, if this correlation is set to 1, then all lineages inherit from the same (randomly sampled) parent. More generally, the lineages' parents are simulated according to a Dirichlet process with base distribution determined by the γ values, and with concentration parameter α = (1-r)/r, that is, r = 1/(1+α), where r
is the input inheritance correlation. For more details about this model, please read the package manual or refer to Fogg, Allman & Ané (2023).
Assumptions:
net
must have non-missing edge lengths and γ values.- If
nindividuals
is a dictionary, it must have a key for all species, with the same spelling of species names in its keys as in the tip labels ofnet
.
examples
julia> using PhyloNetworks
julia> net = readnewick("(A:1,B:1);"); # branch lengths of 1 coalescent unit
julia> using Random; Random.seed!(54321); # for replicability of examples below
julia> simulatecoalescent(net, 2, 1) # 2 gene trees, 1 individual/species
2-element Vector{HybridNetwork}:
PhyloNetworks.HybridNetwork, Rooted Network
2 edges
3 nodes: 2 tips, 0 hybrid nodes, 1 internal tree nodes.
tip labels: B, A
(B:1.023,A:1.023);
PhyloNetworks.HybridNetwork, Rooted Network
2 edges
3 nodes: 2 tips, 0 hybrid nodes, 1 internal tree nodes.
tip labels: B, A
(B:2.328,A:2.328);
julia> simulatecoalescent(net, 1, 3)[1] # 1 gene tree, 3 individuals/species
PhyloNetworks.HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: B_2, B_1, B_3, A_3, ...
(((B_2:0.12,B_1:0.12):2.39,B_3:2.51):0.692,(A_3:0.518,(A_2:0.461,A_1:0.461):0.057):2.684);
julia> simulatecoalescent(net, 1, Dict("A"=>2, "B"=>1))[1] # 2 individuals in A, 1 in B
PhyloNetworks.HybridNetwork, Rooted Network
4 edges
5 nodes: 3 tips, 0 hybrid nodes, 2 internal tree nodes.
tip labels: B, A_2, A_1
(B:2.801,(A_2:0.344,A_1:0.344):2.457);
In the next example, we use a custom random number generator (RNG), to show how to do so. In this example we use a stable RNG to make the example reproducible across julia versions. However, this particular RNG is not recommended. The RNG used by default is better (e.g. much more efficient).
julia> # using Pkg; Pkg.add("StableRNGs") # to install StableRNGs if not done earlier
julia> using StableRNGs
julia> rng = StableRNG(791);
julia> tree1 = simulatecoalescent(rng, net,2,2; nodemapping=true)[1]; # first gene tree only
julia> writenewick(tree1, round=true)
"(((B_2:0.82,B_1:0.82):0.18)minus2:0.992,((A_1:1.0)minus2:0.427,(A_2:1.0)minus2:0.427):0.565);"
julia> nameinternalnodes!(net); writenewick(net)
"(A:1.0,B:1.0)i1;"
julia> tree1 = simulatecoalescent(rng, net,2,2; nodemapping=true)[1]; writenewick(tree1, round=true)
"((B_2:1.0)i1:0.621,((B_1:1.0)i1:0.018,((A_1:0.61,A_2:0.61):0.39)i1:0.018):0.604);"
julia> printnodes(net)
node leaf hybrid name i_cycle edges'numbers
1 true false A -1 1
2 true false B -1 2
-2 false false i1 -1 1 2
julia> printnodes(tree1)
node leaf hybrid name i_cycle edges'numbers
10 false false 3 8 9
8 false false i1 -1 5 8
5 true false B_2 -1 5
9 false false 3 7 6 9
7 false false i1 -1 4 7
4 true false B_1 -1 4
6 false false i1 -1 3 6
3 false false 1 1 2 3
1 true false A_1 -1 1
2 true false A_2 -1 2
julia> [(tree_edge_number = e.number,
pop_edge_number = population_mappedto(e)) for e in tree1.edge]
9-element Vector{@NamedTuple{tree_edge_number::Int64, pop_edge_number::Int64}}:
(tree_edge_number = 8, pop_edge_number = 3)
(tree_edge_number = 5, pop_edge_number = 2)
(tree_edge_number = 9, pop_edge_number = 3)
(tree_edge_number = 7, pop_edge_number = 3)
(tree_edge_number = 4, pop_edge_number = 2)
(tree_edge_number = 6, pop_edge_number = 3)
(tree_edge_number = 3, pop_edge_number = 1)
(tree_edge_number = 1, pop_edge_number = 1)
(tree_edge_number = 2, pop_edge_number = 1)
PhyloCoalSimulations.simulatecoalescent
— Methodsimulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals, populationsize;
nodemapping=false, round_generationnumber=true,
inheritancecorrelation=0.0)
Simulate nloci
gene trees with nindividuals
from each species under the multispecies network coalescent, along network net
, whose branch lengths are assumed to be in number of generations. populationsize
should be a single number, assumed to be the (haploid) effective population size Nₑ, constant across the species phylogeny. Alternatively, populationsize
can be a dictionary mapping the number of each edge in net
to its Nₑ, including an extra edge number for the population above the root of the network.
Coalescent units are then calculated as u=g/Nₑ
where g
is the edge length in net
(number of generations), and the coalescent model is applied using the infinite-population-size approximation.
Output: vector of gene trees with edge lengths in number of generations, calculated as g=uNₑ
and then rounded to be an integer, unless round_generationnumber
is false.
When populationsize
Nₑ is not provided as input, all edge lengths are in coalescent units. When populationsize
is given as an argument, all edge lengths are in number of generations. The second method (using # generation and Nₑ as input) is a wrapper around the first (using coalescent units).
julia> using PhyloNetworks
julia> net = readnewick("(A:500,B:500);"); # branch lengths of 100 generations
julia> Ne = Dict(e.number => 1_000 for e in net.edge);
julia> rootedgenumber = PhyloCoalSimulations.get_rootedgenumber(net)
3
julia> push!(Ne, rootedgenumber => 2_000) # Ne for population above the root
Dict{Int64, Int64} with 3 entries:
2 => 1000
3 => 2000
1 => 1000
julia> using Random; Random.seed!(54321); # for replicability of example below
julia> genetrees = simulatecoalescent(net, 2, 1, Ne);
julia> writeMultiTopology(genetrees, stdout) # branch lengths: number of generations
(B:546.0,A:546.0);
(B:3155.0,A:3155.0);