public documentation

Documentation for PhyloCoalSimulations's public (exported) functions. Most functions are internal (not exported).

functions & types

PhyloCoalSimulations.gene_edgemapping!Function
gene_edgemapping!(gene_tree, species_network, checknames=true)

Given a gene tree with labeled internal nodes that map to a species phylogeny (a species tree or a species network), this function maps each gene edge to the species edge that it is contained "within". Gene edge mappings are stored in the inte1 field of each gene tree edge, but it's best to access this mapping via population_mappedto.

Assumption: the species_network has unique node names to uniquely identify the speciation and reticulation events; and the gene_tree has degree-2 nodes with matching names, to indicate which species event each degree-2 node corresponds to.

The checknames argument takes a boolean, and, if true, then the function will check that both the species and the gene phylogeny have the same internal node names. The mapping of edges is recovered from matching names between nodes in the gene tree and nodes in the species network.

source
PhyloCoalSimulations.population_mappedtoMethod
population_mappedto(edge or node)

Identifier of the population (edge in the species network) that a gene tree's edge or a node is mapped onto, or nothing if not mapped. For example, coalescent nodes in gene trees map to a node in the species phylogeny, instead of mapping to an edge.

source
PhyloCoalSimulations.simulatecoalescentMethod
simulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals;
    nodemapping=false, inheritancecorrelation=0.0)

Simulate nloci gene trees with nindividuals from each species under the multispecies network coalescent, along network net whose branch lengths are assumed to be in coalescent units (ratio: number of generations / effective population size). The coalescent model uses the infinite-population-size approximation.

The random number generator rng is optional.

Output: vector of gene trees, of length nloci.

nindividuals can be a single integer, or a dictionary listing the number of individuals to be simulated for each species.

If nodemapping is true, each simulated gene tree is augmented with degree-2 nodes that can be mapped to speciation or hybridization events. The mapping of gene tree nodes & edges to network edges is carried by their attribute .intn1 (for nodes) and .inte1 (for edges). The mapping of gene tree nodes to network nodes is carried by the .name attribute. Namely:

  • A degree-3 node (1 parent + 2 children) represents a coalescent event that occurred along a population edge in net. Its .intn1 attribute is set to the number of that network population edge. Its parent edge has its .inte1 attribute also set to the number of the population edge that it originated from.
  • The gene tree's root node (of degree 2) represents a coalescent event along the network's root edge. Its .intn1 attribute is the number assigned to the network's root edge, which is set by get_rootedgenumber as the maximum edge number + 1.
  • A leaf (or degree-1 node) represents an individual. It maps to a species in net. The individual leaf name is set to the species name if nindividuals is 1. Otherwise, its name is set to speciesname_i where i is the individual number in that species. Its intn1 attribute is the default -1.
  • A non-root degree-2 node represents a speciation or hybridization and maps to a population node in net. Its intn1 attribute is the default -1. Its name is set to network node name, if it exists. If the network node has no name, the gene tree node is given a name built from the network node number.

By default, lineages at a hybrid node come from a parent (chosen according to inheritance probabilities γ) independently across lineages. Positive dependence can be simulated with option inheritancecorrelation. For example, if this correlation is set to 1, then all lineages inherit from the same (randomly sampled) parent. More generally, the lineages' parents are simulated according to a Dirichlet process with base distribution determined by the γ values, and with concentration parameter α = (1-r)/r, that is, r = 1/(1+α), where r is the input inheritance correlation. For more details about this model, please read the package manual or refer to Fogg, Allman & Ané (2023).

Assumptions:

  • net must have non-missing edge lengths and γ values.
  • If nindividuals is a dictionary, it must have a key for all species, with the same spelling of species names in its keys as in the tip labels of net.

examples

julia> using PhyloNetworks

julia> net = readnewick("(A:1,B:1);"); # branch lengths of 1 coalescent unit

julia> using Random; Random.seed!(54321); # for replicability of examples below

julia> simulatecoalescent(net, 2, 1) # 2 gene trees, 1 individual/species
2-element Vector{HybridNetwork}:
 PhyloNetworks.HybridNetwork, Rooted Network
2 edges
3 nodes: 2 tips, 0 hybrid nodes, 1 internal tree nodes.
tip labels: B, A
(B:1.023,A:1.023);

 PhyloNetworks.HybridNetwork, Rooted Network
2 edges
3 nodes: 2 tips, 0 hybrid nodes, 1 internal tree nodes.
tip labels: B, A
(B:2.328,A:2.328);


julia> simulatecoalescent(net, 1, 3)[1] # 1 gene tree, 3 individuals/species
PhyloNetworks.HybridNetwork, Rooted Network
10 edges
11 nodes: 6 tips, 0 hybrid nodes, 5 internal tree nodes.
tip labels: B_2, B_1, B_3, A_3, ...
(((B_2:0.12,B_1:0.12):2.39,B_3:2.51):0.692,(A_3:0.518,(A_2:0.461,A_1:0.461):0.057):2.684);


julia> simulatecoalescent(net, 1, Dict("A"=>2, "B"=>1))[1] # 2 individuals in A, 1 in B
PhyloNetworks.HybridNetwork, Rooted Network
4 edges
5 nodes: 3 tips, 0 hybrid nodes, 2 internal tree nodes.
tip labels: B, A_2, A_1
(B:2.801,(A_2:0.344,A_1:0.344):2.457);

In the next example, we use a custom random number generator (RNG), to show how to do so. In this example we use a stable RNG to make the example reproducible across julia versions. However, this particular RNG is not recommended. The RNG used by default is better (e.g. much more efficient).

julia> # using Pkg; Pkg.add("StableRNGs") # to install StableRNGs if not done earlier

julia> using StableRNGs

julia> rng = StableRNG(791);

julia> tree1 = simulatecoalescent(rng, net,2,2; nodemapping=true)[1]; # first gene tree only

julia> writenewick(tree1, round=true)
"(((B_2:0.82,B_1:0.82):0.18)minus2:0.992,((A_1:1.0)minus2:0.427,(A_2:1.0)minus2:0.427):0.565);"

julia> nameinternalnodes!(net); writenewick(net)
"(A:1.0,B:1.0)i1;"

julia> tree1 = simulatecoalescent(rng, net,2,2; nodemapping=true)[1]; writenewick(tree1, round=true)
"((B_2:1.0)i1:0.621,((B_1:1.0)i1:0.018,((A_1:0.61,A_2:0.61):0.39)i1:0.018):0.604);"

julia> printnodes(net)
node leaf  hybrid name i_cycle edges'numbers
1    true  false  A    -1      1   
2    true  false  B    -1      2   
-2   false false  i1   -1      1    2   

julia> printnodes(tree1)
node leaf  hybrid name i_cycle edges'numbers
10   false false       3       8    9   
8    false false  i1   -1      5    8   
5    true  false  B_2  -1      5   
9    false false       3       7    6    9   
7    false false  i1   -1      4    7   
4    true  false  B_1  -1      4   
6    false false  i1   -1      3    6   
3    false false       1       1    2    3   
1    true  false  A_1  -1      1   
2    true  false  A_2  -1      2   

julia> [(tree_edge_number = e.number,
         pop_edge_number = population_mappedto(e)) for e in tree1.edge]
9-element Vector{@NamedTuple{tree_edge_number::Int64, pop_edge_number::Int64}}:
 (tree_edge_number = 8, pop_edge_number = 3)
 (tree_edge_number = 5, pop_edge_number = 2)
 (tree_edge_number = 9, pop_edge_number = 3)
 (tree_edge_number = 7, pop_edge_number = 3)
 (tree_edge_number = 4, pop_edge_number = 2)
 (tree_edge_number = 6, pop_edge_number = 3)
 (tree_edge_number = 3, pop_edge_number = 1)
 (tree_edge_number = 1, pop_edge_number = 1)
 (tree_edge_number = 2, pop_edge_number = 1)
source
PhyloCoalSimulations.simulatecoalescentMethod
simulatecoalescent([rng::AbstractRNG,] net, nloci, nindividuals, populationsize;
    nodemapping=false, round_generationnumber=true,
    inheritancecorrelation=0.0)

Simulate nloci gene trees with nindividuals from each species under the multispecies network coalescent, along network net, whose branch lengths are assumed to be in number of generations. populationsize should be a single number, assumed to be the (haploid) effective population size Nₑ, constant across the species phylogeny. Alternatively, populationsize can be a dictionary mapping the number of each edge in net to its Nₑ, including an extra edge number for the population above the root of the network.

Coalescent units are then calculated as u=g/Nₑ where g is the edge length in net (number of generations), and the coalescent model is applied using the infinite-population-size approximation.

Output: vector of gene trees with edge lengths in number of generations, calculated as g=uNₑ and then rounded to be an integer, unless round_generationnumber is false.

Warning

When populationsize Nₑ is not provided as input, all edge lengths are in coalescent units. When populationsize is given as an argument, all edge lengths are in number of generations. The second method (using # generation and Nₑ as input) is a wrapper around the first (using coalescent units).

julia> using PhyloNetworks

julia> net = readnewick("(A:500,B:500);"); # branch lengths of 100 generations

julia> Ne = Dict(e.number => 1_000 for e in net.edge);

julia> rootedgenumber = PhyloCoalSimulations.get_rootedgenumber(net)
3

julia> push!(Ne, rootedgenumber => 2_000) # Ne for population above the root
Dict{Int64, Int64} with 3 entries:
  2 => 1000
  3 => 2000
  1 => 1000

julia> using Random; Random.seed!(54321); # for replicability of example below

julia> genetrees = simulatecoalescent(net, 2, 1, Ne);

julia> writeMultiTopology(genetrees, stdout) # branch lengths: number of generations
(B:546.0,A:546.0);
(B:3155.0,A:3155.0);
source

index