Discrete trait simulation

rand can simulate discrete traits along a known network.

binary trait example

For example, we can define a binary trait model with states "carnivory" (state 1) and "non-carnivory" (state 2), then ask for a trait to be simulated along our network. We can ask for 3 independent simulations, giving us 3 traits then, arranged in 3 rows.

julia> m1 = BinaryTraitSubstitutionModel(1.0,2.0, ["carnivory", "non-carnivory"])Binary Trait Substitution Model:
rate carnivory→non-carnivory α=1.0
rate non-carnivory→carnivory β=2.0

We also need a phylogeny. We re-use the network from the section on Discrete trait evolution. We then simulate traits with a stable random number generator (RNG) for reproducibility, but recommend using the default RNG by simply removing the argument rng below.

julia> net = readnewick("(O:4,(A:3,((B:0.4)#H1:1.6::0.92,((#H1:0::0.08,C:0.4):0.6,(D:.2,E:.2):0.8):1):1):1);");
julia> using StableRNGs; rng = StableRNG(123); # for reproducibility of this example
julia> traitmatrix, nodecolumn = rand(rng, m1, net; ntraits=3);
julia> traitmatrix3×13 Matrix{Int64}: 2 1 2 1 2 2 1 1 1 1 1 2 1 1 1 2 2 2 2 1 2 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 1 2 1

In this trait matrix, each column corresponds to a node, each row is a trait, and each entry gives the state of that trait for that node, as an index. To get the state labels:

julia> labs = getlabels(m1)2-element Vector{String}:
 "carnivory"
 "non-carnivory"
julia> labs[traitmatrix]3×13 Matrix{String}: "non-carnivory" "carnivory" … "non-carnivory" "carnivory" "carnivory" "carnivory" "carnivory" "carnivory" "non-carnivory" "carnivory" "non-carnivory" "carnivory"

The nodecolumn vector says which node corresponds to which column in the trait matrix, and we can compare to the node numbers in the network. For example, the first column corresponds to node -2, which is the root. (The root is always in the first column: that's where the simulation starts.)

julia> nodecolumn13-element Vector{String}:
 "-2"
 "-3"
 "-4"
 "-6"
 "-8"
 "E"
 "D"
 "-7"
 "C"
 "H1"
 "B"
 "A"
 "O"
julia> getroot(net)PhyloNetworks.Node: number:-2 attached to 2 edges, numbered: 1 13

As another example, we can extract the data for species "A". We first get the column number for "A" (column 12), then get A's data in that column.

julia> findfirst(isequal("A"), nodecolumn) # 1212
julia> nodecolumn[12]"A"
julia> traitmatrix[:,12] # data for species A, as category indices3-element Vector{Int64}: 2 1 2
julia> labs[traitmatrix[:,12]] # same data, as category names3-element Vector{String}: "non-carnivory" "carnivory" "non-carnivory"

example of DNA simulation

Below we simulate 4 sites of a DNA alignment, independent from an HKY model with transition/transversion ratio κ=0.5 and stationary base frequencies of 0.2 for A and T and 0.3 for C and G:

and

julia> m2 = HKY85([.5], [0.20, 0.30, 0.30, 0.20])HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
relative rate version with transition/tranversion ratio kappa = 0.5,
 scaled so that there is one substitution per unit time
rate matrix Q:
               A       C       G       T
       A       *  0.4839  0.2419  0.3226
       C  0.3226       *  0.4839  0.1613
       G  0.1613  0.4839       *  0.3226
       T  0.3226  0.2419  0.4839       *
julia> rng = StableRNG(36154); # again, for reproducibility
julia> traitmatrix, nodecolumn = rand(rng, m2, net; ntraits=4);
julia> traitmatrix4×13 Matrix{Int64}: 2 2 2 4 3 4 2 4 4 4 4 2 2 2 3 2 3 3 2 4 4 2 2 1 3 3 1 2 4 1 3 4 3 2 2 4 1 2 4 1 1 1 3 4 4 4 3 3 1 3 1 3
julia> labs = getlabels(m2)4-element Vector{BioSymbols.DNA}: DNA_A DNA_C DNA_G DNA_T

To get the data at the tips only, and in a specific order, we can do this.

julia> taxa = tiplabels(net)6-element Vector{String}:
 "O"
 "A"
 "B"
 "C"
 "D"
 "E"
julia> taxoncol = indexin(taxa, nodecolumn) # column indices to get taxa in order6-element Vector{Union{Nothing, Int64}}: 13 12 11 9 7 6
julia> labs[traitmatrix[:,taxoncol]] # trait at the tips only, ordered as in 'taxa'4×6 Matrix{BioSymbols.DNA}: DNA_C DNA_C DNA_T DNA_T DNA_C DNA_T DNA_G DNA_G DNA_A DNA_C DNA_T DNA_C DNA_T DNA_C DNA_A DNA_C DNA_G DNA_T DNA_G DNA_A DNA_G DNA_G DNA_T DNA_T

This type of DNA data is from the package BioSymbols, for interoperability with packages from BioJulia.

julia> using BioSequences
julia> d = Dict(taxa[i] => LongDNA{4}(labs[traitmatrix[:,taxoncol[i]]]) for i in eachindex(taxa))Dict{String, BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}} with 6 entries: "B" => TAAG "A" => CGCA "C" => TCCG "D" => CTGT "E" => TCTT "O" => CGTG