public documentation

Documentation for PhyloTraits's public (exported) functions. Most functions are internal (not exported).

functions & types

Base.randFunction
rand([rng::AbstractRNG,]
      net::HybridNetwork,
      params::ParamsProcess,
      checkpreorder::Bool=true)

Simulate traits on net using the parameters params. For now, only parameters of type ParamsBM (univariate Brownian Motion) and ParamsMultiBM (multivariate Brownian motion) are accepted.

The simulation using a recursion from the root to the tips of the network, therefore, a pre-ordering of nodes is needed. If checkpreorder=true (default), PhyloNetworks.preorder! is called on the network beforehand. Otherwise, it is assumed that the preordering has already been calculated.

Returns an object of type TraitSimulation, which has a matrix with the trait expectations and simulated trait values at all the nodes.

See examples below for accessing expectations and simulated trait values.

Examples

Univariate

julia> phy = readnewick("(A:2.5,((U:1,#H1:0.5::0.4):1,(C:1,(D:0.5)#H1:0.5::0.6):1):0.5);");

julia> par = ParamsBM(1, 0.1) # BM with expectation 1 and variance 0.1.
ParamsBM:
Parameters of a BM with fixed root:
mu: 1.0
Sigma2: 0.1


julia> sim = rand(phy, par) # simulate along the network
TraitSimulation:
Trait simulation results on a network with 4 tips, using a BM model, with parameters:
mu: 1.0
Sigma2: 0.1

Below, we re-run the same simulation but with our own fixed random number generator for reproducibility.

julia> # using Pkg; Pkg.add("StableRNGs") # to install StableRNGs if not done earlier

julia> using StableRNGs; rng = StableRNG(791); # for reproducibility

julia> sim = rand(rng, phy, par); # re-simulate

julia> traits = sim[:tips] # extract simulated values at the tips.
4-element Vector{Float64}:
 0.5991561486238962
 0.861066346792992
 1.367634062992289
 1.6439310845929571

julia> tiplabels(sim) # name of tips, in the same order as values above
4-element Vector{String}:
 "A"
 "U"
 "C"
 "D"

So, for example, the simulated trait value for taxon U (listed second) is ~0.86. For some purposes, we might want to access the values simulated at internal nodes, or at all nodes at once:

julia> traits = sim[:internalnodes] # extract simulated values at internal nodes. Order: as in sim.M.internalnodenumbers
5-element Vector{Float64}:
 1.0716684901027937
 1.6007823049044083
 1.6756374575490327
 1.2194286026283034
 1.0

julia> traits = sim[:all] # simulated values at all nodes, ordered as in sim.M.nodenumbers_toporder
9-element Vector{Float64}:
 1.0
 1.2194286026283034
 1.6756374575490327
 1.367634062992289
 1.0716684901027937
 1.6007823049044083
 1.6439310845929571
 0.861066346792992
 0.5991561486238962

We might also want to extract the expected mean values (without noise). This is not very interesting under a standard BM model, but may become interesting under more complex models (e.g. with shifts). We can do so with an extra :exp index:

julia> traits = sim[:tips, :exp] # expected values at the tips (also works for sim[:all, :exp] and sim[:internalnodes, :exp]).
4-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0

Multivariate

julia> phy = readnewick("(A:2.5,((B:1,#H1:0.5::0.4):1,(C:1,(V:0.5)#H1:0.5::0.6):1):0.5);");

julia> par = ParamsMultiBM([1.0, 2.0], [1.0 0.5; 0.5 1.0]) # BM with expectation [1.0, 2.0] and variance [1.0 0.5; 0.5 1.0].
ParamsMultiBM:
Parameters of a MBD with fixed root:
mu: [1.0, 2.0]
Sigma: [1.0 0.5; 0.5 1.0]

julia> using StableRNGs; rng = StableRNG(9851); # for reproducibility

julia> sim = rand(rng, phy, par) # simulate on the phylogeny
TraitSimulation:
Trait simulation results on a network with 4 tips, using a MBD model, with parameters:
mu: [1.0, 2.0]
Sigma: [1.0 0.5; 0.5 1.0]


julia> traits = sim[:tips] # extract simulated values at the tips (each column contains the simulated traits for one node).
2×4 Matrix{Float64}:
 3.8013    0.839485  0.346092  1.91131
 5.91725  -0.59143   0.458569  0.629048

The 2 rows to the 2 correlated traits and the columns correspond to the 4 taxa (tips). The order in which taxa are listed is obtained with tiplabels:

julia> tiplabels(sim)
4-element Vector{String}:
 "A"
 "B"
 "C"
 "V"

As in the univariate case, we can also extract the values simulated at internal nodes, or all nodes (but listed in preorder), or expected mean values.

julia> sim[:internalnodes] # simulated values at internal nodes. order: same as in sim.M.internalnodenumbers
2×5 Matrix{Float64}:
 -0.604224  0.755722  2.14755  0.484292  1.0
 -0.338922  0.373921  1.15174  0.695561  2.0

julia> traits = sim[:all]; # 2×9 Matrix: values at all nodes, ordered as in sim.M.nodenumbers_toporder

julia> sim[:tips, :exp] # expected values (also works for sim[:all, :exp] and sim[:internalnodes, :exp])
2×4 Matrix{Float64}:
 1.0  1.0  1.0  1.0
 2.0  2.0  2.0  2.0
source
Base.randMethod
rand([rng::AbstractRNG,]
      model::TraitSubstitutionModel,
      t::Float64,
      start::AbstractVector{Int})

Simulate discrete traits along one edge of length t. A random number generator rng is optional. start must be a vector of integers, each representing the starting value of one trait.

examples

julia> m1 = BinaryTraitSubstitutionModel(1.0, 2.0)
Binary Trait Substitution Model:
rate 0→1 α=1.0
rate 1→0 β=2.0

julia> using StableRNGs; rng = StableRNG(135);

julia> rand(rng, m1, 0.2, [1,2,1,2,1])
5-element Vector{Int64}:
 2
 2
 1
 2
 1
source
Base.randMethod
rand([rng::AbstractRNG,]
      model::TraitSubstitutionModel,
      net::HybridNetwork;
      ntraits=1,
      keepinternal=true,
      checkpreorder=true)

Simulate evolution of discrete traits on a rooted evolutionary network based on the supplied evolutionary model. Trait sampling is uniform at the root.

optional arguments:

  • ntraits: number of traits to be simulated (default: 1 trait).
  • keepinternal: if true, export character states at all nodes, including internal nodes. if false, export character states at tips only.

output:

  • matrix of character states with one row per trait, one column per node; these states are indices in model.label, not the trait labels themselves.
  • vector of node labels (for tips) or node numbers (for internal nodes) in the same order as columns in the character state matrix

examples

julia> m1 = BinaryTraitSubstitutionModel(1.0, 2.0, ["low","high"]);

julia> net = readnewick("(((A:4.0,(B:1.0)#H1:1.1::0.9):0.5,(C:0.6,#H1:1.0::0.1):1.0):3.0,D:5.0);");

julia> using Random; Random.seed!(95);

julia> trait, lab = rand(m1, net)
([1 2 … 1 1], ["-2", "D", "-3", "-6", "C", "-4", "H1", "B", "A"])

julia> trait
1×9 Matrix{Int64}:
 1  2  1  1  2  2  1  1  1

julia> lab
9-element Vector{String}:
 "-2" 
 "D"  
 "-3" 
 "-6" 
 "C"  
 "-4" 
 "H1"
 "B"  
 "A"  
source
PhyloTraits.QMethod

For a BinaryTraitSubstitutionModel, the rate matrix Q is of the form:

-α  α
 β -β
source
PhyloTraits.QMethod
Q(model)

Substitution rate matrix for a given substitution model: Q[i,j] is the rate of transitioning from state i to state j.

source
PhyloTraits.ancestralreconstructionFunction
ancestralreconstruction(obj::SSM, trait::Integer = 1)

Estimate the marginal probability of ancestral states for discrete character number trait (first trait by default). The parameters of the StatisticalSubstitutionModel object obj must first be fitted using fitdiscrete, and ancestral state reconstruction is conditional on the estimated parameters. If these parameters were estimated using all traits, they are used as is, to do ancestral state reconstruction of the particular trait of interest.

output: data frame with a first column for the node numbers, a second column for the node labels, and a column for each possible state: the entries in these columns give the marginal probability that a given node has a given state.

warnings:

  • node numbers and node labels refer to those in obj.net, which might have a different internal representation of nodes than the original network used to build obj.
  • obj is modified: its likelihood fields (forward, directional & backward) are updated to make sure that they correspond to the current parameter values in obj.model, and to the trait of interest.

limitations: the following are not checked.

  • Assumes that every node in the large network is also present (with descendant leaves) in each displayed tree. This is not true if the network is not tree-child...
  • Assumes that the root is also in each displayed tree, which may not be the case if the root had a hybrid child edge.

See also posterior_logtreeweight and discrete_backwardlikelihood_trait! to update obj.backwardlik.

examples

julia> net = readnewick("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]);

julia> using DataFrames

julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);

julia> fit1 = fitdiscrete(net, m1, dat);

julia> asr = ancestralreconstruction(fit1)
9×4 DataFrame
 Row │ nodenumber  nodelabel  lo        hi
     │ Int64       String     Float64   Float64
─────┼───────────────────────────────────────────
   1 │          1  A          1.0       0.0
   2 │          2  B          1.0       0.0
   3 │          3  C          0.0       1.0
   4 │          4  D          0.0       1.0
   5 │          5  5          0.286021  0.713979
   6 │          6  6          0.319456  0.680544
   7 │          7  7          0.16855   0.83145
   8 │          8  8          0.767359  0.232641
   9 │          9  H1         0.782776  0.217224

julia> using PhyloPlots

julia> plot(fit1.net, nodelabel = asr[!,[:nodenumber, :lo]], tipoffset=0.2); # pp for "lo" state
source
PhyloTraits.ancestralreconstructionMethod
ancestralreconstruction(fr::AbstractDataFrame, net::HybridNetwork; kwargs...)

Estimate the ancestral traits on a network, given some data at the tips. Uses function phylolm to perform a phylogenetic regression of the data against an intercept (amounts to fitting an evolutionary model on the network).

See documentation on phylolm and ancestralreconstruction(obj::PhyloNetworkLinearModel[, X_n::Matrix]) for further details.

Returns an object of type ReconstructedStates.

source
PhyloTraits.ancestralreconstructionMethod
ancestralreconstruction(obj::PhyloNetworkLinearModel[, X_n::Matrix])

Function to find the ancestral traits reconstruction on a network, given an object fitted by function phylolm. By default, the function assumes that the regressor is just an intercept. If the value of the regressor for all the ancestral states is known, it can be entered in X_n, a matrix with as many columns as the number of predictors used, and as many lines as the number of unknown nodes or tips.

Returns an object of type ReconstructedStates.

Examples

julia> using DataFrames, CSV # to read data file

julia> phy = readnewick(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "carnivores_tree.txt"));

julia> dat = CSV.read(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "carnivores_trait.txt"), DataFrame);

julia> using StatsModels # for statistical model formulas

julia> fitBM = phylolm(@formula(trait ~ 1), dat, phy);

julia> ancStates = ancestralreconstruction(fitBM) # Should produce a warning, as variance is unknown.
┌ Warning: These prediction intervals show uncertainty in ancestral values,
│ assuming that the estimated variance rate of evolution is correct.
│ Additional uncertainty in the estimation of this variance rate is
│ ignored, so prediction intervals should be larger.
└ @ PhyloTraits ~/build/JuliaPhylo/PhyloTraits.jl/src/traits_continuous.jl:2601
ReconstructedStates:
───────────────────────────────────────────────
  Node index      Pred.        Min.  Max. (95%)
───────────────────────────────────────────────
        -5.0   1.32139   -0.33824      2.98102
        -8.0   1.03258   -0.589695     2.65485
        -7.0   1.41575   -0.140705     2.97221
        -6.0   1.39417   -0.107433     2.89577
        -4.0   1.39961   -0.102501     2.90171
        -3.0   1.51341   -0.220523     3.24733
       -13.0   5.3192     3.92279      6.71561
       -12.0   4.51176    2.89222      6.13131
       -16.0   1.50947   -0.0186118    3.03755
       -15.0   1.67425    0.196069     3.15242
       -14.0   1.80309    0.309992     3.29618
       -11.0   2.7351     1.17608      4.29412
       -10.0   2.73217    1.12361      4.34073
        -9.0   2.41132    0.603932     4.21871
        -2.0   2.04138   -0.0340955    4.11686
        14.0   1.64289    1.64289      1.64289
         8.0   1.67724    1.67724      1.67724
         5.0   0.331568   0.331568     0.331568
         2.0   2.27395    2.27395      2.27395
         4.0   0.275237   0.275237     0.275237
         6.0   3.39094    3.39094      3.39094
        13.0   0.355799   0.355799     0.355799
        15.0   0.542565   0.542565     0.542565
         7.0   0.773436   0.773436     0.773436
        10.0   6.94985    6.94985      6.94985
        11.0   4.78323    4.78323      4.78323
        12.0   5.33016    5.33016      5.33016
         1.0  -0.122604  -0.122604    -0.122604
        16.0   0.73989    0.73989      0.73989
         9.0   4.84236    4.84236      4.84236
         3.0   1.0695     1.0695       1.0695
───────────────────────────────────────────────

julia> using StatsBase # for predict function

julia> predict(ancStates)
31×2 DataFrame
 Row │ nodenumber  prediction 
     │ Int64       Float64    
─────┼────────────────────────
   1 │         -5    1.32139
   2 │         -8    1.03258
   3 │         -7    1.41575
   4 │         -6    1.39417
   5 │         -4    1.39961
   6 │         -3    1.51341
   7 │        -13    5.3192
   8 │        -12    4.51176
  ⋮  │     ⋮           ⋮
  25 │         10    6.94985
  26 │         11    4.78323
  27 │         12    5.33016
  28 │          1   -0.122604
  29 │         16    0.73989
  30 │          9    4.84236
  31 │          3    1.0695
               16 rows omitted

julia> predict(ancStates, interval = :prediction)
31×4 DataFrame
 Row │ nodenumber  prediction  lower       upper     
     │ Int64       Float64     Float64     Float64   
─────┼───────────────────────────────────────────────
   1 │         -5    1.32139   -0.33824     2.98102
   2 │         -8    1.03258   -0.589695    2.65485
   3 │         -7    1.41575   -0.140705    2.97221
   4 │         -6    1.39417   -0.107433    2.89577
   5 │         -4    1.39961   -0.102501    2.90171
   6 │         -3    1.51341   -0.220523    3.24733
   7 │        -13    5.3192     3.92279     6.71561
   8 │        -12    4.51176    2.89222     6.13131
  ⋮  │     ⋮           ⋮           ⋮           ⋮
  25 │         10    6.94985    6.94985     6.94985
  26 │         11    4.78323    4.78323     4.78323
  27 │         12    5.33016    5.33016     5.33016
  28 │          1   -0.122604  -0.122604   -0.122604
  29 │         16    0.73989    0.73989     0.73989
  30 │          9    4.84236    4.84236     4.84236
  31 │          3    1.0695     1.0695      1.0695
                                      16 rows omitted

julia> using PhyloPlots # next: plot ancestral states on the tree

julia> plot(phy, nodelabel = predict(ancStates));

julia> pred = predict(ancStates, interval = :prediction, text = true);

julia> plot(phy, nodelabel = pred[!,[:nodenumber,:interval]]);

julia> allowmissing!(dat, :trait);

julia> dat[[2, 5], :trait] .= missing; # missing values allowed to fit model

julia> fitBM = phylolm(@formula(trait ~ 1), dat, phy);

julia> ancStates = ancestralreconstruction(fitBM);
┌ Warning: These prediction intervals show uncertainty in ancestral values,
│ assuming that the estimated variance rate of evolution is correct.
│ Additional uncertainty in the estimation of this variance rate is
│ ignored, so prediction intervals should be larger.
└ @ PhyloTraits ~/build/JuliaPhylo/PhyloTraits.jl/src/traits_continuous.jl:2601

julia> first(predict(ancStates), 3) # looking at first 3 nodes only
3×2 DataFrame
 Row │ nodenumber  prediction 
     │ Int64       Float64    
─────┼────────────────────────
   1 │         -5     1.42724
   2 │         -8     1.35185
   3 │         -7     1.61993

julia> first(predict(ancStates, interval=:prediction), 3)
3×4 DataFrame
 Row │ nodenumber  prediction  lower      upper   
     │ Int64       Float64     Float64    Float64 
─────┼────────────────────────────────────────────
   1 │         -5     1.42724  -0.373749  3.22824
   2 │         -8     1.35185  -0.698432  3.40214
   3 │         -7     1.61993  -0.17179   3.41165

julia> plot(phy, nodelabel = predict(ancStates, text=true));

julia> pred = predict(ancStates, interval = :prediction, text = true);

julia> plot(phy, nodelabel = pred[!,[:nodenumber,:interval]]);
source
PhyloTraits.ancestralreconstructionMethod
ancestralreconstruction(net::HybridNetwork, Y::Vector, params::ParamsBM)

Compute the conditional expectations and variances of the ancestral (un-observed) traits values at the internal nodes of the phylogenetic network (net), given the values of the traits at the tips of the network (Y) and some known parameters of the process used for trait evolution (params, only BM with fixed root works for now).

This function assumes that the parameters of the process are known. For a more general function, see ancestralreconstruction(obj::PhyloNetworkLinearModel[, X_n::Matrix]).

source
PhyloTraits.descendencedataframeMethod
descendencedataframe(net::HybridNetwork, which=:allhybrids; checkpreorder=true)
descendencedataframe(net::HybridNetwork, node::Vector{Node}; checkpreorder=true)
descendencedataframe(net::HybridNetwork, edge::Vector{Edge}; checkpreorder=true)

Data frame containing the genomic proportion inherited by each taxon in net, from each hybrid node by default in the first method, or from each node or edge in the input vector in the second and third methods. The data frame has 1 row per tip (taxon) in the network and the following columns:

  • 1 column per edge or node, with columns named according to the pattern shift{edgenumber}" where edge_number is the number of the edge associated with an input edge or node (in which case the parent edge is used)
  • 1 additional column labeled tipnames, containing the tip labels.

The shift_* columns in this data frame can be used as regressor vectors associated with shifts on input edges or on edges that are above input nodes. With option which=:allhybrids in last method, each shift column is associated with a hybrid node in net, to model a shift on the edge that is immediately below the hybrid node. This can be used to test for transgressive evolution: see examples below.

These methods use PhyloNetworks.descendencematrix, so net might be modified to store a vector of its nodes sorted in a pre-order.

Examples

julia> net = readnewick("(A:2.5,((B:1,#H1:0.5::0.4):1,(C:1,(D:0.5)#H1:0.5::0.6):1):0.5);");

julia> preorder!(net)

julia> using PhyloPlots

julia> plot(net, shownodenumber=true); # to locate nodes

julia> nodes_shifts = indexin([1,-5], [n.number for n in net.node]) # Put a shift on edges ending at nodes 1 and -5
2-element Vector{Union{Nothing, Int64}}:
 1
 7

julia> params = ParamsBM(10, 0.1, ShiftNet(net.node[nodes_shifts], [3.0, -3.0],  net))
ParamsBM:
Parameters of a BM with fixed root:
mu: 10.0
Sigma2: 0.1

There are 2 shifts on the network:
──────────────────────────
  Edge Number  Shift Value
──────────────────────────
          8.0         -3.0
          1.0          3.0
──────────────────────────

julia> using Random; Random.seed!(2468); # sets the seed for reproducibility

julia> sim = rand(net, params); # simulate a dataset with shifts

julia> using DataFrames # to handle data frames

julia> dat = DataFrame(trait = sim[:tips], tipnames = sim.M.tipnames);

julia> dat = DataFrame(trait = [13.391976856737717, 9.55741491696386, 7.17703734817448, 7.889062527849697],
        tipnames = ["A","B","C","D"]) # hard-coded, to be independent of random number generator
4×2 DataFrame
 Row │ trait     tipnames 
     │ Float64   String   
─────┼────────────────────
   1 │ 13.392    A
   2 │  9.55741  B
   3 │  7.17704  C
   4 │  7.88906  D

julia> dfr_shift = descendencedataframe(net, net.node[nodes_shifts]) # the regressors matching the shifts.
4×3 DataFrame
 Row │ shift_1  shift_8  tipnames 
     │ Float64  Float64  String   
─────┼────────────────────────────
   1 │     1.0      0.0  A
   2 │     0.0      0.0  B
   3 │     0.0      1.0  C
   4 │     0.0      0.6  D

julia> dfr = innerjoin(dat, dfr_shift, on=:tipnames); # join data and regressors in a single dataframe

julia> using StatsModels # for statistical model formulas

julia> fitBM = phylolm(@formula(trait ~ shift_1 + shift_8), dfr, net; reml=false) # actual fit
PhyloNetworkLinearModel

Formula: trait ~ 1 + shift_1 + shift_8

Model: Brownian motion

Parameter Estimates, using ML:
phylogenetic variance rate: 0.0112618

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)   9.48238    0.327089  28.99    0.0220    5.32632   13.6384
shift_1       3.9096     0.46862    8.34    0.0759   -2.04479    9.86399
shift_8      -2.4179     0.422825  -5.72    0.1102   -7.7904     2.95461
────────────────────────────────────────────────────────────────────────
Log Likelihood: 1.8937302027
AIC: 4.2125395947

Next we illustrate the model with heterosis, aka transgressive evolution: with a shift in the trait after successful hybridization. First how to simulated according to this model:

julia> nodes_hybrids = indexin([5], [n.number for n in net.node]); # Put a shift on edges below hybrids

julia> params = ParamsBM(10, 0.1, ShiftNet(net.node[nodes_hybrids], [3.0],  net));

julia> using Random; Random.seed!(2468); # sets the seed for reproducibility

julia> sim = rand(net, params); # simulate a dataset with shifts

julia> dat = DataFrame(trait = sim[:tips], tipnames = sim.M.tipnames);

and next how to analyze data under a transgressive evolution model. Below we hard-code data values for more reproducibility.

julia> dat = DataFrame(trait = [10.391976856737717, 9.55741491696386, 10.17703734817448, 12.689062527849698],
          tipnames = ["A","B","C","D"])
4×2 DataFrame
 Row │ trait     tipnames 
     │ Float64   String   
─────┼────────────────────
   1 │ 10.392    A
   2 │  9.55741  B
   3 │ 10.177    C
   4 │ 12.6891   D

julia> dfr_hybrid = descendencedataframe(net) # the regressors matching the hybrids (all of them)
4×3 DataFrame
 Row │ shift_6  tipnames  sum     
     │ Float64  String    Float64 
─────┼────────────────────────────
   1 │     0.0  A             0.0
   2 │     0.0  B             0.0
   3 │     0.0  C             0.0
   4 │     1.0  D             1.0

julia> dfr = innerjoin(dat, dfr_hybrid, on=:tipnames); # join data and regressors in a single dataframe

julia> using StatsModels

julia> fitBM = phylolm(@formula(trait ~ shift_6), dfr, net; reml=false) # actual fit
PhyloNetworkLinearModel

Formula: trait ~ 1 + shift_6

Model: Brownian motion

Parameter Estimates, using ML:
phylogenetic variance rate: 0.041206

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  10.064      0.277959  36.21    0.0008    8.86805   11.26
shift_6       2.72526    0.315456   8.64    0.0131    1.36796    4.08256
────────────────────────────────────────────────────────────────────────
Log Likelihood: -0.7006021946
AIC: 7.4012043891

See also

phylolm, PhyloNetworks.descendencematrix.

source
PhyloTraits.empiricalDNAfrequenciesFunction
empiricalDNAfrequencies(DNAdata::AbstractDataFrame, DNAweights,
                        correction=true, useambiguous=true)

Estimate base frequencies in DNA data DNAdata, ordered ACGT.

  • DNAdata: data frame. All columns are used. If the first column gives species names, find a way to ignore it before calculating empirical frequencies, e.g. empiricalDNAfrequencies(view(DNAdata, :, 2:size(DNAdata, 2))). Data type must be BioSymbols.DNA or Char or String. WARNING: this is checked on the first column only.
  • DNAweights: vector of weights, to weigh each column in DNAdata.
  • correction: if true, add 1 to each count and 4 to the denominator for a more stable estimator, similar to Bayes prior of 1/4 and the Agresti-Coull interval in binomial estimation.
  • useambiguous: if true, ambiguous bases are used (except gaps and Ns). For example, Y adds 0.5 weight to C and 0.5 weight to T.
source
PhyloTraits.fitdiscreteMethod
fitdiscrete(net, model, tipdata)
fitdiscrete(net, model, RateVariationAcrossSites, tipdata)
fitdiscrete(net, model, species, traits)
fitdiscrete(net, model, RateVariationAcrossSites, species, traits)
fitdiscrete(net, model, dnadata, dnapatternweights)
fitdiscrete(net, model, RateVariationAcrossSites, dnadata, dnapatternweights)
fitdiscrete(net, modSymbol, species, traits)
fitdiscrete(net, modSymbol, dnadata, dnapatternweights)

Calculate the maximum likelihood (ML) score of a network or tree given one or more discrete characters at the tips. Along each edge, transitions are modelled with a continous time Markov model, whose parameters are estimated (by maximizing the likelihood). At each hybrid node, the trait is assumed to be inherited from either of the two immediate parents according to the parents' average genetic contributions (inheritance γ). The model ignores incomplete lineage sorting. The algorithm extracts all trees displayed in the network.

Data can given in one of the following:

  • tipdata: dictionary taxon => state label, for a single trait.
  • tipdata: data frame for a single trait, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait labels are to appear in column 2 or in a column named "trait". Here, trait labels should be as they appear in getlabels(model).
  • species: vector of strings, and traits: DataFrame of traits, with rows in the order corresponding to the order of species names. Again, trait labels should be as they appear in getlabels(model). All traits are assumed to follow the same model, with same parameters.
  • dnadata: the first part of the output of readfastatodna, a dataframe of BioSequence DNA sequences, with taxon in column 1 and a column for each site.
  • dnapatternweights: the second part of the output of readfastatodna, an array of weights, one weights for each of the site columns. The length of the weight is equal to nsites. If using dnapatternweights, must provide dnadata.
  • RateVariationAcrossSites: model for rate variation (optional)

Optional arguments (default):

  • optimizeQ (true): should model rate parameters be fixed, or should they be optimized?
  • optimizeRVAS (true): should the model optimize the parameters for the variability of rates across sites (α and/or p_invariable)?
  • NLoptMethod (:LN_COBYLA, derivative-free) for the optimization algorithm. For other options, see the NLopt.
  • tolerance values to control when the optimization is stopped: ftolRel (1e-12), ftolAbs (1e-10) on the likelihood, and xtolRel (1e-10), xtolAbs (1e-10) on the model parameters.
  • bounds for the alpha parameter of the Gamma distribution of rates across sites: alphamin=0.05, alphamax=50.
  • verbose (false): if true, more information is output.
  • suppresswarnings (false): if true, warnings from check_matchtaxonnames! are suppressed.

examples:

julia> net = readnewick("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");

julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]);

julia> using DataFrames

julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);

julia> fit1 = fitdiscrete(net, m1, dat)
PhyloTraits.StatisticalSubstitutionModel:
Binary Trait Substitution Model:
  rate lo→hi α=0.27222
  rate hi→lo β=0.34981
on a network with 1 reticulations
data:
  4 species
  1 trait
log-likelihood: -2.7277

julia> tips = Dict("A" => "lo", "B" => "lo", "C" => "hi", "D" => "hi");

julia> fit2 = fitdiscrete(net, m1, tips; xtolRel=1e-16, xtolAbs=1e-16, ftolRel=1e-16)
PhyloTraits.StatisticalSubstitutionModel:
Binary Trait Substitution Model:
  rate lo→hi α=0.27222
  rate hi→lo β=0.34981
on a network with 1 reticulations
data:
  4 species
  1 trait
log-likelihood: -2.7277

Note that a copy of the network is stored in the fitted object, but the internal representation of the network may be different in fit1.net and in the original network net:

julia> net = readnewick("(sp1:3.0,(sp2:2.0,(sp3:1.0,sp4:1.0):1.0):1.0);");

julia> using BioSymbols

julia> tips = Dict("sp1" => BioSymbols.DNA_A, "sp2" => BioSymbols.DNA_A, "sp3" => BioSymbols.DNA_G, "sp4" => BioSymbols.DNA_G);

julia> mJC69 = JC69([0.25], false);

julia> fitJC69 = fitdiscrete(net, mJC69, tips)
PhyloTraits.StatisticalSubstitutionModel:
Jukes and Cantor 69 Substitution Model,
  absolute rate version
  off-diagonal rates equal to 0.29233/3.
  rate matrix Q:
                 A       C       G       T
         A       *  0.0974  0.0974  0.0974
         C  0.0974       *  0.0974  0.0974
         G  0.0974  0.0974       *  0.0974
         T  0.0974  0.0974  0.0974       *
on a network with 0 reticulations
data:
  4 species
  1 trait
log-likelihood: -4.99274

julia> rv = RateVariationAcrossSites(alpha=1.0, ncat=4)
Rate variation across sites: discretized Gamma
alpha: 1.0
categories for Gamma discretization: 4
rates: [0.146, 0.513, 1.071, 2.27]

julia> fitdiscrete(net, mJC69, rv, tips; optimizeQ=false, optimizeRVAS=false)
PhyloTraits.StatisticalSubstitutionModel:
Jukes and Cantor 69 Substitution Model,
  absolute rate version
  off-diagonal rates equal to 0.25/3.
  rate matrix Q:
                 A       C       G       T
         A       *  0.0833  0.0833  0.0833
         C  0.0833       *  0.0833  0.0833
         G  0.0833  0.0833       *  0.0833
         T  0.0833  0.0833  0.0833       *
Rate variation across sites: discretized Gamma
  alpha: 1.0
  categories for Gamma discretization: 4
  rates: [0.146, 0.513, 1.071, 2.27]
on a network with 0 reticulations
data:
  4 species
  1 trait
log-likelihood: -5.2568

fixit: add option to allow users to specify root prior, using either equal frequencies or stationary frequencies for trait models.

source
PhyloTraits.getlabelsMethod

for a given NucleicAcidSubstitutionModel, labels are symbols from BioSymbols. For now, only ACGTs are allowed. (When fitting data, any ambiguity code in the data would be treated as missing value).

examples

julia> getlabels(JC69([0.03], false))
4-element Vector{BioSymbols.DNA}:
 DNA_A
 DNA_C
 DNA_G
 DNA_T

julia> getlabels(HKY85([.5], repeat([0.25], 4)))
4-element Vector{BioSymbols.DNA}:
 DNA_A
 DNA_C
 DNA_G
 DNA_T
source
PhyloTraits.getshiftedgenumberMethod
getshiftedgenumber(shift::ShiftNet)

Get the edge numbers where the shifts are located, for an object ShiftNet. If a shift is placed at the root node with no parent edge, the edge number of a shift is set to -1 (as if missing).

source
PhyloTraits.nparamsMethod
nparams(model)

Number of parameters for a given trait evolution model (length of field model.rate).

source
PhyloTraits.nstatesMethod

for a BinaryTraitSubstitutionModel, this is 2:

julia> m1 = BinaryTraitSubstitutionModel([1.0,2.0], ["low","high"])
Binary Trait Substitution Model:
rate low→high α=1.0
rate high→low β=2.0

julia> nstates(m1)
2
source
PhyloTraits.nstatesMethod

For example, this is 4 for a NucleicAcidSubstitutionModel.

julia> nstates(JC69([0.03], false))
4

julia> nstates(HKY85([.5], [0.25, 0.25, 0.25, 0.25]))
4
source
PhyloTraits.phylolmMethod
phylolm(f::StatsModels.FormulaTerm, fr::AbstractDataFrame, net::HybridNetwork; kwargs...)

Fit a phylogenetic linear regression model to data. Return an object of type PhyloNetworkLinearModel. It contains a linear model from the GLM package, in object.lm, of type GLM.LinearModel.

Arguments

  • f: formula to use for the regression, see StatsModels
  • fr: DataFrame containing the response values, predictor values, species/tip labels for each observation/row.
  • net: phylogenetic network to use. Should have labelled tips.

Keyword arguments

  • model="BM": model for trait evolution (as a string) "lambda" (Pagel's lambda), "scalinghybrid" are other possible values (see ContinuousTraitEM)
  • tipnames=:tipnames: column name for species/tip-labels, represented as a symbol. For example, if the column containing the species/tip labels in fr is named "Species", then do tipnames=:Species.
  • no_names=false: If true, force the function to ignore the tips names. The data is then assumed to be in the same order as the tips of the network. Default is false, setting it to true is dangerous, and strongly discouraged.
  • reml=true: if true, use REML criterion ("restricted maximum likelihood") for estimating variance components, else use ML criterion.
  • suppresswarnings=false: if true, PhyloTraits-specific warnings are suppressed. Currently, only Pagel's lambda model may throw a warning (if the network is not time-consistent).

The following tolerance parameters control the optimization of lambda if model="lambda" or model="scalinghybrid", and control the optimization of the variance components if model="BM" and withinspecies_var=true.

  • fTolRel=1e-10: relative tolerance on the likelihood value

  • fTolAbs=1e-10: absolute tolerance on the likelihood value

  • xTolRel=1e-10: relative tolerance on the parameter value

  • xTolAbs=1e-10: absolute tolerance on the parameter value

  • startingValue=0.5: If model="lambda" or "scalinghybrid", this provides the starting value for the optimization in lambda.

  • fixedValue=missing: If model="lambda" or "scalinghybrid", and fixedValue is a number, then lambda is set to this number and is not optimized.

  • withinspecies_var=false: If true, fits a within-species variation model. Currently only implemented for model="BM".

  • y_mean_std::Bool=false: If true, and withinspecies_var=true, then accounts for within-species variation, using species-level statistics provided in fr.

Methods applied to fitted models

To access the response values, do response(object). To access the model matrix, do modelmatrix(object). To access the model formula, do formula(object).

Within-species variation

For a high-level description, see PhyloNetworkLinearModel. To fit a model with within-species variation in the response variable, either of the following must be provided in the data frame fr:

(1) Individual-level data: There should be columns for response, predictors, and species/tip-labels. Every row should correspond to an individual observation. At least one species must be represented by two or more individuals.

(2) Species-level statistics: There should be columns for mean response, predictors, species/tip-labels, species sample-sizes (number of individuals for each species), and species standard deviations (standard deviations of the response values by species). Every row should correspond to a species: each species should be represented by a unique row. The column names for species sample-sizes and species standard deviations are expected to be "[response column name]_n" and "[response column name]_sd". For example, if the response column name is "y", then the column names should be "y_n" and "y_sd" for the sample-sizes and standard deviations.

Regardless of whether the data provided follows (1) or (2), withinspecies_var should be set to true. If the data provided follows (2), then y_mean_std should be set to false.

Within-species variation in predictors

The model assumes no within-species variation in predictors, because it aims to capture the evolutionary (historical, phylogenetic) relationship between the predictors and the response, not the within-species (present-day, or phenotypic) relationship.

If a within-species variation model is fitted on individual-level data, and if there are individuals within the same species with different values for the same predictor, these values are all replaced by the mean predictor value for all the individuals in that species. For example, suppose there are 3 individuals in a given species, and that their predictor values are (x₁=3, x₂=6), (x₁=4, x₂=8) and (x₁=2, x₂=1). Then the predictor values for these 3 individuals are each replaced by (x₁=(3+4+2)/3, x₂=(6+8+1)/3) before model fitting. If a fourth individual had data (x₁=10, x₂=missing), then that individual would be ignored for any model using x₂, and would not contribute any information to its species data for these models.

Missing data

Rows with missing data for either the response or the predictors are omitted from the model-fitting. There should minimally be columns for response, predictors, species/tip-labels. As detailed above, additional columns may be required for fitting within-species variation. Missing data in the columns for species names, species standard deviation / sample sizes (if used) will throw an error.

See also

rand, ancestralreconstruction, PhyloNetworks.vcv

Examples: Without within-species variation

We first load data from the package and fit the default BM model.

julia> phy = readnewick(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "caudata_tree.txt"));

julia> using DataFrames, CSV # to read data file, next

julia> dat = CSV.read(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "caudata_trait.txt"), DataFrame);

julia> using StatsModels # for stat model formulas

julia> fitBM = phylolm(@formula(trait ~ 1), dat, phy; reml=false);

julia> fitBM # Shows a summary
PhyloNetworkLinearModel

Formula: trait ~ 1

Model: Brownian motion

Parameter Estimates, using ML:
phylogenetic variance rate: 0.00294521

Coefficients:
─────────────────────────────────────────────────────────────────────
             Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────
(Intercept)  4.679    0.330627  14.15    <1e-31    4.02696    5.33104
─────────────────────────────────────────────────────────────────────
Log Likelihood: -78.9611507833
AIC: 161.9223015666

We can extra parameters, likelihood, AIC etc.

julia> round(sigma2_phylo(fitBM), digits=6) # rounding for jldoctest convenience
0.002945

julia> round(mu_phylo(fitBM), digits=4)
4.679

julia> using StatsBase # for aic() stderror() loglikelihood() etc.

julia> round(loglikelihood(fitBM), digits=10)
-78.9611507833

julia> round(aic(fitBM), digits=10)
161.9223015666

julia> round(aicc(fitBM), digits=10)
161.9841572367

julia> round(bic(fitBM), digits=10)
168.4887090241

julia> round.(coef(fitBM), digits=4)
1-element Vector{Float64}:
 4.679

julia> confint(fitBM) # 95% (default) confidence interval for the coefficient(s)
1×2 Matrix{Float64}:
 4.02696  5.33104

julia> abs(round(r2(fitBM), digits=10)) # absolute value for jldoctest convenience
0.0

julia> abs(round(adjr2(fitBM), digits=10))
0.0

julia> round.(vcov(fitBM), digits=6) # variance-covariance of estimated parameters: squared standard error
1×1 Matrix{Float64}:
 0.109314

The residuals are the variance not explained by predictors. The phylogenetic correlation modelled by the BM is about them. The trait may have 2 sources of phylogenetic signal: from the predictor with which it the response may be associated, and from the residuals.

julia> round.(residuals(fitBM), digits=6)
197-element Vector{Float64}:
 -0.237648
 -0.357937
 -0.159387
 -0.691868
 -0.323977
 -0.270452
 -0.673486
 -0.584654
 -0.279882
 -0.302175
  ⋮
 -0.777026
 -0.385121
 -0.443444
 -0.327303
 -0.525953
 -0.673486
 -0.603158
 -0.211712
 -0.439833

julia> round.(response(fitBM), digits=5)
197-element Vector{Float64}:
 4.44135
 4.32106
 4.51961
 3.98713
 4.35502
 4.40855
 4.00551
 4.09434
 4.39912
 4.37682
 ⋮
 3.90197
 4.29388
 4.23555
 4.3517
 4.15305
 4.00551
 4.07584
 4.46729
 4.23917

julia> round.(predict(fitBM), digits=5)
197-element Vector{Float64}:
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 ⋮
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679
 4.679

Examples: With within-species variation (two different input formats shown)

We use a smaller network here. We can input data as 1 row per individual, multiple rows per species:

julia> net = readnewick("((((D:0.4,C:0.4):4.8,((A:0.8,B:0.8):2.2)#H1:2.2::0.7):4.0,(#H1:0::0.3,E:3.0):6.2):2.0,O:11.2);");

julia> df = DataFrame( # individual-level observations
           species = repeat(["D","C","A","B","E","O"],inner=3),
           trait1 = [4.08298,4.08298,4.08298,3.10782,3.10782,3.10782,2.17078,2.17078,2.17078,1.87333,1.87333,
              1.87333,2.8445,2.8445,2.8445,5.88204,5.88204,5.88204],
           trait2 = [-7.34186,-7.34186,-7.34186,-7.45085,-7.45085,-7.45085,-3.32538,-3.32538,-3.32538,-4.26472,
              -4.26472,-4.26472,-5.96857,-5.96857,-5.96857,-1.99388,-1.99388,-1.99388],
           trait3 = [18.8101,18.934,18.9438,17.0687,17.0639,17.0732,14.4818,14.1112,14.2817,13.0842,12.9562,
              12.9019,15.4373,15.4075,15.4317,24.2249,24.1449,24.1302]);

julia> m1 = phylolm(@formula(trait3 ~ trait1), df, net;
                    tipnames=:species, withinspecies_var=true)
PhyloNetworkLinearModel

Formula: trait3 ~ 1 + trait1

Model: Brownian motion

Parameter Estimates, using REML:
phylogenetic variance rate: 0.156188
within-species variance: 0.0086343

Coefficients:
──────────────────────────────────────────────────────────────────────
               Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────
(Intercept)  9.65347    1.3066    7.39    0.0018    6.02577   13.2812
trait1       2.30358    0.276163  8.34    0.0011    1.53683    3.07033
──────────────────────────────────────────────────────────────────────
Log Likelihood: 1.9446255188
AIC: 4.1107489623

Alternatively, we can input the data as 1 row per species and 2 extra columns: standard deviation of the response trait among individuals in the same species, and number of individuals per species for which we calculated this SD. The result is the same.

julia> df_r = DataFrame( # species-level statistics (sample means, standard deviations)
           species = ["D","C","A","B","E","O"],
           trait1 = [4.08298,3.10782,2.17078,1.87333,2.8445,5.88204],
           trait2 = [-7.34186,-7.45085,-3.32538,-4.26472,-5.96857,-1.99388],
           trait3 = [18.896,17.0686,14.2916,12.9808,15.4255,24.1667],
           trait3_sd = [0.074524,0.00465081,0.185497,0.0936,0.0158379,0.0509643],
           trait3_n = [3, 3, 3, 3, 3, 3]);

julia> m2 = phylolm(@formula(trait3 ~ trait1), df_r, net;
                tipnames=:species, withinspecies_var=true, y_mean_std=true)
PhyloNetworkLinearModel

Formula: trait3 ~ 1 + trait1

Model: Brownian motion

Parameter Estimates, using REML:
phylogenetic variance rate: 0.15618
within-species variance: 0.0086343

Coefficients:
──────────────────────────────────────────────────────────────────────
               Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────
(Intercept)  9.65342    1.30657   7.39    0.0018    6.02582   13.281
trait1       2.30359    0.276156  8.34    0.0011    1.53686    3.07032
──────────────────────────────────────────────────────────────────────
Log Likelihood: 1.9447243714
AIC: 4.1105512573
source
PhyloTraits.shiftathybridsMethod
shiftathybrids(value::Vector{T} where T<:Real, net::HybridNetwork; checkpreorder::Bool=true)

Construct an object ShiftNet with shifts on all the edges below hybrid nodes, with values provided. The vector of values must have the same length as the number of hybrids in the network.

source
Random.rand!Method
rand!(rng::AbstractRNG,
      end::AbstractVector{Int},
      model::TraitSubstitutionModel,
      t::Float64,
      start::AbstractVector{Int})

Simulate discrete traits along one edge of length t like rand, but modifying end in place to store the simulated values.

source
PhyloTraits.BinaryTraitSubstitutionModelType
BinaryTraitSubstitutionModel(α, β [, label])

Model for binary traits, that is, with 2 states. Default labels are "0" and "1". α is the rate of transition from "0" to "1", and β from "1" to "0".

source
PhyloTraits.EqualRatesSubstitutionModelType
EqualRatesSubstitutionModel(numberStates, α, labels)

TraitSubstitutionModel for traits with any number of states and equal substitution rates α between all states. Default labels are "1","2",...

example

julia> m1 = EqualRatesSubstitutionModel(2, [1.0], ["low","high"])
Equal Rates Substitution Model with k=2,
all rates equal to α=1.0.
rate matrix Q:
             low    high
     low       *  1.0000
    high  1.0000       *
source
PhyloTraits.HKY85Type
HKY85(rate, pi, relative)

A nucleic acid substitution model based on Hasegawa et al. 1985 substitution model. rate should be a vector of 1 or 2 rates, and pi a vector of 4 probabilities summing to 1.

If relative is false, the 2 rates represent the transition rate and the transversion rate, α and β. If relative is true (default), only the first rate is used and represents the transition/transversion ratio: κ=α/β. The rate transition matrix Q is normalized to have 1 change / unit of time on average, i.e. the absolute version of Q is divided by 2(piT*piC + piA*piG)α + 2(piY*piR)β.

nparams returns 1 or 2. In other words: the stationary distribution is not counted in the number of parameters (and fitdiscrete does not optimize the pi values at the moment).

examples

julia> m1 = HKY85([.5], [0.20, 0.30, 0.30, 0.20])
HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
relative rate version with transition/tranversion ratio kappa = 0.5,
 scaled so that there is one substitution per unit time
rate matrix Q:
               A       C       G       T
       A       *  0.4839  0.2419  0.3226
       C  0.3226       *  0.4839  0.1613
       G  0.1613  0.4839       *  0.3226
       T  0.3226  0.2419  0.4839       *

julia> nstates(m1)
4

julia> m2 = HKY85([0.5, 0.5], [0.20, 0.30, 0.30, 0.20], false)
HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
absolute rate version with transition/transversion ratio kappa = a/b = 1.0
 with rates a = 0.5 and b = 0.5
rate matrix Q:
               A       C       G       T
       A       *  0.1500  0.1500  0.1000
       C  0.1000       *  0.1500  0.1000
       G  0.1000  0.1500       *  0.1000
       T  0.1000  0.1500  0.1500       *
source
PhyloTraits.JC69Type
JC69(rate, relative)

Jukes Cantor (1969) nucleic acid substitution model, which has a single rate parameter. rate corresponds to the absolute diagonal elements, that is, the rate of change (to any of the other 2 states). Individual rates are rate/3. If relative is true (default), the transition matrix Q is normalized to an average of 1 transition per unit of time: in which case rate is set to 1.0.

examples

julia> m1 = JC69([0.25], false)
Jukes and Cantor 69 Substitution Model,
absolute rate version
off-diagonal rates equal to 0.25/3.
rate matrix Q:
               A       C       G       T
       A       *  0.0833  0.0833  0.0833
       C  0.0833       *  0.0833  0.0833
       G  0.0833  0.0833       *  0.0833
       T  0.0833  0.0833  0.0833       *

julia> nstates(m1)
4

julia> nparams(m1)
1

julia> m2 = JC69([0.5])
Jukes and Cantor 69 Substitution Model,
relative rate version
off-diagonal rates equal to 1/3
rate matrix Q:
               A       C       G       T
       A       *  0.3333  0.3333  0.3333
       C  0.3333       *  0.3333  0.3333
       G  0.3333  0.3333       *  0.3333
       T  0.3333  0.3333  0.3333       *

julia> nparams(m2)
0
source
PhyloTraits.ParamsBMType
ParamsBM <: ParamsProcess

Type for a BM process on a network. Fields are mu (expectation), sigma2 (variance), randomRoot (whether the root is random, default to false), and varRoot (if the root is random, the variance of the root, default to NaN).

source
PhyloTraits.ParamsMultiBMType
ParamsMultiBM <: ParamsProcess

Type for a multivariate Brownian diffusion (MBD) process on a network. Fields are mu (expectation), sigma (covariance matrix), randomRoot (whether the root is random, default to false), varRoot (if the root is random, the covariance matrix of the root, default to [NaN]), shift (a ShiftNet type, default to missing), and L (the lower triangular of the cholesky decomposition of sigma, computed automatically)

examples and constructors

julia> ParamsMultiBM([1.0, -0.5], [2.0 0.3; 0.3 1.0]) # no shifts
ParamsMultiBM:
Parameters of a MBD with fixed root:
mu: [1.0, -0.5]
Sigma: [2.0 0.3; 0.3 1.0]

julia> net = readnewick("((A:1,B:1):1,C:2);");

julia> shifts = ShiftNet(net.node[2], [-1.0, 2.0], net);

julia> ParamsMultiBM([1.0, -0.5], [2.0 0.3; 0.3 1.0], shifts) # with shifts
ParamsMultiBM:
Parameters of a MBD with fixed root:
mu: [1.0, -0.5]
Sigma: [2.0 0.3; 0.3 1.0]

There are 2 shifts on the network:
───────────────────────────────────────────
  Edge Number  Shift Value 1  Shift Value 2
───────────────────────────────────────────
          2.0           -1.0            2.0
───────────────────────────────────────────
source
PhyloTraits.PhyloNetworkLinearModelType
PhyloNetworkLinearModel <: GLM.LinPredModel

Phylogenetic linear model representation.

Fields

lm, V, Vy, RL, Y, X, logdetVy, reml, ind, nonmissing, evomodel, model_within and formula. The following syntax pattern can be used to get more information on a specific field: e.g. to find out about the lm field, do ?PhyloNetworkLinearModel.lm.

Methods applied to fitted models

The following StatsAPI / StatsBase functions can be applied: coef, nobs, vcov, stderror, confint, coeftable, dof_residual, dof, deviance, residuals, response, predict, loglikelihood, nulldeviance, nullloglikelihood, r2, adjr2, aic, aicc, bic, ftest, lrtest etc.

The estimated variance-rate and estimated mean of the species-level trait model (see ContinuousTraitEM) can be retrieved using sigma2_phylo and mu_phylo respectively.

If relevant, the estimated individual-level/within-species variance can be retrieved using sigma2_within.

The optimized λ parameter for Pagel's λ model (see PagelLambda) can be retrieved using lambda_estim.

An ancestral state reconstruction can be performed using ancestralreconstruction.

Within-species variation

The true species/population means for the response trait/variable (or the residuals: conditional on the predictors) are jointly modeled as 𝒩(·, σ²ₛV) where V depends on the trait model (see ContinuousTraitEM) and on the species network. σ²ₛ is the between-species variance-rate.

Within-species variation is modeled by assuming that the individual-level responses are iid 𝒩(0, σ²ₑ) about the true species means, so that the species-level sample means (conditional on the predictors) are jointly modeled as 𝒩(·, σ²ₛV + σ²ₑD⁻¹), where σ²ₑ is the within-species variance and D⁻¹ is a diagonal matrix whose entries are the inverse sample-sizes (see WithinSpeciesCTM).

Although the above two models can be expressed in terms of a joint distribution for the species-level sample means (or residuals conditional on the predictors), more data are required to fit a model accounting for within-species variation, that is, a model recognizing that the sample means are estimates of the true population means. To fit a model without within-species variation, data on the species means are sufficient. To fit a model with within-species variation, we need to have the species means and the standard deviations of the response variable for each species.

phylolm can fit a model with within-species variation either from species-level statistics ("mean response" and "standard deviation in response") or from individual-level data (in which case the species-level statistics are computed internally). See phylolm for more details on these two input choices.

In the object, obj.Y and obj.X are the observed species means. predict, residuals and response return the values at the species level.

source
PhyloTraits.RateVariationAcrossSitesType
RateVariationAcrossSites(rvsymbol::Symbol, ncategories::Int=4)

Default model for rate variation across site, specified by a symbol:

  • :noRV for no rate variation
  • :G or :Gamma for gamma-distributed rates
  • :I or :Inv for two categories: invariable and variable
  • :GI or :GI for both.
source
PhyloTraits.RateVariationAcrossSitesMethod
RateVariationAcrossSites(; pinv=0.0, alpha=Inf, ncat=4)

Model for variable substitution rates across sites (or across traits) using the discrete Gamma model (+G, Yang 1994, Journal of Molecular Evolution) or the invariable-sites model (+I, Hasegawa, Kishino & Yano 1985 J Mol Evol). Both types of rate variation can be combined (+G+I, Gu, Fu & Li 1995, Mol Biol Evol) but this is discouraged (Jia, Lo & Ho 2014 PLOS One). Using rate variation increases the number of parameters by one (+G or +I) or by two (+G+I).

Because the mean of the desired distribution or rates is 1, we use a Gamma distribution with shape α and scale θ=1/α (rate β=α) if no invariable sites, or scale θ=1/(α(1-pinv)), that is rate β=α(1-pinv) with a proportion pinv of invariable sites. The shape parameter is referred to as alpha here. The Gamma distribution is discretized into ncat categories. In each category, the category's rate multiplier is a normalized quantile of the gamma distribution.

julia> rv = RateVariationAcrossSites()
Rate variation across sites: discretized Gamma
categories for Gamma discretization: 1
rates: [1.0]

julia> nparams(rv)
0

julia> typeof(rv)
PhyloTraits.RVASGamma{1}

julia> rv = RateVariationAcrossSites(alpha=1.0, ncat=4)
Rate variation across sites: discretized Gamma
alpha: 1.0
categories for Gamma discretization: 4
rates: [0.146, 0.513, 1.071, 2.27]

julia> typeof(rv)
PhyloTraits.RVASGamma{4}

julia> PhyloTraits.setalpha!(rv, 2.0)
Rate variation across sites: discretized Gamma
alpha: 2.0
categories for Gamma discretization: 4
rates: [0.319, 0.683, 1.109, 1.889]

julia> nparams(rv)
1

julia> rv = RateVariationAcrossSites(pinv=0.3)
Rate variation across sites: +I (invariable sites)
pinv: 0.3
rates: [0.0, 1.429]

julia> nparams(rv)
1

julia> typeof(rv)
PhyloTraits.RVASInv

julia> PhyloTraits.setpinv!(rv, 0.05)
Rate variation across sites: +I (invariable sites)
pinv: 0.05
rates: [0.0, 1.053]

julia> rv = RateVariationAcrossSites(pinv=0.3, alpha=2.0, ncat=4)
Rate variation across sites: discretized Gamma+I
pinv: 0.3
alpha: 2.0
categories for Gamma discretization: 4
rates: [0.0, 0.456, 0.976, 1.584, 2.698]
probabilities: [0.3, 0.175, 0.175, 0.175, 0.175]

julia> nparams(rv)
2

julia> typeof(rv)
PhyloTraits.RVASGammaInv{5}

julia> PhyloTraits.setalpha!(rv, 3.0)
Rate variation across sites: discretized Gamma+I
pinv: 0.3
alpha: 3.0
categories for Gamma discretization: 4
rates: [0.0, 0.6, 1.077, 1.584, 2.454]
probabilities: [0.3, 0.175, 0.175, 0.175, 0.175]

julia> PhyloTraits.setpinv!(rv, 0.05)
Rate variation across sites: discretized Gamma+I
pinv: 0.05
alpha: 3.0
categories for Gamma discretization: 4
rates: [0.0, 0.442, 0.793, 1.167, 1.808]
probabilities: [0.05, 0.238, 0.238, 0.238, 0.238]

julia> PhyloTraits.setpinvalpha!(rv, 0.1, 5.0)
Rate variation across sites: discretized Gamma+I
pinv: 0.1
alpha: 5.0
categories for Gamma discretization: 4
rates: [0.0, 0.593, 0.91, 1.221, 1.721]
probabilities: [0.1, 0.225, 0.225, 0.225, 0.225]
source
PhyloTraits.ReconstructedStatesType
ReconstructedStates

Type containing the inferred information about the law of the ancestral states given the observed tips values. The missing tips are considered as ancestral states.

Reconstructed states and prediction intervals can be recovered with function predict, and the standard error can be obtained with stderror.

The ReconstructedStates object has fields: traits_nodes, variances_nodes, nodenumbers, traits_tips, tipnumbers, model. Type in "?ReconstructedStates.field" to get help on a specific field.

source
PhyloTraits.ShiftNetType
ShiftNet

Shifts mapped to tree nodes and their (unique) parent edge on a PhyloNetworks.HybridNetwork sorted in topological order. Its shift field is a vector of shift values, one for each node, corresponding to the shift on the parent edge of the node (which makes sense for tree nodes only: they have a single parent edge).

Two ShiftNet objects on the same network can be concatened with *.

ShiftNet(node::Vector{Node}, value::AbstractVector, net::HybridNetwork; checkpreorder::Bool=true)

Constructor from a vector of nodes and associated values. The shifts are located on the edges above the nodes provided. Warning, shifts on hybrid edges are not allowed.

ShiftNet(edge::Vector{Edge}, value::AbstractVector, net::HybridNetwork; checkpreorder::Bool=true)

Constructor from a vector of edges and associated values. Warning, shifts on hybrid edges are not allowed.

Extractors: getshiftedgenumber, getshiftvalue

source
PhyloTraits.TwoBinaryTraitSubstitutionModelType
TwoBinaryTraitSubstitutionModel(rate [, label])

TraitSubstitutionModel for two binary traits, possibly correlated. Default labels are "x0", "x1" for trait 1, and "y0", "y1" for trait 2. If provided, label should be a vector of size 4, listing labels for trait 1 first then labels for trait 2. rate should be a vector of substitution rates of size 8. rate[1],...,rate[4] describe rates of changes in trait 1. rate[5],...,rate[8] describe rates of changes in trait 2. In the transition matrix, trait combinations are listed in the following order: x0-y0, x0-y1, x1-y0, x1-y1.

example

model = TwoBinaryTraitSubstitutionModel([2.0,1.2,1.1,2.2,1.0,3.1,2.0,1.1],
        ["carnivory", "noncarnivory", "wet", "dry"]);
model
using PhyloPlots
plot(model) # to visualize states and rates. not supported with PhyloPlots v2.0.0. requires PhyloPlots v1.0.0
source

index