public documentation
Documentation for PhyloTraits
's public (exported) functions. Most functions are internal (not exported).
functions & types
Base.rand
— Functionrand([rng::AbstractRNG,]
net::HybridNetwork,
params::ParamsProcess,
checkpreorder::Bool=true)
Simulate traits on net
using the parameters params
. For now, only parameters of type ParamsBM
(univariate Brownian Motion) and ParamsMultiBM
(multivariate Brownian motion) are accepted.
The simulation using a recursion from the root to the tips of the network, therefore, a pre-ordering of nodes is needed. If checkpreorder=true
(default), PhyloNetworks.preorder!
is called on the network beforehand. Otherwise, it is assumed that the preordering has already been calculated.
Returns an object of type TraitSimulation
, which has a matrix with the trait expectations and simulated trait values at all the nodes.
See examples below for accessing expectations and simulated trait values.
Examples
Univariate
julia> phy = readnewick("(A:2.5,((U:1,#H1:0.5::0.4):1,(C:1,(D:0.5)#H1:0.5::0.6):1):0.5);");
julia> par = ParamsBM(1, 0.1) # BM with expectation 1 and variance 0.1.
ParamsBM:
Parameters of a BM with fixed root:
mu: 1.0
Sigma2: 0.1
julia> sim = rand(phy, par) # simulate along the network
TraitSimulation:
Trait simulation results on a network with 4 tips, using a BM model, with parameters:
mu: 1.0
Sigma2: 0.1
Below, we re-run the same simulation but with our own fixed random number generator for reproducibility.
julia> # using Pkg; Pkg.add("StableRNGs") # to install StableRNGs if not done earlier
julia> using StableRNGs; rng = StableRNG(791); # for reproducibility
julia> sim = rand(rng, phy, par); # re-simulate
julia> traits = sim[:tips] # extract simulated values at the tips.
4-element Vector{Float64}:
0.5991561486238962
0.861066346792992
1.367634062992289
1.6439310845929571
julia> tiplabels(sim) # name of tips, in the same order as values above
4-element Vector{String}:
"A"
"U"
"C"
"D"
So, for example, the simulated trait value for taxon U (listed second) is ~0.86. For some purposes, we might want to access the values simulated at internal nodes, or at all nodes at once:
julia> traits = sim[:internalnodes] # extract simulated values at internal nodes. Order: as in sim.M.internalnodenumbers
5-element Vector{Float64}:
1.0716684901027937
1.6007823049044083
1.6756374575490327
1.2194286026283034
1.0
julia> traits = sim[:all] # simulated values at all nodes, ordered as in sim.M.nodenumbers_toporder
9-element Vector{Float64}:
1.0
1.2194286026283034
1.6756374575490327
1.367634062992289
1.0716684901027937
1.6007823049044083
1.6439310845929571
0.861066346792992
0.5991561486238962
We might also want to extract the expected mean values (without noise). This is not very interesting under a standard BM model, but may become interesting under more complex models (e.g. with shifts). We can do so with an extra :exp
index:
julia> traits = sim[:tips, :exp] # expected values at the tips (also works for sim[:all, :exp] and sim[:internalnodes, :exp]).
4-element Vector{Float64}:
1.0
1.0
1.0
1.0
Multivariate
julia> phy = readnewick("(A:2.5,((B:1,#H1:0.5::0.4):1,(C:1,(V:0.5)#H1:0.5::0.6):1):0.5);");
julia> par = ParamsMultiBM([1.0, 2.0], [1.0 0.5; 0.5 1.0]) # BM with expectation [1.0, 2.0] and variance [1.0 0.5; 0.5 1.0].
ParamsMultiBM:
Parameters of a MBD with fixed root:
mu: [1.0, 2.0]
Sigma: [1.0 0.5; 0.5 1.0]
julia> using StableRNGs; rng = StableRNG(9851); # for reproducibility
julia> sim = rand(rng, phy, par) # simulate on the phylogeny
TraitSimulation:
Trait simulation results on a network with 4 tips, using a MBD model, with parameters:
mu: [1.0, 2.0]
Sigma: [1.0 0.5; 0.5 1.0]
julia> traits = sim[:tips] # extract simulated values at the tips (each column contains the simulated traits for one node).
2×4 Matrix{Float64}:
3.8013 0.839485 0.346092 1.91131
5.91725 -0.59143 0.458569 0.629048
The 2 rows to the 2 correlated traits and the columns correspond to the 4 taxa (tips). The order in which taxa are listed is obtained with tiplabels
:
julia> tiplabels(sim)
4-element Vector{String}:
"A"
"B"
"C"
"V"
As in the univariate case, we can also extract the values simulated at internal nodes, or all nodes (but listed in preorder), or expected mean values.
julia> sim[:internalnodes] # simulated values at internal nodes. order: same as in sim.M.internalnodenumbers
2×5 Matrix{Float64}:
-0.604224 0.755722 2.14755 0.484292 1.0
-0.338922 0.373921 1.15174 0.695561 2.0
julia> traits = sim[:all]; # 2×9 Matrix: values at all nodes, ordered as in sim.M.nodenumbers_toporder
julia> sim[:tips, :exp] # expected values (also works for sim[:all, :exp] and sim[:internalnodes, :exp])
2×4 Matrix{Float64}:
1.0 1.0 1.0 1.0
2.0 2.0 2.0 2.0
Base.rand
— Methodrand([rng::AbstractRNG,]
model::TraitSubstitutionModel,
t::Float64,
start::AbstractVector{Int})
Simulate discrete traits along one edge of length t. A random number generator rng
is optional. start
must be a vector of integers, each representing the starting value of one trait.
examples
julia> m1 = BinaryTraitSubstitutionModel(1.0, 2.0)
Binary Trait Substitution Model:
rate 0→1 α=1.0
rate 1→0 β=2.0
julia> using StableRNGs; rng = StableRNG(135);
julia> rand(rng, m1, 0.2, [1,2,1,2,1])
5-element Vector{Int64}:
2
2
1
2
1
Base.rand
— Methodrand([rng::AbstractRNG,]
model::TraitSubstitutionModel,
net::HybridNetwork;
ntraits=1,
keepinternal=true,
checkpreorder=true)
Simulate evolution of discrete traits on a rooted evolutionary network based on the supplied evolutionary model. Trait sampling is uniform at the root.
optional arguments:
ntraits
: number of traits to be simulated (default: 1 trait).keepinternal
: if true, export character states at all nodes, including internal nodes. if false, export character states at tips only.
output:
- matrix of character states with one row per trait, one column per node; these states are indices in
model.label
, not the trait labels themselves. - vector of node labels (for tips) or node numbers (for internal nodes) in the same order as columns in the character state matrix
examples
julia> m1 = BinaryTraitSubstitutionModel(1.0, 2.0, ["low","high"]);
julia> net = readnewick("(((A:4.0,(B:1.0)#H1:1.1::0.9):0.5,(C:0.6,#H1:1.0::0.1):1.0):3.0,D:5.0);");
julia> using Random; Random.seed!(95);
julia> trait, lab = rand(m1, net)
([1 2 … 1 1], ["-2", "D", "-3", "-6", "C", "-4", "H1", "B", "A"])
julia> trait
1×9 Matrix{Int64}:
1 2 1 1 2 2 1 1 1
julia> lab
9-element Vector{String}:
"-2"
"D"
"-3"
"-6"
"C"
"-4"
"H1"
"B"
"A"
PhyloTraits.Q
— MethodFor a BinaryTraitSubstitutionModel, the rate matrix Q is of the form:
-α α
β -β
PhyloTraits.Q
— MethodQ(model)
Substitution rate matrix for a given substitution model: Q[i,j] is the rate of transitioning from state i to state j.
PhyloTraits.ancestralreconstruction
— Functionancestralreconstruction(obj::SSM, trait::Integer = 1)
Estimate the marginal probability of ancestral states for discrete character number trait
(first trait by default). The parameters of the StatisticalSubstitutionModel
object obj
must first be fitted using fitdiscrete
, and ancestral state reconstruction is conditional on the estimated parameters. If these parameters were estimated using all traits, they are used as is, to do ancestral state reconstruction of the particular trait
of interest.
output: data frame with a first column for the node numbers, a second column for the node labels, and a column for each possible state: the entries in these columns give the marginal probability that a given node has a given state.
warnings:
- node numbers and node labels refer to those in
obj.net
, which might have a different internal representation of nodes than the original network used to buildobj
. obj
is modified: its likelihood fields (forward, directional & backward) are updated to make sure that they correspond to the current parameter values inobj.model
, and to thetrait
of interest.
limitations: the following are not checked.
- Assumes that every node in the large network is also present (with descendant leaves) in each displayed tree. This is not true if the network is not tree-child...
- Assumes that the root is also in each displayed tree, which may not be the case if the root had a hybrid child edge.
See also posterior_logtreeweight
and discrete_backwardlikelihood_trait!
to update obj.backwardlik
.
examples
julia> net = readnewick("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]);
julia> using DataFrames
julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);
julia> fit1 = fitdiscrete(net, m1, dat);
julia> asr = ancestralreconstruction(fit1)
9×4 DataFrame
Row │ nodenumber nodelabel lo hi
│ Int64 String Float64 Float64
─────┼───────────────────────────────────────────
1 │ 1 A 1.0 0.0
2 │ 2 B 1.0 0.0
3 │ 3 C 0.0 1.0
4 │ 4 D 0.0 1.0
5 │ 5 5 0.286021 0.713979
6 │ 6 6 0.319456 0.680544
7 │ 7 7 0.16855 0.83145
8 │ 8 8 0.767359 0.232641
9 │ 9 H1 0.782776 0.217224
julia> using PhyloPlots
julia> plot(fit1.net, nodelabel = asr[!,[:nodenumber, :lo]], tipoffset=0.2); # pp for "lo" state
PhyloTraits.ancestralreconstruction
— Methodancestralreconstruction(fr::AbstractDataFrame, net::HybridNetwork; kwargs...)
Estimate the ancestral traits on a network, given some data at the tips. Uses function phylolm
to perform a phylogenetic regression of the data against an intercept (amounts to fitting an evolutionary model on the network).
See documentation on phylolm
and ancestralreconstruction(obj::PhyloNetworkLinearModel[, X_n::Matrix])
for further details.
Returns an object of type ReconstructedStates
.
PhyloTraits.ancestralreconstruction
— Methodancestralreconstruction(obj::PhyloNetworkLinearModel[, X_n::Matrix])
Function to find the ancestral traits reconstruction on a network, given an object fitted by function phylolm
. By default, the function assumes that the regressor is just an intercept. If the value of the regressor for all the ancestral states is known, it can be entered in X_n, a matrix with as many columns as the number of predictors used, and as many lines as the number of unknown nodes or tips.
Returns an object of type ReconstructedStates
.
Examples
julia> using DataFrames, CSV # to read data file
julia> phy = readnewick(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "carnivores_tree.txt"));
julia> dat = CSV.read(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "carnivores_trait.txt"), DataFrame);
julia> using StatsModels # for statistical model formulas
julia> fitBM = phylolm(@formula(trait ~ 1), dat, phy);
julia> ancStates = ancestralreconstruction(fitBM) # Should produce a warning, as variance is unknown.
┌ Warning: These prediction intervals show uncertainty in ancestral values,
│ assuming that the estimated variance rate of evolution is correct.
│ Additional uncertainty in the estimation of this variance rate is
│ ignored, so prediction intervals should be larger.
└ @ PhyloTraits ~/build/JuliaPhylo/PhyloTraits.jl/src/traits_continuous.jl:2601
ReconstructedStates:
───────────────────────────────────────────────
Node index Pred. Min. Max. (95%)
───────────────────────────────────────────────
-5.0 1.32139 -0.33824 2.98102
-8.0 1.03258 -0.589695 2.65485
-7.0 1.41575 -0.140705 2.97221
-6.0 1.39417 -0.107433 2.89577
-4.0 1.39961 -0.102501 2.90171
-3.0 1.51341 -0.220523 3.24733
-13.0 5.3192 3.92279 6.71561
-12.0 4.51176 2.89222 6.13131
-16.0 1.50947 -0.0186118 3.03755
-15.0 1.67425 0.196069 3.15242
-14.0 1.80309 0.309992 3.29618
-11.0 2.7351 1.17608 4.29412
-10.0 2.73217 1.12361 4.34073
-9.0 2.41132 0.603932 4.21871
-2.0 2.04138 -0.0340955 4.11686
14.0 1.64289 1.64289 1.64289
8.0 1.67724 1.67724 1.67724
5.0 0.331568 0.331568 0.331568
2.0 2.27395 2.27395 2.27395
4.0 0.275237 0.275237 0.275237
6.0 3.39094 3.39094 3.39094
13.0 0.355799 0.355799 0.355799
15.0 0.542565 0.542565 0.542565
7.0 0.773436 0.773436 0.773436
10.0 6.94985 6.94985 6.94985
11.0 4.78323 4.78323 4.78323
12.0 5.33016 5.33016 5.33016
1.0 -0.122604 -0.122604 -0.122604
16.0 0.73989 0.73989 0.73989
9.0 4.84236 4.84236 4.84236
3.0 1.0695 1.0695 1.0695
───────────────────────────────────────────────
julia> using StatsBase # for predict function
julia> predict(ancStates)
31×2 DataFrame
Row │ nodenumber prediction
│ Int64 Float64
─────┼────────────────────────
1 │ -5 1.32139
2 │ -8 1.03258
3 │ -7 1.41575
4 │ -6 1.39417
5 │ -4 1.39961
6 │ -3 1.51341
7 │ -13 5.3192
8 │ -12 4.51176
⋮ │ ⋮ ⋮
25 │ 10 6.94985
26 │ 11 4.78323
27 │ 12 5.33016
28 │ 1 -0.122604
29 │ 16 0.73989
30 │ 9 4.84236
31 │ 3 1.0695
16 rows omitted
julia> predict(ancStates, interval = :prediction)
31×4 DataFrame
Row │ nodenumber prediction lower upper
│ Int64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────
1 │ -5 1.32139 -0.33824 2.98102
2 │ -8 1.03258 -0.589695 2.65485
3 │ -7 1.41575 -0.140705 2.97221
4 │ -6 1.39417 -0.107433 2.89577
5 │ -4 1.39961 -0.102501 2.90171
6 │ -3 1.51341 -0.220523 3.24733
7 │ -13 5.3192 3.92279 6.71561
8 │ -12 4.51176 2.89222 6.13131
⋮ │ ⋮ ⋮ ⋮ ⋮
25 │ 10 6.94985 6.94985 6.94985
26 │ 11 4.78323 4.78323 4.78323
27 │ 12 5.33016 5.33016 5.33016
28 │ 1 -0.122604 -0.122604 -0.122604
29 │ 16 0.73989 0.73989 0.73989
30 │ 9 4.84236 4.84236 4.84236
31 │ 3 1.0695 1.0695 1.0695
16 rows omitted
julia> using PhyloPlots # next: plot ancestral states on the tree
julia> plot(phy, nodelabel = predict(ancStates));
julia> pred = predict(ancStates, interval = :prediction, text = true);
julia> plot(phy, nodelabel = pred[!,[:nodenumber,:interval]]);
julia> allowmissing!(dat, :trait);
julia> dat[[2, 5], :trait] .= missing; # missing values allowed to fit model
julia> fitBM = phylolm(@formula(trait ~ 1), dat, phy);
julia> ancStates = ancestralreconstruction(fitBM);
┌ Warning: These prediction intervals show uncertainty in ancestral values,
│ assuming that the estimated variance rate of evolution is correct.
│ Additional uncertainty in the estimation of this variance rate is
│ ignored, so prediction intervals should be larger.
└ @ PhyloTraits ~/build/JuliaPhylo/PhyloTraits.jl/src/traits_continuous.jl:2601
julia> first(predict(ancStates), 3) # looking at first 3 nodes only
3×2 DataFrame
Row │ nodenumber prediction
│ Int64 Float64
─────┼────────────────────────
1 │ -5 1.42724
2 │ -8 1.35185
3 │ -7 1.61993
julia> first(predict(ancStates, interval=:prediction), 3)
3×4 DataFrame
Row │ nodenumber prediction lower upper
│ Int64 Float64 Float64 Float64
─────┼────────────────────────────────────────────
1 │ -5 1.42724 -0.373749 3.22824
2 │ -8 1.35185 -0.698432 3.40214
3 │ -7 1.61993 -0.17179 3.41165
julia> plot(phy, nodelabel = predict(ancStates, text=true));
julia> pred = predict(ancStates, interval = :prediction, text = true);
julia> plot(phy, nodelabel = pred[!,[:nodenumber,:interval]]);
PhyloTraits.ancestralreconstruction
— Methodancestralreconstruction(net::HybridNetwork, Y::Vector, params::ParamsBM)
Compute the conditional expectations and variances of the ancestral (un-observed) traits values at the internal nodes of the phylogenetic network (net
), given the values of the traits at the tips of the network (Y
) and some known parameters of the process used for trait evolution (params
, only BM with fixed root works for now).
This function assumes that the parameters of the process are known. For a more general function, see ancestralreconstruction(obj::PhyloNetworkLinearModel[, X_n::Matrix])
.
PhyloTraits.descendencedataframe
— Methoddescendencedataframe(net::HybridNetwork, which=:allhybrids; checkpreorder=true)
descendencedataframe(net::HybridNetwork, node::Vector{Node}; checkpreorder=true)
descendencedataframe(net::HybridNetwork, edge::Vector{Edge}; checkpreorder=true)
Data frame containing the genomic proportion inherited by each taxon in net
, from each hybrid node by default in the first method, or from each node or edge in the input vector in the second and third methods. The data frame has 1 row per tip (taxon) in the network and the following columns:
- 1 column per edge or node, with columns named according to the pattern shift{edgenumber}" where
edge_number
is the number of the edge associated with an input edge or node (in which case the parent edge is used) - 1 additional column labeled
tipnames
, containing the tip labels.
The shift_*
columns in this data frame can be used as regressor vectors associated with shifts on input edge
s or on edges that are above input node
s. With option which=:allhybrids
in last method, each shift column is associated with a hybrid node in net
, to model a shift on the edge that is immediately below the hybrid node. This can be used to test for transgressive evolution: see examples below.
These methods use PhyloNetworks.descendencematrix
, so net
might be modified to store a vector of its nodes sorted in a pre-order.
Examples
julia> net = readnewick("(A:2.5,((B:1,#H1:0.5::0.4):1,(C:1,(D:0.5)#H1:0.5::0.6):1):0.5);");
julia> preorder!(net)
julia> using PhyloPlots
julia> plot(net, shownodenumber=true); # to locate nodes
julia> nodes_shifts = indexin([1,-5], [n.number for n in net.node]) # Put a shift on edges ending at nodes 1 and -5
2-element Vector{Union{Nothing, Int64}}:
1
7
julia> params = ParamsBM(10, 0.1, ShiftNet(net.node[nodes_shifts], [3.0, -3.0], net))
ParamsBM:
Parameters of a BM with fixed root:
mu: 10.0
Sigma2: 0.1
There are 2 shifts on the network:
──────────────────────────
Edge Number Shift Value
──────────────────────────
8.0 -3.0
1.0 3.0
──────────────────────────
julia> using Random; Random.seed!(2468); # sets the seed for reproducibility
julia> sim = rand(net, params); # simulate a dataset with shifts
julia> using DataFrames # to handle data frames
julia> dat = DataFrame(trait = sim[:tips], tipnames = sim.M.tipnames);
julia> dat = DataFrame(trait = [13.391976856737717, 9.55741491696386, 7.17703734817448, 7.889062527849697],
tipnames = ["A","B","C","D"]) # hard-coded, to be independent of random number generator
4×2 DataFrame
Row │ trait tipnames
│ Float64 String
─────┼────────────────────
1 │ 13.392 A
2 │ 9.55741 B
3 │ 7.17704 C
4 │ 7.88906 D
julia> dfr_shift = descendencedataframe(net, net.node[nodes_shifts]) # the regressors matching the shifts.
4×3 DataFrame
Row │ shift_1 shift_8 tipnames
│ Float64 Float64 String
─────┼────────────────────────────
1 │ 1.0 0.0 A
2 │ 0.0 0.0 B
3 │ 0.0 1.0 C
4 │ 0.0 0.6 D
julia> dfr = innerjoin(dat, dfr_shift, on=:tipnames); # join data and regressors in a single dataframe
julia> using StatsModels # for statistical model formulas
julia> fitBM = phylolm(@formula(trait ~ shift_1 + shift_8), dfr, net; reml=false) # actual fit
PhyloNetworkLinearModel
Formula: trait ~ 1 + shift_1 + shift_8
Model: Brownian motion
Parameter Estimates, using ML:
phylogenetic variance rate: 0.0112618
Coefficients:
────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept) 9.48238 0.327089 28.99 0.0220 5.32632 13.6384
shift_1 3.9096 0.46862 8.34 0.0759 -2.04479 9.86399
shift_8 -2.4179 0.422825 -5.72 0.1102 -7.7904 2.95461
────────────────────────────────────────────────────────────────────────
Log Likelihood: 1.8937302027
AIC: 4.2125395947
Next we illustrate the model with heterosis, aka transgressive evolution: with a shift in the trait after successful hybridization. First how to simulated according to this model:
julia> nodes_hybrids = indexin([5], [n.number for n in net.node]); # Put a shift on edges below hybrids
julia> params = ParamsBM(10, 0.1, ShiftNet(net.node[nodes_hybrids], [3.0], net));
julia> using Random; Random.seed!(2468); # sets the seed for reproducibility
julia> sim = rand(net, params); # simulate a dataset with shifts
julia> dat = DataFrame(trait = sim[:tips], tipnames = sim.M.tipnames);
and next how to analyze data under a transgressive evolution model. Below we hard-code data values for more reproducibility.
julia> dat = DataFrame(trait = [10.391976856737717, 9.55741491696386, 10.17703734817448, 12.689062527849698],
tipnames = ["A","B","C","D"])
4×2 DataFrame
Row │ trait tipnames
│ Float64 String
─────┼────────────────────
1 │ 10.392 A
2 │ 9.55741 B
3 │ 10.177 C
4 │ 12.6891 D
julia> dfr_hybrid = descendencedataframe(net) # the regressors matching the hybrids (all of them)
4×3 DataFrame
Row │ shift_6 tipnames sum
│ Float64 String Float64
─────┼────────────────────────────
1 │ 0.0 A 0.0
2 │ 0.0 B 0.0
3 │ 0.0 C 0.0
4 │ 1.0 D 1.0
julia> dfr = innerjoin(dat, dfr_hybrid, on=:tipnames); # join data and regressors in a single dataframe
julia> using StatsModels
julia> fitBM = phylolm(@formula(trait ~ shift_6), dfr, net; reml=false) # actual fit
PhyloNetworkLinearModel
Formula: trait ~ 1 + shift_6
Model: Brownian motion
Parameter Estimates, using ML:
phylogenetic variance rate: 0.041206
Coefficients:
────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept) 10.064 0.277959 36.21 0.0008 8.86805 11.26
shift_6 2.72526 0.315456 8.64 0.0131 1.36796 4.08256
────────────────────────────────────────────────────────────────────────
Log Likelihood: -0.7006021946
AIC: 7.4012043891
See also
PhyloTraits.empiricalDNAfrequencies
— FunctionempiricalDNAfrequencies(DNAdata::AbstractDataFrame, DNAweights,
correction=true, useambiguous=true)
Estimate base frequencies in DNA data DNAdata
, ordered ACGT.
DNAdata
: data frame. All columns are used. If the first column gives species names, find a way to ignore it before calculating empirical frequencies, e.g.empiricalDNAfrequencies(view(DNAdata, :, 2:size(DNAdata, 2)))
. Data type must beBioSymbols.DNA
orChar
orString
. WARNING: this is checked on the first column only.DNAweights
: vector of weights, to weigh each column inDNAdata
.correction
: iftrue
, add 1 to each count and 4 to the denominator for a more stable estimator, similar to Bayes prior of 1/4 and the Agresti-Coull interval in binomial estimation.useambiguous
: iftrue
, ambiguous bases are used (except gaps and Ns). For example,Y
adds 0.5 weight toC
and 0.5 weight toT
.
PhyloTraits.fitdiscrete
— Methodfitdiscrete(net, model, tipdata)
fitdiscrete(net, model, RateVariationAcrossSites, tipdata)
fitdiscrete(net, model, species, traits)
fitdiscrete(net, model, RateVariationAcrossSites, species, traits)
fitdiscrete(net, model, dnadata, dnapatternweights)
fitdiscrete(net, model, RateVariationAcrossSites, dnadata, dnapatternweights)
fitdiscrete(net, modSymbol, species, traits)
fitdiscrete(net, modSymbol, dnadata, dnapatternweights)
Calculate the maximum likelihood (ML) score of a network or tree given one or more discrete characters at the tips. Along each edge, transitions are modelled with a continous time Markov model
, whose parameters are estimated (by maximizing the likelihood). At each hybrid node, the trait is assumed to be inherited from either of the two immediate parents according to the parents' average genetic contributions (inheritance γ). The model ignores incomplete lineage sorting. The algorithm extracts all trees displayed in the network.
Data can given in one of the following:
tipdata
: dictionary taxon => state label, for a single trait.tipdata
: data frame for a single trait, in which case the taxon names are to appear in column 1 or in a column named "taxon" or "species", and trait labels are to appear in column 2 or in a column named "trait". Here, trait labels should be as they appear ingetlabels(model)
.species
: vector of strings, andtraits
: DataFrame of traits, with rows in the order corresponding to the order of species names. Again, trait labels should be as they appear ingetlabels(model)
. All traits are assumed to follow the same model, with same parameters.dnadata
: the first part of the output of readfastatodna, a dataframe of BioSequence DNA sequences, with taxon in column 1 and a column for each site.dnapatternweights
: the second part of the output of readfastatodna, an array of weights, one weights for each of the site columns. The length of the weight is equal to nsites. If using dnapatternweights, must provide dnadata.- RateVariationAcrossSites: model for rate variation (optional)
Optional arguments (default):
optimizeQ
(true): should model rate parameters be fixed, or should they be optimized?optimizeRVAS
(true): should the model optimize the parameters for the variability of rates across sites (α and/or p_invariable)?NLoptMethod
(:LN_COBYLA
, derivative-free) for the optimization algorithm. For other options, see the NLopt.- tolerance values to control when the optimization is stopped:
ftolRel
(1e-12),ftolAbs
(1e-10) on the likelihood, andxtolRel
(1e-10),xtolAbs
(1e-10) on the model parameters. - bounds for the alpha parameter of the Gamma distribution of rates across sites:
alphamin=0.05
,alphamax=50
. verbose
(false): if true, more information is output.suppresswarnings
(false): if true, warnings fromcheck_matchtaxonnames!
are suppressed.
examples:
julia> net = readnewick("(((A:2.0,(B:1.0)#H1:0.1::0.9):1.5,(C:0.6,#H1:1.0::0.1):1.0):0.5,D:2.0);");
julia> m1 = BinaryTraitSubstitutionModel([0.1, 0.1], ["lo", "hi"]);
julia> using DataFrames
julia> dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]);
julia> fit1 = fitdiscrete(net, m1, dat)
PhyloTraits.StatisticalSubstitutionModel:
Binary Trait Substitution Model:
rate lo→hi α=0.27222
rate hi→lo β=0.34981
on a network with 1 reticulations
data:
4 species
1 trait
log-likelihood: -2.7277
julia> tips = Dict("A" => "lo", "B" => "lo", "C" => "hi", "D" => "hi");
julia> fit2 = fitdiscrete(net, m1, tips; xtolRel=1e-16, xtolAbs=1e-16, ftolRel=1e-16)
PhyloTraits.StatisticalSubstitutionModel:
Binary Trait Substitution Model:
rate lo→hi α=0.27222
rate hi→lo β=0.34981
on a network with 1 reticulations
data:
4 species
1 trait
log-likelihood: -2.7277
Note that a copy of the network is stored in the fitted object, but the internal representation of the network may be different in fit1.net
and in the original network net
:
julia> net = readnewick("(sp1:3.0,(sp2:2.0,(sp3:1.0,sp4:1.0):1.0):1.0);");
julia> using BioSymbols
julia> tips = Dict("sp1" => BioSymbols.DNA_A, "sp2" => BioSymbols.DNA_A, "sp3" => BioSymbols.DNA_G, "sp4" => BioSymbols.DNA_G);
julia> mJC69 = JC69([0.25], false);
julia> fitJC69 = fitdiscrete(net, mJC69, tips)
PhyloTraits.StatisticalSubstitutionModel:
Jukes and Cantor 69 Substitution Model,
absolute rate version
off-diagonal rates equal to 0.29233/3.
rate matrix Q:
A C G T
A * 0.0974 0.0974 0.0974
C 0.0974 * 0.0974 0.0974
G 0.0974 0.0974 * 0.0974
T 0.0974 0.0974 0.0974 *
on a network with 0 reticulations
data:
4 species
1 trait
log-likelihood: -4.99274
julia> rv = RateVariationAcrossSites(alpha=1.0, ncat=4)
Rate variation across sites: discretized Gamma
alpha: 1.0
categories for Gamma discretization: 4
rates: [0.146, 0.513, 1.071, 2.27]
julia> fitdiscrete(net, mJC69, rv, tips; optimizeQ=false, optimizeRVAS=false)
PhyloTraits.StatisticalSubstitutionModel:
Jukes and Cantor 69 Substitution Model,
absolute rate version
off-diagonal rates equal to 0.25/3.
rate matrix Q:
A C G T
A * 0.0833 0.0833 0.0833
C 0.0833 * 0.0833 0.0833
G 0.0833 0.0833 * 0.0833
T 0.0833 0.0833 0.0833 *
Rate variation across sites: discretized Gamma
alpha: 1.0
categories for Gamma discretization: 4
rates: [0.146, 0.513, 1.071, 2.27]
on a network with 0 reticulations
data:
4 species
1 trait
log-likelihood: -5.2568
fixit: add option to allow users to specify root prior, using either equal frequencies or stationary frequencies for trait models.
PhyloTraits.getlabels
— Methodfor a given NucleicAcidSubstitutionModel
, labels are symbols from BioSymbols. For now, only ACGTs are allowed. (When fitting data, any ambiguity code in the data would be treated as missing value).
examples
julia> getlabels(JC69([0.03], false))
4-element Vector{BioSymbols.DNA}:
DNA_A
DNA_C
DNA_G
DNA_T
julia> getlabels(HKY85([.5], repeat([0.25], 4)))
4-element Vector{BioSymbols.DNA}:
DNA_A
DNA_C
DNA_G
DNA_T
PhyloTraits.getlabels
— Methodgetlabels(model)
State labels of a substitution model.
PhyloTraits.getshiftedgenumber
— Methodgetshiftedgenumber(shift::ShiftNet)
Get the edge numbers where the shifts are located, for an object ShiftNet
. If a shift is placed at the root node with no parent edge, the edge number of a shift is set to -1 (as if missing).
PhyloTraits.getshiftvalue
— Methodgetshiftvalue(shift::ShiftNet)
Get the values of the shifts, for an object ShiftNet
.
PhyloTraits.lambda_estim
— Methodlambda_estim(m::PhyloNetworkLinearModel)
Estimated lambda parameter for a fitted object.
PhyloTraits.mu_phylo
— Methodmu_phylo(m::PhyloNetworkLinearModel)
Estimated root value for a fitted object.
PhyloTraits.nparams
— Methodfor HKY85
model: 1 if relative, 2 if absolute
PhyloTraits.nparams
— Methodfor JC69
model: 0 if relative, 1 if absolute
PhyloTraits.nparams
— Methodnparams(model)
Number of parameters for a given trait evolution model (length of field model.rate
).
PhyloTraits.nstates
— Methodfor a BinaryTraitSubstitutionModel
, this is 2:
julia> m1 = BinaryTraitSubstitutionModel([1.0,2.0], ["low","high"])
Binary Trait Substitution Model:
rate low→high α=1.0
rate high→low β=2.0
julia> nstates(m1)
2
PhyloTraits.nstates
— MethodFor example, this is 4 for a NucleicAcidSubstitutionModel
.
julia> nstates(JC69([0.03], false))
4
julia> nstates(HKY85([.5], [0.25, 0.25, 0.25, 0.25]))
4
PhyloTraits.nstates
— Methodnstates(model)
Number of character states for a given evolution model.
PhyloTraits.phylolm
— Functionphylolm(X::Matrix, Y::Vector, net::HybridNetwork, model::ContinuousTraitEM=BM(); kwargs...)
Return a PhyloNetworkLinearModel
object. This method is called by phylolm(formula, data, network; kwargs...)
.
PhyloTraits.phylolm
— Methodphylolm(f::StatsModels.FormulaTerm, fr::AbstractDataFrame, net::HybridNetwork; kwargs...)
Fit a phylogenetic linear regression model to data. Return an object of type PhyloNetworkLinearModel
. It contains a linear model from the GLM package, in object.lm
, of type GLM.LinearModel.
Arguments
f
: formula to use for the regression, see StatsModelsfr
: DataFrame containing the response values, predictor values, species/tip labels for each observation/row.net
: phylogenetic network to use. Should have labelled tips.
Keyword arguments
model="BM"
: model for trait evolution (as a string) "lambda" (Pagel's lambda), "scalinghybrid" are other possible values (seeContinuousTraitEM
)tipnames=:tipnames
: column name for species/tip-labels, represented as a symbol. For example, if the column containing the species/tip labels infr
is named "Species", then dotipnames=:Species
.no_names=false
: Iftrue
, force the function to ignore the tips names. The data is then assumed to be in the same order as the tips of the network. Default is false, setting it to true is dangerous, and strongly discouraged.reml=true
: iftrue
, use REML criterion ("restricted maximum likelihood") for estimating variance components, else use ML criterion.suppresswarnings=false
: iftrue
, PhyloTraits-specific warnings are suppressed. Currently, only Pagel's lambda model may throw a warning (if the network is not time-consistent).
The following tolerance parameters control the optimization of lambda if model="lambda"
or model="scalinghybrid"
, and control the optimization of the variance components if model="BM"
and withinspecies_var=true
.
fTolRel=1e-10
: relative tolerance on the likelihood valuefTolAbs=1e-10
: absolute tolerance on the likelihood valuexTolRel=1e-10
: relative tolerance on the parameter valuexTolAbs=1e-10
: absolute tolerance on the parameter valuestartingValue=0.5
: Ifmodel
="lambda" or "scalinghybrid", this provides the starting value for the optimization in lambda.fixedValue=missing
: Ifmodel
="lambda" or "scalinghybrid", andfixedValue
is a number, then lambda is set to this number and is not optimized.withinspecies_var=false
: Iftrue
, fits a within-species variation model. Currently only implemented formodel
="BM".y_mean_std::Bool=false
: Iftrue
, andwithinspecies_var=true
, then accounts for within-species variation, using species-level statistics provided infr
.
Methods applied to fitted models
To access the response values, do response(object)
. To access the model matrix, do modelmatrix(object)
. To access the model formula, do formula(object)
.
Within-species variation
For a high-level description, see PhyloNetworkLinearModel
. To fit a model with within-species variation in the response variable, either of the following must be provided in the data frame fr
:
(1) Individual-level data: There should be columns for response, predictors, and species/tip-labels. Every row should correspond to an individual observation. At least one species must be represented by two or more individuals.
(2) Species-level statistics: There should be columns for mean response, predictors, species/tip-labels, species sample-sizes (number of individuals for each species), and species standard deviations (standard deviations of the response values by species). Every row should correspond to a species: each species should be represented by a unique row. The column names for species sample-sizes and species standard deviations are expected to be "[response column name]_n" and "[response column name]_sd". For example, if the response column name is "y", then the column names should be "y_n" and "y_sd" for the sample-sizes and standard deviations.
Regardless of whether the data provided follows (1) or (2), withinspecies_var
should be set to true. If the data provided follows (2), then y_mean_std
should be set to false.
Within-species variation in predictors
The model assumes no within-species variation in predictors, because it aims to capture the evolutionary (historical, phylogenetic) relationship between the predictors and the response, not the within-species (present-day, or phenotypic) relationship.
If a within-species variation model is fitted on individual-level data, and if there are individuals within the same species with different values for the same predictor, these values are all replaced by the mean predictor value for all the individuals in that species. For example, suppose there are 3 individuals in a given species, and that their predictor values are (x₁=3, x₂=6), (x₁=4, x₂=8) and (x₁=2, x₂=1). Then the predictor values for these 3 individuals are each replaced by (x₁=(3+4+2)/3, x₂=(6+8+1)/3) before model fitting. If a fourth individual had data (x₁=10, x₂=missing), then that individual would be ignored for any model using x₂, and would not contribute any information to its species data for these models.
Missing data
Rows with missing data for either the response or the predictors are omitted from the model-fitting. There should minimally be columns for response, predictors, species/tip-labels. As detailed above, additional columns may be required for fitting within-species variation. Missing data in the columns for species names, species standard deviation / sample sizes (if used) will throw an error.
See also
rand
, ancestralreconstruction
, PhyloNetworks.vcv
Examples: Without within-species variation
We first load data from the package and fit the default BM model.
julia> phy = readnewick(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "caudata_tree.txt"));
julia> using DataFrames, CSV # to read data file, next
julia> dat = CSV.read(joinpath(dirname(pathof(PhyloTraits)), "..", "examples", "caudata_trait.txt"), DataFrame);
julia> using StatsModels # for stat model formulas
julia> fitBM = phylolm(@formula(trait ~ 1), dat, phy; reml=false);
julia> fitBM # Shows a summary
PhyloNetworkLinearModel
Formula: trait ~ 1
Model: Brownian motion
Parameter Estimates, using ML:
phylogenetic variance rate: 0.00294521
Coefficients:
─────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────
(Intercept) 4.679 0.330627 14.15 <1e-31 4.02696 5.33104
─────────────────────────────────────────────────────────────────────
Log Likelihood: -78.9611507833
AIC: 161.9223015666
We can extra parameters, likelihood, AIC etc.
julia> round(sigma2_phylo(fitBM), digits=6) # rounding for jldoctest convenience
0.002945
julia> round(mu_phylo(fitBM), digits=4)
4.679
julia> using StatsBase # for aic() stderror() loglikelihood() etc.
julia> round(loglikelihood(fitBM), digits=10)
-78.9611507833
julia> round(aic(fitBM), digits=10)
161.9223015666
julia> round(aicc(fitBM), digits=10)
161.9841572367
julia> round(bic(fitBM), digits=10)
168.4887090241
julia> round.(coef(fitBM), digits=4)
1-element Vector{Float64}:
4.679
julia> confint(fitBM) # 95% (default) confidence interval for the coefficient(s)
1×2 Matrix{Float64}:
4.02696 5.33104
julia> abs(round(r2(fitBM), digits=10)) # absolute value for jldoctest convenience
0.0
julia> abs(round(adjr2(fitBM), digits=10))
0.0
julia> round.(vcov(fitBM), digits=6) # variance-covariance of estimated parameters: squared standard error
1×1 Matrix{Float64}:
0.109314
The residuals are the variance not explained by predictors. The phylogenetic correlation modelled by the BM is about them. The trait may have 2 sources of phylogenetic signal: from the predictor with which it the response may be associated, and from the residuals.
julia> round.(residuals(fitBM), digits=6)
197-element Vector{Float64}:
-0.237648
-0.357937
-0.159387
-0.691868
-0.323977
-0.270452
-0.673486
-0.584654
-0.279882
-0.302175
⋮
-0.777026
-0.385121
-0.443444
-0.327303
-0.525953
-0.673486
-0.603158
-0.211712
-0.439833
julia> round.(response(fitBM), digits=5)
197-element Vector{Float64}:
4.44135
4.32106
4.51961
3.98713
4.35502
4.40855
4.00551
4.09434
4.39912
4.37682
⋮
3.90197
4.29388
4.23555
4.3517
4.15305
4.00551
4.07584
4.46729
4.23917
julia> round.(predict(fitBM), digits=5)
197-element Vector{Float64}:
4.679
4.679
4.679
4.679
4.679
4.679
4.679
4.679
4.679
4.679
⋮
4.679
4.679
4.679
4.679
4.679
4.679
4.679
4.679
4.679
Examples: With within-species variation (two different input formats shown)
We use a smaller network here. We can input data as 1 row per individual, multiple rows per species:
julia> net = readnewick("((((D:0.4,C:0.4):4.8,((A:0.8,B:0.8):2.2)#H1:2.2::0.7):4.0,(#H1:0::0.3,E:3.0):6.2):2.0,O:11.2);");
julia> df = DataFrame( # individual-level observations
species = repeat(["D","C","A","B","E","O"],inner=3),
trait1 = [4.08298,4.08298,4.08298,3.10782,3.10782,3.10782,2.17078,2.17078,2.17078,1.87333,1.87333,
1.87333,2.8445,2.8445,2.8445,5.88204,5.88204,5.88204],
trait2 = [-7.34186,-7.34186,-7.34186,-7.45085,-7.45085,-7.45085,-3.32538,-3.32538,-3.32538,-4.26472,
-4.26472,-4.26472,-5.96857,-5.96857,-5.96857,-1.99388,-1.99388,-1.99388],
trait3 = [18.8101,18.934,18.9438,17.0687,17.0639,17.0732,14.4818,14.1112,14.2817,13.0842,12.9562,
12.9019,15.4373,15.4075,15.4317,24.2249,24.1449,24.1302]);
julia> m1 = phylolm(@formula(trait3 ~ trait1), df, net;
tipnames=:species, withinspecies_var=true)
PhyloNetworkLinearModel
Formula: trait3 ~ 1 + trait1
Model: Brownian motion
Parameter Estimates, using REML:
phylogenetic variance rate: 0.156188
within-species variance: 0.0086343
Coefficients:
──────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────
(Intercept) 9.65347 1.3066 7.39 0.0018 6.02577 13.2812
trait1 2.30358 0.276163 8.34 0.0011 1.53683 3.07033
──────────────────────────────────────────────────────────────────────
Log Likelihood: 1.9446255188
AIC: 4.1107489623
Alternatively, we can input the data as 1 row per species and 2 extra columns: standard deviation of the response trait among individuals in the same species, and number of individuals per species for which we calculated this SD. The result is the same.
julia> df_r = DataFrame( # species-level statistics (sample means, standard deviations)
species = ["D","C","A","B","E","O"],
trait1 = [4.08298,3.10782,2.17078,1.87333,2.8445,5.88204],
trait2 = [-7.34186,-7.45085,-3.32538,-4.26472,-5.96857,-1.99388],
trait3 = [18.896,17.0686,14.2916,12.9808,15.4255,24.1667],
trait3_sd = [0.074524,0.00465081,0.185497,0.0936,0.0158379,0.0509643],
trait3_n = [3, 3, 3, 3, 3, 3]);
julia> m2 = phylolm(@formula(trait3 ~ trait1), df_r, net;
tipnames=:species, withinspecies_var=true, y_mean_std=true)
PhyloNetworkLinearModel
Formula: trait3 ~ 1 + trait1
Model: Brownian motion
Parameter Estimates, using REML:
phylogenetic variance rate: 0.15618
within-species variance: 0.0086343
Coefficients:
──────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────
(Intercept) 9.65342 1.30657 7.39 0.0018 6.02582 13.281
trait1 2.30359 0.276156 8.34 0.0011 1.53686 3.07032
──────────────────────────────────────────────────────────────────────
Log Likelihood: 1.9447243714
AIC: 4.1105512573
PhyloTraits.shiftathybrids
— Methodshiftathybrids(value::Vector{T} where T<:Real, net::HybridNetwork; checkpreorder::Bool=true)
Construct an object ShiftNet
with shifts on all the edges below hybrid nodes, with values provided. The vector of values must have the same length as the number of hybrids in the network.
PhyloTraits.sigma2_phylo
— Methodsigma2_phylo(m::PhyloNetworkLinearModel)
Estimated between-species variance-rate for a fitted object.
PhyloTraits.sigma2_within
— Methodsigma2_within(m::PhyloNetworkLinearModel)
Estimated within-species variance for a fitted object.
PhyloTraits.stationary
— Methodstationary(substitutionmodel)
Stationary distribution of a Markov model
Random.rand!
— Methodrand!(rng::AbstractRNG,
end::AbstractVector{Int},
model::TraitSubstitutionModel,
t::Float64,
start::AbstractVector{Int})
Simulate discrete traits along one edge of length t
like rand
, but modifying end
in place to store the simulated values.
PhyloTraits.BinaryTraitSubstitutionModel
— TypeBinaryTraitSubstitutionModel(α, β [, label])
Model for binary traits, that is, with 2 states. Default labels are "0" and "1". α is the rate of transition from "0" to "1", and β from "1" to "0".
PhyloTraits.EqualRatesSubstitutionModel
— TypeEqualRatesSubstitutionModel(numberStates, α, labels)
TraitSubstitutionModel
for traits with any number of states and equal substitution rates α between all states. Default labels are "1","2",...
example
julia> m1 = EqualRatesSubstitutionModel(2, [1.0], ["low","high"])
Equal Rates Substitution Model with k=2,
all rates equal to α=1.0.
rate matrix Q:
low high
low * 1.0000
high 1.0000 *
PhyloTraits.HKY85
— TypeHKY85(rate, pi, relative)
A nucleic acid substitution model based on Hasegawa et al. 1985 substitution model. rate
should be a vector of 1 or 2 rates, and pi
a vector of 4 probabilities summing to 1.
If relative
is false, the 2 rates represent the transition rate and the transversion rate, α and β. If relative
is true (default), only the first rate is used and represents the transition/transversion ratio: κ=α/β. The rate transition matrix Q is normalized to have 1 change / unit of time on average, i.e. the absolute version of Q is divided by 2(piT*piC + piA*piG)α + 2(piY*piR)β
.
nparams
returns 1 or 2. In other words: the stationary distribution is not counted in the number of parameters (and fitdiscrete
does not optimize the pi values at the moment).
examples
julia> m1 = HKY85([.5], [0.20, 0.30, 0.30, 0.20])
HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
relative rate version with transition/tranversion ratio kappa = 0.5,
scaled so that there is one substitution per unit time
rate matrix Q:
A C G T
A * 0.4839 0.2419 0.3226
C 0.3226 * 0.4839 0.1613
G 0.1613 0.4839 * 0.3226
T 0.3226 0.2419 0.4839 *
julia> nstates(m1)
4
julia> m2 = HKY85([0.5, 0.5], [0.20, 0.30, 0.30, 0.20], false)
HKY85 Substitution Model base frequencies: [0.2, 0.3, 0.3, 0.2]
absolute rate version with transition/transversion ratio kappa = a/b = 1.0
with rates a = 0.5 and b = 0.5
rate matrix Q:
A C G T
A * 0.1500 0.1500 0.1000
C 0.1000 * 0.1500 0.1000
G 0.1000 0.1500 * 0.1000
T 0.1000 0.1500 0.1500 *
PhyloTraits.JC69
— TypeJC69(rate, relative)
Jukes Cantor (1969) nucleic acid substitution model, which has a single rate parameter. rate
corresponds to the absolute diagonal elements, that is, the rate of change (to any of the other 2 states). Individual rates are rate
/3. If relative
is true (default), the transition matrix Q
is normalized to an average of 1 transition per unit of time: in which case rate
is set to 1.0.
examples
julia> m1 = JC69([0.25], false)
Jukes and Cantor 69 Substitution Model,
absolute rate version
off-diagonal rates equal to 0.25/3.
rate matrix Q:
A C G T
A * 0.0833 0.0833 0.0833
C 0.0833 * 0.0833 0.0833
G 0.0833 0.0833 * 0.0833
T 0.0833 0.0833 0.0833 *
julia> nstates(m1)
4
julia> nparams(m1)
1
julia> m2 = JC69([0.5])
Jukes and Cantor 69 Substitution Model,
relative rate version
off-diagonal rates equal to 1/3
rate matrix Q:
A C G T
A * 0.3333 0.3333 0.3333
C 0.3333 * 0.3333 0.3333
G 0.3333 0.3333 * 0.3333
T 0.3333 0.3333 0.3333 *
julia> nparams(m2)
0
PhyloTraits.ParamsBM
— TypeParamsBM <: ParamsProcess
Type for a BM process on a network. Fields are mu
(expectation), sigma2
(variance), randomRoot
(whether the root is random, default to false
), and varRoot
(if the root is random, the variance of the root, default to NaN
).
PhyloTraits.ParamsMultiBM
— TypeParamsMultiBM <: ParamsProcess
Type for a multivariate Brownian diffusion (MBD) process on a network. Fields are mu
(expectation), sigma
(covariance matrix), randomRoot
(whether the root is random, default to false
), varRoot
(if the root is random, the covariance matrix of the root, default to [NaN]
), shift
(a ShiftNet type, default to missing
), and L
(the lower triangular of the cholesky decomposition of sigma
, computed automatically)
examples and constructors
julia> ParamsMultiBM([1.0, -0.5], [2.0 0.3; 0.3 1.0]) # no shifts
ParamsMultiBM:
Parameters of a MBD with fixed root:
mu: [1.0, -0.5]
Sigma: [2.0 0.3; 0.3 1.0]
julia> net = readnewick("((A:1,B:1):1,C:2);");
julia> shifts = ShiftNet(net.node[2], [-1.0, 2.0], net);
julia> ParamsMultiBM([1.0, -0.5], [2.0 0.3; 0.3 1.0], shifts) # with shifts
ParamsMultiBM:
Parameters of a MBD with fixed root:
mu: [1.0, -0.5]
Sigma: [2.0 0.3; 0.3 1.0]
There are 2 shifts on the network:
───────────────────────────────────────────
Edge Number Shift Value 1 Shift Value 2
───────────────────────────────────────────
2.0 -1.0 2.0
───────────────────────────────────────────
PhyloTraits.PhyloNetworkLinearModel
— TypePhyloNetworkLinearModel <: GLM.LinPredModel
Phylogenetic linear model representation.
Fields
lm
, V
, Vy
, RL
, Y
, X
, logdetVy
, reml
, ind
, nonmissing
, evomodel
, model_within
and formula
. The following syntax pattern can be used to get more information on a specific field: e.g. to find out about the lm
field, do ?PhyloNetworkLinearModel.lm
.
Methods applied to fitted models
The following StatsAPI / StatsBase functions can be applied: coef
, nobs
, vcov
, stderror
, confint
, coeftable
, dof_residual
, dof
, deviance
, residuals
, response
, predict
, loglikelihood
, nulldeviance
, nullloglikelihood
, r2
, adjr2
, aic
, aicc
, bic
, ftest
, lrtest
etc.
The estimated variance-rate and estimated mean of the species-level trait model (see ContinuousTraitEM
) can be retrieved using sigma2_phylo
and mu_phylo
respectively.
If relevant, the estimated individual-level/within-species variance can be retrieved using sigma2_within
.
The optimized λ parameter for Pagel's λ model (see PagelLambda
) can be retrieved using lambda_estim
.
An ancestral state reconstruction can be performed using ancestralreconstruction
.
Within-species variation
The true species/population means for the response trait/variable (or the residuals: conditional on the predictors) are jointly modeled as 𝒩(·, σ²ₛV) where V depends on the trait model (see ContinuousTraitEM
) and on the species network. σ²ₛ is the between-species variance-rate.
Within-species variation is modeled by assuming that the individual-level responses are iid 𝒩(0, σ²ₑ) about the true species means, so that the species-level sample means (conditional on the predictors) are jointly modeled as 𝒩(·, σ²ₛV + σ²ₑD⁻¹), where σ²ₑ is the within-species variance and D⁻¹ is a diagonal matrix whose entries are the inverse sample-sizes (see WithinSpeciesCTM
).
Although the above two models can be expressed in terms of a joint distribution for the species-level sample means (or residuals conditional on the predictors), more data are required to fit a model accounting for within-species variation, that is, a model recognizing that the sample means are estimates of the true population means. To fit a model without within-species variation, data on the species means are sufficient. To fit a model with within-species variation, we need to have the species means and the standard deviations of the response variable for each species.
phylolm
can fit a model with within-species variation either from species-level statistics ("mean response" and "standard deviation in response") or from individual-level data (in which case the species-level statistics are computed internally). See phylolm
for more details on these two input choices.
In the object, obj.Y
and obj.X
are the observed species means. predict
, residuals
and response
return the values at the species level.
PhyloTraits.RateVariationAcrossSites
— TypeRateVariationAcrossSites(rvsymbol::Symbol, ncategories::Int=4)
Default model for rate variation across site, specified by a symbol:
:noRV
for no rate variation:G
or:Gamma
for gamma-distributed rates:I
or:Inv
for two categories: invariable and variable:GI
or:GI
for both.
PhyloTraits.RateVariationAcrossSites
— MethodRateVariationAcrossSites(; pinv=0.0, alpha=Inf, ncat=4)
Model for variable substitution rates across sites (or across traits) using the discrete Gamma model (+G, Yang 1994, Journal of Molecular Evolution) or the invariable-sites model (+I, Hasegawa, Kishino & Yano 1985 J Mol Evol). Both types of rate variation can be combined (+G+I, Gu, Fu & Li 1995, Mol Biol Evol) but this is discouraged (Jia, Lo & Ho 2014 PLOS One). Using rate variation increases the number of parameters by one (+G or +I) or by two (+G+I).
Because the mean of the desired distribution or rates is 1, we use a Gamma distribution with shape α and scale θ=1/α (rate β=α) if no invariable sites, or scale θ=1/(α(1-pinv)), that is rate β=α(1-pinv) with a proportion pinv of invariable sites. The shape parameter is referred to as alpha here. The Gamma distribution is discretized into ncat
categories. In each category, the category's rate multiplier is a normalized quantile of the gamma distribution.
julia> rv = RateVariationAcrossSites()
Rate variation across sites: discretized Gamma
categories for Gamma discretization: 1
rates: [1.0]
julia> nparams(rv)
0
julia> typeof(rv)
PhyloTraits.RVASGamma{1}
julia> rv = RateVariationAcrossSites(alpha=1.0, ncat=4)
Rate variation across sites: discretized Gamma
alpha: 1.0
categories for Gamma discretization: 4
rates: [0.146, 0.513, 1.071, 2.27]
julia> typeof(rv)
PhyloTraits.RVASGamma{4}
julia> PhyloTraits.setalpha!(rv, 2.0)
Rate variation across sites: discretized Gamma
alpha: 2.0
categories for Gamma discretization: 4
rates: [0.319, 0.683, 1.109, 1.889]
julia> nparams(rv)
1
julia> rv = RateVariationAcrossSites(pinv=0.3)
Rate variation across sites: +I (invariable sites)
pinv: 0.3
rates: [0.0, 1.429]
julia> nparams(rv)
1
julia> typeof(rv)
PhyloTraits.RVASInv
julia> PhyloTraits.setpinv!(rv, 0.05)
Rate variation across sites: +I (invariable sites)
pinv: 0.05
rates: [0.0, 1.053]
julia> rv = RateVariationAcrossSites(pinv=0.3, alpha=2.0, ncat=4)
Rate variation across sites: discretized Gamma+I
pinv: 0.3
alpha: 2.0
categories for Gamma discretization: 4
rates: [0.0, 0.456, 0.976, 1.584, 2.698]
probabilities: [0.3, 0.175, 0.175, 0.175, 0.175]
julia> nparams(rv)
2
julia> typeof(rv)
PhyloTraits.RVASGammaInv{5}
julia> PhyloTraits.setalpha!(rv, 3.0)
Rate variation across sites: discretized Gamma+I
pinv: 0.3
alpha: 3.0
categories for Gamma discretization: 4
rates: [0.0, 0.6, 1.077, 1.584, 2.454]
probabilities: [0.3, 0.175, 0.175, 0.175, 0.175]
julia> PhyloTraits.setpinv!(rv, 0.05)
Rate variation across sites: discretized Gamma+I
pinv: 0.05
alpha: 3.0
categories for Gamma discretization: 4
rates: [0.0, 0.442, 0.793, 1.167, 1.808]
probabilities: [0.05, 0.238, 0.238, 0.238, 0.238]
julia> PhyloTraits.setpinvalpha!(rv, 0.1, 5.0)
Rate variation across sites: discretized Gamma+I
pinv: 0.1
alpha: 5.0
categories for Gamma discretization: 4
rates: [0.0, 0.593, 0.91, 1.221, 1.721]
probabilities: [0.1, 0.225, 0.225, 0.225, 0.225]
PhyloTraits.ReconstructedStates
— TypeReconstructedStates
Type containing the inferred information about the law of the ancestral states given the observed tips values. The missing tips are considered as ancestral states.
Reconstructed states and prediction intervals can be recovered with function predict
, and the standard error can be obtained with stderror
.
The ReconstructedStates
object has fields: traits_nodes
, variances_nodes
, nodenumbers
, traits_tips
, tipnumbers
, model
. Type in "?ReconstructedStates.field" to get help on a specific field.
PhyloTraits.ShiftNet
— TypeShiftNet
Shifts mapped to tree nodes and their (unique) parent edge on a PhyloNetworks.HybridNetwork
sorted in topological order. Its shift
field is a vector of shift values, one for each node, corresponding to the shift on the parent edge of the node (which makes sense for tree nodes only: they have a single parent edge).
Two ShiftNet
objects on the same network can be concatened with *
.
ShiftNet(node::Vector{Node}, value::AbstractVector, net::HybridNetwork; checkpreorder::Bool=true)
Constructor from a vector of nodes and associated values. The shifts are located on the edges above the nodes provided. Warning, shifts on hybrid edges are not allowed.
ShiftNet(edge::Vector{Edge}, value::AbstractVector, net::HybridNetwork; checkpreorder::Bool=true)
Constructor from a vector of edges and associated values. Warning, shifts on hybrid edges are not allowed.
Extractors: getshiftedgenumber
, getshiftvalue
PhyloTraits.TraitSimulation
— TypeTraitSimulation
Result of a trait simulation on an PhyloNetworks.HybridNetwork
with rand
.
The following functions and extractors can be applied to it: tiplabels
, obj[:tips]
, obj[:internalnodes]
(see documentation for function getindex(::TraitSimulation, ::Symbol)
).
The TraitSimulation
object has fields: M
, params
, evomodel
.
PhyloTraits.TraitSubstitutionModel
— TypeTraitSubstitutionModel
For subtypes, see BinaryTraitSubstitutionModel
, EqualRatesSubstitutionModel
, TwoBinaryTraitSubstitutionModel
PhyloTraits.TwoBinaryTraitSubstitutionModel
— TypeTwoBinaryTraitSubstitutionModel(rate [, label])
TraitSubstitutionModel
for two binary traits, possibly correlated. Default labels are "x0", "x1" for trait 1, and "y0", "y1" for trait 2. If provided, label
should be a vector of size 4, listing labels for trait 1 first then labels for trait 2. rate
should be a vector of substitution rates of size 8. rate[1],...,rate[4] describe rates of changes in trait 1. rate[5],...,rate[8] describe rates of changes in trait 2. In the transition matrix, trait combinations are listed in the following order: x0-y0, x0-y1, x1-y0, x1-y1.
example
model = TwoBinaryTraitSubstitutionModel([2.0,1.2,1.1,2.2,1.0,3.1,2.0,1.1],
["carnivory", "noncarnivory", "wet", "dry"]);
model
using PhyloPlots
plot(model) # to visualize states and rates. not supported with PhyloPlots v2.0.0. requires PhyloPlots v1.0.0
index
PhyloTraits.BinaryTraitSubstitutionModel
PhyloTraits.EqualRatesSubstitutionModel
PhyloTraits.HKY85
PhyloTraits.JC69
PhyloTraits.ParamsBM
PhyloTraits.ParamsMultiBM
PhyloTraits.PhyloNetworkLinearModel
PhyloTraits.RateVariationAcrossSites
PhyloTraits.RateVariationAcrossSites
PhyloTraits.ReconstructedStates
PhyloTraits.ShiftNet
PhyloTraits.TraitSimulation
PhyloTraits.TraitSubstitutionModel
PhyloTraits.TwoBinaryTraitSubstitutionModel
Base.rand
Base.rand
Base.rand
PhyloTraits.Q
PhyloTraits.Q
PhyloTraits.ancestralreconstruction
PhyloTraits.ancestralreconstruction
PhyloTraits.ancestralreconstruction
PhyloTraits.ancestralreconstruction
PhyloTraits.descendencedataframe
PhyloTraits.empiricalDNAfrequencies
PhyloTraits.fitdiscrete
PhyloTraits.getlabels
PhyloTraits.getlabels
PhyloTraits.getshiftedgenumber
PhyloTraits.getshiftvalue
PhyloTraits.lambda_estim
PhyloTraits.mu_phylo
PhyloTraits.nparams
PhyloTraits.nparams
PhyloTraits.nparams
PhyloTraits.nstates
PhyloTraits.nstates
PhyloTraits.nstates
PhyloTraits.phylolm
PhyloTraits.phylolm
PhyloTraits.shiftathybrids
PhyloTraits.sigma2_phylo
PhyloTraits.sigma2_within
PhyloTraits.stationary
Random.rand!