Introduction

With PhyloNetworks installed, we can load the package and start using it to read, manipulate, and analyze phylogenetic trees and networks in Julia.

julia> using PhyloNetworks

Here is a very small test to see if we correctly installed and loaded PhyloNetworks.

julia> net = readnewick("(A,(B,(C,D)));");
julia> tiplabels(net)4-element Vector{String}:
 "A"
 "B"
 "C"
 "D"

You can see a list of all the functions with

varinfo(PhyloNetworks)

and press ? inside Julia to switch to help mode, followed by the name of a function (or type) to get more details about it.

Often you may wish to work in the directory that contains your data. To change the directory used by julia in a session, say the "examples" folder found in the you have 2 options:

quit your session, navigate to the directory and restart julia there.
or change the working directory within your Julia session by using the cd().

The following code changes the working directory to the examples folder within PhyloNetworks' source directory.

julia> examples_path = joinpath(dirname(dirname(pathof(PhyloNetworks))), "examples");
julia> cd(examples_path)

You will need to set the path to the folder where your data are located.

Julia types

Each object in Julia has a type. We show here small examples on how to get more info on an object. If we want to know the type of a particular object, use typeof For example, let's read a list of gene trees that come with the package. First, we need the file name. Assuming we are in the "examples" folder:

julia> raxmltreefile = joinpath(examples_path, "raxmltrees.tre")"/home/runner/work/PhyloNetworks.jl/PhyloNetworks.jl/examples/raxmltrees.tre"
julia> # raxmltreefile = "raxmltrees.tre" # if your working directory contains the file
       typeof(raxmltreefile)String

The object raxmltreefile is a basic string (of letters). Let's create our list of gene trees by reading this file. Note that if you changed your working directory as mentioned above, you do not need to use joinpath to join the path to the examples folder with the file name.

julia> genetrees = readmultinewick(raxmltreefile); # the semicolon suppresses info on the result
julia> typeof(genetrees)Vector{HybridNetwork} (alias for Array{HybridNetwork, 1})

which shows us that genetrees is of type Vector{HybridNetwork}, that is, a vector containing networks. If we want to know about the attributes the object has, we can type ? in Julia, followed by HybridNetwork for a description.

Typing varinfo() will provide a list of objects and packages in memory, including raxmltreefile and genetrees that we just created.

Quick start

Here we could check the length of our list of gene trees, as a check for correctness to make sure we have all gene trees we expected, and check that the third tree has whatever taxon names we expected:

julia> length(genetrees)30
julia> tiplabels(genetrees[3])6-element Vector{String}:
 "E"
 "A"
 "B"
 "C"
 "D"
 "O"

We can also see some basic information on the third gene tree, say:

julia> genetrees[3]HybridNetwork, Rooted Network
9 edges
10 nodes: 6 tips, 0 hybrid nodes, 4 internal tree nodes.
tip labels: E, A, B, C, ...
((E:0.015,(A:0.006,B:0.006):0.003):0.041,(C:0.006,D:0.0):0.041,O:0.052);

To visualize any of these gene trees, use the PhyloPlots package:

julia> using PhyloPlots
julia> plot(genetrees[3]); # tree for 3rd gene

gene3

Phylogenetic networks

In phylogenetics, there two types of networks:

Explicit networks have a biological interpretation: internal nodes represent ancestral species (or populations); the main evolutionary history is depicted by the "major tree". Various methods that estimate explicit networks use models that account for ILS and for gene tree estimation error.

explicit network

Implicit networks are typically descriptive: internal nodes do not represent ancestral species. Implicit networks do not discriminate between ILS, gene flow/hybridization or gene tree estimation error, and can be hard to interpret biologically.

In PhyloNetworks, we consider explicit phylogenetic networks exclusively.

Extended newick format

In parenthetical format, internal nodes can have a name, like node C below, in a tree written as (A,B)C in newick format:

To represent networks in parenthetical format, the extended newick format splits each hybrid node into two nodes with the same name:

hybrid node split into 2 nodes of the same name

By convention, the hybrid tag is # + H,LGT,R + number, and the minor hybrid edge leads to a leaf.

Thus, we get: (((A,(B)#H1),(C,#H1)),D);. We can write inheritance probabilities in the parenthetical format: (C,#H1):branch length:bootstrap support:inheritance probability.

We can read a network from a newick-formatted string, and, for example, print a list of its edges:

julia> newickstring = "(((A,(B)#H1),(C,#H1)),D);";
julia> net = readnewick(newickstring);
julia> printedges(net)edge parent child  length  hybrid ismajor gamma   containroot i_cycle
1    -4     1              false  true    1       true        -1     
2    3      2              false  true    1       false       -1     
3    -4     3              true   true            true        -1     
4    -3     -4             false  true    1       true        -1     
5    -6     4              false  true    1       true        -1     
6    -6     3              true   false           true        -1     
7    -3     -6             false  true    1       true        -1     
8    -2     -3             false  true    1       true        -1     
9    -2     5              false  true    1       true        -1

We see that the edges do not have branch lengths, and the hybrid edges do not have gamma (inheritance) values. We can set them with

julia> setlength!(net.edge[1], 1.9)
julia> setgamma!(net.edge[3],  0.8)
julia> printedges(net)edge parent child  length  hybrid ismajor gamma   containroot i_cycle
1    -4     1      1.900   false  true    1       true        -1     
2    3      2              false  true    1       false       -1     
3    -4     3              true   true    0.8     true        -1     
4    -3     -4             false  true    1       true        -1     
5    -6     4              false  true    1       true        -1     
6    -6     3              true   false   0.2     true        -1     
7    -3     -6             false  true    1       true        -1     
8    -2     -3             false  true    1       true        -1     
9    -2     5              false  true    1       true        -1

where 1 and 3 correspond to the position of the given edge to modify in the list of edges. We can only change the γ value of hybrid edges, not tree edges (for which γ=1 necessarily). Such an attempt below will cause an error with a message to explain that the edge was a tree edge:

setgamma!(net.edge[4], 0.7)
# should return this:
# ERROR: cannot change gamma in a tree edge