API

DeadBird.add_internal!Method
add_internal!(dag, ndata, parts, x, n)

For a species tree internal node n, this adds the gene family nodes associated with n to the graph and provides the bound on the number of lineages that survive to the present below n for each gene family. Note that x is a vector of tuples of DAG nodes that each will be joined into a newly added node. The resulting nodes are returned.

!!! note: I believe this also works for multifurcating species trees (like the Csuros Miklos algorithm does too)

source
DeadBird.add_leaves!Method
add_leaves!(dag, ndata, parts, x, n)

For a species tree leaf node n, this adds the vector of (gene) counts x for that species to the graph. This returns for each gene family the corresponding node that was added to the graph

source
DeadBird.cm!Method
cm!(dag, node, model)

Compute the conditional survival probabilities at n using the Csuros & Miklos (2009) algorithm. This assumes the model already contains the computed transition probability matrices W and that the partial loglikelihood vectors for the child nodes in the DAG are already computed and available.

source
DeadBird.conditionfactorMethod
conditionfactor(model)

Compute the condition factor for the model for the associated data filtering strategy.

source
DeadBird.discretizeMethod
discretize(d, K)

Discretize a distribution d in K equal probability classes. Uses the median of each class as representative rate, and rescales the resulting vector x so that mean(x) == mean(d).

Note

Better would be to have the mean value of each class as representative I guess, but the median is much more straightforward to obtain given that we have quantile functions available.

Example

julia> discretize(Gamma(10, 0.1), 5)
5-element Array{Float64,1}:
 0.6269427439826725
 0.8195837806573205
 0.9743503743962694
 1.1475354999847722
 1.4315876009789656
source
DeadBird.example_dataMethod
example_data()

Get some example_data.

Example (and benchmark)

julia> x = DeadBird.example_data();

julia> @btime DeadBird.loglikelihood!(x.dag, x.model)
  36.974 μs (431 allocations: 31.53 KiB)
-26.30930561857625

julia> @btime DeadBird.loglikelihood!(x.mat, x.model)
  32.876 μs (420 allocations: 29.91 KiB)
-26.309305618576246
source
DeadBird.extpMethod
extp(t, λ, μ, ϵ)

Compute the extinction probability of a single lineage evolving according to a linear BDP for time t with rate λ and μ and with extinction probability of a single lineage at t equal to ϵ. This is ∑ᵢℙ{Xₜ=i|X₀=1}ϵ^i

Note

Takes ϵ on a [0,1] scale

source
DeadBird.getθMethod
getθ(m<:RatesModel, node)

Get the parameter values from a RatesModel relevant for a particular node in a phylogeny. Should be implemented for each RatesModel where parameters differ across branches.

source
DeadBird.getϕψMethod
getϕψ(t, λ, μ)

Returns ϕ = μ(eʳ - 1)/(λeʳ - μ) where r = t*(λ-μ) and ψ = ϕ*λ/μ, with special cases for λ ≈ μ. These methods should be implemented as to prevent underflow/overflow issues. Note these quantities are also called p and q (in Csuros & Miklos) or α and β (in Bailey). Note that ϕ = P(Xₜ=0|X₀=1), i.e. the extinction probability for a single gene.

source
DeadBird.getϕψ′Method
getϕψ′(ϕ, ψ, ϵ)

Adjusted ϕ and ψ for a linear BDP process with extinction probability ϵ after the process.

\[\phi' = \frac{\psi(1-\epsilon) + (1 - \psi) \epsilon}{1 - \psi \epsilon} \psi' = \frac{\psi(1-\epsilon)}{1 - \psi \epsilon}\]

Some edge cases are when ϵ is 1 or 0. Other edge cases may be relevant when ψ and or ϕ is 1 or 0.

Note

We take ϵ on [0,1] scale.

source
DeadBird.insertwgmsMethod
insertwgms(model, wgms::Dict)

Insert a bunch of WGMs in a given PhyloBDP model, will return a new model object. wgms should be a dict with vectors of tuples, keeping for each branch a vector with (t, k) tuples. This version does not modify anything in the template model.

This is not particularly convenient for use in rjMCMC algorithms, where we want to efficiently add and remove single WGM events...

Example

x = DeadBird.example_data()
m = PhyloBDP(RatesModel(ConstantDLGWGD(q=ones(9))), x.tr, 5)
insertwgms(m, Dict(3=>[(0.1, 2)], 2=>[(0.3, 4)]))
source
DeadBird.loglikelihood!Method
loglikelihood!(dag::CountDAG, model::PhyloBDP)

Compute the log likelihood on the DAG using the Csuros & Miklos algorithm.

source
DeadBird.marginal_extinctionpFunction
marginal_extinctionp(d, logϵ)

Compute the marginal log probability of having no observed descendants for a branching process starting off with n genes given by a probability distribution d when the probability that a single gene goes extinct is ϵ. This is:

\[\sum_{k=1}^\infty ϵ^k P\{X₀ = k\}\]

For many priors a closed form can be obtained by manipulating the sum so that it becomes a geometric series.

source
DeadBird.marginalizeFunction
marginalize(p::ShiftedBetaGeometric, ℓvec, logϵ, imax=100)

There seems to be no closed form for this, but we can devise a recursion, analogous to the ShiftedBetaGeometric case. This could probably share code, but I'll have it separate for now.

source
DeadBird.marginalizeFunction
marginalize(p::ShiftedBetaGeometric, ℓvec, logϵ, imax=100)

There seems to be no closed form for this, but we can devise a recursion and obtain a near-exact solution efficiently.

source
DeadBird.marginalizeFunction
marginalize(prior, ℓvec, logϵ)

Compute the log-likelihood of the data at the root by marginalizing the partial conditional survival likelihoods at the root over the prior on the number of genes at the root. This is the following sum

\[\ell = \sum_{n=1}^{b} \ell[n] \Big( \sum_{i=0}^\infty \binom{n+i}{i} \epsilon^i (1 - \epsilon)^n P\{X_o = n+i\} \Big)\]

Where ℓ[n] = P{data|Yₒ=n}, where Yₒ denotes the number of genes at the root that leave observed descendants and Xₒ denotes the total number of genes at the root, for which we specified the prior. b is the bound on the number of surviving lineages, which is determined by the observed data. For many priors, the innner infinite sum can be simplified to a closed form after some algebraic manipulation and relying on the fact that ∑ binom(α + k - 1, k) z^k = (1 - z)^(-k) for |z| < 1.

source
DeadBird.newmodelMethod
newmodel(m::M, θ) where M<:RatesModel

Construct a new model of type M by taking the parameters of m and parameters defined in the named tuple θ, the latter overriding the former.

source
DeadBird.nonextinctfromrootconditionMethod
nonextinctfromrootcondition(model)

Compute the probability that a family existing at the root of the species tree leaves observed descendants in both clades stemming from the root, i.e. does not go extinct in any of the two clades stemming from the root. This uses the marginalization of the extinction probability over the prior distribution on the number of genes at the root using marginal_extinctionp

source
DeadBird.ppmfMethod
ppmf(node, x, prior, bound)

Compute the posterior pmf for the ancestral state at node node, where x holds the partial likelihoods somewhere, and prior corresponds to the root or transition probability density that acts as a prior on the node state.

source
DeadBird.sample_ancestralMethod
sample_ancestral(spl::AncestralSampler, x)

Sample a set of ancestral states using a pre-order traversal over the tree. This assumes the partial likelihoods are available in x.

source
DeadBird.sample_ancestral_nodeMethod
sample_ancestral_node(rng, node, x, prior, bound)

Sample ancestral state for node node with prior prior and relevant partial likelihoods computed in x. The prior refers to either a root prior distribution or the transient probability distribution of the process given the parent state.

source
DeadBird.setw!Method
setw!(W, θ, t)

Compute transition probability matrix for the ordinary (not conditional on survival that is) birth-death process. Using the recursive formulation of Csuros & Miklos.

source
DeadBird.simulateFunction
simulate(mfun::Function, data::DataFrame, chain, N)

Perform posterior predictive simulations. mfun should be a function that takes an iterate of the chain and returns a PhyloBDP model, i.e. mfun(chain[i]) should return a parameterized model. data is the observed data set to which posterior predictive simulations should correspond.

Example

julia> x = DeadBird.example_data();

julia> DeadBird.simulate(y->x.model((λ=y, μ=y)), x.df, ones(10), 100)
PP simulations (N = 100, n = 10)
source
DeadBird.simulateFunction
simulate(m::ModelArray)
simulate(m::MixtureModel, n)
simulate(m::PhyloBDP, n)

Simulate a set of random profiles from a phylogenetic birth-death model.

Example

julia> x = DeadBird.example_data();

julia> simulate(x.model, 5)
5×5 DataFrame
│ Row │ A     │ B     │ C     │ rejected │ extinct │
│     │ Int64 │ Int64 │ Int64 │ Int64    │ Int64   │
├─────┼───────┼───────┼───────┼──────────┼─────────┤
│ 1   │ 1     │ 1     │ 1     │ 0        │ 0       │
│ 2   │ 1     │ 1     │ 1     │ 0        │ 0       │
│ 3   │ 2     │ 2     │ 2     │ 0        │ 0       │
│ 4   │ 0     │ 1     │ 1     │ 1        │ 1       │
│ 5   │ 1     │ 1     │ 1     │ 0        │ 0       │
source
DeadBird.wgmϵMethod
wgmϵ(q, k, logϵ)

Compute the log-extinction probability of a single lineage going through a k-plication event, given the extinction probability of a single lineage after the WGM event. Assumes the single parameter WGM retention model (assuming a single gene before the WGM, the number of retained genes after the WGM is a rv X' = 1 + Y where Y is Binomial(k - 1, q)).

source
DeadBird.wstar!Method
wstar!(W::Matrix, t, θ, ϵ)

Compute the transition probabilities for the conditional survival process recursively (not implemented using recursion though!). Note that the resulting transition matrix is not a stochastic matrix of some Markov chain.

source
DeadBird.AncestralSamplerType
AncestralSampler(model, bound)

A wrapper that contains the transition probability matrices for the transient distributions for the PhyloBDP (not conditioned on survival) along each branch.

source
DeadBird.ConstantDLGType
ConstantDLG{T}

Simple constant rates duplication-loss and gain model. All nodes of the tree are associated with the same parameters (duplication rate λ, loss rate μ, gain rate κ).

source
DeadBird.ConstantDLGWGMType
ConstantDLGWGM{T}

Similar to ConstantDLG, but with a field for whole-genome multiplication (WGM) nodes in the phylogeny, which have a single retention rate parameter q each.

source
DeadBird.CountDAGType
CountDAG(df::DataFrame, tree::Node)

Get a CountDAG from a count matrix, i.e. the directed acyclic graph (DAG) representation of a phylogenetic profile for an (assumed known) species tree. This is a multitree.

Example

julia> x = DeadBird.example_data();

julia> dag = CountDAG(x.df, x.tr)
(dag = CountDAG({17, 20} directed simple Int64 graph), bound = 7)
source
DeadBird.DLGType
DLG{T}

Simple branch-wise rates duplication-loss and gain model.

source
DeadBird.NodeDataType
NodeData{I}

Keeps some relevant information for nodes in the DAG representation of a phylogenetic profile matrix.

source
DeadBird.PPSimType
PPPSim

Container for posterior predictive simulations, constructor should not be called directly nor exported.

source
DeadBird.PhyloBDPType
PhyloBDP(ratesmodel, tree, bound)

The phylogenetic birth-death process model as defined by Csuros & Miklos (2009). The bound is exactly defined by the data under consideration.

!!! note: implemented as a <: DiscreteMultivariateDistribution (for convenience with Turing.jl), however does not support a lot of the Distributions.jl interface.

Example

julia> x = DeadBird.example_data();

julia> rates = RatesModel(ConstantDLG(λ=0.1, μ=0.1));

julia> dag, bound = CountDAG(x.df, x.tr);

julia> rates = RatesModel(ConstantDLG(λ=0.1, μ=0.1));

julia> PhyloBDP(rates, x.tr, bound)
PhyloBDP(
~root
RatesModel with () fixed
ConstantDLG{Float64}
  λ: Float64 0.1
  μ: Float64 0.1
  κ: Float64 0.0
  η: Float64 0.66
)
source
DeadBird.ProfileType
Profile{T,I}

A phylogenetic profile, i.e. an observation of a discrete random variable associated with the leaves of a phylogenetic tree. This has a field x for the extended profile (which records the bound on the number of lineages that survive below an internal node for internal nodes) and a field for the 'partial likelihoods' .

source
DeadBird.ProfileMatrixType
ProfileMatrix(df::DataFrame, tree)

Obtain a ProfileMatrix struct for a count dataframe.

Example

julia> x = DeadBird.example_data();

julia> mat, bound = ProfileMatrix(x.df, x.tr)
(matrix = Profile{Float64,Int64}[2 1 … 0 1; 3 2 … 1 1; 7 3 … 0 4; 7 3 … 3 4], bound = 7)
source
DeadBird.RatesModelType
RatesModel

Abstract type for diferent rate models for phylogenies (e.g. constant rates across the tree, branch-specific rates, models with WGD nodes, ...).

source
DeadBird.ShiftedBetaGeometricType
ShiftedBetaGeometric(η, ζ)

Beta-Geometric compound distribution on the domain [1, 2, ..., ∞). The pmf is given by

\[p_k = \frac{\mathrm{B}(\alpha + 1, \beta + k - 1)}{\mathrm{B}(\alpha, \beta)}\]

Note

We use the alternative parameterization using the mean η = α/(α+β) and offset 'sample size' ζ = α + β - 1, where ζ > 0. That is, we assume α + β > 1.

source
DeadBird.TransientType
Transient

Transient distribution P(X(t)|X(0)=k). This is a simple struct for the sampler.

source