API
Beluga.BMRevJumpPrior
— TypeBMRevJumpPrior
Bivariate autocorrelated rates (Brownian motion) prior inspired by coevol (Lartillot & Poujol 2010) with an Inverse Wishart prior on the unknown covariance 2×2 matrix. Crucially, this is defined for the Node
based model, i.e. states at model nodes are assumed to be states at nodes of the phylogeny.
Beluga.BranchKernel
— TypeBranchKernel
Reversible jump kernel that introduces a WGD, decreases λ and increases μ on the associated branch.
Beluga.CRRevJumpPrior
— TypeCRRevJumpPrior
Constant-rates model.
Beluga.DLWGD
— TypeDLWGD{T<:Real,V<:ModelNode{T}}
Duplication, loss and WGD model. This holds a dictionary for easy access of the nodes in the probabilistic graphical model and the leaf names.
Beluga.DLWGD
— Method(m::DLWGD)(row::DataFrameRow)
Instantiate a model based on a row from a trace data frame. This returns a modified copy of the input model.
Beluga.DropKernel
— TypeDropKernel
Reversible jump kernel that introduces a WGD and decreases λ on the associated branch.
Beluga.IRRevJumpPrior
— TypeIRRevJumpPrior
Bivariate uncorrelated rates prior with an Inverse Wishart prior on the unknown covariance 2×2 matrix. Crucially, this is defined for the Branch
based model, i.e. states at model nodes are assumed to be states at branches of the phylogeny.
Beluga.PArray
— TypePArray{T<:Real}
Ditributed array of phylogenetic profiles.
Beluga.PostPredSim
— TypePostPredSim(chain, data::DataFrame, n::Int64)
Perform posterior predictive simulations.
Beluga.Profile
— TypeProfile{T<:Real}
Struct for a phylogenetic profile of a single family. Geared towards MCMC applications (temporary storage fields) and parallel applications (using DArrays). See also PArray
.
Beluga.RevJumpChain
— TypeRevJumpChain
Reversible jump chain struct for DLWGD model inference.
!!! note After construction, an explicit call to init!
is required.
Beluga.SimpleKernel
— TypeSimpleKernel
Reversible jump kernel that only introduces a WGD while not chnging λ or μ.
Beluga.addwgd!
— Methodaddwgd!(d::DLWGD, n::ModelNode, t, q)
Insert a WGD node with retention rate q
at distance t
above node n
.
Beluga.addwgds!
— Methodaddwgds!(m::DLWGD, p::PArray, config::Array)
Add WGDs from array of named tuples e.g. [(lca="ath,cpa", t=rand(), q=rand())] and update the profile array.
Beluga.addwgds!
— Methodaddwgds!(m::DLWGD, p::PArray, config::String)
addwgds!(m::DLWGD, p::PArray, config::Dict{Int64,Tuple})
Add WGDs from a (stringified) dictionary (as in the wgds column of the trace data frame in rjMCMC applications) and update the profile array.
Beluga.asvector
— Methodasvector(d::DLWGD)
Get a parameter vector for the DLWGD model, structured as [λ1, …, λn, μ1, …, μn, q1, …, qk, η].
Beluga.bayesfactors
— Methodbayesfactors(trace::DataFrame, model::DLWGD, p::Float64)
Compute Bayes Factors for all branch WGD configurations. Returns a data frame that is more or less self-explanatory.
Beluga.getrates
— Methodgetrates(model::DLWGD{T})
Get the duplication and loss rate matrix (2 × n).
Beluga.getwgdtrace
— Methodgetwgdtrace(chain)
Summarize all WGD models from an rjMCMC trace. This provdes the data to evaluate retention rates for WGD models etc. Returns a dict of dicts with data frames (which is a horrible data structure, I know) structured as (branch1 => (1 WGD => trace, 2 WGDs => trace, ...), branch2 => (), ...).
Beluga.gradient
— Methodgradient!(d::DLWGD, p::PArray{T})
Accumulate the gradient ∇ℓ(λ,μ,q,η|X) in parallel for the phylogenetic profile matrix p
.
Beluga.gradient
— Methodgradient(d::DLWGD, x::Vector)
Compute the gradient of the log likelihood under the DLWGD model for a single count vector x
, ∇ℓ(λ,μ,q,η|x).
Currently the gradient seems to only work in NaN safe mode github issue
Beluga.init!
— Methodinit!(chain::RevJumpChain)
Initialize the chain.
Beluga.posteriorE!
— MethodposteriorE!(chain)
Compute E[Xi|Xparent=1,λ,μ] for the joint posterior; i.e. the expected number of lineages at node i under the linear birth-death process given that there was one lineage at the parent of i, for each sample from the posterior. This can give an idea of gene family expansion/contraction.
Beluga.posteriorΣ!
— MethodposteriorΣ!(chain)
Sample the covariance matrix of the bivariate process post hoc from the posterior under the Inverse Wishart prior. Based on Lartillot & Poujol 2010.
Beluga.pppvalues
— Methodpppvalues(pps::PostPredSim)
Compute posterior predictive p-values based on the posterior predictive distribution and the observed sumary statistics (see e.g. Gelman et al. 2013).
Beluga.removewgd!
— Functionremovewgd!(d::DLWGD, n::ModelNode, reindex::Bool=true, set::Bool=true)
Remove WGD/T node n
from the DLWGD model. If reindex
is true, the model nodes are reindexed to be consecutive. If set
is true, the model internals (transition and extinction probabilities) are recomputed.
Beluga.removewgds!
— Methodremovewgds(d::DLWGD)
Remove all WGD nodes from the model.
Beluga.setrates!
— Methodsetrates!(model::DLWGD{T}, X::Matrix{T})
Set duplication and loss rates for each non-wgd node|branch in the model. Rates should be provided as a 2 × n matrix, where the columns correspond to model node indices.
Distributions.logpdf!
— Methodlogpdf!(L::Matrix, d::DLWGD, x::Vector{Int64})
Compute the log likelihood under the DLWGD model for a single count vector x
ℓ(λ,μ,q,η|x) and update the dynamic programming matrix (L
).
Distributions.logpdf!
— Methodlogpdf!(L::Matrix, n::ModelNode, x::Vector{Int64})
Compute the log likelihood under the DLWGD model for a single count vector x
ℓ(λ,μ,q,η|x) and update the dynamic programming matrix (L
), only recomputing the matrix above node n
.
Distributions.logpdf!
— Methodlogpdf!(d::DLWGD, p::PArray{T})
logpdf!(n::ModelNode, p::PArray{T})
Accumulate the log-likelihood ℓ(λ,μ,q,η|X) in parallel for the phylogenetic profile matrix. If the first argument is a ModelNode, this will recompute the dynamic programming matrices starting from that node to save computation. Assumes (of course) that the phylogenetic profiles are iid from the same DLWGD model.
Distributions.logpdf
— Methodlogpdf(d::DLWGD, x::Vector{Int64})
Compute the log likelihood under the DLWGD model for a single count vector x
ℓ(λ,μ,q,η|x).
Beluga.AMMProposals
— TypeAMMProposals(d)
Adaptive Mixture Metropolis (AMM) proposals, where d
is the dimensionality of the rates vector (should be 2 × number of nodes in tree).
Beluga.ConstantDistribution
— TypeConstantDistribution(x)
A 'constant' distribution (Dirac mass), sometimes useful.
Beluga.MWGProposals
— TypeMWGProposals
Proposals for the Metropolis-within-Gibbs algorithm. The MWG algorithm iterates over each node in the tree, resulting in very good mixing and fast convergence in terms of number of iterations, but has quite a high computational cost per generation.
Beluga.UpperBoundedGeometric
— TypeUpperBoundedGeometric{T<:Real}
An upper bounded geometric distribution, basically a constructor for a DiscreteNonParametric
distribution with the relevant probabilities.
Base.rand
— Methodrand(d::DLWGD, N::Int64 [; condition::Vector{Vector{Symbol}}])
Simulate N
random phylogenetic profiles under the DLWGD model, subject to the constraint that there is at least one gene in each clade specified in the condition
array. (by default conditioning is on non-extinction).
Examples:
```julia-repl julia> # include completely extinct families julia> rand(d, N, condition=[])
julia> # condition on at least one lineage in both clades stemming from the root julia> rand(d, N, condition=Beluga.rootclades(d))
Base.rand
— Methodrand(d::DLWGD)
Simulate a random phylogenetic profile from the DLWGD model.
Beluga.set!
— Methodset!(d::DLWGD)
Compute all model internals in postorder.