List of useful methods
The following lists docstrings for various important functions.
Once a DataModel object has been defined, it can subsequently be used to compute various quantities as follows:
StatsAPI.loglikelihood — Method
loglikelihood(DM::DataModel, θ::AbstractVector) -> RealCalculates the logarithm of the likelihood $L$, i.e. $\ell(\mathrm{data} \, | \, \theta) \coloneqq \mathrm{ln} \big( L(\mathrm{data} \, | \, \theta) \big)$ given a DataModel and a parameter configuration $\theta$.
InformationGeometry.MLE — Method
MLE(DM::DataModel) -> VectorReturns the parameter configuration $\theta_\text{MLE} \in \mathcal{M}$ which is estimated to have the highest likelihood of producing the observed data (under the assumption that the specified model captures the true relationship present in the data). For performance reasons, the maximum likelihood estimate is stored as a part of the DataModel type.
InformationGeometry.MLEuncert — Method
MLEuncert(DM::AbstractDataModel, mle::AbstractVector=MLE(DM), F::AbstractMatrix=FisherMetric(DM, mle); Safe::Bool=false, threshold::Real=1e-10, verbose::Bool=true)Returns vector of type Measurements.Measurement where the parameter uncertainties are approximated via the diagonal of the inverse Fisher metric. That is, the stated uncertainties are a linearized symmetric approximation of the true parameter uncertainties around the MLE.
If Safe=false, infinite values are only imputed for parameters which are structurally non-identifiable themselves, i.e. about which no information contained in the data. However, as a secondary effect, many more parameters than these can also exhibit flat profile likelihoods, if their effects are related to (and can be compensated for by) the degenerate parameters. Nevertheless, upon fixing or removing the purely degenerate parameters, the additional parameters whose profiles were only flat as a secondary effect become structurally identifiable. For Safe=true, infinite values are also imputed for any parameters related to the degenerate parameters, whose profiles may be secondarily affected.
InformationGeometry.LogLikeMLE — Method
LogLikeMLE(DM::DataModel) -> RealReturns the value of the log-likelihood $\ell$ when evaluated at the maximum likelihood estimate, i.e. $\ell(\mathrm{data} \, | \, \theta_\text{MLE})$. For performance reasons, this value is stored as a part of the DataModel type.
Properties related to structural identifiability can be computed via:
InformationGeometry.StructurallyIdentifiable — Function
StructurallyIdentifiable(DM::AbstractDataModel, mle::AbstractVector{<:Number}=MLE(DM); showall::Bool=false, noise::Real=1e-5, threshold::Real=1e-10, N::Int=3)Checks if jacobian of model wrt parameters has singular values below threshold and provides associated singular directions.
InformationGeometry.PracticallyIdentifiable — Function
PracticallyIdentifiable(DM::AbstractDataModel, Confnum::Real=3; plot::Bool=isloaded(:Plots), IsCost::Bool=false, kwargs...) -> RealDetermines the maximum confidence level (in units of standard deviations σ) at which the given DataModel is still practically identifiable.
PracticallyIdentifiable(P::ParameterProfiles) -> RealDetermines the maximum level at which ALL the given profiles in ParameterProfiles are still practically identifiable. If IsCost=true was chosen for the profiles, the output is the maximal deviation in cost function value W = 2(L_MLE - PL_i(θ)). If instead IsCost=false was chosen, so that cost function deviations have already been rescaled to confidence levels, the output of PracticallyIdentifiable is the maximal confidence level in units of standard deviations σ where the model is still practically identifiability.
InformationGeometry.IsLinearParameter — Function
IsLinearParameter(DM::DataModel, MLE::AbstractVector) -> BitVectorChecks with respect to which parameters the model function model(x,θ) is linear and returns vector of booleans where true indicates linearity. This test is performed by comparing the Jacobians of the model for two random configurations $\theta_1, \theta_2 \in \mathcal{M}$ column by column.
Various geometric quantities which are intrinsic to the parameter manifold $\mathcal{M}$ can be computed as a result of the Fisher metric $g$ (and subsequent choice of the Levi-Civita connection) such as the Riemann and Ricci tensors and the Ricci scalar $R$.
InformationGeometry.Score — Method
Score(DM::DataModel, θ::AbstractVector{<:Number}; ADmode::Val=Val(:ForwardDiff))Calculates the gradient of the log-likelihood $\ell$ with respect to a set of parameters $\theta$. ADmode=Val(false) computes the Score using the Jacobian dmodel provided in DM, i.e. by having to separately evaluate both the model as well as dmodel. Other choices of ADmode directly compute the Score by differentiating the formula the log-likelihood, i.e. only one evaluation on a dual variable is performed.
InformationGeometry.FisherMetric — Method
FisherMetric(DM::DataModel, θ::AbstractVector{<:Number})Computes the Fisher metric $g$ given a DataModel and a parameter configuration $\theta$ under the assumption that the likelihood $L(\mathrm{data} \, | \, \theta)$ is a multivariate normal distribution.
\[g_{ab}(\theta) \coloneqq -\int_{\mathcal{D}} \mathrm{d}^m y_{\mathrm{data}} \, L(y_{\mathrm{data}} \,|\, \theta) \, \frac{\partial^2 \, \mathrm{ln}(L)}{\partial \theta^a \, \partial \theta^b} = -\mathbb{E} \bigg( \frac{\partial^2 \, \mathrm{ln}(L)}{\partial \theta^a \, \partial \theta^b} \bigg)\]
InformationGeometry.GeometricDensity — Method
GeometricDensity(DM::AbstractDataModel, θ::AbstractVector) -> RealComputes the square root of the determinant of the Fisher metric $\sqrt{\mathrm{det}\big(g(\theta)\big)}$ at the point $\theta$.
InformationGeometry.ChristoffelSymbol — Method
ChristoffelSymbol(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false)
ChristoffelSymbol(Metric::Function, θ::AbstractVector; BigCalc::Bool=false)Calculates the components of the $(1,2)$ Christoffel symbol $\Gamma$ at a point $\theta$ (i.e. the Christoffel symbol "of the second kind") through finite differencing of the Metric. Accurate to ≈ 3e-11. BigCalc=true increases accuracy through BigFloat calculation.
InformationGeometry.Riemann — Method
Riemann(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false)
Riemann(Metric::Function, θ::AbstractVector; BigCalc::Bool=false)Calculates the components of the $(1,3)$ Riemann tensor by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.
InformationGeometry.Ricci — Method
Ricci(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false)
Ricci(Metric::Function, θ::AbstractVector; BigCalc::Bool=false)Calculates the components of the $(0,2)$ Ricci tensor by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.
InformationGeometry.RicciScalar — Method
RicciScalar(DM::DataModel, θ::AbstractVector; BigCalc::Bool=false) -> Real
RicciScalar(Metric::Function, θ::AbstractVector; BigCalc::Bool=false) -> RealCalculates the Ricci scalar by finite differencing of the Metric. BigCalc=true increases accuracy through BigFloat calculation.
InformationGeometry.GeodesicDistance — Method
GeodesicDistance(DM::DataModel, P::AbstractVector{<:Number}, Q::AbstractVector{<:Number}; tol::Real=1e-10)
GeodesicDistance(Metric::Function, P::AbstractVector{<:Number}, Q::AbstractVector{<:Number}; tol::Real=1e-10)Computes the length of a geodesic connecting the points P and Q.
Different model comparison and model selection criteria are available via:
InformationGeometry.AIC — Method
AIC(DM::DataModel, θ::AbstractVector) -> RealCalculates the Akaike Information Criterion given a parameter configuration $\theta$ defined by $\mathrm{AIC} = 2 \, \mathrm{length}(\theta) -2 \, \ell(\mathrm{data} \, | \, \theta)$. Lower values for the AIC indicate that the associated model function is more likely to be correct. For linearly parametrized models and small sample sizes, the AICc should be used instead. The pure AIC tends to select overfitting models. An asymptotically consistent generalization is given by the CAIC and CAICF.
InformationGeometry.AICc — Method
AICc(DM::DataModel, θ::AbstractVector) -> RealComputes Akaike Information Criterion with an added correction term that prevents the AIC from selecting models with too many parameters (i.e. overfitting) in the case of small sample sizes. $\mathrm{AICc} = \mathrm{AIC} + \frac{2\mathrm{length}(\theta)^2 + 2 \mathrm{length}(\theta)}{N - \mathrm{length}(\theta) - 1}$ where $N$ is the number of data points. Whereas AIC constitutes a first order estimate of the information loss, the AICc constitutes a second order estimate. This correction term is only exact for linearly parametrized models, and an approximation otherwise.
InformationGeometry.BIC — Method
BIC(DM::DataModel, θ::AbstractVector) -> RealCalculates the Bayesian Information Criterion given a parameter configuration $\theta$ defined by $\mathrm{BIC} = \mathrm{ln}(N) \cdot \mathrm{length}(\theta) -2 \, \ell(\mathrm{data} \, | \, \theta)$ where $N$ is the number of data points.
InformationGeometry.CAIC — Method
CAIC(DM::DataModel, θ::AbstractVector) -> RealCalculates the consistent Akaike Information Criterion developed by Bozdogan in https://doi.org/10.1007/BF02294361 Given a parameter configuration $\theta$, it is defined by $\mathrm{CAIC} = (1 + \mathrm{ln}(N)) \cdot \mathrm{length}(\theta) -2 \, \ell(\mathrm{data} \, | \, \theta) = \mathrm{BIC} + \mathrm{length}(\theta)$ where $N$ is the number of data points. Therefore, it gives a slightly stronger penalization of the parameter degrees of freedom than BIC. While this approximates exactly the same quantity as the AIC, namely minus twice the expected entropy, it in contrast provides a more accurate and asymptotically consistent approximation. That is, in the limit of $N \longrightarrow \infty$, the both the probabilities of under- and overfitting the true model dimensionality go to zero for the CAIC and CAICF.
InformationGeometry.CAICF — Method
CAICF(DM::DataModel, θ::AbstractVector) -> RealCalculates the consistent Akaike Information Criterion with Fisher information developed by Bozdogan in https://doi.org/10.1007/BF02294361 Given a parameter configuration $\theta$, it is defined by $\mathrm{CAIC} = (2 + \mathrm{ln}(N)) \cdot \mathrm{length}(\theta) + 2\mathrm{logdet}(F) -2 \, \ell(\mathrm{data} \, | \, \theta) = \mathrm{BIC} + \mathrm{length}(\theta)$ where $F$ is the Fisher information at $\theta$ and $N$ is the number of data points. Therefore, $\mathrm{CAICF} = \mathrm{AIC} + \mathrm{length}(\theta) \cdot \mathrm{ln}(N) + \mathrm{logdet}(F)$. While this approximates exactly the same quantity as the AIC, namely minus twice the expected entropy, it in contrast provides a more accurate and asymptotically consistent approximation. That is, in the limit of $N \longrightarrow \infty$, the both the probabilities of under- and overfitting the true model dimensionality go to zero for the CAIC and CAICF. Intuitively, the idea is that a model with good fit but lower Fisher information generalizes better, since many parameter values fit the data well.
InformationGeometry.ModelComparison — Function
ModelComparison(DM1::AbstractDataModel, DM2::AbstractDataModel) -> Tuple{Int,Real}Compares the AICc values of both models at best fit and estimates probability that one model is more likely than the other. First entry of tuple returns which model is more likely to be correct (1 or 2) whereas the second entry returns the ratio of probabilities p_better/p_worse.