Maximum Likelihood Estimation
Unless the kwarg SkipOptim=true
is passed to the DataModel
constructor, it automatically tries to optimize the model parameters in an attempt to find the MLE. If no initial guess for the values parameters is provided, the constructor will try to infer the correct number of parameters and make a random initial guess. However, more control can be exerted over the optimization process by choosing an appropriate optimization method and corresponding options. These options may either be be passed to the DataModel
constructor, or the optimization method InformationGeometry.minimize
may be invoked explicitly after the DataModel
was constructed with either unsuccessful optimization or skipping the automatic initial optimization altogether.
Most importantly, InformationGeometry.minimize
is compatible with the Optimization.jl ecosystem of optimizer methods via the meth
keyword. Best support is available for optimizers from the Optim.jl via a custom wrapper. Choosing meth=nothing
also allows for using a Levenberg-Marquardt method from the LsqFit.jl optimizer, which is however only possible if data with fixed Gaussian uncertainties and no priors are used.
DM = DataModel(DataSet(1:3, [4,5,6.5], [0.5,0.45,0.6]), (x,p)->(p[1]+p[2])*x + exp(p[1]-p[2]), [1.3, 0.2]; SkipOptim=true)
plot(DM; Confnum=0)
The optimization can then be performed with:
using Optim
mle = InformationGeometry.minimize(DM; meth=NewtonTrustRegion(), tol=1e-12, maxtime=60.0, Domain=HyperCube(zeros(2), 10ones(2)), verbose=true)
2-element Vector{Float64}:
1.1084748685027297
0.12215893446056099
The full solution object is returned if the keyword argument Full=true
is additionally provided.
Alternatively, one might use optimizers from the Optimization.jl ecosystem e.g. via
using Optimization, OptimizationOptimisers
mle = InformationGeometry.minimize(DM, rand(2); meth=Optimisers.AdamW())
or
using Optimization, OptimizationNLopt
mle = InformationGeometry.minimize(DM, rand(2); meth=NLopt.GN_DIRECT())
Finally, the newly found optimum can be visually inspected with
plot(DM, mle)
and added to the original unoptimized DataModel
object via
DM = DataModel(Data(DM), Predictor(DM), mle; SkipOptim=true)
Multistart Optimization
When a reasonable estimate for the initial parameter configuration is not available and optimizations starting from the approximate center of the parameter domain do not converge to a suitable optimum, a systematic search of the parameter space is needed. This can be achieved with the MultistartFit
method, which will sample the parameter space $\mathcal{M}$ either according to a given probability distribution (e.g. uniform / normal) on $\mathcal{M}$ or by drawing the parameter values from a low-discrepancy sequence such as the Sobol sequence. Compared with uniformly drawn values, values drawn from low-discrepancy sequences achieve a more even and "equidistant" coverage, thereby slightly increasing the chances of discovering a larger number of distinct local optima overall.
Just like InformationGeometry.minimize
, the MultistartFit
method is also compatible with the Optimization.jl ecosystem of optimizer methods via the meth
keyword.
using Optim
R = MultistartFit(DM; N=200, maxval=500, meth=LBFGS())
MultistartResults with 200/251 successful starts
Median number of iterations: 2
5/200 (2.5 %) convergent fits had same final objective value
The most relevant keywords are:
MultistartDomain::HyperCube
for defining the domain from which the initial guesses are drawnN::Int
for choosing the number of startsmeth
for choosing the optimizerresampling=false
disables the drawing of new guesses when the objective function cannot be evaluated in some regions of the parameter spacemaxval::Real=1e5
if noMultistartDomain
is specified, a cube of size(-maxval, maxval)^(×n)
is constructed
This returns a MultistartResults
object, which saves some additional information such as the number of iterations taken per run, the initial guesses, etc. The parameter configuration is obtained with MLE(R)
. Alternatively, one could for example specify a distribution for the initials of the multistart optimization via
MultistartFit(DM, MvNormal([0,0], Diagonal(ones(2))), ProfileDomain=HyperCube([-1,-1],[3,4]), N=200, meth=Newton())
The results of a multistart optimization can be visualized with a plot of the sorted final objective values, ideally showing prominent step-like features where the optimization reliably converged to the same local optima multiple times.
WaterfallPlot(R; BiLog=true)
It can also be useful to investigate whether the parameter configurations within one "step" of the plot, where the final objective function value was the same, are actually "close" to each other, or whether there are distinct (or spread out) local optima, which nevertheless produce a fit of the same quality. This can be plotted via the ParameterPlot
method, which can display the data either in a :dotplot
, :boxplot
or :violin
plot.
using StatsPlots
ParameterPlot(R; st=:dotplot)
Seeing a spread in the ParameterPlot
for the parameter configurations of the lowest "step" is already an indication that some parameter may be non-identifiable.
InformationGeometry.minimize
— Functionminimize(F::Function, start::AbstractVector{<:Number}; tol::Real=1e-10, meth=NelderMead(), Full::Bool=false, maxtime::Real=600, kwargs...) -> Vector
minimize(F, dF, start::AbstractVector{<:Number}; tol::Real=1e-10, meth=LBFGS(), Full::Bool=false, maxtime::Real=600, kwargs...) -> Vector
minimize(F, dF, ddF, start::AbstractVector{<:Number}; tol::Real=1e-10, meth=NewtonTrustRegion(), Full::Bool=false, maxtime::Real=600, kwargs...) -> Vector
Minimizes the scalar input function using the given start
using any algorithms from the Optimation.jl
ecosystem specified via the keyword meth
. Full=true
returns the full solution object instead of only the minimizing result. Optionally, the search domain can be bounded by passing a suitable HyperCube
object as the third argument (ignoring derivatives).
InformationGeometry.MultistartFit
— FunctionMultistartFit(DM::AbstractDataModel; maxval::Real=1e5, MultistartDomain::HyperCube=FullDomain(pdim(DM), maxval), kwargs...)
MultistartFit(DS::AbstractDataSet, model::ModelOrFunction, LogPriorFn::Union{Nothing,Function}, MultistartDomain::HyperCube; N::Int=100, resampling::Bool=true, timeout::Real=120, Full=true, parallel::Bool=true, Robust::Bool=true, p::Real=2, kwargs...)
Performs Multistart optimization with N
starts and timeout of fits after timeout
seconds. If resampling=true
, if likelihood non-finite new initial starts are redrawn until N
suitable initials are found. If Robust=true
, performs optimization wrt. p-norm according to given kwarg p
. For Full=false
, only the final MLE is returned, otherwise a MultistartResults
object is returned, which can be further analyzed and plotted.
Any further keyword arguments are passed through to the optimization procedure InformationGeometry.minimize such as tolerances, optimization methods, domain constraints, etc.
InformationGeometry.LocalMultistartFit
— FunctionLocalMultistartFit(DM::AbstractDataModel, scale::Real=sqrt(InvChisqCDF(DOF(DM), ConfVol(2.0))); kwargs...)
LocalMultistartFit(DM::AbstractDataModel, mle::AbstractVector{<:Number}, scale::Real=sqrt(InvChisqCDF(DOF(DM), ConfVol(2.0))); kwargs...)
Performs a multistart search locally around a given MLE in a cube constructed from the Fisher information and multiplied with scale
.
InformationGeometry.WaterfallPlot
— FunctionWaterfallPlot(R::MultistartResults; BiLog::Bool=true, MaxValue::Real=3000, StepTol::Real=0.01, kwargs...)
Shows Waterfall plot for the given results of MultistartFit. StepTol
is used to decide which difference of two neighbouring values in the Waterfall plot constitutes a step. StepTol=0
deactivates step marks. MaxValue
is used to set threshold for ignoring points whose cost function after optimization is too large compared with best optimum. DoBiLog=false
disables logarithmic scale for cost function.
InformationGeometry.ParameterPlot
— MethodParameterPlot(R::MultistartResults; st=:dotplot, BiLog::Bool=true, Nsteps::Int=5, StepTol::Real=0.01, MaxValue=3000)
Plots the parameter values of the MultistartResults
separated by step to show whether the different optima are localized or not. st
can be either :dotplot
, :boxplot
or :violin
.
StatsPlots.jl needs to be loaded to use this plotting function.
InformationGeometry.IncrementalTimeSeriesFit
— MethodIncrementalTimeSeriesFit(DM::AbstractDataModel, initial::AbstractVector{<:Number}=MLE(DM); steps::Int=length(Data(DM))÷5, Method::Function=InformationGeometry.minimize, kwargs...) -> Vector
Fits DataModel incrementally by splitting up the times series into chunks, e.g. fitting only the first quarter of data points, then half and so on. This can yield much better fitting results from random starting points, particularly for autocorrelated time series data. For example when the time series data oscillates in such a way that the optimization often gets stuck in a local optimum where the model fits a mostly straight line through the data, not correctly recognizing the oscillations.
InformationGeometry.RobustFit
— FunctionRobustFit(DM::AbstractDataModel, start::AbstractVector{<:Number}; tol::Real=1e-10, p::Real=1, kwargs...)
Uses p
-Norm to judge distance on Dataspace as specified by the keyword.