diff --git a/paper.md b/paper.md index 3bed463..b8e4b56 100644 --- a/paper.md +++ b/paper.md @@ -51,7 +51,7 @@ Addressing real-world challenges across diverse domains, including engineering, $$ x^* = \operatorname*{argmin}_{x\in\mathcal{X}} f(x), $$ -where, $f:\mathcal{X}\to\mathbb{R}$ is some objective function over the state space $\mathcal{X}$. While in many cases, gradient-based methods achieve state-of-the-art performance, there are various scenarios where so-called derivative-free methods are more appropriate. This can be attributed to the unavailability or difficulty in evaluating the gradient of $f$. Additionally, it might be that $f$ is non-smooth or non-convex, which also hinders the applicability of gradient-based methods. +where, $f\colon\mathcal{X}\to\mathbb{R}$ is some objective function over the state space $\mathcal{X}$. While in many cases, gradient-based methods achieve state-of-the-art performance, there are various scenarios where so-called derivative-free methods are more appropriate. This can be attributed to the unavailability or difficulty in evaluating the gradient of $f$. Additionally, it might be that $f$ is non-smooth or non-convex, which also hinders the applicability of gradient-based methods. Numerous techniques exist for derivative-free optimization, such as random or pattern search [@friedman1947planning;@rastrigin1963convergence;@hooke1961direct], Bayesian optimization [@movckus1975bayesian] or simulated annealing [@henderson2003theory]. However, we focus on particle-based methods, specifically on consensus-based optimization (CBO) as proposed in [@pinnau2017consensus]. For an ensemble of $N$ particles $x=(x^1,\ldots, x^N)\in \mathcal{X}^N$, the update of the $i$th particle is given by $$ @@ -79,7 +79,7 @@ We summarize the motivation and main features of the packages in what follows. # Mathematical background -CBO methods use a finite number of agents $x=(x^1,\dots,x^N)$ to explore the domain and to form a global consensus about the location of the minimizer $x^*$ as time passes. They are described through a system of stochastic differential equations (SDEs), expressed in Ito's formula as +CBO methods use a finite number of agents $x=(x^1,\dots,x^N)$ to explore the domain and to form a global consensus about the location of the minimizer $x^*$ as time passes. They are described through a system of stochastic differential equations (SDEs): $$ dx^i_t = -\lambda\ \underbrace{(x^i_t-c_\alpha(x_t))\,dt}_{\text{consensus drift}} + \sigma\ \underbrace{D(x^i_t-c_\alpha(x_t))\,dB^i_t}_{\text{scaled diffusion}}, $$ @@ -89,14 +89,14 @@ Global information about the objective function is encoded in the consensus poin $$ c_\alpha(x_t) = \frac{1}{\sum_{i=1}^N \omega_\alpha(x^i_t)} \sum_{i=1}^N x^i_t\ \omega_\alpha(x^i_t), \quad\text{ with }\quad \omega_\alpha(\,\cdot\,) = \mathrm{exp}(-\alpha f(\,\cdot\,)). $$ -Each particle is driven by a drift toward the consensus (confinement) and subject to a scaled diffusion (exploration). The scaling factor of the diffusion is proportional to the distance of the particle from the consensus point. Hence, whenever a particle's position and the location of the weighted mean coincide, the particle stops moving. Concerning the analysis of the methods, the main challenge is to balance the drift and diffusion term in such a way that all particles converge to the consensus point, which is located at the globally best position of the state space. If the drift is too strong, the convergence may happen prematurely. On the other hand, if the diffusion is too strong, there may be no convergence at all. The choice of the weight function defining the mean is motivated by the Laplace principle [@dembo1998large], which ensures that the consensus point converges to the position of the particle with best objective value (assuming that this particle is unique). From a computational perspective, the method is attractive as the particle interactions scale linearly with the number of particles. +Each particle is driven by a drift toward the consensus (confinement) and subject to a scaled diffusion (exploration). The scaling factor of the diffusion is proportional to the distance of the particle from the consensus point. Hence, whenever the position of a particle and the location of the weighted mean coincide, the particle stops moving. Concerning the analysis of the methods, the main challenge is to balance the drift and diffusion terms in such a way that (i) all particles converge to a consensus point and (ii) the consensus point is close to the global minimizer $x^*$. If the drift is too strong, consensus formation may happen prematurely, far from the global minimizer. On the other hand, if the diffusion is too strong, there may be no convergence at all. The choice of the weight function defining the mean is motivated by the Laplace principle [@dembo1998large], which ensures that the consensus point is close to the position of the particle with best objective value (assuming that this particle is unique). From a computational perspective, the method is attractive as the particle interactions scale linearly with the number of particles. A theoretical convergence analysis is not directly conducted on the above SDE system due to its highly complex behavior, but on its macroscopic mean-field limit (infinite-particle limit) [@huang2021MFLCBO], which can be described by a nonlinear nonlocal Fokker-Planck equation [@pinnau2017consensus;@carrillo2018analytical;@carrillo2021consensus;@fornasier2021consensus;@fornasier2021convergence]. The implemented CBO code originates from a simple Euler-Maruyama time discretization of the above SDE system. A convergence statement therefore is available in [@fornasier2021consensus;@fornasier2021convergence]. Similar analysis techniques further allowed to obtain theoretical convergence guarantees for a variety of CBO variants [@bungert2022polarized;@riedl2022leveraging;@fornasier2023consensus] as well as PSO [@qiu2022PSOconvergence]. -As of now, CBX methods have been deployed in several different settings and for different purposes, such as for solving constrained optimization problems [@fornasier2020consensus_sphere_convergence;@borghi2021constrained], multi-objective optimizations [@borghi2022adaptive;@klamroth2022consensus], saddle point problems [@huang2022consensus], federated learning tasks [@carrillo2023fedcbo], in the setting of uncertainty quantification [@althaus2023consensus] or for sampling [@carrillo2022consensus]. +At present, CBX methods have been deployed in several different settings and for different purposes, such as for solving constrained optimization problems [@fornasier2020consensus_sphere_convergence;@borghi2021constrained], multi-objective optimizations [@borghi2022adaptive;@klamroth2022consensus], saddle point problems [@huang2022consensus], federated learning tasks [@carrillo2023fedcbo], in the setting of uncertainty quantification [@althaus2023consensus] or for sampling [@carrillo2022consensus]. In addition, recent work [@riedl2023gradient] establishes a connection of CBO to stochastic gradient descent-type methods, suggesting a more fundamental connection of theoretical interest between derivative-free and gradient-based methods. # Features of CBXPy