The structure of GPflowOpt¶
Joachim van der Herten
In this document, the structure of the GPflowOpt library is explained,
including some small examples. First the Domain
and Optimizer
interfaces are shortly illustrated, followed by a description of the
BayesianOptimizer
. At the end, a step-by-step walkthrough of the
BayesianOptimizer
is given.
Optimization¶
The underlying design principles of GPflowOpt were chosen to address the following task: optimizing problems of the form
The objective function \(f: \mathcal{X} \rightarrow \mathbb{R}^p\) maps a candidate optimum to a score (or multiple). Here \(\mathcal{X}\) represents the input domain. This domain encloses all candidate solutions to the optimization problem and can be entirely continuous (i.e., a \(d\)-dimensional hypercube) but may also consist of discrete and categorical parameters.
In GPflowOpt, the Domain
and Optimizer
interfaces and
corresponding subclasses are used to explicitly represent the
optimization problem.
Objective function¶
The objective function itself is provided and must be implemented as any python callable (function or object with implemented call operator), accepting a two dimensional numpy array with shape \((n, d)\) as an input, with \(n\) the batch size. It returns a tuple of numpy arrays: the first element of the tuple has shape \((n, p)\) and returns the objective scores for each point to evaluate. The second element is the gradient in every point, shaped either \((n, d)\) if we have a single-objective optimization problem, or \((n, d, p)\) in case of a multi-objective function. If the objective function is passed on to a gradient-free optimization method, the gradient is automatically discarded. GPflowOpt provides decorators which handle batch application of a function along the n points of the input matrix, or dealing with functions which accept each feature as function argument.
Here, we define a simple quadratic objective function:
In [5]:
import numpy as np
def fx(X):
X = np.atleast_2d(X)
# Return objective & gradient
return np.sum(np.square(X), axis=1)[:,None], 2*X
Domain¶
Then, we represent \(\mathcal{X}\) by composing parameters. This is how a simple continuous square domain is defined:
In [6]:
from gpflowopt.domain import ContinuousParameter
domain = ContinuousParameter('x1', -2, 2) + ContinuousParameter('x2', -1, 2)
domain
Out[6]:
Name | Type | Values |
x1 | Continuous | [-2. 2.] |
x2 | Continuous | [-1. 2.] |
Optimize¶
Based on the domain and a valid objective function, we can now easily
apply one of the included optimizers to optimize objective functions.
GPflowOpt defines an intuitive Optimizer
interface which can be used
to specify the domain, the initial point(s), constraints (to be
implemented) etc. Some popular optimization approaches are provided.
Here is how our function is optimized using one of the available methods
of SciPy’s minimize:
In [7]:
from gpflowopt.optim import SciPyOptimizer
optimizer = SciPyOptimizer(domain, method='SLSQP')
optimizer.set_initial([-1,-1])
optimizer.optimize(fx)
Out[7]:
fun: 0.0
jac: array([ 0., 0.])
message: 'Optimization terminated successfully.'
nfev: 3
nit: 2
njev: 2
status: 0
success: True
x: array([[ 0., 0.]])
And here is how we optimize it Monte-Carlo. We can pass the same function as the gradients are automatically discarded.
In [8]:
from gpflowopt.optim import MCOptimizer
optimizer = MCOptimizer(domain, 200)
optimizer.optimize(fx)
Out[8]:
fun: array([[ 0.02115951]])
message: 'OK'
nfev: 200
success: True
x: array([[ 0.05731395, -0.13369599]])
Bayesian Optimization¶
In Bayesian Optimization (BO), the typical assumption is that \(f\) is expensive to evaluate and no gradients are available. The typical approach is to sequentially select a limited set of decisions \(\boldsymbol{x}_0, \boldsymbol{x}_1, ... \boldsymbol{x}_{n-1}\) using a sampling policy. Hence each decision \(\boldsymbol{x}_i \in \mathcal{X}\) itself is the result of an optimization problem
Each iteration, a function \(\alpha_i\) which is cheap-to-evaluate
acts as a surrogate for the expensive function. It is typically a
mapping of the predictive distribution of a (Bayesian) model built on
all decisions and their corresponding (noisy) evaluations. The mapping
introduces an order in \(\mathcal{X}\) to obtain a certain goal. The
typical goal within the context of BO is the search for optimality or
feasibility while keeping the amount of required evaluations
(\(n\)) a small number. As we can have several functions \(f\)
representing objectives and constraints, BO may invoke several models
and mappings \(\alpha\). These mappings are typically referred to as
acquisition functions (or infill criteria). GPflowOpt defines an
Acquisition
interface to implement these mappings and provides
implementations of some default choices. In combination with a special
Optimizer
implementation for BO, following steps are required for a
typical workflow:
- Define the problem domain. Its dimensionality matches the input to the objective and constraint functions. (like normal optimization)
- Specify the (GP) models for the constraints and objectives. This involves choice of kernels, priors, fixes, transforms... this step follows the standard way of setting up GPflow models. GPflowOpt does not further wrap models hence it is possible to implement custom models in GPflow and use them directly in GPflowOpt
- Set up the acquisition function(s) using the available built-in
implementations in GPflowOpt, or design your own by implementing the
Acquisition
interface. - Set up an optimizer for the acquisition function.
- Run the high-level
BayesianOptimizer
which implements a typical BO flow.BayesianOptimizer
in itself is compliant with theOptimizer
interface. Exceptionally, theBayesianOptimizer
requires that the objective function returns no gradients.
Alternatively, advanced users requiring finer control can easily implement their own flow based on the low-level interfaces of GPflowOpt, as the coupling between these objects was intentionally kept loose.
As illustration of the described flow, the previous example is optimized using Bayesian optimization instead, with the well-known Expected Improvement acquisition function:
In [9]:
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import FactorialDesign
from gpflowopt.acquisition import ExpectedImprovement
import gpflow
# The Bayesian Optimizer does not expect gradients to be returned
def fx(X):
X = np.atleast_2d(X)
# Return objective & gradient
return np.sum(np.square(X), axis=1)[:,None]
X = FactorialDesign(2, domain).generate()
Y = fx(X)
# initializing a standard BO model, Gaussian Process Regression with
# Matern52 ARD Kernel
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
alpha = ExpectedImprovement(model)
# Now we must specify an optimization algorithm to optimize the acquisition
# function, each iteration.
acqopt = SciPyOptimizer(domain)
# Now create the Bayesian Optimizer
optimizer = BayesianOptimizer(domain, alpha, optimizer=acqopt)
with optimizer.silent():
r = optimizer.optimize(fx, n_iter=15)
print(r)
fun: array([ 0.17684308])
message: 'OK'
nfev: 15
success: True
x: array([[ 0. , 0.42052714]])
This brief snippet code starts from a 2-level grid (corner points of the
domain) and uses a GP model to model the response surface of the
objective function. The BayesianOptimizer
follows the same interface
as other optimizers and is initialized with a domain, the acquisition
function and an additional optimization method to optimize the
acquisition function each iteration. Finally, the optimizer performs 10
iterations to optimize fx.
The code to evaluate the acquisition function on the model is written in TensorFlow, allowing gradient-based optimization without additional effort due to the automated differentation.
Step-by-step description of the BayesianOptimizer¶
Prior to running BayesianOptimizer.optimize()
, the acquisition
function is initialized with an underlying model. Any data previously
included in the model (through the GPModel.__init__
constructor in
GPflow) is used as initial data. When optimize(function, n_iter)
is
called:
- Any data points returned by
get_initial()
are evaluated. Afterwards the evaluated points are added to the models by calling_update_model_data()
. n_iter
iterations are performed. Each iteration the acquisition function is optimized, and the models are updated by calling_update_model_data()
The updating of a model through _update_model_data()
calls
set_data(X, Y)
on the acquisition function. This covers following
aspects:
GPModel.X
andGPModel.Y
are updated- Each of the contained models are returned to the state when the
acquisition function was initialized and optimized. If the
optimize_restarts
parameter of theAcquisition.__init__()
was set to \(n>1\), the state of the model is randomized and optimized \(n-1\) times. Finally, the state resulting in the bestlog_likelihood()
is the new model state - Call
Acquisition.setup()
to perform any pre-calculation of quantities independent of candidate points, which can be used inbuild_acquisition()
.
The GPflow tree¶
The Acquisition
interface, mapping the belief of the model(s) to a
score indicating areas of optimality/feasibility, is implemented as part
of the GPflow tree
structure.
More specifically it implements the Parameterized
interface
permitting the use of the useful AutoFlow
decorator. The
build_acquisition()
method to be implemented by subclasses is a
TensorFlow method, allowing automated differentiation of the acquisition
function which enables gradient-based optimization thereof (not of the
objective!). It may directly access the graph for computing the
predictive distribution of a model by calling build_predict()
.