# The structure of GPflowOpt¶

*Joachim van der Herten*

In this document, the structure of the GPflowOpt library is explained,
including some small examples. First the `Domain`

and `Optimizer`

interfaces are shortly illustrated, followed by a description of the
`BayesianOptimizer`

. At the end, a step-by-step walkthrough of the
`BayesianOptimizer`

is given.

## Optimization¶

The underlying design principles of GPflowOpt were chosen to address the following task: optimizing problems of the form

The *objective function* \(f: \mathcal{X} \rightarrow \mathbb{R}^p\)
maps a candidate optimum to a score (or multiple). Here
\(\mathcal{X}\) represents the input domain. This domain encloses
all candidate solutions to the optimization problem and can be entirely
continuous (i.e., a \(d\)-dimensional hypercube) but may also
consist of discrete and categorical parameters.

In GPflowOpt, the `Domain`

and `Optimizer`

interfaces and
corresponding subclasses are used to explicitly represent the
optimization problem.

### Objective function¶

The objective function itself is provided and must be implemented as any python callable (function or object with implemented call operator), accepting a two dimensional numpy array with shape \((n, d)\) as an input, with \(n\) the batch size. It returns a tuple of numpy arrays: the first element of the tuple has shape \((n, p)\) and returns the objective scores for each point to evaluate. The second element is the gradient in every point, shaped either \((n, d)\) if we have a single-objective optimization problem, or \((n, d, p)\) in case of a multi-objective function. If the objective function is passed on to a gradient-free optimization method, the gradient is automatically discarded. GPflowOpt provides decorators which handle batch application of a function along the n points of the input matrix, or dealing with functions which accept each feature as function argument.

Here, we define a simple quadratic objective function:

```
In [5]:
```

```
import numpy as np
def fx(X):
X = np.atleast_2d(X)
# Return objective & gradient
return np.sum(np.square(X), axis=1)[:,None], 2*X
```

### Domain¶

Then, we represent \(\mathcal{X}\) by composing parameters. This is how a simple continuous square domain is defined:

```
In [6]:
```

```
from gpflowopt.domain import ContinuousParameter
domain = ContinuousParameter('x1', -2, 2) + ContinuousParameter('x2', -1, 2)
domain
```

```
Out[6]:
```

Name | Type | Values |

x1 | Continuous | [-2. 2.] |

x2 | Continuous | [-1. 2.] |

### Optimize¶

Based on the domain and a valid objective function, we can now easily
apply one of the included optimizers to optimize objective functions.
GPflowOpt defines an intuitive `Optimizer`

interface which can be used
to specify the domain, the initial point(s), constraints (to be
implemented) etc. Some popular optimization approaches are provided.
Here is how our function is optimized using one of the available methods
of SciPy’s minimize:

```
In [7]:
```

```
from gpflowopt.optim import SciPyOptimizer
optimizer = SciPyOptimizer(domain, method='SLSQP')
optimizer.set_initial([-1,-1])
optimizer.optimize(fx)
```

```
Out[7]:
```

```
fun: 0.0
jac: array([ 0., 0.])
message: 'Optimization terminated successfully.'
nfev: 3
nit: 2
njev: 2
status: 0
success: True
x: array([[ 0., 0.]])
```

And here is how we optimize it Monte-Carlo. We can pass the same function as the gradients are automatically discarded.

```
In [8]:
```

```
from gpflowopt.optim import MCOptimizer
optimizer = MCOptimizer(domain, 200)
optimizer.optimize(fx)
```

```
Out[8]:
```

```
fun: array([[ 0.02115951]])
message: 'OK'
nfev: 200
success: True
x: array([[ 0.05731395, -0.13369599]])
```

## Bayesian Optimization¶

In Bayesian Optimization (BO), the typical assumption is that \(f\) is expensive to evaluate and no gradients are available. The typical approach is to sequentially select a limited set of decisions \(\boldsymbol{x}_0, \boldsymbol{x}_1, ... \boldsymbol{x}_{n-1}\) using a sampling policy. Hence each decision \(\boldsymbol{x}_i \in \mathcal{X}\) itself is the result of an optimization problem

Each iteration, a function \(\alpha_i\) which is cheap-to-evaluate
acts as a surrogate for the expensive function. It is typically a
mapping of the predictive distribution of a (Bayesian) model built on
all decisions and their corresponding (noisy) evaluations. The mapping
introduces an order in \(\mathcal{X}\) to obtain a certain goal. The
typical goal within the context of BO is the search for *optimality* or
*feasibility* while keeping the amount of required evaluations
(\(n\)) a small number. As we can have several functions \(f\)
representing objectives and constraints, BO may invoke several models
and mappings \(\alpha\). These mappings are typically referred to as
*acquisition functions* (or *infill criteria*). GPflowOpt defines an
`Acquisition`

interface to implement these mappings and provides
implementations of some default choices. In combination with a special
`Optimizer`

implementation for BO, following steps are required for a
typical workflow:

- Define the
**problem domain**. Its dimensionality matches the input to the objective and constraint functions. (like normal optimization) - Specify the
**(GP) models**for the constraints and objectives. This involves choice of kernels, priors, fixes, transforms… this step follows the standard way of setting up GPflow models. GPflowOpt does not further wrap models hence it is possible to implement custom models in GPflow and use them directly in GPflowOpt - Set up the
**acquisition function(s)**using the available built-in implementations in GPflowOpt, or design your own by implementing the`Acquisition`

interface. - Set up an
**optimizer**for the acquisition function. - Run the
**high-level**`BayesianOptimizer`

which implements a typical BO flow.`BayesianOptimizer`

in itself is compliant with the`Optimizer`

interface. Exceptionally, the`BayesianOptimizer`

requires that the objective function returns**no gradients**.

Alternatively, advanced users requiring finer control can easily implement their own flow based on the low-level interfaces of GPflowOpt, as the coupling between these objects was intentionally kept loose.

As illustration of the described flow, the previous example is optimized using Bayesian optimization instead, with the well-known Expected Improvement acquisition function:

```
In [9]:
```

```
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import FactorialDesign
from gpflowopt.acquisition import ExpectedImprovement
import gpflow
# The Bayesian Optimizer does not expect gradients to be returned
def fx(X):
X = np.atleast_2d(X)
# Return objective & gradient
return np.sum(np.square(X), axis=1)[:,None]
X = FactorialDesign(2, domain).generate()
Y = fx(X)
# initializing a standard BO model, Gaussian Process Regression with
# Matern52 ARD Kernel
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
alpha = ExpectedImprovement(model)
# Now we must specify an optimization algorithm to optimize the acquisition
# function, each iteration.
acqopt = SciPyOptimizer(domain)
# Now create the Bayesian Optimizer
optimizer = BayesianOptimizer(domain, alpha, optimizer=acqopt)
with optimizer.silent():
r = optimizer.optimize(fx, n_iter=15)
print(r)
```

```
fun: array([ 0.17684308])
message: 'OK'
nfev: 15
success: True
x: array([[ 0. , 0.42052714]])
```

This brief snippet code starts from a 2-level grid (corner points of the
domain) and uses a GP model to model the response surface of the
objective function. The `BayesianOptimizer`

follows the same interface
as other optimizers and is initialized with a domain, the acquisition
function and an additional optimization method to optimize the
acquisition function each iteration. Finally, the optimizer performs 10
iterations to optimize fx.

The code to evaluate the acquisition function on the model is written in TensorFlow, allowing gradient-based optimization without additional effort due to the automated differentation.

### Step-by-step description of the BayesianOptimizer¶

Prior to running `BayesianOptimizer.optimize()`

, the acquisition
function is initialized with an underlying model. Any data previously
included in the model (through the `GPModel.__init__`

constructor in
GPflow) is used as initial data. When `optimize(function, n_iter)`

is
called:

- Any data points returned by
`get_initial()`

are evaluated. Afterwards the evaluated points are added to the models by calling`_update_model_data()`

. `n_iter`

iterations are performed. Each iteration the acquisition function is optimized, and the models are updated by calling`_update_model_data()`

The updating of a model through `_update_model_data()`

calls
`set_data(X, Y)`

on the acquisition function. This covers following
aspects:

`GPModel.X`

and`GPModel.Y`

are updated- Each of the contained models are returned to the state when the
acquisition function was initialized and optimized. If the
`optimize_restarts`

parameter of the`Acquisition.__init__()`

was set to \(n>1\), the state of the model is randomized and optimized \(n-1\) times. Finally, the state resulting in the best`log_likelihood()`

is the new model state - Call
`Acquisition.setup()`

to perform any pre-calculation of quantities independent of candidate points, which can be used in`build_acquisition()`

.

## The GPflow tree¶

The `Acquisition`

interface, mapping the belief of the model(s) to a
score indicating areas of optimality/feasibility, is implemented as part
of the GPflow tree
structure.
More specifically it implements the `Parameterized`

interface
permitting the use of the useful `AutoFlow`

decorator. The
`build_acquisition()`

method to be implemented by subclasses is a
TensorFlow method, allowing automated differentiation of the acquisition
function which enables gradient-based optimization thereof (not of the
objective!). It may directly access the graph for computing the
predictive distribution of a model by calling `build_predict()`

.