GPflowOpt Documentation¶
Introduction¶
GPflowOpt is a library for Bayesian Optimization with GPflow. It makes use of TensorFlow for computation of acquisition functions, to offer scalability, and avoid implementation of gradients. The package was created, and is currently maintained by Joachim van der Herten and Ivo Couckuyt
The project is open source: if you feel you have some relevant skills and are interested in contributing then please contact us on GitHub by opening an issue or pull request.
Install¶
- Install package
A straightforward way to install GPflowOpt is to clone its repository and run
pip install . --process-dependency-links
in the root folder. This also installs required dependencies including TensorFlow. For alternative TensorFlow installations (e.g., gpu), please see the instructions on the main TensorFlow webpage.
- Development
GPflowOpt is a pure python library so you could just add it to your python path. We use
pip install -e . --process-dependency-links
- Testing and documentation
The tests require some additional dependencies that need to be installed first with
pip install -e .[test]
. Afterwards the tests can be run with python setup.py test
.
Similarly, to build the documentation, first install the extra dependencies with
pip install -e .[docs]
. Then proceed with python setup.py build_sphinx
.
Getting started¶
A simple example of Bayesian optimization to get up and running is provided by the first steps into Bayesian optimization notebook
For more advanced use cases have a look at the other tutorial notebooks and the API and architecture.
Acknowledgements¶
Joachim van der Herten and Ivo Couckuyt are Ghent University - imec postdoctoral fellows. Ivo Couckuyt is supported by FWO Vlaanderen.
Tutorials and examples¶
First steps into Bayesian optimization¶
Ivo Couckuyt, Joachim van der Herten
Introduction¶
Bayesian optimization is particularly useful for expensive optimization problems. This includes optimization problems where the objective (and constraints) are time-consuming to evaluate: measurements, engineering simulations, hyperparameter optimization of deep learning models, etc. Another area where Bayesian optimization may provide a benefit is in the presence of (a lot of) noise. If your problem does not satisfy these requirements other optimization algorithms might be better suited.
To setup a Bayesian optimization scheme with GPflowOpt you have to:
- define your objective and specify the optimization domain
- setup a GPflow model and choose an acquisition function
- create a BayesianOptimizer
Objective function¶
In [1]:
import numpy as np
from gpflowopt.domain import ContinuousParameter
def fx(X):
X = np.atleast_2d(X)
return np.sum(np.square(X), axis=1)[:, None]
domain = ContinuousParameter('x1', -2, 2) + ContinuousParameter('x2', -1, 2)
domain
Out[1]:
Name | Type | Values |
x1 | Continuous | [-2. 2.] |
x2 | Continuous | [-1. 2.] |
Bayesian optimizer¶
In [2]:
import gpflow
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import LatinHyperCube
from gpflowopt.acquisition import ExpectedImprovement
from gpflowopt.optim import SciPyOptimizer
# Use standard Gaussian process Regression
lhd = LatinHyperCube(21, domain)
X = lhd.generate()
Y = fx(X)
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
model.kern.lengthscales.transform = gpflow.transforms.Log1pe(1e-3)
# Now create the Bayesian Optimizer
alpha = ExpectedImprovement(model)
optimizer = BayesianOptimizer(domain, alpha)
# Run the Bayesian optimization
with optimizer.silent():
r = optimizer.optimize(fx, n_iter=15)
print(r)
Warning: optimization restart 1/5 failed
Warning: optimization restart 2/5 failed
Warning: optimization restart 3/5 failed
Warning: optimization restart 2/5 failed
fun: array([ 0.01])
message: 'OK'
nfev: 15
success: True
x: array([[ 0. , -0.1]])
That’s all! Your objective function has now been optimized for 15 iterations.
Bayesian Optimization with black-box constraints¶
Joachim van der Herten
Introduction¶
This notebook demonstrates the optimization of an analytical function using the well known Expected Improvement (EI) function. The problem is constrained by a black-box constraint function. The feasible regions are learnt jointly with the optimal regions by considering a second acquisition function known as the Probability of Feasibility (PoF), following the approach of Gardner et al. (2014)
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import gpflow
import gpflowopt
import numpy as np
Constrained problem¶
First we set up an objective function (the townsend function) and a constraint function. We further assume both functions are black-box. We also define the optimization domain (2 continuous parameters).
In [2]:
# Objective & constraint
def townsend(X):
return -(np.cos((X[:,0]-0.1)*X[:,1])**2 + X[:,0] * np.sin(3*X[:,0]+X[:,1]))[:,None]
def constraint(X):
return -(-np.cos(1.5*X[:,0]+np.pi)*np.cos(1.5*X[:,1])+np.sin(1.5*X[:,0]+np.pi)*np.sin(1.5*X[:,1]))[:,None]
# Setup input domain
domain = gpflowopt.domain.ContinuousParameter('x1', -2.25, 2.5) + \
gpflowopt.domain.ContinuousParameter('x2', -2.5, 1.75)
# Plot
def plotfx():
X = gpflowopt.design.FactorialDesign(101, domain).generate()
Zo = townsend(X)
Zc = constraint(X)
mask = Zc>=0
Zc[mask] = np.nan
Zc[np.logical_not(mask)] = 1
Z = Zo * Zc
shape = (101, 101)
f, axes = plt.subplots(1, 1, figsize=(7, 5))
axes.contourf(X[:,0].reshape(shape), X[:,1].reshape(shape), Z.reshape(shape))
axes.set_xlabel('x1')
axes.set_ylabel('x2')
axes.set_xlim([domain.lower[0], domain.upper[0]])
axes.set_ylim([domain.lower[1], domain.upper[1]])
return axes
plotfx();

Modeling and joint acquisition function¶
We proceed by assigning the objective and constraint function a GP prior. Both functions are evaluated on a space-filling set of points (here, a Latin Hypercube design). Two GPR models are created. The EI is based on the model of the objective function (townsend), whereas PoF is based on the model of the constraint function. We then define the joint criterioin as the product of the EI and PoF.
In [3]:
# Initial evaluations
design = gpflowopt.design.LatinHyperCube(11, domain)
X = design.generate()
Yo = townsend(X)
Yc = constraint(X)
# Models
objective_model = gpflow.gpr.GPR(X, Yo, gpflow.kernels.Matern52(2, ARD=True))
objective_model.likelihood.variance = 0.01
constraint_model = gpflow.gpr.GPR(np.copy(X), Yc, gpflow.kernels.Matern52(2, ARD=True))
constraint_model.kern.lengthscales.transform = gpflow.transforms.Log1pe(1e-3)
constraint_model.likelihood.variance = 0.01
constraint_model.likelihood.variance.prior = gpflow.priors.Gamma(1./4.,1.0)
# Setup
ei = gpflowopt.acquisition.ExpectedImprovement(objective_model)
pof = gpflowopt.acquisition.ProbabilityOfFeasibility(constraint_model)
joint = ei * pof
Initial belief¶
We can now inspect our belief about the optimization problem by plotting the models, the EI, PoF and joint mappings. Both models clearly are not very accurate yet. More specifically, the constraint model does not correctly capture the feasibility yet.
In [4]:
def plot():
Xeval = gpflowopt.design.FactorialDesign(101, domain).generate()
Yevala,_ = joint.operands[0].models[0].predict_f(Xeval)
Yevalb,_ = joint.operands[1].models[0].predict_f(Xeval)
Yevalc = np.maximum(ei.evaluate(Xeval), 0)
Yevald = pof.evaluate(Xeval)
Yevale = np.maximum(joint.evaluate(Xeval), 0)
shape = (101, 101)
plots = [('Objective model', Yevala), ('Constraint model', Yevalb),
('EI', Yevalc), ('PoF', Yevald),
('EI * PoF', Yevale)]
plt.figure(figsize=(10,10))
for i, plot in enumerate(plots):
if i == 4:
ax = plt.subplot2grid((3, 4), (2, 1), colspan=2)
else:
ax = plt.subplot2grid((3, 2), (int(i/2), i % 2))
ax.contourf(Xeval[:,0].reshape(shape), Xeval[:,1].reshape(shape), plot[1].reshape(shape))
ax.scatter(joint.data[0][:,0], joint.data[0][:,1], c='w')
ax.set_title(plot[0])
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_xlim([domain.lower[0], domain.upper[0]])
ax.set_ylim([domain.lower[1], domain.upper[1]])
plt.tight_layout()
# Plot representing the model belief, and the belief mapped to EI and PoF
plot()
print(constraint_model)
name.kern.lengthscales transform:+ve prior:None
[ 0.15481272 0.14080552]
name.kern.variance transform:+ve prior:None
[ 0.63087138]
name.likelihood.variance transform:+ve prior:Ga([ 0.25],[ 1.])
[ 0.16824172]

Running Bayesian Optimizer¶
Running the Bayesian optimization is the next step. For this, we must set up an appropriate strategy to optimize the joint acquisition function. Sometimes this can be a bit challenging as often large non-varying areas may occur. A typical strategy is to apply a Monte Carlo optimization step first, then optimize the point with the best value (several variations exist). This approach is followed here. We then run the Bayesian Optimization and allow it to select up to 50 additional decisions.
The joint acquisition function assures the feasibility (w.r.t the constraint) is taken into account while selecting decisions for optimality.
In [5]:
# First setup the optimization strategy for the acquisition function
# Combining MC step followed by L-BFGS-B
acquisition_opt = gpflowopt.optim.StagedOptimizer([gpflowopt.optim.MCOptimizer(domain, 200),
gpflowopt.optim.SciPyOptimizer(domain)])
# Then run the BayesianOptimizer for 50 iterations
optimizer = gpflowopt.BayesianOptimizer(domain, joint, optimizer=acquisition_opt)
with optimizer.silent():
result = optimizer.optimize([townsend, constraint], n_iter=50)
print(result)
fun: array([[-3.19946868]])
message: 'OK'
nfev: 50
success: True
x: array([[-2.25 , -1.29648192]])
Results¶
If we now plot the belief, we clearly see the constraint model has improved significantly. More specifically, its PoF mapping is an accurate representation of the true constraint function. By multiplying the EI by the PoF, the search is restricted to the feasible regions.
In [6]:
# Plotting belief again
print(constraint_model)
plot()
name.kern.lengthscales transform:+ve prior:None
[ 0.46705724 0.49762857]
name.kern.variance transform:+ve prior:None
[ 9.14493985]
name.likelihood.variance transform:+ve prior:Ga([ 0.25],[ 1.])
[ 3.10492691e-06]

If we inspect the sampling distribution, we can see that the amount of samples in the infeasible regions is limited. The optimization has focussed on the feasible areas. In addition, it has been active mostly in two optimal regions.
In [7]:
# Plot function, overlayed by the constraint. Also plot the samples
axes = plotfx()
valid = joint.feasible_data_index()
axes.scatter(joint.data[0][valid,0], joint.data[0][valid,1], label='feasible data', c='w')
axes.scatter(joint.data[0][np.logical_not(valid),0], joint.data[0][np.logical_not(valid),1], label='data', c='r');
axes.legend()
Out[7]:
<matplotlib.legend.Legend at 0x7fd40a3a9e48>

Finally, the evolution of the best value over the number of iterations clearly shows a very good solution is already found after only a few evaluations.
In [8]:
f, axes = plt.subplots(1, 1, figsize=(7, 5))
f = joint.data[1][:,0]
f[joint.data[1][:,1] > 0] = np.inf
axes.plot(np.arange(0, joint.data[0].shape[0]), np.minimum.accumulate(f))
axes.set_ylabel('fmin')
axes.set_xlabel('Number of evaluated points');

Defining new acquisition functions¶
Joachim van der Herten
Introduction¶
GPflowOpt implements supports some acquisition functions for common
scenarios, such as EI and PoF. However, it is straightforward to
implement your own strategy. For most strategies, it is sufficient to
implement the Acquisition
interface. In case a more sophisticated
model is needed, this can easily be achieved with GPflow.
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import copy
import gpflow
import gpflowopt
import tensorflow as tf
In [2]:
rng = np.random.RandomState(5)
def camelback(X):
f = (4. - 2.1*X[:,0]**2 + 0.3* X[:,0]**4) * X[:,0]**2 + np.prod(X,axis=1) + 4 * (X[:,1]**2-1) * X[:,1]**2
return f[:,None] + rng.rand(X.shape[0], 1) * 0.25
# Setup input domain
domain = gpflowopt.domain.ContinuousParameter('x1', -3, 3) + \
gpflowopt.domain.ContinuousParameter('x2', -2, 2)
Augmented expected improvement¶
As an example on how to implement a custom acquisition function, we illustrate the Augmented EI (Huang et al. 2006), a modification for Expected Improvement for optimization of noisy functions. It is defined as
This definition can be interpreted as rescaling of the EI score, with respect to the noise variance. For \(\sigma^2_n=0\), the rescale term equals 1 and normal EI is recovered. For \(\sigma^2_n > 0\), small prediction variances are punished, which decreases concentration of sampling and enhances exploration.
To implement this acquisition function, we could override the
build_acquisition
method of Acquisition
, however in this case
its easier to override ExpectedImprovement
and obtain the EI score
through a call to build_acquisition of the parent class.
In [3]:
class AugmentedEI(gpflowopt.acquisition.ExpectedImprovement):
def __init__(self, model):
super(AugmentedEI, self).__init__(model)
def build_acquisition(self, Xcand):
ei = super(AugmentedEI, self).build_acquisition(Xcand)
X = self.models[0].input_transform.build_forward(Xcand)
_, pvar = self.models[0].wrapped.build_predict(X)
return tf.multiply(ei, 1 - tf.sqrt(self.models[0].likelihood.variance)
/ (tf.sqrt(pvar + self.models[0].likelihood.variance)))
Results¶
This small experiment on the six hump camelback illustrates impact of the penalty term.
In [4]:
design = gpflowopt.design.LatinHyperCube(9, domain)
X = design.generate()
Y = camelback(X)
m = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True, lengthscales=[10,10], variance=10000))
m.likelihood.variance = 1
m.likelihood.variance.fixed = True
ei = gpflowopt.acquisition.ExpectedImprovement(m)
m = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True, lengthscales=[10,10], variance=10000))
m.likelihood.variance = 1
m.likelihood.variance.fixed = False
aei = AugmentedEI(m)
opt = gpflowopt.optim.StagedOptimizer([gpflowopt.optim.MCOptimizer(domain, 200),
gpflowopt.optim.SciPyOptimizer(domain)])
bopt1 = gpflowopt.BayesianOptimizer(domain, ei, optimizer=opt)
with bopt1.silent():
bopt1.optimize(camelback, n_iter=50)
bopt2 = gpflowopt.BayesianOptimizer(domain, aei, optimizer=opt)
with bopt2.silent():
bopt2.optimize(camelback, n_iter=50)
f, axes = plt.subplots(1,2, figsize=(14,7))
Xeval = gpflowopt.design.FactorialDesign(101, domain).generate()
Yeval = camelback(Xeval)
titles = ['EI', 'AEI']
shape = (101, 101)
for ax, t, acq in zip(axes, titles, [ei, aei]):
pred = acq.models[0].predict_f(Xeval)[0]
ax.contourf(Xeval[:,0].reshape(shape), Xeval[:,1].reshape(shape),
pred.reshape(shape))
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_title(t)
ax.scatter(acq.data[0][:,0], acq.data[0][:,1], c='w')

Bayesian Optimization of Hyperparameters¶
Vincent Dutordoir, Joachim van der Herten
Introduction¶
The paper Practical Bayesian Optimization of Machine Learning algorithms by Snoek et al. 2012 describes the use of Bayesian optimization for hyperparameter optimization. In this paper, the (at the time) state-of-the-art test error for convolutional neural networks on the CIFAR-10 dataset was improved significantly by optimizing the parameters of the training process with Bayesian optimization.
In this notebook we demonstrate the principle by optimizing the starting point of the maximum likelihood estimation of a GP. Note that we use a GP to optimize the initial hyperparameter values of another GP.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
# Loading airline data
import numpy as np
data = np.load('airline.npz')
X_train, Y_train = data['X_train'], data['Y_train']
D = Y_train.shape[1];
Data set¶
The data to illustrate hyperparameter optimization is the well-known airline passenger volume data. It is a one-dimensional time series of the passenger volumes of airlines over time. We wish to use it to make forecasts. Plotting the data below, it is clear that the data contains a pattern.
In [2]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel('Time (years)')
ax.set_ylabel('Airline passengers ($10^3$)')
ax.plot(X_train.flatten(),Y_train.flatten(), c='b')
ax.set_xticklabels([1949, 1952, 1955, 1958, 1961, 1964])
plt.tight_layout()

Modeling¶
To forecast this timeseries, we will pick up its pattern using a Spectral Mixture kernel (Wilson et al, 2013). Essentially, this kernel is a sum of \(Q\) products of Cosine and RBF kernels. For this one-dimensional problem each term of the sum has 4 hyperparameters. To account for the upward trend, we also add a Linear and a Bias kernel.
In [3]:
from gpflow.kernels import RBF, Cosine, Linear, Bias, Matern52
from gpflow import transforms
from gpflow.gpr import GPR
Q = 10 # nr of terms in the sum
max_iters = 1000
# Trains a model with a spectral mixture kernel, given an ndarray of 2Q frequencies and lengthscales
def create_model(hypers):
f = np.clip(hypers[:Q], 0, 5)
weights = np.ones(Q) / Q
lengths = hypers[Q:]
kterms = []
for i in range(Q):
rbf = RBF(D, lengthscales=lengths[i], variance=1./Q)
rbf.lengthscales.transform = transforms.Exp()
cos = Cosine(D, lengthscales=f[i])
kterms.append(rbf * cos)
k = np.sum(kterms) + Linear(D) + Bias(D)
m = GPR(X_train, Y_train, kern=k)
return m
X_test, X_complete = data['X_test'], data['X_complete']
def plotprediction(m):
# Perform prediction
mu, var = m.predict_f(X_complete)
# Plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel('Time (years)')
ax.set_ylabel('Airline passengers ($10^3$)')
ax.set_xticklabels([1949, 1952, 1955, 1958, 1961, 1964, 1967, 1970, 1973])
ax.plot(X_train.flatten(),Y_train.flatten(), c='b')
ax.plot(X_complete.flatten(), mu.flatten(), c='g')
lower = mu - 2*np.sqrt(var)
upper = mu + 2*np.sqrt(var)
ax.plot(X_complete, upper, 'g--', X_complete, lower, 'g--', lw=1.2)
ax.fill_between(X_complete.flatten(), lower.flatten(), upper.flatten(),
color='g', alpha=.1)
plt.tight_layout()
m = create_model(np.ones((2*Q,)))
m.optimize(maxiter=max_iters)
plotprediction(m)

In total, a lot of hyperparameters must be optimized. Furthermore, the optimization surface of the spectral mixture is highly multimodal. Starting from the default hyperparameter values the optimized GP is able to pick up the linear trend, and the RBF kernels perform local interpolation. However, the kernel is not able to extrapolate away from the data. In sum, with this starting point, the likelihood optimization ends in a local minimum.
Hyperparameter optimization¶
This issue is a known problem of the spectram mixture kernel, and several recommendations exist on how to improve the starting point. Here, we will use GPflowOpt to optimize the initial values for the lengthscales of the RBF and the Cosine kernel (i.e., the frequencies of the latter kernel). The other hyperparameters (rbf and cosine variances, likelihood variances and the linear and bias terms) are kept at their defaults and will be optimized by the standard likelihood optimization.
First, we setup the objective function accepting proposed starting points. The objective function returns the negative log likelihood, obtained by optimizing the hyperparameters from the given starting point. Then, we setup the 20D optimization domain for the frequencies and lengthscales.
In [4]:
from gpflowopt.domain import ContinuousParameter
from gpflowopt.objective import batch_apply
# Objective function for our optimization
# Input: N x 2Q ndarray, output: N x 1.
# returns the negative log likelihood obtained by training with given frequencies and rbf lengthscales
# Applies some tricks for stability similar to GPy's jitchol
@batch_apply
def objectivefx(freq):
m = create_model(freq)
for i in [0] + [10**exponent for exponent in range(6,1,-1)]:
try:
mean_diag = np.mean(np.diag(m.kern.compute_K_symm(X_train)))
m.likelihood.variance = 1 + mean_diag * i
m.optimize(maxiter=max_iters)
return -m.compute_log_likelihood()
except:
pass
raise RuntimeError("Frequency combination failed indefinately.")
# Setting up optimization domain.
lower = [0.]*Q
upper = [5.]*int(Q)
df = np.sum([ContinuousParameter('freq{0}'.format(i), l, u) for i, l, u in zip(range(Q), lower, upper)])
lower = [1e-5]*Q
upper = [300]*int(Q)
dl = np.sum([ContinuousParameter('l{0}'.format(i), l, u) for i, l, u in zip(range(Q), lower, upper)])
domain = df + dl
domain
Out[4]:
Name | Type | Values |
freq0 | Continuous | [ 0. 5.] |
freq1 | Continuous | [ 0. 5.] |
freq2 | Continuous | [ 0. 5.] |
freq3 | Continuous | [ 0. 5.] |
freq4 | Continuous | [ 0. 5.] |
freq5 | Continuous | [ 0. 5.] |
freq6 | Continuous | [ 0. 5.] |
freq7 | Continuous | [ 0. 5.] |
freq8 | Continuous | [ 0. 5.] |
freq9 | Continuous | [ 0. 5.] |
l0 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l1 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l2 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l3 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l4 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l5 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l6 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l7 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l8 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
l9 | Continuous | [ 1.00000000e-05 3.00000000e+02] |
High-dimensional Bayesian optimization is tricky, although the complexity of the problem is significantly reduced due to symmetry in the optimization domain (interchanging frequencies does not make a difference) and because we still optimize the likelihood given the starting point. Therefore, getting near a mode is sufficient. Furthermore we disable ARD of the kernel of the model approximating the objective function to avoid optimizing a lot of lengthscales with little data. We then use EI to pick new candidate starting points and evaluate our objective.
In [5]:
from gpflowopt.design import LatinHyperCube
from gpflowopt.acquisition import ExpectedImprovement
from gpflowopt import optim, BayesianOptimizer
design = LatinHyperCube(6, domain)
X = design.generate()
Y = objectivefx(X)
m = GPR(X, Y, kern=Matern52(domain.size, ARD=False))
ei = ExpectedImprovement(m)
opt = optim.StagedOptimizer([optim.MCOptimizer(domain, 5000), optim.SciPyOptimizer(domain)])
optimizer = BayesianOptimizer(domain, ei, optimizer=opt)
with optimizer.silent():
result = optimizer.optimize(objectivefx, n_iter=24)
Warning: inf or nan in gradient: replacing with zeros
Warning: inf or nan in gradient: replacing with zeros
Warning: inf or nan in gradient: replacing with zeros
Warning: inf or nan in gradient: replacing with zeros
In [6]:
m = create_model(result.x[0,:])
m.optimize()
plotprediction(m)

Clearly, the optimization point identified with BO is a lot better than the default values. We now obtain a proper forecasting. By inspecting the evolution of the best likelihood value obtained so far, we see the solution is identified quickly.
In [7]:
f, axes = plt.subplots(1, 1, figsize=(7, 5))
f = ei.data[1][:,0]
axes.plot(np.arange(0, ei.data[0].shape[0]), np.minimum.accumulate(f))
axes.set_ylabel('fmin')
axes.set_xlabel('Number of evaluated points');

Bayesian Multiobjective Optimization¶
Joachim van der Herten, Ivo Couckuyt
Introduction¶
This notebook demonstrates the multiobjective optimization of an analytical function using the hypervolume-based probability of improvement function.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import gpflow
import gpflowopt
import numpy as np
We setup the Veldhuizen and Lamont multiobjective optimization problem 2 (vlmop2). The objectives of vlmop2 are very easy to model. Ideal for illustrating Bayesian multiobjective optimization.
In [2]:
# Objective
def vlmop2(x):
transl = 1 / np.sqrt(2)
part1 = (x[:, [0]] - transl) ** 2 + (x[:, [1]] - transl) ** 2
part2 = (x[:, [0]] + transl) ** 2 + (x[:, [1]] + transl) ** 2
y1 = 1 - np.exp(-1 * part1)
y2 = 1 - np.exp(-1 * part2)
return np.hstack((y1, y2))
# Setup input domain
domain = gpflowopt.domain.ContinuousParameter('x1', -2, 2) + \
gpflowopt.domain.ContinuousParameter('x2', -2, 2)
# Plot
def plotfx():
X = gpflowopt.design.FactorialDesign(101, domain).generate()
Z = vlmop2(X)
shape = (101, 101)
axes = []
plt.figure(figsize=(15, 5))
for i in range(Z.shape[1]):
axes = axes + [plt.subplot2grid((1, 2), (0, i))]
axes[-1].contourf(X[:,0].reshape(shape), X[:,1].reshape(shape), Z[:,i].reshape(shape))
axes[-1].set_title('Objective {}'.format(i+1))
axes[-1].set_xlabel('x1')
axes[-1].set_ylabel('x2')
axes[-1].set_xlim([domain.lower[0], domain.upper[0]])
axes[-1].set_ylim([domain.lower[1], domain.upper[1]])
return axes
plotfx();

Multiobjective acquisition function¶
We can model the belief of each objective by one GP prior or model each objective separately using a GP prior. We illustrate the latter approach here. A set of data points arranged in a Latin Hypercube is evaluated on the vlmop2 function.
In multiobjective optimization the definition of improvement is ambigious. Here we define improvement using the contributing hypervolume which will determine the focus on density and uniformity of the Pareto set during sampling. For instance, due to the nature of the contributing hypervolume steep slopes in the Pareto front will be sampled less densely. The hypervolume-based probability of improvement is based on the model(s) of the objective functions (vlmop2) and aggregates all the information in one cost function which is a balance between:
- improving our belief of the objectives (high uncertainty)
- favoring points improving the Pareto set (large contributing hypervolume with a likely higher uncertainty)
- focussing on augmenting the Pareto set (small contributing hypervolume but with low uncertainty).
In [3]:
# Initial evaluations
design = gpflowopt.design.LatinHyperCube(11, domain)
X = design.generate()
Y = vlmop2(X)
# One model for each objective
objective_models = [gpflow.gpr.GPR(X.copy(), Y[:,[i]].copy(), gpflow.kernels.Matern52(2, ARD=True)) for i in range(Y.shape[1])]
for model in objective_models:
model.likelihood.variance = 0.01
hvpoi = gpflowopt.acquisition.HVProbabilityOfImprovement(objective_models)
Running the Bayesian optimizer¶
The optimization surface of multiobjective acquisition functions can be even more challenging than, e.g., standard expected improvement. Hence, a hybrid optimization scheme is preferred: a Monte Carlo optimization step first, then optimize the point with the best value.
We then run the Bayesian Optimization and allow it to select up to 20 additional decisions.
In [4]:
# First setup the optimization strategy for the acquisition function
# Combining MC step followed by L-BFGS-B
acquisition_opt = gpflowopt.optim.StagedOptimizer([gpflowopt.optim.MCOptimizer(domain, 1000),
gpflowopt.optim.SciPyOptimizer(domain)])
# Then run the BayesianOptimizer for 20 iterations
optimizer = gpflowopt.BayesianOptimizer(domain, hvpoi, optimizer=acquisition_opt)
with optimizer.silent():
result = optimizer.optimize([vlmop2], n_iter=20)
print(result)
print(optimizer.acquisition.pareto.front.value)
fun: array([[ 0.88762951, 0.29191139],
[ 0.68279472, 0.59802318],
[ 0.42659568, 0.79430421],
[ 0.07354128, 0.94891563],
[ 0.82764037, 0.37153952],
[ 0.21803674, 0.89654802],
[ 0.5652733 , 0.69464852],
[ 0.00923563, 0.9738539 ],
[ 0.9334239 , 0.12286183],
[ 0.71007284, 0.54565196],
[ 0.76405389, 0.47382314],
[ 0.34539271, 0.83984545],
[ 0.48687039, 0.75355875],
[ 0.63004337, 0.63471413],
[ 0.98267516, 0.00178449],
[ 0.91892508, 0.16232397],
[ 0.85634937, 0.30878024],
[ 0.29492334, 0.86269861],
[ 0.89223755, 0.22705924],
[ 0.79104171, 0.43108073]])
message: 'OK'
nfev: 20
success: True
x: array([[-0.16140283, -0.48940691],
[ 0.0727557 , -0.15649316],
[ 0.22042083, 0.14203954],
[ 0.52761769, 0.49694149],
[-0.17070714, -0.28668133],
[ 0.38576529, 0.32936723],
[ 0.09298785, 0.03190632],
[ 0.66424394, 0.62084368],
[-0.40696469, -0.50460869],
[-0.10057392, -0.05825421],
[-0.10349761, -0.18006422],
[ 0.2923697 , 0.20539677],
[ 0.14213026, 0.11716751],
[-0.01662408, 0.02111616],
[-0.74495067, -0.68829395],
[-0.36858442, -0.45705387],
[-0.26088788, -0.29457053],
[ 0.27909045, 0.29936819],
[-0.3425574 , -0.35403933],
[-0.20960284, -0.14451762]])
[[ 0.00924157 0.97385548]
[ 0.07353906 0.9489149 ]
[ 0.21803833 0.89654648]
[ 0.29492503 0.86270261]
[ 0.34538813 0.83984479]
[ 0.42659985 0.79430552]
[ 0.48686827 0.75354473]
[ 0.56527447 0.69466632]
[ 0.63003846 0.63471877]
[ 0.68279543 0.59801444]
[ 0.71007725 0.54563413]
[ 0.76405382 0.47382832]
[ 0.82763436 0.37155876]
[ 0.8563462 0.30880294]
[ 0.88763433 0.29189238]
[ 0.89225033 0.2270049 ]
[ 0.91890866 0.16234889]
[ 0.93342994 0.12287979]
[ 0.98267631 0.00178968]]
For multiple objectives the returned OptimizeResult
object contains
the identified Pareto set instead of just a single optimum. Note that
this is computed on the raw data Y.
The hypervolume-based probability of improvement operates on the Pareto
set derived from the model predictions of the training data (to handle
noise). This latter Pareto set can be found as
optimizer.acquisition.pareto.front.value
.
Plotting the results¶
Lets plot the belief of the final models and acquisition function.
In [5]:
def plot():
grid_size = 51 # 101
shape = (grid_size, grid_size)
Xeval = gpflowopt.design.FactorialDesign(grid_size, domain).generate()
Yeval_1, _ = hvpoi.models[0].predict_f(Xeval)
Yeval_2, _ = hvpoi.models[1].predict_f(Xeval)
Yevalc = hvpoi.evaluate(Xeval)
plots = [((0,0), 1, 1, 'Objective 1 model', Yeval_1[:,0]),
((0,1), 1, 1, 'Objective 2 model', Yeval_2[:,0]),
((1,0), 2, 2, 'hypervolume-based PoI', Yevalc)]
plt.figure(figsize=(7,7))
for i, (plot_pos, plot_rowspan, plot_colspan, plot_title, plot_data) in enumerate(plots):
data = hvpoi.data[0]
ax = plt.subplot2grid((3, 2), plot_pos, rowspan=plot_rowspan, colspan=plot_colspan)
ax.contourf(Xeval[:,0].reshape(shape), Xeval[:,1].reshape(shape), plot_data.reshape(shape))
ax.scatter(data[:,0], data[:,1], c='w')
ax.set_title(plot_title)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_xlim([domain.lower[0], domain.upper[0]])
ax.set_ylim([domain.lower[1], domain.upper[1]])
plt.tight_layout()
# Plot representing the model belief, and the belief mapped to EI and PoF
plot()
for model in objective_models:
print(model)
name.kern.lengthscales transform:+ve prior:None
[ 0.58170479 0.57493089]
name.kern.variance transform:+ve prior:None
[ 4.39140011]
name.likelihood.variance transform:+ve prior:None
[ 1.00000000e-06]
name.kern.lengthscales transform:+ve prior:None
[ 0.60083304 0.58044734]
name.kern.variance transform:+ve prior:None
[ 3.35582088]
name.likelihood.variance transform:+ve prior:None
[ 1.00000007e-06]

Finally, we can extract and plot the Pareto front ourselves using the
pareto.non_dominated_sort
function on the final data matrix Y.
The non-dominated sort returns the Pareto set (non-dominated solutions) as well as a dominance vector holding the number of dominated points for each point in Y. For example, we could only select the points with dom == 2, or dom == 0 (the latter retrieves the non-dominated solutions). Here we choose to use the dominance vector to color the points.
In [6]:
# plot pareto front
plt.figure(figsize=(9, 4))
R = np.array([1.5, 1.5])
print('R:', R)
hv = hvpoi.pareto.hypervolume(R)
print('Hypervolume indicator:', hv)
plt.figure(figsize=(7, 7))
pf, dom = gpflowopt.pareto.non_dominated_sort(hvpoi.data[1])
plt.scatter(hvpoi.data[1][:,0], hvpoi.data[1][:,1], c=dom)
plt.title('Pareto set')
plt.xlabel('Objective 1')
plt.ylabel('Objective 2')
R: [ 1.5 1.5]
Hypervolume indicator: 1.55890890103
Out[6]:
<matplotlib.text.Text at 0x7f42a9bd1908>
<matplotlib.figure.Figure at 0x7f4324517860>

API and architecture¶
The structure of GPflowOpt¶
Joachim van der Herten
In this document, the structure of the GPflowOpt library is explained,
including some small examples. First the Domain
and Optimizer
interfaces are shortly illustrated, followed by a description of the
BayesianOptimizer
. At the end, a step-by-step walkthrough of the
BayesianOptimizer
is given.
Optimization¶
The underlying design principles of GPflowOpt were chosen to address the following task: optimizing problems of the form
The objective function \(f: \mathcal{X} \rightarrow \mathbb{R}^p\) maps a candidate optimum to a score (or multiple). Here \(\mathcal{X}\) represents the input domain. This domain encloses all candidate solutions to the optimization problem and can be entirely continuous (i.e., a \(d\)-dimensional hypercube) but may also consist of discrete and categorical parameters.
In GPflowOpt, the Domain
and Optimizer
interfaces and
corresponding subclasses are used to explicitly represent the
optimization problem.
Objective function¶
The objective function itself is provided and must be implemented as any python callable (function or object with implemented call operator), accepting a two dimensional numpy array with shape \((n, d)\) as an input, with \(n\) the batch size. It returns a tuple of numpy arrays: the first element of the tuple has shape \((n, p)\) and returns the objective scores for each point to evaluate. The second element is the gradient in every point, shaped either \((n, d)\) if we have a single-objective optimization problem, or \((n, d, p)\) in case of a multi-objective function. If the objective function is passed on to a gradient-free optimization method, the gradient is automatically discarded. GPflowOpt provides decorators which handle batch application of a function along the n points of the input matrix, or dealing with functions which accept each feature as function argument.
Here, we define a simple quadratic objective function:
In [5]:
import numpy as np
def fx(X):
X = np.atleast_2d(X)
# Return objective & gradient
return np.sum(np.square(X), axis=1)[:,None], 2*X
Domain¶
Then, we represent \(\mathcal{X}\) by composing parameters. This is how a simple continuous square domain is defined:
In [6]:
from gpflowopt.domain import ContinuousParameter
domain = ContinuousParameter('x1', -2, 2) + ContinuousParameter('x2', -1, 2)
domain
Out[6]:
Name | Type | Values |
x1 | Continuous | [-2. 2.] |
x2 | Continuous | [-1. 2.] |
Optimize¶
Based on the domain and a valid objective function, we can now easily
apply one of the included optimizers to optimize objective functions.
GPflowOpt defines an intuitive Optimizer
interface which can be used
to specify the domain, the initial point(s), constraints (to be
implemented) etc. Some popular optimization approaches are provided.
Here is how our function is optimized using one of the available methods
of SciPy’s minimize:
In [7]:
from gpflowopt.optim import SciPyOptimizer
optimizer = SciPyOptimizer(domain, method='SLSQP')
optimizer.set_initial([-1,-1])
optimizer.optimize(fx)
Out[7]:
fun: 0.0
jac: array([ 0., 0.])
message: 'Optimization terminated successfully.'
nfev: 3
nit: 2
njev: 2
status: 0
success: True
x: array([[ 0., 0.]])
And here is how we optimize it Monte-Carlo. We can pass the same function as the gradients are automatically discarded.
In [8]:
from gpflowopt.optim import MCOptimizer
optimizer = MCOptimizer(domain, 200)
optimizer.optimize(fx)
Out[8]:
fun: array([[ 0.02115951]])
message: 'OK'
nfev: 200
success: True
x: array([[ 0.05731395, -0.13369599]])
Bayesian Optimization¶
In Bayesian Optimization (BO), the typical assumption is that \(f\) is expensive to evaluate and no gradients are available. The typical approach is to sequentially select a limited set of decisions \(\boldsymbol{x}_0, \boldsymbol{x}_1, ... \boldsymbol{x}_{n-1}\) using a sampling policy. Hence each decision \(\boldsymbol{x}_i \in \mathcal{X}\) itself is the result of an optimization problem
Each iteration, a function \(\alpha_i\) which is cheap-to-evaluate
acts as a surrogate for the expensive function. It is typically a
mapping of the predictive distribution of a (Bayesian) model built on
all decisions and their corresponding (noisy) evaluations. The mapping
introduces an order in \(\mathcal{X}\) to obtain a certain goal. The
typical goal within the context of BO is the search for optimality or
feasibility while keeping the amount of required evaluations
(\(n\)) a small number. As we can have several functions \(f\)
representing objectives and constraints, BO may invoke several models
and mappings \(\alpha\). These mappings are typically referred to as
acquisition functions (or infill criteria). GPflowOpt defines an
Acquisition
interface to implement these mappings and provides
implementations of some default choices. In combination with a special
Optimizer
implementation for BO, following steps are required for a
typical workflow:
- Define the problem domain. Its dimensionality matches the input to the objective and constraint functions. (like normal optimization)
- Specify the (GP) models for the constraints and objectives. This involves choice of kernels, priors, fixes, transforms... this step follows the standard way of setting up GPflow models. GPflowOpt does not further wrap models hence it is possible to implement custom models in GPflow and use them directly in GPflowOpt
- Set up the acquisition function(s) using the available built-in
implementations in GPflowOpt, or design your own by implementing the
Acquisition
interface. - Set up an optimizer for the acquisition function.
- Run the high-level
BayesianOptimizer
which implements a typical BO flow.BayesianOptimizer
in itself is compliant with theOptimizer
interface. Exceptionally, theBayesianOptimizer
requires that the objective function returns no gradients.
Alternatively, advanced users requiring finer control can easily implement their own flow based on the low-level interfaces of GPflowOpt, as the coupling between these objects was intentionally kept loose.
As illustration of the described flow, the previous example is optimized using Bayesian optimization instead, with the well-known Expected Improvement acquisition function:
In [9]:
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import FactorialDesign
from gpflowopt.acquisition import ExpectedImprovement
import gpflow
# The Bayesian Optimizer does not expect gradients to be returned
def fx(X):
X = np.atleast_2d(X)
# Return objective & gradient
return np.sum(np.square(X), axis=1)[:,None]
X = FactorialDesign(2, domain).generate()
Y = fx(X)
# initializing a standard BO model, Gaussian Process Regression with
# Matern52 ARD Kernel
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
alpha = ExpectedImprovement(model)
# Now we must specify an optimization algorithm to optimize the acquisition
# function, each iteration.
acqopt = SciPyOptimizer(domain)
# Now create the Bayesian Optimizer
optimizer = BayesianOptimizer(domain, alpha, optimizer=acqopt)
with optimizer.silent():
r = optimizer.optimize(fx, n_iter=15)
print(r)
fun: array([ 0.17684308])
message: 'OK'
nfev: 15
success: True
x: array([[ 0. , 0.42052714]])
This brief snippet code starts from a 2-level grid (corner points of the
domain) and uses a GP model to model the response surface of the
objective function. The BayesianOptimizer
follows the same interface
as other optimizers and is initialized with a domain, the acquisition
function and an additional optimization method to optimize the
acquisition function each iteration. Finally, the optimizer performs 10
iterations to optimize fx.
The code to evaluate the acquisition function on the model is written in TensorFlow, allowing gradient-based optimization without additional effort due to the automated differentation.
Step-by-step description of the BayesianOptimizer¶
Prior to running BayesianOptimizer.optimize()
, the acquisition
function is initialized with an underlying model. Any data previously
included in the model (through the GPModel.__init__
constructor in
GPflow) is used as initial data. When optimize(function, n_iter)
is
called:
- Any data points returned by
get_initial()
are evaluated. Afterwards the evaluated points are added to the models by calling_update_model_data()
. n_iter
iterations are performed. Each iteration the acquisition function is optimized, and the models are updated by calling_update_model_data()
The updating of a model through _update_model_data()
calls
set_data(X, Y)
on the acquisition function. This covers following
aspects:
GPModel.X
andGPModel.Y
are updated- Each of the contained models are returned to the state when the
acquisition function was initialized and optimized. If the
optimize_restarts
parameter of theAcquisition.__init__()
was set to \(n>1\), the state of the model is randomized and optimized \(n-1\) times. Finally, the state resulting in the bestlog_likelihood()
is the new model state - Call
Acquisition.setup()
to perform any pre-calculation of quantities independent of candidate points, which can be used inbuild_acquisition()
.
The GPflow tree¶
The Acquisition
interface, mapping the belief of the model(s) to a
score indicating areas of optimality/feasibility, is implemented as part
of the GPflow tree
structure.
More specifically it implements the Parameterized
interface
permitting the use of the useful AutoFlow
decorator. The
build_acquisition()
method to be implemented by subclasses is a
TensorFlow method, allowing automated differentiation of the acquisition
function which enables gradient-based optimization thereof (not of the
objective!). It may directly access the graph for computing the
predictive distribution of a model by calling build_predict()
.
Bayesian Optimizer¶
Acquisition functions¶
The GPflowOpt package currently supports a limited number of popular acquisition functions. These are summarized in the table below. Detailed description for each can be found below.
Method | Objective | Constraint | # Outputs |
---|---|---|---|
GPflowOpt.acquisition.ExpectedImprovement |
✔ | 1 | |
GPflowOpt.acquisition.ProbabilityOfFeasibility |
✔ | 1 | |
GPflowOpt.acquisition.ProbabilityOfImprovement |
✔ | 1 | |
GPflowOpt.acquisition.LowerConfidenceBound |
✔ | 1 | |
GPflowOpt.acquisition.HVProbabilityOfImprovement |
✔ | > 1 |