## Abstract

This thesis deals with an efficient approach for learning the optimal hyper-parameters for

Support Vector Machines (SVMs). The common method to determine hyper-parameters

is grid search. Grid search typically involves the definition of a discretized ”grid” of

possible parameter values with a certain resolution and a search for the values that

result in the minimal validation error of the learned model. A major limitation of grid

search is that the search space grows exponentially in the parameters which makes the

approach only practical for determining very few hyper-parameters. Additionally, grid

search operates on discrete parameter values which leads to suboptimal solutions. In

this thesis we develop an approach to use bi-level optimization for learning the optimal

hyper-parameters and solve both major shortcomings of grid search in an efficient and

elegant way. Bi-level learning is an optimization method where one optimization problem

has another optimization problem as its constraint. The goal of the bi-level program is to

find optimal hyper-parameters such that the validation error (the higher level objective)

is minimized, while the optimal training problem is solved for the underlying SVM (the

lower level objective). We use Lagrange multipliers to solve the bi-level problem and

formulate the solution for several variants of the SVM (linear, kernel, multiple kernel).

We can show that, using this method, the model selection problem (i.e. selection of

hyper-parameters) can be solved also for a large number of hyper-parameters. The bi-

level approach exploits the continuity of the hyper-parameters which allows for better

solutions than with grid search. In the experiments, we investigate different properties

of the bi-level approach and try to give insights into the advantages of this method. We

find that highly parametrized kernel SVMs perform best compared to simpler models

which is a clear advantage of bi-level optimization against grid search for model selection.

Support Vector Machines (SVMs). The common method to determine hyper-parameters

is grid search. Grid search typically involves the definition of a discretized ”grid” of

possible parameter values with a certain resolution and a search for the values that

result in the minimal validation error of the learned model. A major limitation of grid

search is that the search space grows exponentially in the parameters which makes the

approach only practical for determining very few hyper-parameters. Additionally, grid

search operates on discrete parameter values which leads to suboptimal solutions. In

this thesis we develop an approach to use bi-level optimization for learning the optimal

hyper-parameters and solve both major shortcomings of grid search in an efficient and

elegant way. Bi-level learning is an optimization method where one optimization problem

has another optimization problem as its constraint. The goal of the bi-level program is to

find optimal hyper-parameters such that the validation error (the higher level objective)

is minimized, while the optimal training problem is solved for the underlying SVM (the

lower level objective). We use Lagrange multipliers to solve the bi-level problem and

formulate the solution for several variants of the SVM (linear, kernel, multiple kernel).

We can show that, using this method, the model selection problem (i.e. selection of

hyper-parameters) can be solved also for a large number of hyper-parameters. The bi-

level approach exploits the continuity of the hyper-parameters which allows for better

solutions than with grid search. In the experiments, we investigate different properties

of the bi-level approach and try to give insights into the advantages of this method. We

find that highly parametrized kernel SVMs perform best compared to simpler models

which is a clear advantage of bi-level optimization against grid search for model selection.

Original language | English |
---|---|

Publication status | Published - 2014 |