### Abstract

Support Vector Machines (SVMs). The common method to determine hyper-parameters

is grid search. Grid search typically involves the definition of a discretized ”grid” of

possible parameter values with a certain resolution and a search for the values that

result in the minimal validation error of the learned model. A major limitation of grid

search is that the search space grows exponentially in the parameters which makes the

approach only practical for determining very few hyper-parameters. Additionally, grid

search operates on discrete parameter values which leads to suboptimal solutions. In

this thesis we develop an approach to use bi-level optimization for learning the optimal

hyper-parameters and solve both major shortcomings of grid search in an efficient and

elegant way. Bi-level learning is an optimization method where one optimization problem

has another optimization problem as its constraint. The goal of the bi-level program is to

find optimal hyper-parameters such that the validation error (the higher level objective)

is minimized, while the optimal training problem is solved for the underlying SVM (the

lower level objective). We use Lagrange multipliers to solve the bi-level problem and

formulate the solution for several variants of the SVM (linear, kernel, multiple kernel).

We can show that, using this method, the model selection problem (i.e. selection of

hyper-parameters) can be solved also for a large number of hyper-parameters. The bi-

level approach exploits the continuity of the hyper-parameters which allows for better

solutions than with grid search. In the experiments, we investigate different properties

of the bi-level approach and try to give insights into the advantages of this method. We

find that highly parametrized kernel SVMs perform best compared to simpler models

which is a clear advantage of bi-level optimization against grid search for model selection.

Original language | English |
---|---|

Publication status | Published - 2014 |

### Fingerprint

### Cite this

*Bi-level Optimization for Support Vector Machines*.

**Bi-level Optimization for Support Vector Machines.** / Klatzer, Teresa.

Research output: Thesis › Master's Thesis › Research

}

TY - THES

T1 - Bi-level Optimization for Support Vector Machines

AU - Klatzer, Teresa

PY - 2014

Y1 - 2014

N2 - This thesis deals with an efficient approach for learning the optimal hyper-parameters forSupport Vector Machines (SVMs). The common method to determine hyper-parametersis grid search. Grid search typically involves the definition of a discretized ”grid” ofpossible parameter values with a certain resolution and a search for the values thatresult in the minimal validation error of the learned model. A major limitation of gridsearch is that the search space grows exponentially in the parameters which makes theapproach only practical for determining very few hyper-parameters. Additionally, gridsearch operates on discrete parameter values which leads to suboptimal solutions. Inthis thesis we develop an approach to use bi-level optimization for learning the optimalhyper-parameters and solve both major shortcomings of grid search in an efficient andelegant way. Bi-level learning is an optimization method where one optimization problemhas another optimization problem as its constraint. The goal of the bi-level program is tofind optimal hyper-parameters such that the validation error (the higher level objective)is minimized, while the optimal training problem is solved for the underlying SVM (thelower level objective). We use Lagrange multipliers to solve the bi-level problem andformulate the solution for several variants of the SVM (linear, kernel, multiple kernel).We can show that, using this method, the model selection problem (i.e. selection ofhyper-parameters) can be solved also for a large number of hyper-parameters. The bi-level approach exploits the continuity of the hyper-parameters which allows for bettersolutions than with grid search. In the experiments, we investigate different propertiesof the bi-level approach and try to give insights into the advantages of this method. Wefind that highly parametrized kernel SVMs perform best compared to simpler modelswhich is a clear advantage of bi-level optimization against grid search for model selection.

AB - This thesis deals with an efficient approach for learning the optimal hyper-parameters forSupport Vector Machines (SVMs). The common method to determine hyper-parametersis grid search. Grid search typically involves the definition of a discretized ”grid” ofpossible parameter values with a certain resolution and a search for the values thatresult in the minimal validation error of the learned model. A major limitation of gridsearch is that the search space grows exponentially in the parameters which makes theapproach only practical for determining very few hyper-parameters. Additionally, gridsearch operates on discrete parameter values which leads to suboptimal solutions. Inthis thesis we develop an approach to use bi-level optimization for learning the optimalhyper-parameters and solve both major shortcomings of grid search in an efficient andelegant way. Bi-level learning is an optimization method where one optimization problemhas another optimization problem as its constraint. The goal of the bi-level program is tofind optimal hyper-parameters such that the validation error (the higher level objective)is minimized, while the optimal training problem is solved for the underlying SVM (thelower level objective). We use Lagrange multipliers to solve the bi-level problem andformulate the solution for several variants of the SVM (linear, kernel, multiple kernel).We can show that, using this method, the model selection problem (i.e. selection ofhyper-parameters) can be solved also for a large number of hyper-parameters. The bi-level approach exploits the continuity of the hyper-parameters which allows for bettersolutions than with grid search. In the experiments, we investigate different propertiesof the bi-level approach and try to give insights into the advantages of this method. Wefind that highly parametrized kernel SVMs perform best compared to simpler modelswhich is a clear advantage of bi-level optimization against grid search for model selection.

M3 - Master's Thesis

ER -