Bi-level Optimization for Support Vector Machines

Teresa Klatzer

Research output: ThesisMaster's Thesis

Abstract

This thesis deals with an efficient approach for learning the optimal hyper-parameters for
Support Vector Machines (SVMs). The common method to determine hyper-parameters
is grid search. Grid search typically involves the definition of a discretized ”grid” of
possible parameter values with a certain resolution and a search for the values that
result in the minimal validation error of the learned model. A major limitation of grid
search is that the search space grows exponentially in the parameters which makes the
approach only practical for determining very few hyper-parameters. Additionally, grid
search operates on discrete parameter values which leads to suboptimal solutions. In
this thesis we develop an approach to use bi-level optimization for learning the optimal
hyper-parameters and solve both major shortcomings of grid search in an efficient and
elegant way. Bi-level learning is an optimization method where one optimization problem
has another optimization problem as its constraint. The goal of the bi-level program is to
find optimal hyper-parameters such that the validation error (the higher level objective)
is minimized, while the optimal training problem is solved for the underlying SVM (the
lower level objective). We use Lagrange multipliers to solve the bi-level problem and
formulate the solution for several variants of the SVM (linear, kernel, multiple kernel).
We can show that, using this method, the model selection problem (i.e. selection of
hyper-parameters) can be solved also for a large number of hyper-parameters. The bi-
level approach exploits the continuity of the hyper-parameters which allows for better
solutions than with grid search. In the experiments, we investigate different properties
of the bi-level approach and try to give insights into the advantages of this method. We
find that highly parametrized kernel SVMs perform best compared to simpler models
which is a clear advantage of bi-level optimization against grid search for model selection.
LanguageEnglish
StatusPublished - 2014

Fingerprint

Support vector machines
Lagrange multipliers
Experiments

Cite this

Bi-level Optimization for Support Vector Machines. / Klatzer, Teresa.

2014. 84 p.

Research output: ThesisMaster's Thesis

@phdthesis{1b006fa5f8694e0998b7e7bbba0d7267,
title = "Bi-level Optimization for Support Vector Machines",
abstract = "This thesis deals with an efficient approach for learning the optimal hyper-parameters forSupport Vector Machines (SVMs). The common method to determine hyper-parametersis grid search. Grid search typically involves the definition of a discretized ”grid” ofpossible parameter values with a certain resolution and a search for the values thatresult in the minimal validation error of the learned model. A major limitation of gridsearch is that the search space grows exponentially in the parameters which makes theapproach only practical for determining very few hyper-parameters. Additionally, gridsearch operates on discrete parameter values which leads to suboptimal solutions. Inthis thesis we develop an approach to use bi-level optimization for learning the optimalhyper-parameters and solve both major shortcomings of grid search in an efficient andelegant way. Bi-level learning is an optimization method where one optimization problemhas another optimization problem as its constraint. The goal of the bi-level program is tofind optimal hyper-parameters such that the validation error (the higher level objective)is minimized, while the optimal training problem is solved for the underlying SVM (thelower level objective). We use Lagrange multipliers to solve the bi-level problem andformulate the solution for several variants of the SVM (linear, kernel, multiple kernel).We can show that, using this method, the model selection problem (i.e. selection ofhyper-parameters) can be solved also for a large number of hyper-parameters. The bi-level approach exploits the continuity of the hyper-parameters which allows for bettersolutions than with grid search. In the experiments, we investigate different propertiesof the bi-level approach and try to give insights into the advantages of this method. Wefind that highly parametrized kernel SVMs perform best compared to simpler modelswhich is a clear advantage of bi-level optimization against grid search for model selection.",
author = "Teresa Klatzer",
year = "2014",
language = "English",

}

TY - THES

T1 - Bi-level Optimization for Support Vector Machines

AU - Klatzer,Teresa

PY - 2014

Y1 - 2014

N2 - This thesis deals with an efficient approach for learning the optimal hyper-parameters forSupport Vector Machines (SVMs). The common method to determine hyper-parametersis grid search. Grid search typically involves the definition of a discretized ”grid” ofpossible parameter values with a certain resolution and a search for the values thatresult in the minimal validation error of the learned model. A major limitation of gridsearch is that the search space grows exponentially in the parameters which makes theapproach only practical for determining very few hyper-parameters. Additionally, gridsearch operates on discrete parameter values which leads to suboptimal solutions. Inthis thesis we develop an approach to use bi-level optimization for learning the optimalhyper-parameters and solve both major shortcomings of grid search in an efficient andelegant way. Bi-level learning is an optimization method where one optimization problemhas another optimization problem as its constraint. The goal of the bi-level program is tofind optimal hyper-parameters such that the validation error (the higher level objective)is minimized, while the optimal training problem is solved for the underlying SVM (thelower level objective). We use Lagrange multipliers to solve the bi-level problem andformulate the solution for several variants of the SVM (linear, kernel, multiple kernel).We can show that, using this method, the model selection problem (i.e. selection ofhyper-parameters) can be solved also for a large number of hyper-parameters. The bi-level approach exploits the continuity of the hyper-parameters which allows for bettersolutions than with grid search. In the experiments, we investigate different propertiesof the bi-level approach and try to give insights into the advantages of this method. Wefind that highly parametrized kernel SVMs perform best compared to simpler modelswhich is a clear advantage of bi-level optimization against grid search for model selection.

AB - This thesis deals with an efficient approach for learning the optimal hyper-parameters forSupport Vector Machines (SVMs). The common method to determine hyper-parametersis grid search. Grid search typically involves the definition of a discretized ”grid” ofpossible parameter values with a certain resolution and a search for the values thatresult in the minimal validation error of the learned model. A major limitation of gridsearch is that the search space grows exponentially in the parameters which makes theapproach only practical for determining very few hyper-parameters. Additionally, gridsearch operates on discrete parameter values which leads to suboptimal solutions. Inthis thesis we develop an approach to use bi-level optimization for learning the optimalhyper-parameters and solve both major shortcomings of grid search in an efficient andelegant way. Bi-level learning is an optimization method where one optimization problemhas another optimization problem as its constraint. The goal of the bi-level program is tofind optimal hyper-parameters such that the validation error (the higher level objective)is minimized, while the optimal training problem is solved for the underlying SVM (thelower level objective). We use Lagrange multipliers to solve the bi-level problem andformulate the solution for several variants of the SVM (linear, kernel, multiple kernel).We can show that, using this method, the model selection problem (i.e. selection ofhyper-parameters) can be solved also for a large number of hyper-parameters. The bi-level approach exploits the continuity of the hyper-parameters which allows for bettersolutions than with grid search. In the experiments, we investigate different propertiesof the bi-level approach and try to give insights into the advantages of this method. Wefind that highly parametrized kernel SVMs perform best compared to simpler modelswhich is a clear advantage of bi-level optimization against grid search for model selection.

M3 - Master's Thesis

ER -