Ratsaby, J;
(2003)
A Stochastic Gradient Descent Algorithm for Structure Risk Minimisation.
In: Gavalda, R and Jantke, KP and Takimoto, E, (eds.)
**Algorithmic Learning Theory: 14th International Conference, ALT 2003, Sapporo, Japan, October 17-19, 2003: Proceedings.**
(pp. 205 - 220).
Springer-Verlag: Berlin / Heidelberg, Germany.

## Abstract

Structural risk minimisation (SRM) is a general complexity regularization method which automatically selects the model complexity that approximately minimises the misclassification error probability of the empirical risk minimiser. It does so by adding a complexity penalty term ε(m,k) to the empirical risk of the candidate hypotheses and then for any fixed sample size m it minimises the sum with respect to the model complexity variable k. When learning multicategory classification there are M subsamples m i , corresponding to the M pattern classes with a priori probabilities p i , 1 ≤ i ≤ M. Using the usual representation for a multi-category classifier as M individual boolean classifiers, the penalty becomes Si=1Mpie(mi,ki)Mi=1pi(miki). If the m i are given then the standard SRM trivially applies here by minimizing the penalised empirical risk with respect to k i , 1...,M. However, in situations where the total sample size Si=1MmiMi=1mi needs to be minimal one needs to also minimize the penalised empirical risk with respect to the variables m i , i = 1...,M. The obvious problem is that the empirical risk can only be defined after the subsamples (and hence their sizes) are given (known). Utilising an on-line stochastic gradient descent approach, this paper overcomes this difficulty and introduces a sample-querying algorithm which extends the standard SRM principle. It minimises the penalised empirical risk not only with respect to the k i , as the standard SRM does, but also with respect to the m i , i = 1...,M. The challenge here is in defining a stochastic empirical criterion which when minimised yields a sequence of subsample-size vectors which asymptotically achieve the Bayes’ optimal error convergence rate.

Type: | Proceedings paper |
---|---|

Title: | A Stochastic Gradient Descent Algorithm for Structure Risk Minimisation |

ISBN: | 3540202919 |

DOI: | 10.1007/978-3-540-39624-6_17 |

Publisher version: | http://dx.doi.org/10.1007/978-3-540-39624-6_17 |

Keywords: | algorithmic, Artificial Intelligence, CONFERENCE, INTELLIGENCE, learning, LECTURE, proc, Theories |

UCL classification: | UCL > School of BEAMS > Faculty of Engineering Science > Computer Science |

URI: | http://discovery.ucl.ac.uk/id/eprint/155440 |

### Archive Staff Only

View Item |