Ratsaby, J; (2003) A Stochastic Gradient Descent Algorithm for Structure Risk Minimisation. In: Gavalda, R and Jantke, KP and Takimoto, E, (eds.) Algorithmic Learning Theory: 14th International Conference, ALT 2003, Sapporo, Japan, October 17-19, 2003: Proceedings. (pp. 205 - 220). Springer-Verlag: Berlin / Heidelberg, Germany.
Full text not available from this repository.
Structural risk minimisation (SRM) is a general complexity regularization method which automatically selects the model complexity that approximately minimises the misclassification error probability of the empirical risk minimiser. It does so by adding a complexity penalty term ε(m,k) to the empirical risk of the candidate hypotheses and then for any fixed sample size m it minimises the sum with respect to the model complexity variable k. When learning multicategory classification there are M subsamples m i , corresponding to the M pattern classes with a priori probabilities p i , 1 ≤ i ≤ M. Using the usual representation for a multi-category classifier as M individual boolean classifiers, the penalty becomes Si=1Mpie(mi,ki)Mi=1pi(miki). If the m i are given then the standard SRM trivially applies here by minimizing the penalised empirical risk with respect to k i , 1...,M. However, in situations where the total sample size Si=1MmiMi=1mi needs to be minimal one needs to also minimize the penalised empirical risk with respect to the variables m i , i = 1...,M. The obvious problem is that the empirical risk can only be defined after the subsamples (and hence their sizes) are given (known). Utilising an on-line stochastic gradient descent approach, this paper overcomes this difficulty and introduces a sample-querying algorithm which extends the standard SRM principle. It minimises the penalised empirical risk not only with respect to the k i , as the standard SRM does, but also with respect to the m i , i = 1...,M. The challenge here is in defining a stochastic empirical criterion which when minimised yields a sequence of subsample-size vectors which asymptotically achieve the Bayes’ optimal error convergence rate.
|Title:||A Stochastic Gradient Descent Algorithm for Structure Risk Minimisation|
|Keywords:||algorithmic, Artificial Intelligence, CONFERENCE, INTELLIGENCE, learning, LECTURE, proc, Theories|
|UCL classification:||UCL > School of BEAMS > Faculty of Engineering Science > Computer Science|
Archive Staff Only: edit this record