UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Proceedings paper #79148

Saunders, C; Shawe-Taylor, J; Vinokourov, A; (2003) UNSPECIFIED In: (Proceedings) String Kernels, Fisher Kernels and Finite State Automata.

Full text not available from this repository.

Abstract

In this paper we show how the generation of documents can be thought of as a k-stage Markov process, which leads to a Fisher kernel from which the n-gram and string kernels can be re-constructed. The Fisher kernel view gives a more flexible insight into the string kernel and suggests how it can be parametrised in a way that reflects the statistics of the training corpus. Furthermore, the probabilistic modelling approach suggests extending the Markov process to consider sub-sequences of varying length, rather than the standard fixed-length approach used in the string kernel. We give a procedure for determining which sub-sequences are informative features and hence generate a Finite State Machine model, which can again be used to obtain a Fisher kernel. By adjusting the parametrisation we can also influence the weighting received by the features. In this way we are able to obtain a logarithmic weighting in a Fisher kernel. Finally, experiments are reported comparing the different kernels using the standard Bag of Words kernel as a baseline

Type:Proceedings paper
Event:String Kernels, Fisher Kernels and Finite State Automata
Keywords:String Kernels
UCL classification:UCL > School of BEAMS > Faculty of Engineering Science > Computer Science

Archive Staff Only: edit this record