UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers

Langdon, WB; (2000) Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers. : Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands.

Full text not available from this repository.

Abstract

Ngrams offer fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classifier. 91 percent accuracy is found on binary classification on short multi-author technical English documents. This falls if more categories are used but 69 percent is obtained with 8 classes. Zipf law is found not to apply to trigrams.

Type: Report
Title: Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers
Publisher version: http://www.cwi.nl/ftp/CWIreports/SEN/SEN-R0022.ps....
Additional information: email: W.Langdon@cs.ucl.ac.uk keywords: genetic algorithms, ngrams, trigrams, natural language processing, NLP notes: Also available as GECCO’2000 Late Breaking paper langdon:2000:ngramLB size: 10 pages
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: http://discovery.ucl.ac.uk/id/eprint/1327729
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item