Tissot, H;
Peschl, G;
Del Fabro, MD;
(2014)
Fast phonetic similarity search over large repositories.
In:
Database and Expert Systems Applications.
(pp. pp. 74-81).
Springer: Cham, Switzerland.
Preview |
Text
DEXA-2014-FPSS.pdf - Accepted Version Download (270kB) | Preview |
Abstract
Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.
Type: | Proceedings paper |
---|---|
Title: | Fast phonetic similarity search over large repositories |
Event: | DEXA 2014: Database and Expert Systems Applications |
ISBN-13: | 9783319100845 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1007/978-3-319-10085-2_6 |
Publisher version: | https://doi.org/10.1007/978-3-319-10085-2_6 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Phonetic Similarity, String Similarity, Fast Search |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics |
URI: | https://discovery.ucl.ac.uk/id/eprint/10065716 |
Archive Staff Only
View Item |