TY  - GEN
SP  - 74
T3  - Lecture Notes in Computer Science
SN  - 1611-3349
N2  - Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.
AV  - public
N1  - This version is the author accepted manuscript. For information on re-use, please refer to the publisher?s terms and conditions.
ID  - discovery10065716
UR  - https://doi.org/10.1007/978-3-319-10085-2_6
PB  - Springer
EP  - 81
A1  - Tissot, H
A1  - Peschl, G
A1  - Del Fabro, MD
KW  - Phonetic Similarity
KW  -  String Similarity
KW  -  Fast Search
TI  - Fast phonetic similarity search over large repositories
CY  - Cham, Switzerland
Y1  - 2014/01/01/
ER  -