eprintid: 10065716
rev_number: 33
eprint_status: archive
userid: 608
dir: disk0/10/06/57/16
datestamp: 2020-02-11 13:21:49
lastmod: 2022-01-03 00:07:41
status_changed: 2020-02-11 13:21:49
type: proceedings_section
metadata_visibility: show
creators_name: Tissot, H
creators_name: Peschl, G
creators_name: Del Fabro, MD
title: Fast phonetic similarity search over large repositories
ispublished: pub
divisions: UCL
divisions: B02
divisions: DD4
keywords: Phonetic Similarity, String Similarity, Fast Search
note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.
date: 2014-01-01
date_type: published
publisher: Springer
official_url: https://doi.org/10.1007/978-3-319-10085-2_6
oa_status: green
full_text_type: other
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1561840
doi: 10.1007/978-3-319-10085-2_6
isbn_13: 9783319100845
lyricists_name: Correa Tissot, Hegler
lyricists_id: HTISS81
actors_name: Tissot, Hegler
actors_id: HTISS81
actors_role: owner
full_text_status: public
series: Lecture Notes in Computer Science
publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
volume: 8645
number: PART 2
place_of_pub: Cham, Switzerland
pagerange: 74-81
event_title: DEXA 2014: Database and Expert Systems Applications
issn: 1611-3349
book_title: Database and Expert Systems Applications
citation:        Tissot, H;    Peschl, G;    Del Fabro, MD;      (2014)    Fast phonetic similarity search over large repositories.                     In:  Database and Expert Systems Applications.  (pp. pp. 74-81).  Springer: Cham, Switzerland.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10065716/1/DEXA-2014-FPSS.pdf