eprintid: 10065716 rev_number: 33 eprint_status: archive userid: 608 dir: disk0/10/06/57/16 datestamp: 2020-02-11 13:21:49 lastmod: 2022-01-03 00:07:41 status_changed: 2020-02-11 13:21:49 type: proceedings_section metadata_visibility: show creators_name: Tissot, H creators_name: Peschl, G creators_name: Del Fabro, MD title: Fast phonetic similarity search over large repositories ispublished: pub divisions: UCL divisions: B02 divisions: DD4 keywords: Phonetic Similarity, String Similarity, Fast Search note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. abstract: Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors. date: 2014-01-01 date_type: published publisher: Springer official_url: https://doi.org/10.1007/978-3-319-10085-2_6 oa_status: green full_text_type: other language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 1561840 doi: 10.1007/978-3-319-10085-2_6 isbn_13: 9783319100845 lyricists_name: Correa Tissot, Hegler lyricists_id: HTISS81 actors_name: Tissot, Hegler actors_id: HTISS81 actors_role: owner full_text_status: public series: Lecture Notes in Computer Science publication: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) volume: 8645 number: PART 2 place_of_pub: Cham, Switzerland pagerange: 74-81 event_title: DEXA 2014: Database and Expert Systems Applications issn: 1611-3349 book_title: Database and Expert Systems Applications citation: Tissot, H; Peschl, G; Del Fabro, MD; (2014) Fast phonetic similarity search over large repositories. In: Database and Expert Systems Applications. (pp. pp. 74-81). Springer: Cham, Switzerland. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10065716/1/DEXA-2014-FPSS.pdf