Poisot, Timothee;
Gibb, Rory;
Ryan, Sadie J;
Carlson, Colin J;
(2025)
NCBITaxonomy.jl: rapid biological names finding and reconciliation.
BMC Ecology and Evolution
, 25
(1)
, Article 84. 10.1186/s12862-025-02425-4.
Preview |
Text
NCBITaxonomy.jl rapid biological names finding and reconciliation.pdf - Published Version Download (874kB) | Preview |
Abstract
NCBITaxonomy.jl is a Julia package designed to address the complex challenges of taxonomic name reconciliation using a local copy of the NCBI taxonomic backbone (Federhen in Nucleic Acids Res 40:D136–D143, 2012, Schoch et al. in Database 2020:baaa062, 2020). The package provides advanced name matching capabilities that handle common issues in taxonomic data, including synonyms, homonyms, vernacular names, nomenclatural changes, and typographical errors. Core functionalities include case-insensitive search, customizable fuzzy string matching, and taxonomically-restricted searches. The package implements a robust exception system that explicitly handles ambiguous matches without interrupting workflow execution, enabling automated processing of large datasets. NCBITaxonomy.jl works with Julia 1.6 and up, uses Apache Arrow format for efficient local storage. It provides lineage navigation and taxonomic distance functions. The package has been successfully deployed in large-scale projects for automated name reconciliation and cleaning, demonstrating its effectiveness for high-throughput name reconciliation across heterogeneous biological datasets. The design prioritizes programmatic access over command-line usage, making it well-suited for integration into bioinformatics pipelines requiring reliable taxonomic standardization.
Type: | Article |
---|---|
Title: | NCBITaxonomy.jl: rapid biological names finding and reconciliation |
Location: | England |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1186/s12862-025-02425-4 |
Publisher version: | https://doi.org/10.1186/s12862-025-02425-4 |
Language: | English |
Additional information: | Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | Science & Technology, Life Sciences & Biomedicine, Ecology, Evolutionary Biology, Genetics & Heredity, Environmental Sciences & Ecology, NCBI, Taxonomic names, ICTV, Database, TAXONOMY |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment |
URI: | https://discovery.ucl.ac.uk/id/eprint/10214467 |
Archive Staff Only
![]() |
View Item |