UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL [version 2; peer review: 1 approved, 2 approved with reservations]

Sima, AC; Dessimoz, C; Stockinger, K; Zahn-Zabal, M; Mendes de Farias, T; (2020) A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL [version 2; peer review: 1 approved, 2 approved with reservations]. F1000Research , 8 , Article 1822. 10.12688/f1000research.21027.2. Green open access

[img]
Preview
Text
23c7e33a-5dfb-4807-ac4a-d2706ba8e56c_21027_-_tarcisio_mendes_v2.pdf - Published version

Download (1MB) | Preview

Abstract

The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple data sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the equivalent SPARQL constructs required to benefit from this data – in particular, recursive property paths. In this article, we provide a hands-on introduction to querying evolutionary data across several data sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different data sources can be compared, through the use of federated SPARQL queries.

Type: Article
Title: A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL [version 2; peer review: 1 approved, 2 approved with reservations]
Open access status: An open access version is available from UCL Discovery
DOI: 10.12688/f1000research.21027.2
Publisher version: https://doi.org/10.12688/f1000research.21027.2
Language: English
Additional information: © 2020 Sima AC et al. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: Orthology, Comparative Genomics, Sequence Homology, Resource Description Framework (RDF), SPARQL
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Div of Biosciences > Genetics, Evolution and Environment
URI: https://discovery.ucl.ac.uk/id/eprint/10106401
Downloads since deposit
4Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item