West, Robert;
Brown, Jamie;
Shahab, Lion;
Baird, Harriet;
Webb, Thomas;
Squires, Hazel;
Tattan-Birch, Harry;
... Michie, Susan; + view all
(2025)
Annotating datasets in behavioural and social sciences to promote interoperability: development of the Schema for Ontology-based Dataset Annotation (SODA) version 1.0
[version 1; peer review: awaiting peer review].
Wellcome Open Research
, 10
, Article 455. 10.12688/wellcomeopenres.24234.1.
Preview |
Text
Brown_V1.pdf Download (491kB) | Preview |
Abstract
Background and aims: Ontologies are increasingly employed to help find, use and synthesise information, but methods for using them to annotate documents and datasets remain in their infancy in the behavioural and social sciences. The Behavioural Research UK DEMO-DATA project aimed to develop a prototype schema for annotating datasets in behavioural and social sciences. / Methods: A case-study dataset (the ‘Smoking Toolkit Study’), used to inform an Agent-Based Model of trajectories in cigarette smoking and cessation in England, was chosen for annotation using two ontologies - The Behaviour Change Intervention Ontology (BCIO) and the Addiction Ontology (AddictO). The data set included 21 variables representing information about sociodemographic and tobacco and nicotine use attributes of the study population. A preliminary version of the schema for linking variables to ontology classes was developed as a basis for annotating each variable in the dataset. This was applied and revised iteratively until it was judged by an expert panel of domain experts and modellers to represent the variables sufficiently accurately to enable searching for and integration of data. / Results: The prototype Schema for Ontology-based Dataset Annotation (SODA) version 1.0 was developed over seven iterations. Variables were represented by an ‘object property’|‘ontology class’ expression (e.g., ‘has characteristic’|‘extent of social smoking’) together with information about the data types (e.g., numbers, ontology subclasses, or Boolean values), measurement source, unit of measurement, any coding or data transformations and whether or not the variable was fully characterised by the annotation. The prototype schema was applied successfully to the smoking dataset with 15 new ontology classes being created as required. / Conclusions: A prototype schema for annotating behavioural and social science datasets was developed and successfully applied to a dataset on smoking in England using ontology relations and classes. The next step is to further develop and evaluate the schema by application to case studies with a range of users and other datasets.
Type: | Article |
---|---|
Title: | Annotating datasets in behavioural and social sciences to promote interoperability: development of the Schema for Ontology-based Dataset Annotation (SODA) version 1.0 [version 1; peer review: awaiting peer review] |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.12688/wellcomeopenres.24234.1 |
Publisher version: | https://doi.org/10.12688/wellcomeopenres.24234.1 |
Language: | English |
Additional information: | Copyright © 2025 West R et al. This is an open access work distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Clinical, Edu and Hlth Psychology UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Epidemiology and Health > Behavioural Science and Health |
URI: | https://discovery.ucl.ac.uk/id/eprint/10212682 |
Archive Staff Only
![]() |
View Item |