Routsis, Vasileios;
(2025)
Bridging Complex Data and Generative AI for Data Retrieval.
Presented at: IASSIST 2025, Bristol, UK.
Preview |
Slideshow
IASSIST_2025.pdf - Published Version Download (2MB) | Preview |
Abstract
Data services aim to provide accessible and intuitive tools for researchers and policymakers to navigate an increasingly complex and diverse data landscape. Modern interfaces have significantly improved accessibility but often still require considerable technical expertise and domain-specific knowledge, perpetuating barriers to effective data discovery and retrieval. CORDIAL-AI, a pilot project funded under the ESRC Future Data Services programme, explores the potential of Generative AI (GenAI) to address some of these barriers by enabling users to interact with a Large Language Model (LLM) for retrieving complex, bespoke UK census flow data. Flow data is the most complex type of census data, characterised by its substantial size, extensive code lists, large volumes of numerical information, and intricate relational structures. It provides a compelling case study to examine how LLMs can assist users in identifying and extracting tailored subsets of such data while highlighting the significant challenges GenAI faces in handling highly structured datasets and large-scale categorical and numerical variables. The pilot seeks to equip UK Data Service (UKDS) staff with the technical expertise and transferable skills necessary to engage with this rapidly evolving technology, enabling future data services to adapt to new advancements and become more efficient. From a technical perspective, the project builds on a newly developed experimental census API with advanced subsetting capabilities as part of the UKDS’s 2024–2030 data-driven strategy. The presentation will detail the methodologies employed to construct reliable pipelines between structured datasets and GenAI systems, leveraging the API alongside advanced techniques in prompt engineering, natural language processing, AI agents, and model fine-tuning. It will reflect on practical insights from applying advanced GenAI techniques to data retrieval, offering perspectives on how such approaches can shape the development of innovative tools for data access.
| Type: | Conference item (Presentation) |
|---|---|
| Title: | Bridging Complex Data and Generative AI for Data Retrieval |
| Event: | IASSIST 2025 |
| Location: | Bristol, UK |
| Dates: | 03 - 06 June 2025 |
| Open access status: | An open access version is available from UCL Discovery |
| DOI: | 10.5281/zenodo.15732721 |
| Publisher version: | https://doi.org/10.5281/zenodo.15732722 |
| Language: | English |
| Additional information: | This is an Open Access presentation published under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). |
| Keywords: | API, Data Discovery, DevOps, Census, Artificial Intelligence, Generative AI, Responsible AI, Data Curation, Data Processing |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL SLASH UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10210963 |
Archive Staff Only
![]() |
View Item |

