UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Bridging Complex Data and Generative AI for Data Retrieval

Routsis, Vasileios; (2025) Bridging Complex Data and Generative AI for Data Retrieval. Presented at: IASSIST 2025, Bristol, UK. Green open access

[thumbnail of IASSIST_2025.pdf]
Preview
Slideshow
IASSIST_2025.pdf - Published Version

Download (2MB) | Preview

Abstract

Data services aim to provide accessible and intuitive tools for researchers and policymakers to navigate an increasingly complex and diverse data landscape. Modern interfaces have significantly improved accessibility but often still require considerable technical expertise and domain-specific knowledge, perpetuating barriers to effective data discovery and retrieval. CORDIAL-AI, a pilot project funded under the ESRC Future Data Services programme, explores the potential of Generative AI (GenAI) to address some of these barriers by enabling users to interact with a Large Language Model (LLM) for retrieving complex, bespoke UK census flow data. Flow data is the most complex type of census data, characterised by its substantial size, extensive code lists, large volumes of numerical information, and intricate relational structures. It provides a compelling case study to examine how LLMs can assist users in identifying and extracting tailored subsets of such data while highlighting the significant challenges GenAI faces in handling highly structured datasets and large-scale categorical and numerical variables. The pilot seeks to equip UK Data Service (UKDS) staff with the technical expertise and transferable skills necessary to engage with this rapidly evolving technology, enabling future data services to adapt to new advancements and become more efficient. From a technical perspective, the project builds on a newly developed experimental census API with advanced subsetting capabilities as part of the UKDS’s 2024–2030 data-driven strategy. The presentation will detail the methodologies employed to construct reliable pipelines between structured datasets and GenAI systems, leveraging the API alongside advanced techniques in prompt engineering, natural language processing, AI agents, and model fine-tuning. It will reflect on practical insights from applying advanced GenAI techniques to data retrieval, offering perspectives on how such approaches can shape the development of innovative tools for data access.

Type: Conference item (Presentation)
Title: Bridging Complex Data and Generative AI for Data Retrieval
Event: IASSIST 2025
Location: Bristol, UK
Dates: 03 - 06 June 2025
Open access status: An open access version is available from UCL Discovery
DOI: 10.5281/zenodo.15732721
Publisher version: https://doi.org/10.5281/zenodo.15732722
Language: English
Additional information: This is an Open Access presentation published under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/).
Keywords: API, Data Discovery, DevOps, Census, Artificial Intelligence, Generative AI, Responsible AI, Data Curation, Data Processing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL SLASH
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities > Dept of Information Studies
URI: https://discovery.ucl.ac.uk/id/eprint/10210963
Downloads since deposit
9Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item