UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Toppled Realities: Challenges in Generation and Validation of Synthetic Data

Lam, Percy; Chen, Weiwei; de Silva, Lavindra; Brilakis, Ioannis; (2026) Toppled Realities: Challenges in Generation and Validation of Synthetic Data. In: Tonkin, EL and Tourte, GJL and Yordanova, K, (eds.) Annotation of Real-World Data for Artificial Intelligence Systems (ARDUOUS 2025). (pp. pp. 36-50). Springer Nature: Cham, Switzerland. (In press).

[thumbnail of ECAI_Arduous rev2.pdf] Text
ECAI_Arduous rev2.pdf - Accepted Version
Access restricted to UCL open access staff until 25 October 2026.

Download (19MB)

Abstract

In advancing automation in infrastructure maintenance, collecting comprehensive datasets is arduous. While synthetic data provides a promising avenue to address real data shortages, problems remain in creating and validating the generations. This position paper aims to push the boundary in generating synthetic data without prior training samples and validating the synthetic generation by vision language models (VLM), as learned from our exploratory trials. Our exploratory trials attempted to generate new toppled road lights in road scene images with several inpainting and image editing tools, and ultimately resorted to a more deterministic approach of "create, prepare, stylise and inpaint". When validating the synthetic toppled road lights, we explored the possibility of automating prompt engineering and made four main observations. Whilst exploration and exploitation can be seen, responses were sensitive to the text prompts. The model struggled with the dilemma of adhering to the instruction without good results and self-hallucinating for good results by goal misspecification. From the exploratory trials, we posit that finding the right starting point is important for generating synthetic data that appears real. VLMs can be more widely adopted for detection and validation with more meticulous auto-prompt engineering.

Type: Proceedings paper
Title: Toppled Realities: Challenges in Generation and Validation of Synthetic Data
Event: Annotation of Real-World Data for Artificial Intelligence Systems (ARDUOUS 2025)
ISBN-13: 9783032091161
DOI: 10.1007/978-3-032-09117-8_3
Publisher version: https://doi.org/10.1007/978-3-032-09117-8_3
Language: English
Additional information: This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Synthetic data, Defect detection, Vision Language Models, Prompt engineering
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
URI: https://discovery.ucl.ac.uk/id/eprint/10218782
Downloads since deposit
1Download
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item