UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning for Visual Story Synthesis

Song, Tianyi; Cao, Jiuxin; Wang, Kun; Liu, Bo; Zhang, Xiaofeng; (2024) Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning for Visual Story Synthesis. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. pp. 3350-3354). IEEE: Seoul, Korea, Republic of. Green open access

[thumbnail of Song_2309.09553v4.pdf]
Preview
PDF
Song_2309.09553v4.pdf - Accepted Version

Download (707kB) | Preview

Abstract

The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals.

Type: Proceedings paper
Title: Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning for Visual Story Synthesis
Event: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Dates: 14 Apr 2024 - 19 Apr 2024
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/icassp48485.2024.10446420
Publisher version: http://dx.doi.org/10.1109/icassp48485.2024.1044642...
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Training, Image quality, Visualization, Coherence, Signal processing, Acoustics, Speech processing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10199974
Downloads since deposit
2Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item