eprintid: 10199974
rev_number: 6
eprint_status: archive
userid: 699
dir: disk0/10/19/99/74
datestamp: 2024-11-11 11:16:51
lastmod: 2024-11-11 11:16:51
status_changed: 2024-11-11 11:16:51
type: proceedings_section
metadata_visibility: show
sword_depositor: 699
creators_name: Song, Tianyi
creators_name: Cao, Jiuxin
creators_name: Wang, Kun
creators_name: Liu, Bo
creators_name: Zhang, Xiaofeng
title: Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning for Visual Story Synthesis
ispublished: pub
divisions: UCL
divisions: B04
divisions: F48
keywords: Training, Image quality, Visualization, Coherence, Signal processing, Acoustics, Speech processing
note: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals.
date: 2024-03-18
date_type: published
publisher: IEEE
official_url: http://dx.doi.org/10.1109/icassp48485.2024.10446420
oa_status: green
full_text_type: other
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2333361
doi: 10.1109/icassp48485.2024.10446420
lyricists_name: Song, Tianyi
lyricists_id: TSONG42
actors_name: Jayawardana, Anusha
actors_id: AJAYA51
actors_role: owner
full_text_status: public
pres_type: paper
publication: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
place_of_pub: Seoul, Korea, Republic of
pagerange: 3350-3354
event_title: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
event_dates: 14 Apr 2024 - 19 Apr 2024
book_title: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
citation:        Song, Tianyi;    Cao, Jiuxin;    Wang, Kun;    Liu, Bo;    Zhang, Xiaofeng;      (2024)    Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning for Visual Story Synthesis.                     In:  ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).  (pp. pp. 3350-3354).  IEEE: Seoul, Korea, Republic of.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10199974/1/Song_2309.09553v4.pdf