Pix2Video: Video Editing using Image Diffusion

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Pix2Video: Video Editing using Image Diffusion

Ceylan, D; Huang, CHP; Mitra, NJ; (2024) Pix2Video: Video Editing using Image Diffusion. In: Proceedings of the IEEE International Conference on Computer Vision. (pp. pp. 23149-23160). IEEE: Paris, France. Green open access

Preview

PDF
pix2Video.pdf - Other
Download (30MB) | Preview

Abstract

Image diffusion models, trained on massive image collections, have emerged as the most versatile image generator model in terms of quality and diversity. They support inverting real images and conditional (e.g., text) generation, making them attractive for high-quality image editing applications. We investigate how to use such pre-trained image models for text-guided video editing. The critical challenge is to achieve the target edits while still preserving the content of the source video. Our method works in two simple steps: first, we use a pre-trained structure-guided (e.g., depth) image diffusion model to perform text-guided edits on an anchor frame; then, in the key step, we progressively propagate the changes to the future frames via self-attention feature injection to adapt the core denoising step of the diffusion model. We then consolidate the changes by adjusting the latent code for the frame before continuing the process. Our approach is training-free and generalizes to a wide range of edits. We demonstrate the effectiveness of the approach by extensive experimentation and compare it against four different prior and parallel efforts (on ArXiv). We demonstrate that realistic text-guided video edits are possible, without any compute-intensive preprocessing or video-specific finetuning. https://duyguceylan.github.io/pix2video.github.io/.

Type:	Proceedings paper
Title:	Pix2Video: Video Editing using Image Diffusion
Event:	2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Dates:	1 Oct 2023 - 6 Oct 2023
ISBN-13:	9798350307184
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1109/ICCV51070.2023.02121
Publisher version:	http://dx.doi.org/10.1109/iccv51070.2023.02121
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Adaptation models, Computer vision, Codes, Noise reduction, Generators
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10190379

Downloads since deposit

11Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item