UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Semantically Consistent Text-to-Motion with Unsupervised Styles

Wu, Linjun; Tang, Xiangjun; Cong, Jingyuan; Wang, He; Hu, Bo; Gong, Xu; Li, Songnan; ... Jin, Xiaogang; + view all (2025) Semantically Consistent Text-to-Motion with Unsupervised Styles. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. (pp. p. 56). ACM: Vancouver, BC, Canada. Green open access

[thumbnail of Siggraph_2025__Linjun_Wu_.pdf]
Preview
Text
Siggraph_2025__Linjun_Wu_.pdf - Accepted Version

Download (5MB) | Preview

Abstract

Text-to-stylized human motion generation leverages text descriptions for motion generation with fine-grained style control with respect to a reference motion. However, existing approaches typically rely on supervised style learning with labeled datasets, constraining their adaptability and generalization for effective diverse style control. Additionally, they have not fully explored the temporal correlations between motion, textual descriptions, and style, making it challenging to generate semantically consistent motion with precise style alignment. To address these limitations, we introduce a novel method that integrates unsupervised style from arbitrary references into a text-driven diffusion model to generate semantically consistent stylized human motion. The core innovation lies in leveraging text as a mediator to capture the temporal correspondences between motion and style, enabling the seamless integration of temporally dynamic style into motion features. Specifically, we first train a diffusion model on a text-motion dataset to capture the correlation between motion and text semantics. A style adapter then extracts temporally dynamic style features from reference motions and integrates a novel Semantic-Aware Style Injection (SASI) module to infuse these features into the diffusion model. The SASI module computes the semantic correlation between motion and style features based on text, selectively incorporating style features that align with motion content, ensuring semantic consistency and precise style alignment. Our style adapter does not require a labeled style dataset for training, enhancing adaptability and generalization of style control. Extensive evaluations show that our method outperforms previous approaches in terms of semantic consistency and style expressivity. Our webpage, https://fivezerojun.github.io/stylization.github.io/, includes links to the supplementary video and code.

Type: Proceedings paper
Title: Semantically Consistent Text-to-Motion with Unsupervised Styles
Event: SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3721238.3730641
Publisher version: https://doi.org/10.1145/3721238.3730641
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10214396
Downloads since deposit
8Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item