Wu, Linjun;
Tang, Xiangjun;
Cong, Jingyuan;
Wang, He;
Hu, Bo;
Gong, Xu;
Li, Songnan;
... Jin, Xiaogang; + view all
(2025)
Semantically Consistent Text-to-Motion with Unsupervised Styles.
In:
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers.
(pp. p. 56).
ACM: Vancouver, BC, Canada.
Preview |
Text
Siggraph_2025__Linjun_Wu_.pdf - Accepted Version Download (5MB) | Preview |
Abstract
Text-to-stylized human motion generation leverages text descriptions for motion generation with fine-grained style control with respect to a reference motion. However, existing approaches typically rely on supervised style learning with labeled datasets, constraining their adaptability and generalization for effective diverse style control. Additionally, they have not fully explored the temporal correlations between motion, textual descriptions, and style, making it challenging to generate semantically consistent motion with precise style alignment. To address these limitations, we introduce a novel method that integrates unsupervised style from arbitrary references into a text-driven diffusion model to generate semantically consistent stylized human motion. The core innovation lies in leveraging text as a mediator to capture the temporal correspondences between motion and style, enabling the seamless integration of temporally dynamic style into motion features. Specifically, we first train a diffusion model on a text-motion dataset to capture the correlation between motion and text semantics. A style adapter then extracts temporally dynamic style features from reference motions and integrates a novel Semantic-Aware Style Injection (SASI) module to infuse these features into the diffusion model. The SASI module computes the semantic correlation between motion and style features based on text, selectively incorporating style features that align with motion content, ensuring semantic consistency and precise style alignment. Our style adapter does not require a labeled style dataset for training, enhancing adaptability and generalization of style control. Extensive evaluations show that our method outperforms previous approaches in terms of semantic consistency and style expressivity. Our webpage, https://fivezerojun.github.io/stylization.github.io/, includes links to the supplementary video and code.
| Type: | Proceedings paper |
|---|---|
| Title: | Semantically Consistent Text-to-Motion with Unsupervised Styles |
| Event: | SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers |
| Open access status: | An open access version is available from UCL Discovery |
| DOI: | 10.1145/3721238.3730641 |
| Publisher version: | https://doi.org/10.1145/3721238.3730641 |
| Language: | English |
| Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10214396 |
Archive Staff Only
![]() |
View Item |

