Chanin, David;
Hunter, Anthony;
Camburu, Oana-Maria;
(2024)
Identifying Linear Relational Concepts in Large Language Models.
In:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
(pp. pp. 1524-1535).
Association for Computational Linguistics: Mexico City, Mexico.
Preview |
PDF
2024.naacl-long.85.pdf - Published Version Download (621kB) | Preview |
Abstract
Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations. However, for any human-interpretable concept, how can we find its direction in the latent space? We present a technique called linear relational concepts (LRC) for finding concept directions corresponding to human-interpretable concepts by first modeling the relation between subject and object as a linear relational embedding (LRE) (Hernandez et al., 2023b). We find that inverting the LRE and using earlier object layers results in a powerful technique for finding concept directions that outperforms standard black-box probing classifiers. We evaluate LRCs on their performance as concept classifiers as well as their ability to causally change model output.
Type: | Proceedings paper |
---|---|
Title: | Identifying Linear Relational Concepts in Large Language Models |
Event: | 2024 Conference of the North American Chapter of the Association for Computational Linguistics |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.18653/v1/2024.naacl-long.85 |
Publisher version: | https://aclanthology.org/2024.naacl-long.85/ |
Language: | English |
Additional information: | ©2024 Association for Computational Linguistics ACL materials are Copyright © 1963–2024 ACL; other materials are copyrighted by their respective copyright holders. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. https://creativecommons.org/licenses/by/4.0/ |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10196786 |
Archive Staff Only
![]() |
View Item |