UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Identifying Linear Relational Concepts in Large Language Models

Chanin, David; Hunter, Anthony; Camburu, Oana-Maria; (2024) Identifying Linear Relational Concepts in Large Language Models. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. (pp. pp. 1524-1535). Association for Computational Linguistics: Mexico City, Mexico. Green open access

[thumbnail of 2024.naacl-long.85.pdf]
Preview
PDF
2024.naacl-long.85.pdf - Published Version

Download (621kB) | Preview

Abstract

Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations. However, for any human-interpretable concept, how can we find its direction in the latent space? We present a technique called linear relational concepts (LRC) for finding concept directions corresponding to human-interpretable concepts by first modeling the relation between subject and object as a linear relational embedding (LRE) (Hernandez et al., 2023b). We find that inverting the LRE and using earlier object layers results in a powerful technique for finding concept directions that outperforms standard black-box probing classifiers. We evaluate LRCs on their performance as concept classifiers as well as their ability to causally change model output.

Type: Proceedings paper
Title: Identifying Linear Relational Concepts in Large Language Models
Event: 2024 Conference of the North American Chapter of the Association for Computational Linguistics
Open access status: An open access version is available from UCL Discovery
DOI: 10.18653/v1/2024.naacl-long.85
Publisher version: https://aclanthology.org/2024.naacl-long.85/
Language: English
Additional information: ©2024 Association for Computational Linguistics ACL materials are Copyright © 1963–2024 ACL; other materials are copyrighted by their respective copyright holders. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. https://creativecommons.org/licenses/by/4.0/
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10196786
Downloads since deposit
13Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item