UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Large language models reduce public knowledge sharing on online Q&A platforms

Del Rio-Chanona, R Maria; Laurentsyeva, Nadzeya; Wachs, Johannes; (2024) Large language models reduce public knowledge sharing on online Q&A platforms. PNAS Nexus , 3 (9) , Article pgae400. 10.1093/pnasnexus/pgae400. Green open access

[thumbnail of Large language models reduce public knowledge sharing on online Q&A platforms.pdf]
Preview
Text
Large language models reduce public knowledge sharing on online Q&A platforms.pdf - Accepted Version

Download (823kB) | Preview

Abstract

Large language models (LLMs) are a potential substitute for human-generated data and knowledge resources. This substitution, however, can present a significant problem for the training data needed to develop future models if it leads to a reduction of human-generated content. In this work, we document a reduction in activity on Stack Overflow coinciding with the release of ChatGPT, a popular LLM. To test whether this reduction in activity is specific to the introduction of this LLM, we use counterfactuals involving similar human-generated knowledge resources that should not be affected by the introduction of ChatGPT to such extent. Within 6 months of ChatGPT's release, activity on Stack Overflow decreased by 25% relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable. We interpret this estimate as a lower bound of the true impact of ChatGPT on Stack Overflow. The decline is larger for posts related to the most widely used programming languages. We find no significant change in post quality, measured by peer feedback, and observe similar decreases in content creation by more and less experienced users alike. Thus, LLMs are not only displacing duplicate, low-quality, or beginner-level content. Our findings suggest that the rapid adoption of LLMs reduces the production of public data needed to train them, with significant consequences.

Type: Article
Title: Large language models reduce public knowledge sharing on online Q&A platforms
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/pnasnexus/pgae400
Publisher version: https://doi.org/10.1093/pnasnexus/pgae400
Language: English
Additional information: Copyright © The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: AI, ChatGPT, online public goods, web
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10202775
Downloads since deposit
Loading...
15Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item