Chen, Yihong;
(2025)
Structure and Destructure: Dual Forces in the Making of Knowledge Engines.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
thesis.pdf - Accepted Version Download (21MB) | Preview |
Abstract
The making of knowledge engines in natural language processing has been shaped by two seemingly distinct paradigms: one grounded in structure, the other driven by massively available unstructured data. The structured paradigm leverages predefined symbolic interactions—such as knowledge graphs—as priors, and designs models to capture such priors. In contrast, the unstructured paradigm centres on scaling transformer architectures with increasingly vast data and model sizes, as seen in modern large language models. Despite their divergence, this thesis seeks to establish conceptual connections that bridge these two paradigms. Two key connections are identified: • Structure Formation: Self-supervised objectives, such as language modelling, induce structural patterns in model computation across both paradigms. These objectives support data graph reconstruction, facilitating link prediction in the structured paradigm and providing interpretability in the unstructured paradigm through extracted n-gram patterns. • Destructure for Plasticity: Embeddings, a critical yet often overlooked component in both paradigms, cache message-passing computations over symbols during training. However, excessive caching can hinder generalization. Embedding forgetting, defined as the periodic reset of embedding weights, improves model plasticity and enables generalization to previously unseen scenarios, such as novel predicates or languages. These connections form a new recipe for developing general knowledge engines, where the guidelines not only include modelling of the seen symbolic interactions but also modelling of the unseen, the latter being relatively under-explored. Efficiently modelling the seen necessitates structure formation, regardless of whether the data is inherently structured or not. Conversely, modelling the unseen benefits from active destructuring of the learned cache, which promotes robustness and adaptability. By bridging the two paradigms, this thesis establishes structure and destructure as complementary forces in the design of knowledge engines that can support transparent, controllable, and adaptable intelligent systems.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Structure and Destructure: Dual Forces in the Making of Knowledge Engines |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
Keywords: | Artificial Intelligence, Knowledge Engine |
UCL classification: | UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL |
URI: | https://discovery.ucl.ac.uk/id/eprint/10211291 |
Archive Staff Only
![]() |
View Item |