On the Limits of Large Language Models in Operating in vs Discovering New Fields of Study
This post originated in response to a post by Victor Taelin on X: https://x.com/VictorTaelin/status/1865144235227517053
LLMs should, within their limitations, still eventually be able to discover new fields of study. My view on their limitations is that: while they'll struggle to operate within novel theories, they will still be able to discover and arrive at them. Consider the provided example of QM:
A few years before quantum physics was discovered, its core ideas were completely outside of human discourse, thoughts, and no amount of circling the same box (which is what reasoning models do) would get us there
However, that’s not quite the full picture.
Twenty to thirty years before quantum theory, Boltzmann came incredibly close to initiating quantum physics. Starting from a correct probabilistic model for the dispersion of discrete energy quanta among molecules—a framework from which the Planck distribution, and so also, the Bose-Einstein distribution, can be derived, Boltzmann stopped just short of such results because he viewed his discretization approach a mathematical contrivance, not physically realistic.
It’s also worth noting that Boltzmann, by arguing for the existence of atoms, was already something of a heretic in his time too. Nonetheless, his work—particularly the Boltzmann factor—was central to the development of both quantum and statistical mechanics. Indeed, Planck’s initial work on quantum theory, his hand forced by the need to resolve the ultraviolet catastrophe, was heavily based on Boltzmann’s ideas, including taking his discretization concept seriously. Boltzmann's contributions were far ahead of their time and highly influential on the foundations laid by Planck and Einstein for quantum theory.
The fact that the initial foundation of quantum theory, laid by both Planck and Einstein, relied as much as it did on the work of Boltzmann is not consistent with the claim that a "few years before quantum physics was discovered, its core ideas were completely outside of human discourse".
In theory, having expertise in the methods of combinatorics and probability, knowledge of thermodynamics, familiarity with Maxwell's kinetic theory of gases (itself based on methods for error statistics in astronomy) and classical mechanics—all available and known to many of the time—could have led to the development of quantum physics with ideas well within the discourse of a time decades prior to its actual development.
While I do not consider modern LLMs to be capable of a Boltzmann-level leap, there is nothing stopping them in principle from eventually being able to do so, current limitations considered. The Boltzmann leap was not a matter of operating within a novel theory, but of finding and selecting the right perspective on already available knowledge.
As long as the path to what is known is not too long, they should be capable of such feats. Issues arise only in paths distant from what is known—since LLMs cannot adapt to new information, all new learning must occur in context. But there is both insufficient time and space to learn and do anything of depth, while reasoning must remain driven primarily by patterns and abstract program templates derived exclusively from existing knowledge. At least, that is how I think about it. While I'm sympathetic to the idea of LLM derived AGI as unlikely, I do believe human-LLM based symbiotic ASIs (weakly so) to be feasible.


