It always goes back to Judea Pearl's statement that without modelling causality, you get only "glorified regressors". And causal analysis is freaking hard compared to the "throw data at ever bigger models and hope it sticks" approach of the usual suspects.
In the final part of the essay, the humorous infinite monkey theorem came to mind and the thought hit me, even if an LLM did make a discovery, wouldn't it automatically discard this precisely because it violates in some way its predictive probability programming of the "what", labelling it an error or deviation from the expected result? Whereas if it also or instead took into consideration the "why", partially at least, then it might be prone to consider a deviation from the expected as a potential "breakthrough" and go back to improve upon the model so it could better represent the"what" (or whatever knowledge it's addressing). By way of analogy, even if a monkey did hammer out a work of Shakespeare, it would be just as meaningless as everything preceding it and consigned to the trash heap just as quickly.
World Labs is trying to solve this and avoid the pure next-token prediction paradigm. Perhaps that is the next unlock beyond Scaling Laws on pure Transformers.
Gary Marcus just a few weeks ago had written at length about LLMs not being able to encode world models. He’s either having a field day with this paper rn or he’s mad that somehow people found this shocking even though he said the same thing a while ago. Nevertheless, thank you for your nice additions and philosophical discussions specifically in the latter parts of this piece. Very informative and interesting!
The paragraph: “If you see someone pour juice into a cup and then knock the cup over, you understand, even without seeing the spill, that the juice will be on the floor”, recalled me the classic stages of spatial development in children as proposed by the Swiss developmental psychologist, Jean Piaget. Following a parallelism argument, let me do a comparison between the children’s cognitive development against the current LLM models, where it looks like the current LLM models lack a kind of cognitive development such as the stage of spatial development that occurs in all children to develop a “world model”.
In a nutshell the stages are as follows: a) Sensorimotor, 0–2 years, action-based, spatial understanding is action-based; b) Preoperational, 2–7 years, symbolic but egocentric, limited perspective-taking; c) Concrete Operational, 7–11 years, understand conservation of space, accurate spatial reasoning; d) Formal Operational, 11+ years, abstract and hypothetical spatial concepts.
Where in children the development is embodied, evolving, experiential, in contrast in LLM is static, data-driven, non-developmental. Children learn grounded and contextual meaning and understanding whereas LLM just simulates it by associative and statistical next-token prediction. LLM lacks grounded semantics, children can “feel” volume and space, LLM does not know what is to “feel”, its substrate is impeded to build a “real” world model. The LLM´s lack of embodiment (sensorimotor experience) to learn through action in the world is a huge limitation to build the object permanence, spatial reasoning, or proprioception.
A huge breakthrough is needed to have AI Models ready to make scientific discoveries, although LLM reasoning is a step forward, it is not enough. Maybe a neuro-symbolic AI or a brand-new AI cognitive architecture which encompases not only "artificial intelligence" but a full cognitive system. Who knows? ( Sakana.ai is trying). So, the time to have an AI with a "world model" to emulate the Kepler and Newton discoveries is a long way to go.
Your article also suggests that there might be some different kind of machine reasoning that might not follow the human mind model but could be useful anyway. That potentially opens up the question of AI eventually reaching a different form of consciousness (especially if consciousness turns out to be emergent from sheer complexity) that we may never understand.
Like you though, I think this is a long way off, if it ever happens.
I would be interested in the questions that were asked. A lot of LLM progress has been made by prompt engineering and, it seems to me, that the way to approach this is to ask something like "What is the simplest formula that you can think of that would match the observations to within x%?" . I don't think that this would lead to the LLM internalising a model and it might not even be practcal due to issues of recursion depth when trying to find "simplest", but it might produce better and more suggestive answers
Qué interesante. La IA predice con precisión sin ser capaz de elevarse hasta las causas. Creo que hay un claro paralelismo con el poder de predicción de los epiciclos y ecuantes, matemáticamente capaces de describir cualquier trayectoria. Eso quiere decir que la IA podría no haber superado nunca al sistema ptolemaico geocéntrico.
Qué interesante. La IA predice con precisión sin ser capaz de elevarse hasta las causas. Creo que hay un claro paralelismo con la capacidad de predicción de los epiciclos y ecuantes, matemáticamente capaces de describir cualquier trayectoria. Eso quiere decir que la IA podría no haber superado nunca al sistema ptolemaico geocéntrico.
Very interesting, Alberto! I believe there’s a clear parallel with the predictive power of epicycles and equants, which were mathematically capable of describing any trajectory. That means AI might never have surpassed the Ptolemaic geocentric system.
Language is a very narrow bandwidth representation that works for communicating between two entities that each contain hugely complex but largely similar and compatible knowledge structures. A useful way to think about this is to say that a verbal description is a sort of program that causes the listener to generate in his mind something pretty similar to what the speaker had in his mind; or causes the listener to arrive at a similar internal state to what the speaker had in mind.
While humans are pretty good at generating and understanding language token streams, there's no reason to think that is the best, or only, internal representation of knowledge. For example, there are algorithms that can convert relationships (networks of connect points with lines among them) into a sequential representation, and vice-versa. That's part of "graph theory". But nobody would try to solve a graph theory problem when it's presented as a sequence of tokens, because the structure of the network is not evident. Even though our internal voices are useful for reasoning and other purposes, computer scientists and mathematicians know that choosing the right representation for a problem is often the key to solving it.
LLMs deal with everything as sequences of tokens. Their training is all about sequences of tokens. But the farther a problem's most natural representation is from token sequences, the less we should expect deep understanding to emerge. LLM style AI is focused too much on the communication channel representation. The architecture doesn't have a good place for other representations. LLMs are mechanical tools that have only language but lack the other facilities of a brain that operates in the seeing, touching, moving, sensing real world. This is a triumph of ingenuity that isn't yet complete.
What if they get jealous of our beer ability and the only thing that stops ASI world implosion is the robots just really want to get drunk and then tell us they love us, bro
This is by far one of the most interesting articles I’ve read on LLM’s. Thank you!
Thank you, Jeanne!
It always goes back to Judea Pearl's statement that without modelling causality, you get only "glorified regressors". And causal analysis is freaking hard compared to the "throw data at ever bigger models and hope it sticks" approach of the usual suspects.
Great essay and insights!
In the final part of the essay, the humorous infinite monkey theorem came to mind and the thought hit me, even if an LLM did make a discovery, wouldn't it automatically discard this precisely because it violates in some way its predictive probability programming of the "what", labelling it an error or deviation from the expected result? Whereas if it also or instead took into consideration the "why", partially at least, then it might be prone to consider a deviation from the expected as a potential "breakthrough" and go back to improve upon the model so it could better represent the"what" (or whatever knowledge it's addressing). By way of analogy, even if a monkey did hammer out a work of Shakespeare, it would be just as meaningless as everything preceding it and consigned to the trash heap just as quickly.
World Labs is trying to solve this and avoid the pure next-token prediction paradigm. Perhaps that is the next unlock beyond Scaling Laws on pure Transformers.
Amazing. Great article
Thank you Mark! Glad you liked it
Gary Marcus just a few weeks ago had written at length about LLMs not being able to encode world models. He’s either having a field day with this paper rn or he’s mad that somehow people found this shocking even though he said the same thing a while ago. Nevertheless, thank you for your nice additions and philosophical discussions specifically in the latter parts of this piece. Very informative and interesting!
Yep, he's been saying this for a while
The paragraph: “If you see someone pour juice into a cup and then knock the cup over, you understand, even without seeing the spill, that the juice will be on the floor”, recalled me the classic stages of spatial development in children as proposed by the Swiss developmental psychologist, Jean Piaget. Following a parallelism argument, let me do a comparison between the children’s cognitive development against the current LLM models, where it looks like the current LLM models lack a kind of cognitive development such as the stage of spatial development that occurs in all children to develop a “world model”.
In a nutshell the stages are as follows: a) Sensorimotor, 0–2 years, action-based, spatial understanding is action-based; b) Preoperational, 2–7 years, symbolic but egocentric, limited perspective-taking; c) Concrete Operational, 7–11 years, understand conservation of space, accurate spatial reasoning; d) Formal Operational, 11+ years, abstract and hypothetical spatial concepts.
Where in children the development is embodied, evolving, experiential, in contrast in LLM is static, data-driven, non-developmental. Children learn grounded and contextual meaning and understanding whereas LLM just simulates it by associative and statistical next-token prediction. LLM lacks grounded semantics, children can “feel” volume and space, LLM does not know what is to “feel”, its substrate is impeded to build a “real” world model. The LLM´s lack of embodiment (sensorimotor experience) to learn through action in the world is a huge limitation to build the object permanence, spatial reasoning, or proprioception.
A huge breakthrough is needed to have AI Models ready to make scientific discoveries, although LLM reasoning is a step forward, it is not enough. Maybe a neuro-symbolic AI or a brand-new AI cognitive architecture which encompases not only "artificial intelligence" but a full cognitive system. Who knows? ( Sakana.ai is trying). So, the time to have an AI with a "world model" to emulate the Kepler and Newton discoveries is a long way to go.
Excellent article. Thanks for writing it. You explain very clearly.
The whole topic throws up the question of types of knowledge and reasoning, which as you say is deeply philosophical.
Aristotle had a good go at chopping it up into five different categories, some of which overlap with yours.
See for instance this article.
https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/phc3.12799
Your article also suggests that there might be some different kind of machine reasoning that might not follow the human mind model but could be useful anyway. That potentially opens up the question of AI eventually reaching a different form of consciousness (especially if consciousness turns out to be emergent from sheer complexity) that we may never understand.
Like you though, I think this is a long way off, if it ever happens.
Thank you Julia!! Uuh the topic of consciousness, don't get me started (although I don't think intelligence and consciousness necessarily go together)
Thank you this was a very interesting read
I would be interested in the questions that were asked. A lot of LLM progress has been made by prompt engineering and, it seems to me, that the way to approach this is to ask something like "What is the simplest formula that you can think of that would match the observations to within x%?" . I don't think that this would lead to the LLM internalising a model and it might not even be practcal due to issues of recursion depth when trying to find "simplest", but it might produce better and more suggestive answers
Qué interesante. La IA predice con precisión sin ser capaz de elevarse hasta las causas. Creo que hay un claro paralelismo con el poder de predicción de los epiciclos y ecuantes, matemáticamente capaces de describir cualquier trayectoria. Eso quiere decir que la IA podría no haber superado nunca al sistema ptolemaico geocéntrico.
Qué interesante. La IA predice con precisión sin ser capaz de elevarse hasta las causas. Creo que hay un claro paralelismo con la capacidad de predicción de los epiciclos y ecuantes, matemáticamente capaces de describir cualquier trayectoria. Eso quiere decir que la IA podría no haber superado nunca al sistema ptolemaico geocéntrico.
Very interesting, Alberto! I believe there’s a clear parallel with the predictive power of epicycles and equants, which were mathematically capable of describing any trajectory. That means AI might never have surpassed the Ptolemaic geocentric system.
https://newsletter.ingenierodeletras.com/p/creencias-que-iluminaron-el-cosmos
One of AI’s insufficiencies is filtering conjectures through Ockham’s Razor.
Language is a very narrow bandwidth representation that works for communicating between two entities that each contain hugely complex but largely similar and compatible knowledge structures. A useful way to think about this is to say that a verbal description is a sort of program that causes the listener to generate in his mind something pretty similar to what the speaker had in his mind; or causes the listener to arrive at a similar internal state to what the speaker had in mind.
While humans are pretty good at generating and understanding language token streams, there's no reason to think that is the best, or only, internal representation of knowledge. For example, there are algorithms that can convert relationships (networks of connect points with lines among them) into a sequential representation, and vice-versa. That's part of "graph theory". But nobody would try to solve a graph theory problem when it's presented as a sequence of tokens, because the structure of the network is not evident. Even though our internal voices are useful for reasoning and other purposes, computer scientists and mathematicians know that choosing the right representation for a problem is often the key to solving it.
LLMs deal with everything as sequences of tokens. Their training is all about sequences of tokens. But the farther a problem's most natural representation is from token sequences, the less we should expect deep understanding to emerge. LLM style AI is focused too much on the communication channel representation. The architecture doesn't have a good place for other representations. LLMs are mechanical tools that have only language but lack the other facilities of a brain that operates in the seeing, touching, moving, sensing real world. This is a triumph of ingenuity that isn't yet complete.
AI can’t drink beer
We're still the better species at that
What if they get jealous of our beer ability and the only thing that stops ASI world implosion is the robots just really want to get drunk and then tell us they love us, bro