24 Comments
User's avatar
Jeanne's avatar

This is by far one of the most interesting articles I’ve read on LLM’s. Thank you!

Expand full comment
Alberto Romero's avatar

Thank you, Jeanne!

Expand full comment
Antonio Eleuteri's avatar

It always goes back to Judea Pearl's statement that without modelling causality, you get only "glorified regressors". And causal analysis is freaking hard compared to the "throw data at ever bigger models and hope it sticks" approach of the usual suspects.

Expand full comment
Stefano's avatar

Great essay and insights!

In the final part of the essay, the humorous infinite monkey theorem came to mind and the thought hit me, even if an LLM did make a discovery, wouldn't it automatically discard this precisely because it violates in some way its predictive probability programming of the "what", labelling it an error or deviation from the expected result? Whereas if it also or instead took into consideration the "why", partially at least, then it might be prone to consider a deviation from the expected as a potential "breakthrough" and go back to improve upon the model so it could better represent the"what" (or whatever knowledge it's addressing). By way of analogy, even if a monkey did hammer out a work of Shakespeare, it would be just as meaningless as everything preceding it and consigned to the trash heap just as quickly.

Expand full comment
Leo C's avatar

World Labs is trying to solve this and avoid the pure next-token prediction paradigm. Perhaps that is the next unlock beyond Scaling Laws on pure Transformers.

Expand full comment
Mark Cuban's avatar

Amazing. Great article

Expand full comment
Alberto Romero's avatar

Thank you Mark! Glad you liked it

Expand full comment
Mohamed Shamekh's avatar

Gary Marcus just a few weeks ago had written at length about LLMs not being able to encode world models. He’s either having a field day with this paper rn or he’s mad that somehow people found this shocking even though he said the same thing a while ago. Nevertheless, thank you for your nice additions and philosophical discussions specifically in the latter parts of this piece. Very informative and interesting!

Expand full comment
Alberto Romero's avatar

Yep, he's been saying this for a while

Expand full comment
Ricardo Acuña's avatar

The paragraph: “If you see someone pour juice into a cup and then knock the cup over, you understand, even without seeing the spill, that the juice will be on the floor”, recalled me the classic stages of spatial development in children as proposed by the Swiss developmental psychologist, Jean Piaget. Following a parallelism argument, let me do a comparison between the children’s cognitive development against the current LLM models, where it looks like the current LLM models lack a kind of cognitive development such as the stage of spatial development that occurs in all children to develop a “world model”.

In a nutshell the stages are as follows: a) Sensorimotor, 0–2 years, action-based, spatial understanding is action-based; b) Preoperational, 2–7 years, symbolic but egocentric, limited perspective-taking; c) Concrete Operational, 7–11 years, understand conservation of space, accurate spatial reasoning; d) Formal Operational, 11+ years, abstract and hypothetical spatial concepts.

Where in children the development is embodied, evolving, experiential, in contrast in LLM is static, data-driven, non-developmental. Children learn grounded and contextual meaning and understanding whereas LLM just simulates it by associative and statistical next-token prediction. LLM lacks grounded semantics, children can “feel” volume and space, LLM does not know what is to “feel”, its substrate is impeded to build a “real” world model. The LLM´s lack of embodiment (sensorimotor experience) to learn through action in the world is a huge limitation to build the object permanence, spatial reasoning, or proprioception.

A huge breakthrough is needed to have AI Models ready to make scientific discoveries, although LLM reasoning is a step forward, it is not enough. Maybe a neuro-symbolic AI or a brand-new AI cognitive architecture which encompases not only "artificial intelligence" but a full cognitive system. Who knows? ( Sakana.ai is trying). So, the time to have an AI with a "world model" to emulate the Kepler and Newton discoveries is a long way to go.

Expand full comment
Julia Thornton's avatar

Excellent article. Thanks for writing it. You explain very clearly.

The whole topic throws up the question of types of knowledge and reasoning, which as you say is deeply philosophical.

Aristotle had a good go at chopping it up into five different categories, some of which overlap with yours.

See for instance this article.

https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/phc3.12799

Your article also suggests that there might be some different kind of machine reasoning that might not follow the human mind model but could be useful anyway. That potentially opens up the question of AI eventually reaching a different form of consciousness (especially if consciousness turns out to be emergent from sheer complexity) that we may never understand.

Like you though, I think this is a long way off, if it ever happens.

Expand full comment
Alberto Romero's avatar

Thank you Julia!! Uuh the topic of consciousness, don't get me started (although I don't think intelligence and consciousness necessarily go together)

Expand full comment
Banned in Babylon's avatar

Thank you this was a very interesting read

Expand full comment
Nick Hounsome's avatar

I would be interested in the questions that were asked. A lot of LLM progress has been made by prompt engineering and, it seems to me, that the way to approach this is to ask something like "What is the simplest formula that you can think of that would match the observations to within x%?" . I don't think that this would lead to the LLM internalising a model and it might not even be practcal due to issues of recursion depth when trying to find "simplest", but it might produce better and more suggestive answers

Expand full comment
Javier Jurado's avatar

Qué interesante. La IA predice con precisión sin ser capaz de elevarse hasta las causas. Creo que hay un claro paralelismo con el poder de predicción de los epiciclos y ecuantes, matemáticamente capaces de describir cualquier trayectoria. Eso quiere decir que la IA podría no haber superado nunca al sistema ptolemaico geocéntrico.

Expand full comment
Javier Jurado's avatar

Qué interesante. La IA predice con precisión sin ser capaz de elevarse hasta las causas. Creo que hay un claro paralelismo con la capacidad de predicción de los epiciclos y ecuantes, matemáticamente capaces de describir cualquier trayectoria. Eso quiere decir que la IA podría no haber superado nunca al sistema ptolemaico geocéntrico.

Expand full comment
Javier Jurado's avatar

Very interesting, Alberto! I believe there’s a clear parallel with the predictive power of epicycles and equants, which were mathematically capable of describing any trajectory. That means AI might never have surpassed the Ptolemaic geocentric system.

https://newsletter.ingenierodeletras.com/p/creencias-que-iluminaron-el-cosmos

Expand full comment
Paul Bassett's avatar

One of AI’s insufficiencies is filtering conjectures through Ockham’s Razor.

Expand full comment
Roger Ison's avatar

Language is a very narrow bandwidth representation that works for communicating between two entities that each contain hugely complex but largely similar and compatible knowledge structures. A useful way to think about this is to say that a verbal description is a sort of program that causes the listener to generate in his mind something pretty similar to what the speaker had in his mind; or causes the listener to arrive at a similar internal state to what the speaker had in mind.

While humans are pretty good at generating and understanding language token streams, there's no reason to think that is the best, or only, internal representation of knowledge. For example, there are algorithms that can convert relationships (networks of connect points with lines among them) into a sequential representation, and vice-versa. That's part of "graph theory". But nobody would try to solve a graph theory problem when it's presented as a sequence of tokens, because the structure of the network is not evident. Even though our internal voices are useful for reasoning and other purposes, computer scientists and mathematicians know that choosing the right representation for a problem is often the key to solving it.

LLMs deal with everything as sequences of tokens. Their training is all about sequences of tokens. But the farther a problem's most natural representation is from token sequences, the less we should expect deep understanding to emerge. LLM style AI is focused too much on the communication channel representation. The architecture doesn't have a good place for other representations. LLMs are mechanical tools that have only language but lack the other facilities of a brain that operates in the seeing, touching, moving, sensing real world. This is a triumph of ingenuity that isn't yet complete.

Expand full comment
Theseus Smash's avatar

AI can’t drink beer

Expand full comment
Alberto Romero's avatar

We're still the better species at that

Expand full comment
Theseus Smash's avatar

What if they get jealous of our beer ability and the only thing that stops ASI world implosion is the robots just really want to get drunk and then tell us they love us, bro

Expand full comment