Ok, but there’s a lot of complexity hiding in the simple statement “they approximate the most probable sequence.”
And there is a way to make that human intelligible.
For instance, is the LLM using the proximity of tokens to one another throughout the data set to determine the probability one token will come after another? Ok, how many para…
Ok, but there’s a lot of complexity hiding in the simple statement “they approximate the most probable sequence.”
And there is a way to make that human intelligible.
For instance, is the LLM using the proximity of tokens to one another throughout the data set to determine the probability one token will come after another? Ok, how many parameters can you get out of that? Or is the grammar of the preceding tokens considered when choosing the next token? Or is there semantic meaning, perhaps etymology, maybe even metaphorical relationships elucidated through repetition in the training data? Maybe there are underlying structures to the ways we speak that we have no idea about whatsoever, but the LLMs picked them up as parameters.
All it does is look for patterns right?
Those are all types of patterns that it can replicate without “understanding.”
The same way most English speakers know pong ping and tok tik sound weird, but few know there’s a rule of grammar that the “i” sound should generally be listed before the “o” sound.
And don’t forget, LLMs aren’t the only type of Neural Net. What about recommendation algorithms? They maximize probability you’ll spend more time on an app.
What are their parameters? How do we get to 1 trillion?
Dear Geoff, yes it is precisely that. It predicts the answer to a question based on the probability of a next word following the prompt ( and some hidden context that the LlM adds based on some pre-programmed heuristics and rules). The free parameters of the model ate like the interpolation coefficients for curve fitting. It is like tabulating a function by listing all of its values, except that you don't include all values, but just some, which allow the others to be approximated.
Geoff, but who in her / his sanity would care about that or even understand what would be the purpose of that?
Look, respectfully, sometimes to fully grasp technical concepts we first need to understand the technical concepts to a certain depth and rigour. Otherwise, we are discussing in a vacuum of naive remarks. The first time in history the Lumiere brothers projected a film of an approaching train on the screen, the audience ran out of the room in panic, fearing that the train would somehow come out of the projector and kill then all. That was because they were creating a completely naive and wrong model of a technology completely new to them.
I am afraid the question you are asking is as relevant as asking what is the name of each byte in the mp4 file of a recording of the fifth symphony of Beethoven. The parameters of a LLM encode in a congressed way the shape of a mathematical function that predicts next tokens given a previous string. The parameters do not stand for domain concepts pretty much like the bytes of the mp4 file do not stand for trumpets or the words of Schiller poem Freude.
This is an analogy I hope may help you. Otherwise if you need something more precise about the role of parameters in a regression model routinely used everyday in predictive statistics.
I get the sense you didn’t read this will researched article. Or that you didn’t understand it. And I think the same may be true of my comments. Maybe you don’t see the difference between what something’s function is and how it works?
I find that unlikely. But I don’t understand why we aren’t connecting and it’s making me feel frustrated.
You’re clearly incredibly smart—slinging words around like depth, rigor and “a vacuum of naive remarks” (🤯!)—and you obviously have some specialization in this area… but I think I’ll go back to talking to ChatGPT. It at least pantomimes a desire to take the perspective of the person whose point of view it is criticizing.
Yes, I see... The problem here is that no pain no gain, it is always easier to confuse everything. The point is that it is inglorious to look for meaning in the parameters of a LLM.
This article points out nicely that ML AI lacks explanatory power, but it misses the point of WHY it is inherently so.
As for reading, instead of ChatGPT you may look to this open source book on deep learning. Sorry for any misunderstanding from my part, it just appalls me so much hype and misunderstanding about the way LlMs work... People keep anthropomorphizing and making misplaced comparisons with the human brain. The brain is a very complex dynamic system, a thinking machine, while an artificial neural net is essentially a compressed huge huge lookup table, fixed at training time.
Ok, but there’s a lot of complexity hiding in the simple statement “they approximate the most probable sequence.”
And there is a way to make that human intelligible.
For instance, is the LLM using the proximity of tokens to one another throughout the data set to determine the probability one token will come after another? Ok, how many parameters can you get out of that? Or is the grammar of the preceding tokens considered when choosing the next token? Or is there semantic meaning, perhaps etymology, maybe even metaphorical relationships elucidated through repetition in the training data? Maybe there are underlying structures to the ways we speak that we have no idea about whatsoever, but the LLMs picked them up as parameters.
All it does is look for patterns right?
Those are all types of patterns that it can replicate without “understanding.”
The same way most English speakers know pong ping and tok tik sound weird, but few know there’s a rule of grammar that the “i” sound should generally be listed before the “o” sound.
And don’t forget, LLMs aren’t the only type of Neural Net. What about recommendation algorithms? They maximize probability you’ll spend more time on an app.
What are their parameters? How do we get to 1 trillion?
Dear Geoff, yes it is precisely that. It predicts the answer to a question based on the probability of a next word following the prompt ( and some hidden context that the LlM adds based on some pre-programmed heuristics and rules). The free parameters of the model ate like the interpolation coefficients for curve fitting. It is like tabulating a function by listing all of its values, except that you don't include all values, but just some, which allow the others to be approximated.
Awesome! We agree. Thats it precisely.
Now name one trillion (an unfathomably huge number) of such parameters without handwaving. Literally name them.
You can’t.
I think the point is no one can.
Can you see how that pertains to your original question? That’s what more there is to understand.
Geoff, but who in her / his sanity would care about that or even understand what would be the purpose of that?
Look, respectfully, sometimes to fully grasp technical concepts we first need to understand the technical concepts to a certain depth and rigour. Otherwise, we are discussing in a vacuum of naive remarks. The first time in history the Lumiere brothers projected a film of an approaching train on the screen, the audience ran out of the room in panic, fearing that the train would somehow come out of the projector and kill then all. That was because they were creating a completely naive and wrong model of a technology completely new to them.
I am afraid the question you are asking is as relevant as asking what is the name of each byte in the mp4 file of a recording of the fifth symphony of Beethoven. The parameters of a LLM encode in a congressed way the shape of a mathematical function that predicts next tokens given a previous string. The parameters do not stand for domain concepts pretty much like the bytes of the mp4 file do not stand for trumpets or the words of Schiller poem Freude.
This is an analogy I hope may help you. Otherwise if you need something more precise about the role of parameters in a regression model routinely used everyday in predictive statistics.
I get the sense you didn’t read this will researched article. Or that you didn’t understand it. And I think the same may be true of my comments. Maybe you don’t see the difference between what something’s function is and how it works?
I find that unlikely. But I don’t understand why we aren’t connecting and it’s making me feel frustrated.
You’re clearly incredibly smart—slinging words around like depth, rigor and “a vacuum of naive remarks” (🤯!)—and you obviously have some specialization in this area… but I think I’ll go back to talking to ChatGPT. It at least pantomimes a desire to take the perspective of the person whose point of view it is criticizing.
Cheers!
Yes, I see... The problem here is that no pain no gain, it is always easier to confuse everything. The point is that it is inglorious to look for meaning in the parameters of a LLM.
This article points out nicely that ML AI lacks explanatory power, but it misses the point of WHY it is inherently so.
As for reading, instead of ChatGPT you may look to this open source book on deep learning. Sorry for any misunderstanding from my part, it just appalls me so much hype and misunderstanding about the way LlMs work... People keep anthropomorphizing and making misplaced comparisons with the human brain. The brain is a very complex dynamic system, a thinking machine, while an artificial neural net is essentially a compressed huge huge lookup table, fixed at training time.
https://www.deeplearningbook.org/