this is one of your better posts. I don't fully agree with you on the "averaging" interpretation... that's a bit too simplistic, and is the same error I made when first judging MidJourney. And yet... I can say, it has challenges with anomalies and outliers. You repeat the misconception -- *intentionally*, you *know* this to be false -- t…
this is one of your better posts. I don't fully agree with you on the "averaging" interpretation... that's a bit too simplistic, and is the same error I made when first judging MidJourney. And yet... I can say, it has challenges with anomalies and outliers. You repeat the misconception -- *intentionally*, you *know* this to be false -- that GPT has "memorized" the internet. Two clarifications:
a) The training dataset comprises significantly less than 1/3rd of the internet. And certainly (at this point) does not include video, which is a massive store of untapped information.
b) It isn't, as we now understand, memorization. Its fractal compression. Its pattern recognition. Its much much much more similar to the highly imperfect mechanism of human memory than it is like storing to a database or a hard drive with error-correction and fault-tolerance. From my understanding, GPT's method of "memory" is basically reconstructing context from pattern that was "burned in" to its neural net while digesting the training dataset and then re-re-inforced with months of RLHF. So it's much much more like reconstructive, symbolic human memory -- stories grown from "idea seeds," abstract relations of disparate concepts, strange triggers (smell) to expand massive sensory concepts (that day we met) -- than it is to literal bit-for-bit file storage.
"that GPT has 'memorized' the internet" Yes, I know that's not accurate, it was a bit of artistic license (that's why I put the quotation marks). Yet, one could wonder if so many parameters could allow for memorization of some bits if they're repeated enough times during training...
> one could wonder if... memorization of some bits.
certainly. that's why it's mostly (~99%+) accurate on most (~98%) "common" queries... queries that have 1,000,000+ google results. but it's truly "lossy", like jpeg. And like JPEG (at reasonable compression ratios), for 99.99% of uses, users don't need pixel-accuracy (i.e. TIFF, RAW). (sidenote: nor, btw, do users generally need mathematical accuracy past the 2nd decimal). So you compress "the memorized internet" dataset (?800TB?) into a neural net that fits on a laptop (?2TB?). I'm guessing at those sizes, but I think i'm within an order of magnitude on both figures. Its rocking compression any way you look at it, and that's not even giving credit to the embedded "contextual understanding" and functionality of an LLM.
That's why I mentioned fractal compression, which I think is the most accurate "memory" analogy. What GPT does is look at an oak tree, then look at 10,000 oak trees, and somehow back-derives the DNA of the "seeds" that created those trees, which is an insane form of compression. This model was recently validated with oToy releasing a new 3d model standard (as opposed to polygons & NURBs) called the "neural object model". It takes a 3d object and "digests" it via a neural net, into a seed. it can then hyper-efficiently "re-grow / generate" the model based on the seed, much like LLMs grow/generate responses.
Thank you for your service to the community, Alberto! Keep it up!
this is one of your better posts. I don't fully agree with you on the "averaging" interpretation... that's a bit too simplistic, and is the same error I made when first judging MidJourney. And yet... I can say, it has challenges with anomalies and outliers. You repeat the misconception -- *intentionally*, you *know* this to be false -- that GPT has "memorized" the internet. Two clarifications:
a) The training dataset comprises significantly less than 1/3rd of the internet. And certainly (at this point) does not include video, which is a massive store of untapped information.
b) It isn't, as we now understand, memorization. Its fractal compression. Its pattern recognition. Its much much much more similar to the highly imperfect mechanism of human memory than it is like storing to a database or a hard drive with error-correction and fault-tolerance. From my understanding, GPT's method of "memory" is basically reconstructing context from pattern that was "burned in" to its neural net while digesting the training dataset and then re-re-inforced with months of RLHF. So it's much much more like reconstructive, symbolic human memory -- stories grown from "idea seeds," abstract relations of disparate concepts, strange triggers (smell) to expand massive sensory concepts (that day we met) -- than it is to literal bit-for-bit file storage.
"that GPT has 'memorized' the internet" Yes, I know that's not accurate, it was a bit of artistic license (that's why I put the quotation marks). Yet, one could wonder if so many parameters could allow for memorization of some bits if they're repeated enough times during training...
> one could wonder if... memorization of some bits.
certainly. that's why it's mostly (~99%+) accurate on most (~98%) "common" queries... queries that have 1,000,000+ google results. but it's truly "lossy", like jpeg. And like JPEG (at reasonable compression ratios), for 99.99% of uses, users don't need pixel-accuracy (i.e. TIFF, RAW). (sidenote: nor, btw, do users generally need mathematical accuracy past the 2nd decimal). So you compress "the memorized internet" dataset (?800TB?) into a neural net that fits on a laptop (?2TB?). I'm guessing at those sizes, but I think i'm within an order of magnitude on both figures. Its rocking compression any way you look at it, and that's not even giving credit to the embedded "contextual understanding" and functionality of an LLM.
That's why I mentioned fractal compression, which I think is the most accurate "memory" analogy. What GPT does is look at an oak tree, then look at 10,000 oak trees, and somehow back-derives the DNA of the "seeds" that created those trees, which is an insane form of compression. This model was recently validated with oToy releasing a new 3d model standard (as opposed to polygons & NURBs) called the "neural object model". It takes a 3d object and "digests" it via a neural net, into a seed. it can then hyper-efficiently "re-grow / generate" the model based on the seed, much like LLMs grow/generate responses.
Thank you for your service to the community, Alberto! Keep it up!