Stephen Wolfram, a renowned computational physicist, has recently mentioned on podcasts that he is using the LLMs, and he is perplexed why anyone wouldn't.
Agreed. I programmed an AI agent to do 90% of the deep research for my latest non-fiction book. Supplementing human intelligence is the effective use of human intelligence.
Absolutely why not? I did Physics as my degree - it was super hard back then - before personal computers. We would have killed to have these tools available. Would totally be next level - and produce graduates with even better science skills. Looking forward to seeing what his graduates can dream and achieve.
Math, science and coding are also the skills that AI researchers need to improve AI... any improvement in AI research through AI tooling could lead to a feedback loop and it will be interesting to keep track of developments on that front
Yeah, they can use it as tutors because they are actual physicist that can discern when the model hallucinates. I can use it to code because I can read code and verify that it is correct, I think this actually supports the opposite to the point you are trying to make
That llms are underestimated by “laypeople” ( kind of dismissive term btw) and justifying llms hype. Llms can create very good summaries, and you can query info much more ergonomically with natural language, but people without domain expertise can’t really on them blindly
I agree with you. Laypeople is not a dismissive term in this context. Is a term that works for differentiating experts from non experts (in this case, in physics)
This could be the case even if LLMs were half as clever. "Physics professors continue to read books on physics" is hardly a surprising statement, so to the extent that reading LLM output is quite literally a proxy for reading books and discussion forums and all the other training source materials they were made from, this is the nothingest of nothing burgers.
Sorry, I won't be using LLMs to create AI slop based on work stolen from people like me. I'd say these profs are lazy. Use your own human intelligence.
That's the Luddite view - prompt based querying is the new normal - for example querying research papers on a specific topic to glean what the published science says - can be done in minutes instead of days. Students absolutely need to be abreast of these skills.
No it's not a buddies view. Students pay tuition to learn from the guidance and feedback from professors, not LLMs that ripped off others and generate unreliable slop.
The Podcast is 2:43:37 long. The question that elicits the response about tutors is at 0:45:46, a bit more than a quarter of the way through.
As for your comments about "laypeople" vs. physics professors, let's not forget that physics professors -- even great geniuses -- once supported the ether, caloric, astrology, and the notion that by the late 19th Century everything about physics had been discovered. (Some might include string theory in that list, too.)
The experience of most people with LLMs is with ChatGPT and its commercial, off-the-shelf siblings. These unquestionably are filled with imprecisions, errors and fabrications that justify the negative opinion of "laypeople," i.e., people who can't create their own LLMs (or certain laypeople, because many of them, e.g., undergraduates like my students, are a vast population of credulous users).
E.g., I just now ran a (non-paying) ChatGPT search on "Find and list the top 10 papers in scientific journals about contemporary ocean biodiversity in 2024, with full citations." "Contemporary" was added because without it, almost all returns were paleontological.
I got zero with full cite information, and at most two that could fit the remaining criteria (if one counts a cite to a Wikipedia footnote). The rest were to arXiv preprints, reports on NPO websites, The White House website, etc., and/or weren't mainly about marine biodiversity.
Similarly, a search on "What are the most significant papers in the Astrophysical Journal published in 2024 to date , with citations and criteria of significance?" gave 3 hits, 2 of which were to Ap.J. Letters, and not the Ap.J.; none had full cites. Plus, the introduction to the LLM's list said "Significance in this context is determined by factors such as citation counts, contributions to advancing the field, and recognition through awards," when all the listed papers came from one IOP webpage that used citation count as the sole criterion.
If this is the LLM that physics profs are relying on, then your rhetorical appeal to authority is off-base: the skeptical "laypeople" are right, and the profs are misguided, or lazy, or spend a lot of time de-bugging their searches. Adam Brown's comment (in his answer to the question at the timestamp above) that physics profs use LLMs to "debug" their own understanding of a topic also becomes quite scary.
OTOH, if physics profs are creating their own LLMs or otherwise using some tool not available to the "lay" public, one might be more open-minded about the the use Prof. Brown describes, until more can be learned about the practice. In that case, though, your rhetorical comparison between "laypeople" and physics profs would also be misleading, because you'd be eliding the fact that you're comparing different sorts of LLM usage, the proverbial apples and oranges.
A couple of things. ChatGPT doesn't work to retrieve citations. There are other tools for that (like Deep Research by Google). No one who knows how to use ChatGPT (or his other off-the-shelf siblings) would do that. (If you do that, you don't know how to use ChatGPT so my statement stands.) Those physics professors are not creating their own LLMs because it's not necessary. They just need to know enough about the subject to keep ChatGPT from saying stupid things that don't make sense. It seems you didn't like the term "laypeople", as it happens it's a great way to separate experts from non-experts. Don't you think that distinction is necessary?
Thanks for your reply. What I didn't like was the form of your argument, which in rhetoric is called the informal fallacy of appeal to authority. ('Informal' because the fallacy doesn't arise purely as a matter of logical form.) The form of your argument is that what "laypeople" think about LLMs can't be right because "experts" (sc., experts about physics, not about LLMs) use LLMs for their expert activities.
1. My point about ChatGPT et al. and citations isn't simply that that those LLMs are bad at citations. It's that (1) "laypeople" who disparage LLMs are basing their opinions on that set of commercially available LLMs, and (2) it is very reasonable for them to disparage those LLMs for that purpose, as you yourself seem to agree.
You seem to be saying that experts use a *different* set of LLMs for their research. Then this is the situation in the last paragraph of my comment. In that case, there isn't necessarily any contradiction between what "laypeople" say and what "experts" say, because they're talking about different sets of LLMs. The subtext of your argument, that "experts" are right and "laypeople" are wrong, then fails -- at least in the matter of citations.
(BTW, as for me personally I use arXiv and its siblings, cites and reference lists in papers, and other forms of online search to find papers -- not ChatGPT.)
2. I believe you're addressing the more general argument, about LLMs as tutors, when you say physics experts "just need to know enough about the subject to keep ChatGPT from saying stupid things that don't make sense [in singular and plural, 'STTDMS' for short]."
This suggests that they do use ChatGPT et al. for *this* purpose. Of course, if they don't, then we're back at the situation in my last paragraph, apples vs oranges, as with citations. So in the sequel, I'll assume physics profs are using ChatGPT et al. as tutors.
Then I think your remark about STTDMS misidentifies the problem. Given the way that LLMs work, there isn't any way to "keep it from saying" something stupid. Still, eventually there may be a point where after sufficient tailoring of prompts the expert user can feel that nothing in the output is an STTDMS. That's the easy part. But in the hypothesized use case, the user isn't an expert in the field of inquiry, while being an expert in an adjacent field. So the real problem concerns when the LLM outputs *wrong things that appear to make sense* ("WTTATMS") -- how can the adjacent-expert user distinguish WTTATMS from correct things?
E.g., suppose I got a Ph.D. in physics 30 years ago, and now am a professor of biophysics, an expert on the mechanics of animal locomotion. (Or professor of astrophysics, polymer physics, architectural acoustics, etc.) And now I decide I want to learn more about topological insulators (TIs), which I keep encountering as I page through Nature and Science. My only background in solid state is a grad course I took 30+ years ago; and I have zero background in algebraic topology, Chern classes, or other abstruse pertinent areas of math. I ask ChatGPT for a tutorial on TIs. This is not an easy subject. How the heck am I going to be able to identify a WTTATMS in the output? In fact, I might even overlook some STTDMS.
Physics profs (like anyone else) can be very confident about things that turn out to be wrong. And from observation I would guess that physics profs, along with economists and lawyers, tend to be more prone to overconfidence than experts in other fields, on average. (Confession: my AB is in physics, I am a lawyer, and I have written and taught about economics.)
Given that an LLM like ChatGPT et al. doesn't have any model of reality nor any understanding of the material it has been trained on, then
(1) the likelihood of the output containing some WTTATMS relative to a given user seems high, and
(2) the likelihood of the user being unable to distinguish WTTATMS from correct statements is ≥ (1 - Q•U), where Q•U is the dot product of the queried area of expertise (Q) and the user's area of expertise (U) -- to speak metaphorically.
In the podcast, Adam Brown admits that the best way to find out about a field is to consult a colleague who knows about it. So maybe the reliance of physics profs on LLMs as tutors is overstated, in which case the supposed conflict between "layperson" and "expert" is deflated. But for those who do rely on LLMs as tutors, the more the user needs a tutor the less they may be able to recognize WTTATMS, (while possibly being overconfident that they can recognize them). I.e., they shouldn't be relying on the LLM. That's consistent with the "lay" misgivings about LLMs, and so doesn't support the experts-right-laypersons-wrong argument in the original post.
Sorry for the long and pedantic explanation; see "Confession" above.
Oh, now I see where the misunderstanding lies! Contrary to what you say, I don't think laypeople are wrong in saying LLMs are dumb because physicists are using them to learn physics. I'm just stating the interesting contrast between where ones and the others choose to put the emphasis when thinking about--or using--these tools. LLMs can be both dumb and useful physics tutors. How you choose to conceptualize them says more about you than about them, poor shapeless amoebas; mirrors of our frustrations and object of our projections.
I believe everything else you wrote is downstream from having misunderstood my position and argument, perhaps a result of my non-bias toward physicists, whom I don't consider more overconfident than the general person--if anything I'd say they're more aware of what they don't know than the laypeople who confidently frame LLMs to match their biased--and tragically close-minded--preconceptions.
Thanks for your reply. After a first scan I was tempted to "like" it and leave it there, but your last sentence made me unsure of whether there was an implicit ad hominem snipe there. Also, why do you base your argument on my supposed "frustrations" and "projections"? I didn't make any comments personally directed at you.
Your OP says "Why do laypeople feel so confident declaring that large language models are dumb, know nothing, make foolish mistakes, or engage in faulty reasoning—extrapolating anecdotal instances to define their entire understanding of this technology—but knowledgeable physicists are using them to learn physics? [¶] Brown’s words should make you reconsider your idea of what AI is capable of."
This is the seed of the alleged misunderstanding. What did I miss? Why would anyone need to reconsider, if they're not among those who are wrong, in your view?
I note also that you don't base your argument on anything about under-the-hood LLM technology per se that "laypeople" misunderstand and that physicists do understand. The basis of your argument is simply the anecdotal testimony of one physicist in a podcast, that physicists are comfortable with using LLMs. A classic instance of appeal to authority.
I don't think my argument about WTTATMS relies on the overconfidence vel non of any particular profession: it's a more general epistemological problem, and at the level of an individual user. To that extent my remarks about overconfidence could be seen as gratuitous, and I apologize -- although I included myself as a self-deprecated object of those remarks. Apropos of that, physics has been part of my career and my recreational interests for half a century. I'm not sure how my positive bias towards physicists could create the misunderstanding you speak of.
(BTW, I said physicists on the whole were more overconfident than most *experts,* not people in general. Evidence for this is not only personal observation of individuals (anecdotal! and therefore wrong, right?), but that one more often sees physicists and economists trying to reduce other fields, e.g. biology or education, to the categories of their own field rather than the other way round. Of course there are many individual exceptions, too -- some of my best friends are physics profs, etc. 🙂)
Finally, I wasn't sure whether you included me as among "those laypeople who confidently frame LLMs to match their biased--and tragically close-minded--preconceptions." There may be such people, but I'd never heard of LLMs until early 2023. Then I tried using them and also read David Foster's book 2nd ed. and consulted with experts at my university. So my conceptions are ex post, not ex ante. But why should I need to demonstrate my lack of "tragically close-minded preconceptions" when I've been a premium subscriber to this Substack for more than a year? -- Happy New Year 🎉
I didn't refer to your frustrations or projections, though. I'm referring to those of laypeople (particularly critics of LLMs). (That "you" in the sentence is not a you "you" but a generic "you". Could have said "how one chooses to conceptualize...")
Reconsidering one's idea of something can be to correct a wrong view but also to expand an incomplete one. It's the latter in this case.
I don't see anything wrong with appeal to authority if it makes sense beyond just authority. Physicists are experts in physics. The fact that they are using LLMs to learn physics should tell people the kind of thing an LLM can do. It's not just dumb or a generator of unreliable slop.
Finally, no I sure didn't include you in that last sentence. Again, I'm only talking about the people I refer to in the OP.
Stephen Wolfram, a renowned computational physicist, has recently mentioned on podcasts that he is using the LLMs, and he is perplexed why anyone wouldn't.
Exactly
Agreed. I programmed an AI agent to do 90% of the deep research for my latest non-fiction book. Supplementing human intelligence is the effective use of human intelligence.
Have you tried Deep Research by Google?
Yep. But I prefer the ability to build my own agent to do my dirty work. Also, having a commercial license for Chat-GPT is even more helpful.
Absolutely why not? I did Physics as my degree - it was super hard back then - before personal computers. We would have killed to have these tools available. Would totally be next level - and produce graduates with even better science skills. Looking forward to seeing what his graduates can dream and achieve.
100%
Math, science and coding are also the skills that AI researchers need to improve AI... any improvement in AI research through AI tooling could lead to a feedback loop and it will be interesting to keep track of developments on that front
Yeah, they can use it as tutors because they are actual physicist that can discern when the model hallucinates. I can use it to code because I can read code and verify that it is correct, I think this actually supports the opposite to the point you are trying to make
And what point do you think I'm trying to make?
That llms are underestimated by “laypeople” ( kind of dismissive term btw) and justifying llms hype. Llms can create very good summaries, and you can query info much more ergonomically with natural language, but people without domain expertise can’t really on them blindly
I agree with you. Laypeople is not a dismissive term in this context. Is a term that works for differentiating experts from non experts (in this case, in physics)
Rely*
This could be the case even if LLMs were half as clever. "Physics professors continue to read books on physics" is hardly a surprising statement, so to the extent that reading LLM output is quite literally a proxy for reading books and discussion forums and all the other training source materials they were made from, this is the nothingest of nothing burgers.
Do yourself a favor and watch the podcast episode
Sorry, I won't be using LLMs to create AI slop based on work stolen from people like me. I'd say these profs are lazy. Use your own human intelligence.
That's ok. I agree. I don't think it's good that we generate stuff and publish it online. That's lazy. That's also not what this post is about
That's the Luddite view - prompt based querying is the new normal - for example querying research papers on a specific topic to glean what the published science says - can be done in minutes instead of days. Students absolutely need to be abreast of these skills.
No it's not a buddies view. Students pay tuition to learn from the guidance and feedback from professors, not LLMs that ripped off others and generate unreliable slop.
The Podcast is 2:43:37 long. The question that elicits the response about tutors is at 0:45:46, a bit more than a quarter of the way through.
As for your comments about "laypeople" vs. physics professors, let's not forget that physics professors -- even great geniuses -- once supported the ether, caloric, astrology, and the notion that by the late 19th Century everything about physics had been discovered. (Some might include string theory in that list, too.)
The experience of most people with LLMs is with ChatGPT and its commercial, off-the-shelf siblings. These unquestionably are filled with imprecisions, errors and fabrications that justify the negative opinion of "laypeople," i.e., people who can't create their own LLMs (or certain laypeople, because many of them, e.g., undergraduates like my students, are a vast population of credulous users).
E.g., I just now ran a (non-paying) ChatGPT search on "Find and list the top 10 papers in scientific journals about contemporary ocean biodiversity in 2024, with full citations." "Contemporary" was added because without it, almost all returns were paleontological.
I got zero with full cite information, and at most two that could fit the remaining criteria (if one counts a cite to a Wikipedia footnote). The rest were to arXiv preprints, reports on NPO websites, The White House website, etc., and/or weren't mainly about marine biodiversity.
Similarly, a search on "What are the most significant papers in the Astrophysical Journal published in 2024 to date , with citations and criteria of significance?" gave 3 hits, 2 of which were to Ap.J. Letters, and not the Ap.J.; none had full cites. Plus, the introduction to the LLM's list said "Significance in this context is determined by factors such as citation counts, contributions to advancing the field, and recognition through awards," when all the listed papers came from one IOP webpage that used citation count as the sole criterion.
If this is the LLM that physics profs are relying on, then your rhetorical appeal to authority is off-base: the skeptical "laypeople" are right, and the profs are misguided, or lazy, or spend a lot of time de-bugging their searches. Adam Brown's comment (in his answer to the question at the timestamp above) that physics profs use LLMs to "debug" their own understanding of a topic also becomes quite scary.
OTOH, if physics profs are creating their own LLMs or otherwise using some tool not available to the "lay" public, one might be more open-minded about the the use Prof. Brown describes, until more can be learned about the practice. In that case, though, your rhetorical comparison between "laypeople" and physics profs would also be misleading, because you'd be eliding the fact that you're comparing different sorts of LLM usage, the proverbial apples and oranges.
A couple of things. ChatGPT doesn't work to retrieve citations. There are other tools for that (like Deep Research by Google). No one who knows how to use ChatGPT (or his other off-the-shelf siblings) would do that. (If you do that, you don't know how to use ChatGPT so my statement stands.) Those physics professors are not creating their own LLMs because it's not necessary. They just need to know enough about the subject to keep ChatGPT from saying stupid things that don't make sense. It seems you didn't like the term "laypeople", as it happens it's a great way to separate experts from non-experts. Don't you think that distinction is necessary?
Thanks for your reply. What I didn't like was the form of your argument, which in rhetoric is called the informal fallacy of appeal to authority. ('Informal' because the fallacy doesn't arise purely as a matter of logical form.) The form of your argument is that what "laypeople" think about LLMs can't be right because "experts" (sc., experts about physics, not about LLMs) use LLMs for their expert activities.
1. My point about ChatGPT et al. and citations isn't simply that that those LLMs are bad at citations. It's that (1) "laypeople" who disparage LLMs are basing their opinions on that set of commercially available LLMs, and (2) it is very reasonable for them to disparage those LLMs for that purpose, as you yourself seem to agree.
You seem to be saying that experts use a *different* set of LLMs for their research. Then this is the situation in the last paragraph of my comment. In that case, there isn't necessarily any contradiction between what "laypeople" say and what "experts" say, because they're talking about different sets of LLMs. The subtext of your argument, that "experts" are right and "laypeople" are wrong, then fails -- at least in the matter of citations.
(BTW, as for me personally I use arXiv and its siblings, cites and reference lists in papers, and other forms of online search to find papers -- not ChatGPT.)
2. I believe you're addressing the more general argument, about LLMs as tutors, when you say physics experts "just need to know enough about the subject to keep ChatGPT from saying stupid things that don't make sense [in singular and plural, 'STTDMS' for short]."
This suggests that they do use ChatGPT et al. for *this* purpose. Of course, if they don't, then we're back at the situation in my last paragraph, apples vs oranges, as with citations. So in the sequel, I'll assume physics profs are using ChatGPT et al. as tutors.
Then I think your remark about STTDMS misidentifies the problem. Given the way that LLMs work, there isn't any way to "keep it from saying" something stupid. Still, eventually there may be a point where after sufficient tailoring of prompts the expert user can feel that nothing in the output is an STTDMS. That's the easy part. But in the hypothesized use case, the user isn't an expert in the field of inquiry, while being an expert in an adjacent field. So the real problem concerns when the LLM outputs *wrong things that appear to make sense* ("WTTATMS") -- how can the adjacent-expert user distinguish WTTATMS from correct things?
E.g., suppose I got a Ph.D. in physics 30 years ago, and now am a professor of biophysics, an expert on the mechanics of animal locomotion. (Or professor of astrophysics, polymer physics, architectural acoustics, etc.) And now I decide I want to learn more about topological insulators (TIs), which I keep encountering as I page through Nature and Science. My only background in solid state is a grad course I took 30+ years ago; and I have zero background in algebraic topology, Chern classes, or other abstruse pertinent areas of math. I ask ChatGPT for a tutorial on TIs. This is not an easy subject. How the heck am I going to be able to identify a WTTATMS in the output? In fact, I might even overlook some STTDMS.
Physics profs (like anyone else) can be very confident about things that turn out to be wrong. And from observation I would guess that physics profs, along with economists and lawyers, tend to be more prone to overconfidence than experts in other fields, on average. (Confession: my AB is in physics, I am a lawyer, and I have written and taught about economics.)
Given that an LLM like ChatGPT et al. doesn't have any model of reality nor any understanding of the material it has been trained on, then
(1) the likelihood of the output containing some WTTATMS relative to a given user seems high, and
(2) the likelihood of the user being unable to distinguish WTTATMS from correct statements is ≥ (1 - Q•U), where Q•U is the dot product of the queried area of expertise (Q) and the user's area of expertise (U) -- to speak metaphorically.
In the podcast, Adam Brown admits that the best way to find out about a field is to consult a colleague who knows about it. So maybe the reliance of physics profs on LLMs as tutors is overstated, in which case the supposed conflict between "layperson" and "expert" is deflated. But for those who do rely on LLMs as tutors, the more the user needs a tutor the less they may be able to recognize WTTATMS, (while possibly being overconfident that they can recognize them). I.e., they shouldn't be relying on the LLM. That's consistent with the "lay" misgivings about LLMs, and so doesn't support the experts-right-laypersons-wrong argument in the original post.
Sorry for the long and pedantic explanation; see "Confession" above.
Oh, now I see where the misunderstanding lies! Contrary to what you say, I don't think laypeople are wrong in saying LLMs are dumb because physicists are using them to learn physics. I'm just stating the interesting contrast between where ones and the others choose to put the emphasis when thinking about--or using--these tools. LLMs can be both dumb and useful physics tutors. How you choose to conceptualize them says more about you than about them, poor shapeless amoebas; mirrors of our frustrations and object of our projections.
I believe everything else you wrote is downstream from having misunderstood my position and argument, perhaps a result of my non-bias toward physicists, whom I don't consider more overconfident than the general person--if anything I'd say they're more aware of what they don't know than the laypeople who confidently frame LLMs to match their biased--and tragically close-minded--preconceptions.
Thanks for your reply. After a first scan I was tempted to "like" it and leave it there, but your last sentence made me unsure of whether there was an implicit ad hominem snipe there. Also, why do you base your argument on my supposed "frustrations" and "projections"? I didn't make any comments personally directed at you.
Your OP says "Why do laypeople feel so confident declaring that large language models are dumb, know nothing, make foolish mistakes, or engage in faulty reasoning—extrapolating anecdotal instances to define their entire understanding of this technology—but knowledgeable physicists are using them to learn physics? [¶] Brown’s words should make you reconsider your idea of what AI is capable of."
This is the seed of the alleged misunderstanding. What did I miss? Why would anyone need to reconsider, if they're not among those who are wrong, in your view?
I note also that you don't base your argument on anything about under-the-hood LLM technology per se that "laypeople" misunderstand and that physicists do understand. The basis of your argument is simply the anecdotal testimony of one physicist in a podcast, that physicists are comfortable with using LLMs. A classic instance of appeal to authority.
I don't think my argument about WTTATMS relies on the overconfidence vel non of any particular profession: it's a more general epistemological problem, and at the level of an individual user. To that extent my remarks about overconfidence could be seen as gratuitous, and I apologize -- although I included myself as a self-deprecated object of those remarks. Apropos of that, physics has been part of my career and my recreational interests for half a century. I'm not sure how my positive bias towards physicists could create the misunderstanding you speak of.
(BTW, I said physicists on the whole were more overconfident than most *experts,* not people in general. Evidence for this is not only personal observation of individuals (anecdotal! and therefore wrong, right?), but that one more often sees physicists and economists trying to reduce other fields, e.g. biology or education, to the categories of their own field rather than the other way round. Of course there are many individual exceptions, too -- some of my best friends are physics profs, etc. 🙂)
Finally, I wasn't sure whether you included me as among "those laypeople who confidently frame LLMs to match their biased--and tragically close-minded--preconceptions." There may be such people, but I'd never heard of LLMs until early 2023. Then I tried using them and also read David Foster's book 2nd ed. and consulted with experts at my university. So my conceptions are ex post, not ex ante. But why should I need to demonstrate my lack of "tragically close-minded preconceptions" when I've been a premium subscriber to this Substack for more than a year? -- Happy New Year 🎉
I didn't refer to your frustrations or projections, though. I'm referring to those of laypeople (particularly critics of LLMs). (That "you" in the sentence is not a you "you" but a generic "you". Could have said "how one chooses to conceptualize...")
Reconsidering one's idea of something can be to correct a wrong view but also to expand an incomplete one. It's the latter in this case.
I don't see anything wrong with appeal to authority if it makes sense beyond just authority. Physicists are experts in physics. The fact that they are using LLMs to learn physics should tell people the kind of thing an LLM can do. It's not just dumb or a generator of unreliable slop.
Finally, no I sure didn't include you in that last sentence. Again, I'm only talking about the people I refer to in the OP.
Have a great start to the year!
What does AI think of us being a simulation?