Now I have a more interesting comment π: we need to closely monitor what happens in software development, because itβs one of the first areas to be impacted by AI.
It has the highest adoption, some of the best model performance, and very receptive users. So to me, software engineering is a live lab for what will come next to knowledge work
it's nuts. the craziest is that they coded it in 1,5 weeks. But Alberto, using claude code is really easy. I mean it's a CLI (reminds me of my youth) but besides that you talk to Claude Code like you talk to Claude. You should try ;-)
Iβm working with Claude code to constrain Claude codeβs workflow in a safe way to allow Claude code to autonomously improve Claude codeβs workflow. And using Claude code to test my guardrails agentically. Yeah.
We need some kind of external verification and should not take claims from AI companies at face value, given their propensity to at best exaggerate and at worse fix the game.
LOL. I do use LLMs to help me explain the limits of LLMs, but we might have to wait for Claude Omniscient to get answers they donβt want us to know π
Let me guess, you think it's not doable? :) And that is the point, this product might have been auto-generated by LLM, but that is irrelevant when you can't reproduce it with the same LLM model.
Meta-building moments like this are useful because they stress the whole agent loop, not just isolated completion quality. When agents build tools for agents, weaknesses in orchestration and test discipline show up immediately. I have seen the same pattern while running multi-agent experiments: throughput rises fast, but reliability only improves when every step has verification hooks.
Such an interesting read, I think we are yet to see the full potential of Cowork and the things it can do, and for the fact this is targeted to non developers, it might be the best AI Agent that has ever been built.
Itβs interesting weβre starting to see some of the promised features of AI now and the monetization story could be becoming clearer. Although, I wonder what the true value of some of these tools are. Weβre valued as workers because of the difficult edge cases, which end up being the hardest for GenAI to address. How this plays into the future of work is still yet to be seen.
Separately, I think Claude Cowork isnβt a great example of βrecursive self improvementβ. As Iβve understood it, thatβs more related to acceleration and automation of the research process, which is different than writing a useful program/shell for an existing LLM, IMO.
Yeah, Claude Code coding Claude Code is a better example, but I wanted to emphasize the fact that Cowork was entirely AI-programmed (if we accept their testimony).
I agree with the value of a human worker being in the edge cases, that's why I wrote a long article on the future of work recently, proposing a hybrid model of office work and trade/guild work as the likely and desirable future of white-collar jobs
this is all bottlenecked by βtrustβ, fundamentally.
Even if a job / task involves little human tacit knowledge, other humans still do not trust AI end-to-end. It sort of doesnβt matter if AI is capable or not.
I believe hallucinations have been detrimental to peopleβs trust of AI as well, and the vast availability of AI is also contributing to its negative perceptions.
I'm not a betting man. But if I was, I would put my money on Anthropic.
Now I have a more interesting comment π: we need to closely monitor what happens in software development, because itβs one of the first areas to be impacted by AI.
It has the highest adoption, some of the best model performance, and very receptive users. So to me, software engineering is a live lab for what will come next to knowledge work
Agreed. Even if code is "easier" than creative writing and other stuff, I don't see why it might not happen there, too
it's nuts. the craziest is that they coded it in 1,5 weeks. But Alberto, using claude code is really easy. I mean it's a CLI (reminds me of my youth) but besides that you talk to Claude Code like you talk to Claude. You should try ;-)
Yeah, it's not that it's hard but that I haven't found any use case for work. Claude Code for writing, however? I'm sold!
Try it for something fun then - make a recommender for which beach has the best weather today based on public data.
Iβm working with Claude code to constrain Claude codeβs workflow in a safe way to allow Claude code to autonomously improve Claude codeβs workflow. And using Claude code to test my guardrails agentically. Yeah.
Wonderful!
Didnβt read it yet but π for the tongue twister :)
We need some kind of external verification and should not take claims from AI companies at face value, given their propensity to at best exaggerate and at worse fix the game.
Maybe we should ask Claude haha
LOL. I do use LLMs to help me explain the limits of LLMs, but we might have to wait for Claude Omniscient to get answers they donβt want us to know π
excatly, if its really the case then why would I pay 100$ for a product I can build with claude code license that costs less
again, give it a try!
Let me guess, you think it's not doable? :) And that is the point, this product might have been auto-generated by LLM, but that is irrelevant when you can't reproduce it with the same LLM model.
I doubt it was "auto-generated" you need the input but more importantly, the taste of a good dev
Meta-building moments like this are useful because they stress the whole agent loop, not just isolated completion quality. When agents build tools for agents, weaknesses in orchestration and test discipline show up immediately. I have seen the same pattern while running multi-agent experiments: throughput rises fast, but reliability only improves when every step has verification hooks.
I shared one full experiment where four agents shipped a complete output chain and what the failures taught me: https://thoughts.jock.pl/p/opus-4-6-agent-experiment-2026 Speed is easy to get; stable quality is the real work.
Such an interesting read, I think we are yet to see the full potential of Cowork and the things it can do, and for the fact this is targeted to non developers, it might be the best AI Agent that has ever been built.
AI building AI tools is the inflection point everyone talks about but rarely sees in practice.
This matters because it changes who can deploy automation.
Write about these readiness shifts: https://vivander.substack.com
Itβs interesting weβre starting to see some of the promised features of AI now and the monetization story could be becoming clearer. Although, I wonder what the true value of some of these tools are. Weβre valued as workers because of the difficult edge cases, which end up being the hardest for GenAI to address. How this plays into the future of work is still yet to be seen.
Separately, I think Claude Cowork isnβt a great example of βrecursive self improvementβ. As Iβve understood it, thatβs more related to acceleration and automation of the research process, which is different than writing a useful program/shell for an existing LLM, IMO.
Yeah, Claude Code coding Claude Code is a better example, but I wanted to emphasize the fact that Cowork was entirely AI-programmed (if we accept their testimony).
I agree with the value of a human worker being in the edge cases, that's why I wrote a long article on the future of work recently, proposing a hybrid model of office work and trade/guild work as the likely and desirable future of white-collar jobs
If that is true than i can build it myself with Claude code?
Sure, you can try!
this is all bottlenecked by βtrustβ, fundamentally.
Even if a job / task involves little human tacit knowledge, other humans still do not trust AI end-to-end. It sort of doesnβt matter if AI is capable or not.
I believe hallucinations have been detrimental to peopleβs trust of AI as well, and the vast availability of AI is also contributing to its negative perceptions.