Share this comment
Very likely. Here's a paper from Anthropic on this: anthropic.com/news/slee…. Worth a read, it's the exact security flaw you're talking about. I don't see LLMs being used in any high-stakes category where cybersecurity is a real concern any time soon (sleeper agents is just one of many different problems, like jailbreaking, prompt inject…
© 2025 Alberto Romero
Substack is the home for great culture
Very likely. Here's a paper from Anthropic on this: https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training. Worth a read, it's the exact security flaw you're talking about. I don't see LLMs being used in any high-stakes category where cybersecurity is a real concern any time soon (sleeper agents is just one of many different problems, like jailbreaking, prompt injection, and adversarial attacks, feel free to look those up as well).