Also quite niche, but for a class in my master's we were asked to work on AI and financial institutions. Speaking with cybersecurity experts in the field, something that came up constantly was this fear that models would be trained on bad data and give bad results that would cause weaknesses in the security infrastructure, and also that …
Also quite niche, but for a class in my master's we were asked to work on AI and financial institutions. Speaking with cybersecurity experts in the field, something that came up constantly was this fear that models would be trained on bad data and give bad results that would cause weaknesses in the security infrastructure, and also that employees would put sensitive info into things like chatGPT and it would leak. On the technical side of things, can you speak to how likely these threats actually are?
Very likely. Here's a paper from Anthropic on this: https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training. Worth a read, it's the exact security flaw you're talking about. I don't see LLMs being used in any high-stakes category where cybersecurity is a real concern any time soon (sleeper agents is just one of many different problems, like jailbreaking, prompt injection, and adversarial attacks, feel free to look those up as well).
Also quite niche, but for a class in my master's we were asked to work on AI and financial institutions. Speaking with cybersecurity experts in the field, something that came up constantly was this fear that models would be trained on bad data and give bad results that would cause weaknesses in the security infrastructure, and also that employees would put sensitive info into things like chatGPT and it would leak. On the technical side of things, can you speak to how likely these threats actually are?
Very likely. Here's a paper from Anthropic on this: https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training. Worth a read, it's the exact security flaw you're talking about. I don't see LLMs being used in any high-stakes category where cybersecurity is a real concern any time soon (sleeper agents is just one of many different problems, like jailbreaking, prompt injection, and adversarial attacks, feel free to look those up as well).