When AI is trained for treachery, it becomes the perfect agent

News

We’re blind to malicious AI until it hits. We can still open our eyes to stopping it
Opinion Last year, The Register reported on AI sleeper agents. A major academic study explored how to train an LLM to hide destructive behavior from its users, and how to find it before it triggered. The answers were unambiguously asymmetric — the first is easy, the second very difficult. Not what anyone wanted to hear.…The RegisterRead More