Anthropic’s Claude models can now shut down harmful conversations
Anthropic has introduced a new feature in its Claude Opus 4 and 4.1 models that allows the generative AI (genAI) tool to end a conversation on its own if a user repeatedly tries to push harmful or illegal content.
The new behavior is supposed to only be used when all attempts to redirect a conversation have failed or when a user asks for the conversation to be terminated. It is not designed to be activated in situations where people risk harming themselves or others. Users can still start new conversations or continue a previous one by editing their replies.
The purpose of the feature is not to protect users; it’s to the model itself. While Anthropic emphasizes it does not consider Claude to be sentient, tests found the model showed strong resistance and “apparent discomfort” to certain types of requests. So, the company is now testing measures for better “AI wellness” — in case that becomes relevant in the future.Anthropic’s Claude models can now shut down harmful conversations – ComputerworldRead More