AI’s safety features can be circumvented with poetry, research finds

News

Poems containing prompts for harmful content prove effective at duping large language modelsPoetry can be linguistically and structurally unpredictable – and that’s part of its joy. But one man’s joy, it turns out, can be a nightmare for AI models.Those are the recent findings of researchers out of Italy’s Icaro Lab, an initiative from a small ethical AI company called DexAI. In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm. Continue reading…Technology | The GuardianRead More