New research confirms what we suspected: every LLM tested can be exploited

News

Just finished reading ActiveFence’s emerging threats assessment on 7 major models across hate speech, disinfo, fraud, and CSAM-adjacent prompts. Key findings are: 44% of outputs were rated risky, 68% of unsafe ones were hate-speech-related, and only a single model landed in the safe range. What really jumps out is how different vendors behave per abuse area (fraud looks relatively well-covered, hate and child safety really don’t). For those doing your own evals/red teaming: are you seeing similar per-category gaps? Has anyone brought in an external research partner like ActiveFence to track emerging threats over time? submitted by /u/CortexVortex1 [link] [comments]Technical Information Security Content & DiscussionRead More