Game-theoretic feedback loops for LLM-based pentesting: doubling success rates in test ranges
We’re sharing results from a recent paper on guiding LLM-based pentesting using explicit game-theoretic feedback. The idea is to close the loop between LLM-driven security testing and formal attacker–defender games. The system extracts attack graphs from live pentesting logs, computes Nash equilibria with effort-aware scoring, and injects a concise strategic digest back into the agent’s system prompt to guide subsequent actions. In a 44-run test range benchmark (Shellshock CVE-2014-6271), adding the digest: – Increased success rate from 20.0% to 42.9% – Reduced cost per successful run by 2.7× – Reduced tool-use variance by 5.2× In Attack & Defense exercises, sharing a single game-theoretic graph between red and blue agents (“Purple” setup) wins ~2:1 vs LLM-only agents and ~3.7:1 vs independently guided teams. The game-theoretic layer doesn’t invent new exploits — it constrains the agent’s search space, suppresses hallucinations, and keeps the agent anchored to strategically relevant paths. PDF: https://arxiv.org/pdf/2601.05887 Code: https://github.com/aliasrobotics/cai submitted by /u/Obvious-Language4462 [link] [comments]Technical Information Security Content & DiscussionRead More