Vulnhalla: Picking the true vulnerabilities from the CodeQL haystack

December 21, 2025 Yanac

Full disclosure: I’m a researcher at CyberArk Labs. This is a technical deep dive from our threat research team, no marketing fluff, just code and methodology. Static analysis tools like CodeQL are great at identifying “maybe” issues, but the signal-to-noise ratio is often overwhelming. You get thousands of alerts, and manually triaging them is impossible. We built an open-source tool, Vulnhalla, to address this issue. It queries CodeQL’s “haystack” into GPT-4o, which reasons about the code context to verify if the alert is legitimate. The sheer volume of false positives often tricks us into thinking a codebase is “clean enough” just because we can’t physically get through the backlog. This creates a significant amount of frustration for us. Still, the vulnerabilities remain, hidden in the noise. Once we used GPT-4o to strip away ~96% of the false positives, we uncovered confirmed CVEs in the Linux Kernel, FFmpeg, Redis, Bullet3, and RetroArch. We found these in just 2 days of running the tool and triaging the output (total API cost <$80). Running the tool for longer periods, with improved models, can reveal many additional vulnerabilities. Write-up & Tool: Technical Blog:https://www.cyberark.com/resources/threat-research-blog/vulnhalla-picking-the-true-vulnerabilities-from-the-codeql-haystack GitHub:https://github.com/cyberark/Vulnhalla submitted by /u/ES_CY [link] [comments]Technical Information Security Content & DiscussionRead More