Armis Labs Report: Trusted Vibing Benchmark
100% of AI Models Fail to Generate Secure Code for Critical Development Scenarios
The Trusted Vibing Benchmark provides a critical analysis of 18 leading generative AI models, evaluating their ability to generate secure code across 31 development scenarios. The report reveals a sobering reality: 100% of tested models, including top-tier commercial and open-source options, failed to consistently produce secure code.
A key finding of the benchmark is the dramatic performance gap between model generations. The report also highlights the rising value of open-source models, which offer competitive security performance at a fraction of the cost of proprietary alternatives.
Ultimately, the report concludes that current AI code generators are insufficient for production-level development without rigorous, independent security oversight. To mitigate the “security debt” introduced by these models, Armis Labs recommends that enterprises implement AI-native Application Security (AppSec) controls.
.png)
In partnership with
