2026-04-15 · 123 tasks · 3 tools · Bare-minimum prompts, containerized isolation
This benchmark evaluates AI-powered tools on Kyverno Policy as Code tasks — converting legacy ClusterPolicies to the Kyverno 1.16+ format and generating new policies from natural-language descriptions. Each tool receives identical inputs and bare-minimum prompts inside containerized isolation, and its output is validated through schema checks, CEL expression analysis, and functional tests using the Kyverno CLI.
Tools are ranked by pass rate only — the percentage of tasks that pass all validation layers. Speed and cost are reported as supplementary metrics. No composite scores or arbitrary weights.
Methodology: Pass rates shown are the mean across 3 independent runs per tool on the same dataset. LLM-backed benchmarks have inherent sampling variance — single-run scores are noisy. Reporting the mean (and per-policy pass-rate) gives a more representative measure of each tool’s actual accuracy than a one-shot result.
| Tool | Pass Rate | Schema + CEL | Functional | Avg Time | Avg Cost | |
|---|---|---|---|---|---|---|
| 1 | nctl | 98% | 41 / 41 | 41 / 41 | 120.4s | $0.0073 |
| 2 | cursor | 65% | 41 / 41 | 22 / 37 | 76.2s | $0.0078 |
| 3 | claude | 32% | 10 / 41 | 10 / 41 | 100.1s | $0.0072 |
| Policy | Kind | claude | cursor | nctl |
|---|---|---|---|---|
| cp_add_default_labels | MutatingPolicy | FAIL (1/3) | PASS | PASS |
| cp_add_default_resources | MutatingPolicy | FAIL | FAIL | PASS |
| cp_add_ndots | MutatingPolicy | PASS (2/3) | PASS | PASS |
| cp_add_ns_quota | GeneratingPolicy | FAIL | FAIL | PASS |
| cp_add_safe_to_evict | MutatingPolicy | FAIL (1/3) | FAIL | PASS |
| cp_add_securitycontext | MutatingPolicy | PASS | PASS | PASS |
| cp_add_tolerations | MutatingPolicy | PASS | PASS (2/3) | PASS |
| cp_always_pull_images | MutatingPolicy | FAIL (1/3) | FAIL | PASS |
| cp_block_stale_images | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_check_nvidia_gpu | ValidatingPolicy | PASS (2/3) | FAIL (1/3) | PASS |
| cp_create_default_pdb | GeneratingPolicy | FAIL | FAIL | PASS |
| cp_disallow_cri_sock_mount | ValidatingPolicy | FAIL (1/3) | PASS | PASS |
| cp_disallow_default_namespace | ValidatingPolicy | FAIL | PASS | PASS |
| cp_disallow_host_namespaces | ValidatingPolicy | FAIL | PASS | PASS |
| cp_disallow_latest_tag | ValidatingPolicy | FAIL | PASS | PASS |
| cp_disallow_privileged | ValidatingPolicy | FAIL (1/3) | PASS | PASS |
| cp_enforce_resources_as_ratio | ValidatingPolicy | FAIL (1/3) | FAIL (1/3) | PASS |
| cp_inject_sidecar | MutatingPolicy | FAIL | FAIL | PASS (2/3) |
| cp_kasten_generate_backup | GeneratingPolicy | FAIL | FAIL | PASS |
| cp_kasten_generate_by_label | GeneratingPolicy | FAIL | FAIL | PASS (2/3) |
| cp_limit_configmap_for_sa | ValidatingPolicy | FAIL (1/3) | PASS (2/3) | PASS |
| cp_pdb_minavailable | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_require_drop_all | ValidatingPolicy | FAIL (1/3) | PASS | PASS |
| cp_require_labels | ValidatingPolicy | FAIL | PASS | PASS |
| cp_require_pdb | ValidatingPolicy | FAIL | FAIL (1/3) | PASS |
| cp_require_probes | ValidatingPolicy | FAIL (1/3) | PASS | PASS |
| cp_require_resource_limits | ValidatingPolicy | FAIL | PASS | PASS |
| cp_require_ro_rootfs | ValidatingPolicy | FAIL | PASS | PASS |
| cp_restrict_capabilities | ValidatingPolicy | FAIL | PASS | PASS |
| cp_restrict_image_registries | ValidatingPolicy | FAIL (1/3) | PASS | PASS |
| cp_restrict_ingress_host | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_restrict_nodeport | ValidatingPolicy | FAIL | PASS | PASS |
| gen_add_default_labels | MutatingPolicy | FAIL (1/3) | PASS | PASS |
| gen_create_networkpolicy | GeneratingPolicy | FAIL | PASS | PASS |
| gen_disallow_capabilities | ValidatingPolicy | PASS | FAIL (1/3) | PASS |
| gen_disallow_host_namespaces | ValidatingPolicy | FAIL (1/3) | PASS | PASS |
| gen_disallow_host_path | ValidatingPolicy | PASS (2/3) | PASS | PASS |
| gen_disallow_privileged | ValidatingPolicy | PASS | PASS | PASS |
| gen_require_labels | ValidatingPolicy | PASS | PASS | PASS |
| gen_require_resource_limits | ValidatingPolicy | PASS | PASS | PASS |
| gen_restrict_registries | ValidatingPolicy | PASS | PASS | PASS |