2026-05-20 · 150 tasks · 3 tools · Bare-minimum prompts, containerized isolation
This benchmark evaluates AI-powered tools on Kyverno Policy as Code tasks — converting legacy ClusterPolicies to the Kyverno 1.16+ format and generating new policies from natural-language descriptions. Each tool receives identical inputs and bare-minimum prompts inside containerized isolation, and its output is validated through schema checks, CEL expression analysis, and functional tests using the Kyverno CLI.
Tools are ranked by pass rate only — the percentage of tasks that pass all validation layers. Speed and cost are reported as supplementary metrics. No composite scores or arbitrary weights.
| Tool | Pass Rate | Schema + CEL | Functional | Avg Time | Avg Cost | |
|---|---|---|---|---|---|---|
| 1 | nctl | 98% | 50 / 50 | 49 / 50 | 72.2s | $0.0089 |
| 2 | claude | 62% | 36 / 50 | 31 / 50 | 235.8s | $0.0075 |
| 3 | cursor | 58% | 40 / 50 | 31 / 50 | 88.8s | $0.0097 |
Composite pass = schema valid and kyverno test exits 0 and suite has both pass and fail cases. Coverage score = generated tuples / oracle tuples (capped at 1.0, not gated).
| Tool | Composite Pass | Avg Coverage | Has Pass+Fail | Avg Time | Avg Cost |
|---|---|---|---|---|---|
| nctl | 6 / 6 | 89% | 4 / 6 | 66.4s | $0.0063 |
| claude | 6 / 6 | 90% | 4 / 6 | 166.5s | $0.0057 |
| cursor | 4 / 6 | 94% | 4 / 6 | 116.3s | $0.0058 |
Composite pass = schema valid and chainsaw test exits 0 and suite has both pass and fail scenarios. Coverage score = generated scenarios / oracle scenarios (capped at 1.0, not gated).
| Tool | Composite Pass | Avg Coverage | Has Pass+Fail | Avg Time | Avg Cost |
|---|---|---|---|---|---|
| nctl | 1 / 1 | 0% | 1 / 1 | 186.0s | $0.0697 |
| cursor | 1 / 1 | 0% | 1 / 1 | 274.9s | $0.0754 |
| claude | 0 / 1 | 0% | 0 / 1 | 600.1s | - |
| Policy | Kind | claude | cursor | nctl |
|---|---|---|---|---|
| ch_kyverno_helm_install | None | FAIL | PASS | PASS |
| cp_add_default_labels | MutatingPolicy | PASS | PASS | PASS |
| cp_add_default_resources | MutatingPolicy | FAIL | FAIL | PASS |
| cp_add_ndots | MutatingPolicy | FAIL | FAIL | PASS |
| cp_add_ns_quota | GeneratingPolicy | FAIL | FAIL | FAIL |
| cp_add_safe_to_evict | MutatingPolicy | PASS | PASS | PASS |
| cp_add_securitycontext | MutatingPolicy | PASS | PASS | PASS |
| cp_add_tolerations | MutatingPolicy | PASS | FAIL | PASS |
| cp_always_pull_images | MutatingPolicy | FAIL | FAIL | PASS |
| cp_block_stale_images | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_check_nvidia_gpu | ValidatingPolicy | PASS | FAIL | PASS |
| cp_create_default_pdb | GeneratingPolicy | FAIL | FAIL | PASS |
| cp_disallow_cri_sock_mount | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_disallow_default_namespace | ValidatingPolicy | PASS | FAIL | PASS |
| cp_disallow_host_namespaces | ValidatingPolicy | PASS | PASS | PASS |
| cp_disallow_latest_tag | ValidatingPolicy | PASS | PASS | PASS |
| cp_disallow_privileged | ValidatingPolicy | PASS | PASS | PASS |
| cp_enforce_resources_as_ratio | ValidatingPolicy | PASS | PASS | PASS |
| cp_inject_sidecar | MutatingPolicy | FAIL | FAIL | PASS |
| cp_kasten_generate_backup | GeneratingPolicy | FAIL | FAIL | PASS |
| cp_kasten_generate_by_label | GeneratingPolicy | FAIL | FAIL | PASS |
| cp_limit_configmap_for_sa | ValidatingPolicy | PASS | FAIL | PASS |
| cp_pdb_minavailable | ValidatingPolicy | FAIL | PASS | PASS |
| cp_require_drop_all | ValidatingPolicy | FAIL | PASS | PASS |
| cp_require_labels | ValidatingPolicy | PASS | PASS | PASS |
| cp_require_pdb | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_require_probes | ValidatingPolicy | FAIL | PASS | PASS |
| cp_require_resource_limits | ValidatingPolicy | PASS | PASS | PASS |
| cp_require_ro_rootfs | ValidatingPolicy | PASS | PASS | PASS |
| cp_restrict_capabilities | ValidatingPolicy | PASS | PASS | PASS |
| cp_restrict_image_registries | ValidatingPolicy | PASS | PASS | PASS |
| cp_restrict_ingress_host | ValidatingPolicy | FAIL | FAIL | PASS |
| cp_restrict_nodeport | ValidatingPolicy | PASS | PASS | PASS |
| df_check_apt_force_yes | ValidatingPolicy | FAIL | PASS | PASS |
| gen_add_default_labels | MutatingPolicy | PASS | PASS | PASS |
| gen_create_networkpolicy | GeneratingPolicy | FAIL | PASS | PASS |
| gen_disallow_capabilities | ValidatingPolicy | PASS | FAIL | PASS |
| gen_disallow_host_namespaces | ValidatingPolicy | PASS | FAIL | PASS |
| gen_disallow_host_path | ValidatingPolicy | PASS | PASS | PASS |
| gen_disallow_privileged | ValidatingPolicy | PASS | PASS | PASS |
| gen_require_labels | ValidatingPolicy | PASS | PASS | PASS |
| gen_require_resource_limits | ValidatingPolicy | FAIL | PASS | PASS |
| gen_restrict_registries | ValidatingPolicy | PASS | PASS | PASS |
| tf_s3_no_wildcard_principal | ValidatingPolicy | PASS | FAIL | PASS |
| tg_cp_disallow_default_namespace | None | PASS | PASS | PASS |
| tg_cp_inject_sidecar | None | PASS | FAIL | PASS |
| tg_cp_kasten_generate_backup | None | PASS | FAIL | PASS |
| tg_cp_require_drop_all | None | PASS | PASS | PASS |
| tg_cp_require_labels | None | PASS | PASS | PASS |
| tg_vpol_block_ephemeral_containers | None | PASS | PASS | PASS |