Boundary Shape Sampling Nails 78% of Kernel Bugs at 0% False Positives

78% recall on 10 buggy kernels, 0% false positives on 16 correct controls. That’s the operating sweet spot for boundary-only shape sampling when you’re hunting LLM-style tensor kernel bugs.

The numbers come from a systematic evaluation of seven test-input generation strategies, run by the GPUEMU op-schema-aware seeded fuzzer across a 26-op corpus on an RTX 3060 GPU. The corpus held 16 correct kernels and 10 buggy variants seeded with documented transcription patterns from LLM outputs.

The Sampling Strategies That Actually Work

Boundary-only shape sampling wins on operational safety: 78% recall with zero false alarms. Adversarial value sampling, which injects NaN and Inf, cranks recall to 99% but inflates control FP to 94%. Reason: the validator’s NaN check fires on every kernel that propagates these values, not just buggy ones. That’s not a bug finder—it’s a noise generator.

Regular sampling—the default in most projects—misses the most interesting bugs. On the two softmax tail-mask bugs, regular strategy caught 0%. Boundary sampling raised recall to 100% and 62% respectively. That gap is the clearest single signal in the data: if you’re not testing boundary shapes, you’re blind to the edge-case miscompilations that LLMs produce.

What This Means for Kernel Testing

Most projects pick a representative shape and dtype, run a fixed-shape allclose check, and ship. This paper makes those choices explicit and measures them. The takeaway isn’t about bug rates in any specific deployed LLM—it’s about which test strategy catches which bug pattern. For teams shipping tensor kernels, the data says: start with boundary shapes, verify your validator handles NaN/Inf before turning on adversarial sampling, and never assume a single shape covers your risk.

The GPUEMU fuzzer already exposes the seeded bug corpus. Next step: see whether these recall numbers hold when the same strategies are run against production kernels from PyTorch or JAX.

Source: Test-Input Generation for Tensor Programs: What Actually Finds Kernel Bugs
Domain: arxiv.org

Boundary Shape Sampling Nails 78% of Kernel Bugs at 0% False Positives

The Sampling Strategies That Actually Work

What This Means for Kernel Testing

More in Machine Learning