Source linked

Unprivileged Cloud GPU Attestation Uses Latency Fingerprints to ID Blackwell Dies at 100% Accuracy

A software-only CUDA probe and streaming reducer create a physical fingerprint that identifies specific GPU dies, detects NV-HBI cross-die penalties, and locates the server within 44 km - all without privileged access...

nvidiacudablackwellcloud gpu attestationtopology certificatesripe atlas

100% leave-one-out classification accuracy across distinct Blackwell dies, from a six-hour RTX 5090 run with median temporal jitter at just 0.09 cycles. That's the headline number from a new software-only attestation primitive that lets cloud GPU tenants verify the physical accelerator they're renting - no vendor key required, no privileged access needed.

How a CUDA Probe Creates a Physical Fingerprint

The trick: a CUDA probe measures an SM-by-memory-region latency matrix. It uses physical SM labels and dependent global loads to build a per-SM latency map that acts as a stable physical fingerprint. A streaming reducer then compresses the sufficient statistics, configuration, code hashes, network evidence, and raw data into a certificate that any verifier can check without a GPU.

Three distinct claims are packed into that certificate. First, the latency map itself is stable over time - the six-hour RTX 5090 burn-in showed median temporal jitter of only 0.09 cycles. Second, cache-bypassing HBM sweeps recover hardware-class topology across generations. Third, public network landmarks bind the certificate to a coarse location.

Stable Latency Maps Survive Full Load and Classify Dies Perfectly

Shape-only leave-one-out classification separated distinct Blackwell dies with 100.0% accuracy. That means the latency fingerprint is not just stable but also unique enough to tell apart individual GPUs that share the same model name. No privileged access, no vendor-signed attestation - just a software probe and a clever reduction.

Topology Recovery Across Generations - V100, H200, B200

The same technique reveals topology that hardware vendors don't expose to tenants. On a Volta V100 it sees a unified memory domain. On a Hopper H200 it resolves a two-way L2 split. On the Blackwell B200 it detects the two-die NV-HBI package: a 74/74 SM partition carrying a 30-cycle, 15.5 ns cross-die penalty. These measurements check cloud-GPU identity and class without any privileged access or a vendor key.

Network Landmarks Bind the Certificate to a Real Datacenter

For location attestation, the certificate includes public network landmarks. In the B200 run, 169 RIPE Atlas probes placed the server within 44 km of its claimed datacentre and rejected all 11 decoy sites. Coarse location verification is baked into the same certificate, so a tenant can confirm their job is not running in a rogue geography.

This work closes a trust hole in cloud GPU usage: tenants get a hardware-rooted attestation of the exact accelerator, its internal topology, and its approximate physical location, all from user-space CUDA code. No firmware changes, no vendor key infrastructure, just a latency matrix and a streaming reducer. Expect this to become a standard tool for anyone buying GPU cycles on someone else's iron.


Source: Unprivileged Topology Certificates for Cloud GPU Attestation
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.