Source linked

HPC FFT Library Beats FFTW by 4x Using Quantum Circuit Simulation(HPC FFT図書館は、量子回路シミュレーションを使用して4倍にFFTWを打ち負かす)

QFT→FFTは入力アーレイを量子状態の幅にマッピングし、Googleのqsimで実行し、AVXでFFTWと一致し、A100で4倍に打ち勝ちます。

qft to fftgoogle qsimfftwnvidia a100amd epyc zen2quantum circuit simulation

On an NVIDIA A100, the CUDA backend runs FFTs more than 4x faster than multithreaded FFTW on a 64-thread AMD EPYC Zen2 processor. The trick: simulate a quantum Fourier transform (QFT) circuit on classical hardware.

QFT Circuits as FFT Drop-Ins

The new library, called QFT→FFT, maps input arrays directly to state amplitudes of a quantum computer simulator. Normalization and indexing are handled explicitly, so the QFT circuit becomes a drop-in replacement for traditional FFT primitives. A backend-agnostic planner builds a fused-gate schedule and memory layout adapters that boost arithmetic intensity and cut data movement.

The implementation sits atop Google's C++ qsim and currently supports OpenMP, AVX, and CUDA backends. On an AMD EPYC Zen2, the AVX backend matches multithreaded FFTW performance at 64 threads - already respectable. But the real win is on GPU hardware.

CUDA Crushes the Baseline

At larger transform sizes, the A100 CUDA backend cuts wall-clock time by a factor of 4 compared to the best AVX or FFTW runs on the same CPU. That’s a direct consequence of fusing quantum gates into a schedule that maps well to GPU warp execution and shared memory.

The paper also introduces an approximate QFT (AQFT) variant that truncates small-angle controlled rotations beyond a cutoff $k$. This reduces circuit depth and runtime while preserving accuracy - useful when you don’t need full double-precision FFT output.

What This Enables

QFT→FFT is a reminder that classical simulation of quantum circuits isn’t just a toy for verification - it can outrun mature HPC libraries on conventional hardware. Expect this approach to scale to larger FFT sizes and influence how future FFT libraries are designed, especially on GPU-heavy clusters where memory bandwidth is the bottleneck.


Source: Not Your Usual FFT: QFT$\rightarrow$FFT via Classical Quantum-Circuit Simulation
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.