Parallel Attention Fix Shrinks 5G Channel Predictor by 58%, Beats Baseline by 6 dB

Deploying high-accuracy transformer models for 5G channel state information (CSI) prediction on base-station hardware is a non-starter when the best models carry 30 million parameters or more. Lightweight PCGAE-Net from the authors of this arXiv preprint slashes that count to 8.54 million while actually beating the previous state of the art on prediction quality.

Why 30M-Parameter Models Bleed Efficiency

The baseline CS3T-UNet suffers from two architectural problems that the paper identifies and fixes. First, it applies cross-shaped spatial attention (CSA) before group-wise temporal attention (GTA) in a sequential chain. That means GTA never sees the raw temporal features - it only processes spatial-attention-transformed data, distorting the temporal signal it's supposed to capture. Second, at the deepest encoder stage where channel depth hits $4C$, CS3T-UNet runs full self-attention over an uncompressed bottleneck. That's quadratically expensive and carries redundant features that don't help prediction.

Parallel CrossGate and a Bottleneck AutoEncoder

PCGAE-Net routes both CSA and GTA to the same layer-normalized input, runs them independently, and combines their outputs via a learned per-channel sigmoid gate - the CrossGate mechanism. This forces the model to preserve both spatial and temporal information from the start. For the bottleneck, it replaces heavy self-attention with a Bottleneck AutoEncoder (BAE) using $1\times1$ convolutions that halve the channel depth from $4C$ to $2C$, plus an auxiliary reconstruction loss that prevents information collapse. Wrapping these inside a shallower encoder-decoder with frequency-domain dimensionality reduction ($N_f=32$, $C=48$) yields a model with just 8.54 million parameters.

Measured Gains on QuaDriGa

On the QuaDriGa dataset, which simulates realistic 5G massive MIMO channels, PCGAE-Net outperforms CS3T-UNet by 3.26 dB at 5 km/h and a striking 6.0 dB at 9 km/h in single-step prediction. That's not a trade-off of size for accuracy - it gets smaller and more accurate. Those dB improvements translate directly to better beamforming decisions and more reliable millimeter-wave links. The next step is clear: test this architecture on real baseband hardware at scale, because 8.5 million parameters fits comfortably into the compute budget of today's gNB platforms.

Source: Lightweight PCGAE-Net: Parallel CrossGate Attention and Bottleneck AutoEncoder for Efficient 5G Channel Prediction
Domain: arxiv.org

Parallel Attention Fix Shrinks 5G Channel Predictor by 58%, Beats Baseline by 6 dB

Why 30M-Parameter Models Bleed Efficiency

Parallel CrossGate and a Bottleneck AutoEncoder

Measured Gains on QuaDriGa

More in Machine Learning