Base Station on Chip: Harnessing RISC-V Vector DSPs for AI-Driven 5G and 6G Networks

Base Station on Chip: Harnessing RISC-V Vector DSPs for AI-Driven 5G and 6G Networks

A new paper has been published exploring cutting-edge approaches to meet the rising computational and energy demands of next-generation wireless networks. Written by researchers at TU Dresden and the Centre for Tactile Internet with Human-in-the-Loop (CeTI), the study delves into how Base Station on Chip (BSoC) architectures can revolutionize the deployment of 5G and emerging 6G technologies. By integrating signal processing, neural network computations, and network management into a single chip, the paper outlines a novel hardware/software co-design that enhances performance, scalability, and power efficiency.

Focusing on the use of RISC-V vector Digital Signal Processors, the research also examines the role of vector extensions and custom instructions in optimizing core functions like Channel Estimation, mMIMO, and beamforming to meet the stringent latency and throughput requirements of modern Radio Access Networks.

The Challenge

This introduction section explores how Open Radio Access Network (Open RAN) is reshaping wireless cellular networks by decoupling hardware and software into modular components connected via open interfaces like eCPRI. This architecture enables RAN functions to be virtualized on Commercial-Off-The-Shelf (COTS) servers, reducing dependency on proprietary equipment. However, COTS systems rely on fixed-length SIMD architectures, which are not well-suited for the computational demands of LOW PHY signal processing tasks such as Channel Estimation (CE), beamforming, and massive MIMO (mMIMO), which are key operations at the Open Radio Unit (O-RU) in Open RAN base stations.

To address these limitations, this work investigates the use of the customizable RISC-V Instruction Set Architecture (ISA) to implement hardware accelerators for next-generation 6G base stations through a Base Station on Chip (BSoC) platform. RISC-V’s flexible vector lengths enable the design of specialized vector processors capable of handling complex, compute-intensive LOW PHY operations like matrix multiplications, inversions, and FFT/iFFT—tasks that scale with system size and antenna count. Using the RISC-V-based Ara processor developed by the PULP group, this study aims to (1) assess kernel execution speedup through data-level parallelism and (2) design custom hardware modules optimized for each signal processing kernel. The BSoC approach offers a compact, energy-efficient, and scalable solution for future 6G network demands.

Proposed Solution

The authors utilize the Ara processor, a high-performance, open-source RISC-V core designed for parallel processing, to accelerate compute-intensive LOW PHY algorithms through vectorization. They implement key wireless communication kernels such as Channel Estimation (both Least Square Error and Minimum Mean Square Error), Fast Fourier Transform (FFT) using the Cooley–Tukey algorithm, massive MIMO processing via Zero-Forcing techniques, and digital beamforming with steering vector-based matrix construction. Each of these kernels is executed as C-based software on the Ara core, with experiments varying the vector register length and the number of parallel lanes to evaluate performance. The study focuses on analyzing execution time and measuring the impact of vectorization on computation speed and efficiency.

Preliminary Results

The researchers have conducted a C-based software implementation of key wireless communication kernels—Channel Estimation (LSE and MMSE), mMIMO, and beamforming—on the Ara RISC-V processor. By varying the number of lanes and vector register lengths (V LEN), the study measures clock cycles to evaluate how vectorization affects performance. Results show that larger V LEN values enable more parallel processing per clock cycle, reducing execution time, with matrix size also influencing computational efficiency.

Conclusion

By leveraging the DLP inherent in these algorithms, the researchers have achieved a reduction in the clock cycle count as the number of parallel processing lanes increases. Future work is expected to include a custom hardware implementation of some of those kernels and their integration into the Ara processor via AXI interfaces. Hence, the development of tailored instructions to provide support to that hardware is also planned.

Click here to read the original paper.

Publisher: everything RF