
Hardware DSP for Real-Time SDR
R4W Development Team
(Aida, Joe Mooney, Claude
Code)
December 2025
Software DSP hits limits for high-rate SDR:
| Challenge | Solution |
|---|---|
| CPU latency | Dedicated hardware pipelines |
| Power consumption | Efficient parallel processing |
| Deterministic timing | Hardware clock domains |
| Multi-GS/s rates | Direct RF interface |
| Platform | Toolchain | Status |
|---|---|---|
| Xilinx Zynq-7000 | Vivado 2022.2+ | Production |
| Xilinx Zynq UltraScale+ | Vivado 2022.2+ | Production |
| Lattice iCE40 | Yosys + nextpnr | Implemented |
| Lattice ECP5 | Yosys + nextpnr | Implemented |
┌─────────────────────────────────────────────────────────────┐
│ ZYNQ PS (ARM Cortex-A9/A53) │
│ │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ R4W Rust │───►│ Linux Kernel │──► /dev/mem │
│ │ Application │ │ (UIO/devmem) │ /dev/uio* │
│ └────────────────┘ └────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
│ AXI Interconnect
┌─────────────────────────────┴───────────────────────────────┐
│ FPGA Fabric (PL) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ r4w_ │ │ r4w_ │ │ r4w_ │ │ r4w_ │ │
│ │ fft │ │ fir │ │ nco │ │ chirp │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────┘
| IP Core | Description | Resources |
|---|---|---|
| r4w_fft | FFT/IFFT up to 4096 points | ~15K LUTs |
| r4w_fir | FIR filter up to 256 taps | ~8K LUTs |
| r4w_nco | NCO with CORDIC or LUT | ~1.5K LUTs |
| r4w_chirp_gen | LoRa chirp generator | ~3K LUTs |
| r4w_chirp_corr | Chirp correlator | ~12K LUTs |
| r4w_dma | DMA controller | ~4K LUTs |
| Interface | Use Case | Bandwidth |
|---|---|---|
| AXI-Lite | Control registers, config | Low |
| AXI-Stream | Sample data, streaming | High |
| DMA | Bulk transfers, buffers | High |
| Register | Single sample, debug | Very Low |
vivado/ip/
├── r4w_fft/
│ ├── hdl/
│ │ └── r4w_fft.v
│ ├── tb/
│ │ └── r4w_fft_tb.v
│ └── component.xml
├── r4w_fir/
├── r4w_nco/
├── r4w_chirp_gen/
├── r4w_chirp_corr/
└── r4w_dma/
1024-point streaming FFT with:
Input: 16-bit I[15:0], Q[15:0]
Output: 32-bit magnitude[31:0]
Latency: ~log2(N) * 4 cycles
| Offset | Name | R/W | Description |
|---|---|---|---|
| 0x00 | CTRL | RW | Control register |
| 0x04 | STATUS | R | Status register |
| 0x08 | FFT_SIZE | RW | FFT size (64-4096) |
| 0x0C | SCALE | RW | Scaling factors |
| 0x10 | SAMPLE_IN | W | Input sample |
| 0x14 | SAMPLE_OUT | R | Output magnitude |
Hardware LoRa chirp generation:
| Parameter | Range | Description |
|---|---|---|
| SF | 5-12 | Spreading factor |
| BW | 125/250/500 kHz | Bandwidth |
| Symbol | 0 to 2^SF-1 | Symbol value |
Generates: - Upchirp, downchirp - Symbol-shifted chirps - Continuous streaming
Open-source toolchain (Yosys + nextpnr):
# Build for iCE40
cd lattice
make DEVICE=ice40-hx8k synth
# Build for ECP5
make DEVICE=ecp5-45f synthLower resources, simpler peripherals, great for: - Low-power nodes - Educational platforms - Prototyping
| Operation | Software | FPGA | Speedup |
|---|---|---|---|
| FFT 1024-pt | 371 MS/s | 1+ GS/s | 3x+ |
| FIR 128-tap | ~100 MS/s | 500+ MS/s | 5x+ |
| LoRa decode | 45 MS/s | 200+ MS/s | 4x+ |
Plus: Consistent latency, lower power
| IP Core | LUTs | FFs | BRAM | DSP |
|---|---|---|---|---|
| r4w_fft | 15K | 12K | 4 | 8 |
| r4w_fir | 8K | 6K | 2 | 4 |
| r4w_nco | 1.5K | 1K | 0 | 1 |
| r4w_chirp | 3K | 2K | 1 | 2 |
| Total | ~28K | ~21K | 7 | 15 |
Available: 53K LUTs, 106K FFs, 140 BRAM, 220 DSP
Each IP includes Verilog testbenches:
# Run FFT testbench
cd vivado/ip/r4w_fft/tb
iverilog -o fft_tb r4w_fft_tb.v ../hdl/r4w_fft.v
vvp fft_tb
# Run all testbenches
make -C vivado testCompares against software reference.
vivado/ip/<name>/hdl/tb/component.xml for Vivador4w-fpgaDevelop without hardware:
use r4w_fpga::simulator::FpgaSimulator;
// Create simulated FPGA
let sim = FpgaSimulator::new();
// Register IP cores
sim.add_core("fft", Box::new(FftSim::new()));
// Process (uses software models)
sim.process_samples(&samples)?;Same API, no board required.
| Clock | Typical | Use |
|---|---|---|
| PS clock | 100-300 MHz | AXI interface |
| DSP clock | 100-500 MHz | Processing |
| ADC clock | 100-250 MHz | Sample rate |
CDC (Clock Domain Crossing) handled by: - AXI async FIFOs - Handshake protocols
| Tool | Purpose |
|---|---|
| ILA (Integrated Logic Analyzer) | Signal capture |
| VIO (Virtual I/O) | Runtime control |
| Console (UART) | Debug messages |
| Rust tracing | Software logging |
| Component | Status |
|---|---|
| 6 IP Cores | Production ready |
| Zynq Support | Complete |
| Lattice Support | Complete |
| Testbenches | All cores |
| Rust Drivers | Memory-mapped I/O |
| Documentation | FPGA Developer’s Guide |
Accelerate your SDR with R4W FPGA!
R4W - FPGA Acceleration
github.com/joemooney/r4w
Docs: docs/FPGA_DEVELOPERS_GUIDE.md