R4W - Rust for Waveforms Comprehensive Guide for FPGA Engineers
This guide is designed for FPGA engineers integrating R4W (Rust for Waveforms) hardware acceleration into SDR systems. It provides detailed technical specifications, register maps, interface protocols, and integration procedures for implementing DSP accelerators on Xilinx Zynq and Lattice FPGA platforms.
R4W provides a complete FPGA acceleration layer:
| Component | Description |
|---|---|
| IP Cores | Pre-built DSP blocks (FFT, FIR, NCO, chirp, correlator, DMA) |
| AXI Wrappers | Standard AXI4-Lite control interfaces |
| Streaming | AXI4-Stream data paths for high-throughput |
| Rust Drivers | Memory-mapped I/O drivers for Linux |
| Simulation | Software FPGA simulator for development without hardware |
| Testbenches | Verilog testbenches for verification |
As an FPGA developer, you are responsible for:
┌───────────────────────────────────────────────────────────────────────────┐
│ R4W FPGA Acceleration Layer │
├───────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ZYNQ PS (ARM Cortex-A9/A53) │ │
│ │ │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │
│ │ │ R4W Rust │ │ Linux Kernel │ │ /dev/mem │ │ │
│ │ │ Application │───►│ (UIO/devmem) │───►│ /dev/uio* │ │ │
│ │ └────────────────┘ └────────────────┘ └───────┬────────┘ │ │
│ │ │ │ │
│ └───────────────────────────────────────────────────────┼─────────────┘ │
│ │ │
│ AXI Interconnect │ │
│ ┌───────────────────────────────────────────────────────┼─────────────┐ │
│ │ │ │ │
│ │ AXI-Lite (Control/Status) AXI-Stream (Data) │ │ │
│ │ │ │ │ │ │
│ └───────────┼───────────────────────────┼───────────────┼─────────────┘ │
│ │ │ │ │
│ ┌───────────┴───────────────────────────┴───────────────┴─────────────┐ │
│ │ FPGA Fabric (PL) │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ r4w_ │ │ r4w_ │ │ r4w_ │ │ r4w_ │ │ r4w_ │ │ │
│ │ │ fft │ │ fir │ │ nco │ │ chirp │ │ dma │ │ │
│ │ │ │ │ │ │ │ │ gen │ │ │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │ │
│ │ ┌────┴────────────┴────────────┴────────────┴────────────┴────┐ │ │
│ │ │ AXI-Stream Router │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────────────┘
| Interface | Use Case | Bandwidth | Latency |
|---|---|---|---|
| AXI-Lite | Control registers, status, configuration | Low | Low |
| AXI-Stream | Sample data, continuous streaming | High | Low |
| DMA | Bulk transfers, buffer management | High | Medium |
| Register | Single sample, debugging | Very Low | Very Low |
| Feature | Specification |
|---|---|
| Target Devices | xc7z010, xc7z020, xc7z030, xc7z045 |
| Toolchain | Vivado 2022.2+ |
| AXI Interface | AXI4-Lite (32-bit), AXI4-Stream |
| Memory Access | /dev/mem, /dev/uio* |
| Typical Clock | 100 MHz fabric, 667 MHz PS |
| Feature | Specification |
|---|---|
| Target Devices | xczu3eg, xczu7ev, xczu9eg |
| Toolchain | Vivado 2022.2+ |
| AXI Interface | AXI4-Lite (32/64-bit), AXI4-Stream |
| Memory Access | /dev/mem, /dev/uio*, CMA |
| Typical Clock | 300 MHz fabric, 1.2 GHz PS |
| Feature | Specification |
|---|---|
| Target Devices | iCE40-HX8K, iCE40-LP8K |
| Toolchain | Yosys + nextpnr-ice40 + IceStorm |
| Interface | SPI slave (custom protocol) |
| Memory Access | FTDI SPI or GPIO bit-bang |
| Typical Clock | 12-48 MHz |
| Feature | Specification |
|---|---|
| Target Devices | LFE5U-25F, LFE5U-45F, LFE5U-85F |
| Toolchain | Yosys + nextpnr-ecp5 + Trellis |
| Interface | SPI slave (custom protocol) |
| Memory Access | FTDI SPI or board-specific |
| Typical Clock | 25-100 MHz (PLL multiplied) |
| IP Core | File | ID | Description |
|---|---|---|---|
r4w_fft |
vivado/ip/r4w_fft/ |
0x52345746 (“R4WF”) | FFT/IFFT up to 4096 points |
r4w_fir |
vivado/ip/r4w_fir/ |
0x52344649 (“R4FI”) | FIR filter up to 256 taps |
r4w_nco |
vivado/ip/r4w_nco/ |
0x5234494F (“R4IO”) | NCO with CORDIC or LUT |
r4w_chirp_gen |
vivado/ip/r4w_chirp_gen/ |
0x52344347 (“R4CG”) | LoRa chirp generator |
r4w_chirp_corr |
vivado/ip/r4w_chirp_corr/ |
0x52344343 (“R4CC”) | Chirp correlator/demodulator |
r4w_dma |
vivado/ip/r4w_dma/ |
0x52344D41 (“R4MA”) | DMA controller |
All IP cores share common parameters:
parameter C_S_AXI_DATA_WIDTH = 32 // AXI data width (always 32)
parameter C_S_AXI_ADDR_WIDTH = 8 // Register space size (2^N bytes)Core-specific parameters:
| Core | Parameter | Default | Description |
|---|---|---|---|
| r4w_fft | MAX_FFT_SIZE |
1024 | Maximum FFT size (power of 2) |
| r4w_fft | DATA_WIDTH |
16 | I/Q sample width |
| r4w_fir | MAX_TAPS |
256 | Maximum filter taps |
| r4w_fir | COEF_WIDTH |
16 | Coefficient bit width |
| r4w_nco | PHASE_WIDTH |
32 | Phase accumulator width |
| r4w_nco | OUTPUT_WIDTH |
16 | Output sample width |
| r4w_nco | CORDIC_STAGES |
16 | CORDIC pipeline stages |
| r4w_chirp_gen | MAX_SF |
12 | Maximum spreading factor |
All IP cores follow a consistent register layout:
| Offset | Name | Access | Description |
|---|---|---|---|
| 0x00 | CTRL | R/W | Control register |
| 0x04 | CONFIG | R/W | Configuration (core-specific) |
| 0x08 | STATUS | R/O | Status register |
| 0x1C | VERSION | R/O | IP core version |
| 0x20 | ID | R/O | IP core identifier |
Base address: Configurable (default 0x43C0_0000 on Zynq)
| Offset | Name | Bits | Access | Description |
|---|---|---|---|---|
| 0x00 | CTRL | [0] | R/W | START - Begin FFT/IFFT |
| [1] | R/W | INVERSE - 1=IFFT, 0=FFT | ||
| [31] | R/W | RESET - Soft reset (auto-clear) | ||
| 0x04 | SIZE | [3:0] | R/W | log2(N): 6=64, 7=128, 8=256, 9=512, 10=1024 |
| 0x08 | STATUS | [0] | R/O | DONE - Operation complete |
| [1] | R/O | BUSY - Operation in progress | ||
| [2] | R/O | ERROR - Error occurred | ||
| 0x10 | DATA_IN | [31:16] | W/O | I sample (signed 16-bit) |
| [15:0] | W/O | Q sample (signed 16-bit) | ||
| 0x14 | DATA_OUT | [31:16] | R/O | I result (signed 16-bit) |
| [15:0] | R/O | Q result (signed 16-bit) | ||
| 0x1C | VERSION | [31:0] | R/O | Version (major.minor.patch) |
| 0x20 | ID | [31:0] | R/O | 0x52345746 (“R4WF”) |
Usage Example (C/Rust pseudo-code):
// Configure FFT size (256 points)
write_reg(base + 0x04, 8); // log2(256) = 8
// Write input samples
for sample in input_samples {
let packed = ((sample.i as u32) << 16) | (sample.q as u16 as u32);
write_reg(base + 0x10, packed);
}
// Start FFT
write_reg(base + 0x00, 0x01);
// Wait for completion
while (read_reg(base + 0x08) & 0x01) == 0 {}
// Read output samples
for i in 0..256 {
let packed = read_reg(base + 0x14);
output[i].i = (packed >> 16) as i16;
output[i].q = packed as i16;
}| Offset | Name | Bits | Access | Description |
|---|---|---|---|---|
| 0x00 | CTRL | [0] | R/W | START - Begin filtering |
| [1] | R/W | RELOAD_TAPS - Load new coefficients | ||
| [31] | R/W | RESET - Soft reset | ||
| 0x04 | NUM_TAPS | [8:0] | R/W | Number of filter taps (1-256) |
| 0x08 | STATUS | [0] | R/O | DONE |
| [1] | R/O | BUSY | ||
| [2] | R/O | ERROR | ||
| 0x10 | DATA_IN | [31:0] | W/O | Packed I/Q input |
| 0x14 | DATA_OUT | [31:0] | R/O | Packed I/Q output |
| 0x1C | VERSION | [31:0] | R/O | Version |
| 0x20 | ID | [31:0] | R/O | 0x52344649 (“R4FI”) |
| 0x100-0x4FF | TAPS[0-255] | [15:0] | R/W | Filter coefficients (signed 16-bit) |
Coefficient Loading:
// Write coefficients
for (i, tap) in coefficients.iter().enumerate() {
let tap_fixed = (*tap * 32767.0) as i16;
write_reg(base + 0x100 + i * 4, tap_fixed as u32);
}
// Set tap count and trigger reload
write_reg(base + 0x04, coefficients.len() as u32);
write_reg(base + 0x00, 0x02); // RELOAD_TAPS bit| Offset | Name | Bits | Access | Description |
|---|---|---|---|---|
| 0x00 | CTRL | [0] | R/W | ENABLE - Start NCO output |
| [1] | R/W | RESET_PHASE - Reset phase accumulator | ||
| [31] | R/W | RESET - Soft reset | ||
| 0x04 | FREQ | [31:0] | R/W | Frequency word (phase increment) |
| 0x08 | PHASE | [31:0] | R/W | Phase offset |
| 0x0C | AMPLITUDE | [15:0] | R/W | Output amplitude (0-32767) |
| 0x10 | STATUS | [0] | R/O | VALID - Output valid |
| 0x14 | I_OUT | [15:0] | R/O | Cosine output (signed) |
| 0x18 | Q_OUT | [15:0] | R/O | Sine output (signed) |
| 0x1C | VERSION | [31:0] | R/O | Version |
| 0x20 | ID | [31:0] | R/O | 0x5234494F (“R4IO”) |
Frequency Calculation:
Frequency Word = (Desired_Frequency / Sample_Rate) * 2^PHASE_WIDTH
Example: 1 kHz at 100 MHz sample rate with 32-bit phase:
freq_word = (1000 / 100_000_000) * 2^32 = 42949
| Offset | Name | Bits | Access | Description |
|---|---|---|---|---|
| 0x00 | CTRL | [0] | R/W | START - Generate chirp |
| [1] | R/W | UPCHIRP - 1=upchirp, 0=downchirp | ||
| [2] | R/W | CONTINUOUS - Repeat indefinitely | ||
| [31] | R/W | RESET - Soft reset | ||
| 0x04 | SF | [3:0] | R/W | Spreading factor (5-12) |
| 0x08 | STATUS | [0] | R/O | DONE |
| [1] | R/O | BUSY | ||
| [2] | R/O | OVERFLOW | ||
| 0x0C | SYMBOL | [11:0] | R/W | Symbol value (0 to 2^SF-1) |
| 0x10 | DATA_OUT | [31:0] | R/O | Packed I/Q output |
| 0x14 | BANDWIDTH | [1:0] | R/W | 0=125k, 1=250k, 2=500k |
| 0x1C | VERSION | [31:0] | R/O | Version |
| 0x20 | ID | [31:0] | R/O | 0x52344347 (“R4CG”) |
| Offset | Name | Bits | Access | Description |
|---|---|---|---|---|
| 0x00 | CTRL | [0] | R/W | START - Begin correlation |
| [31] | R/W | RESET - Soft reset | ||
| 0x04 | SF | [3:0] | R/W | Spreading factor (5-12) |
| 0x08 | STATUS | [0] | R/O | DONE |
| [1] | R/O | BUSY | ||
| [2] | R/O | DETECTED - Symbol detected | ||
| 0x10 | DATA_IN | [31:0] | W/O | Packed I/Q input |
| 0x1C | VERSION | [31:0] | R/O | Version |
| 0x20 | SYMBOL | [11:0] | R/O | Detected symbol value |
| 0x24 | MAGNITUDE | [31:0] | R/O | Correlation magnitude |
| 0x28 | THRESHOLD | [31:0] | R/W | Detection threshold |
| 0x30 | ID | [31:0] | R/O | 0x52344343 (“R4CC”) |
| Offset | Name | Bits | Access | Description |
|---|---|---|---|---|
| 0x00 | CTRL | [0] | R/W | START_TX - Start transmit |
| [1] | R/W | START_RX - Start receive | ||
| [2] | R/W | ABORT - Abort transfer | ||
| [3] | R/W | CONTINUOUS - Continuous mode | ||
| [31] | R/W | RESET - Soft reset | ||
| 0x04 | TX_LEN | [15:0] | R/W | TX transfer length (samples) |
| 0x08 | RX_LEN | [15:0] | R/W | RX transfer length (samples) |
| 0x0C | STATUS | [0] | R/O | TX_BUSY |
| [1] | R/O | RX_BUSY | ||
| [2] | R/O | TX_DONE | ||
| [3] | R/O | RX_DONE | ||
| [4] | R/O | TX_ERROR | ||
| [5] | R/O | RX_ERROR | ||
| 0x10 | TX_ADDR | [31:0] | R/W | TX buffer address |
| 0x14 | RX_ADDR | [31:0] | R/W | RX buffer address |
| 0x18 | IRQ_EN | [2:0] | R/W | Interrupt enables |
| 0x1C | IRQ_STATUS | [2:0] | R/W1C | Interrupt status (write 1 to clear) |
All IP cores implement the standard AXI4-Lite slave interface:
// Clock and Reset
input wire S_AXI_ACLK,
input wire S_AXI_ARESETN,
// Write Address Channel
input wire [C_S_AXI_ADDR_WIDTH-1:0] S_AXI_AWADDR,
input wire [2:0] S_AXI_AWPROT,
input wire S_AXI_AWVALID,
output wire S_AXI_AWREADY,
// Write Data Channel
input wire [C_S_AXI_DATA_WIDTH-1:0] S_AXI_WDATA,
input wire [C_S_AXI_DATA_WIDTH/8-1:0] S_AXI_WSTRB,
input wire S_AXI_WVALID,
output wire S_AXI_WREADY,
// Write Response Channel
output wire [1:0] S_AXI_BRESP,
output wire S_AXI_BVALID,
input wire S_AXI_BREADY,
// Read Address Channel
input wire [C_S_AXI_ADDR_WIDTH-1:0] S_AXI_ARADDR,
input wire [2:0] S_AXI_ARPROT,
input wire S_AXI_ARVALID,
output wire S_AXI_ARREADY,
// Read Data Channel
output wire [C_S_AXI_DATA_WIDTH-1:0] S_AXI_RDATA,
output wire [1:0] S_AXI_RRESP,
output wire S_AXI_RVALID,
input wire S_AXI_RREADYFor high-throughput data paths:
// Input Stream (Slave)
input wire [31:0] S_AXIS_TDATA, // Packed I/Q: [31:16]=I, [15:0]=Q
input wire S_AXIS_TVALID,
output wire S_AXIS_TREADY,
input wire S_AXIS_TLAST, // Optional end-of-frame
// Output Stream (Master)
output wire [31:0] M_AXIS_TDATA,
output wire M_AXIS_TVALID,
input wire M_AXIS_TREADY,
output wire M_AXIS_TLASTI/Q samples are packed as 32-bit words:
Bit 31 16 15 0
┌─────────────────────┬─────────────────────┐
│ I (signed 16-bit) │ Q (signed 16-bit) │
└─────────────────────┴─────────────────────┘
Conversion:
// Rust: float to fixed-point
fn pack_iq(i: f32, q: f32) -> u32 {
let i_fixed = (i * 32767.0) as i16;
let q_fixed = (q * 32767.0) as i16;
((i_fixed as u32) << 16) | (q_fixed as u16 as u32)
}
// Rust: fixed-point to float
fn unpack_iq(packed: u32) -> (f32, f32) {
let i_fixed = (packed >> 16) as i16;
let q_fixed = packed as i16;
(i_fixed as f32 / 32767.0, q_fixed as f32 / 32767.0)
}# Create block design
create_bd_design "r4w_system"
# Add Zynq PS
create_bd_cell -type ip -vlnv xilinx.com:ip:processing_system7:5.5 zynq_ps
# Apply board preset
apply_bd_automation -rule xilinx.com:bd_rule:processing_system7 \
-config {make_external "FIXED_IO, DDR" Master "Disable" Slave "Disable"} \
[get_bd_cells zynq_ps]
# Enable GP0 master
set_property -dict [list \
CONFIG.PCW_USE_M_AXI_GP0 {1} \
] [get_bd_cells zynq_ps]
# Add R4W IP cores
add_files -norecurse {
vivado/ip/common/axi_lite_slave.v
vivado/ip/common/iq_pack.v
vivado/ip/r4w_fft/r4w_fft.v
vivado/ip/r4w_fft/r4w_fft_axi.v
vivado/ip/r4w_fir/r4w_fir.v
vivado/ip/r4w_fir/r4w_fir_axi.v
vivado/ip/r4w_nco/r4w_nco.v
vivado/ip/r4w_nco/r4w_nco_axi.v
vivado/ip/r4w_chirp_gen/r4w_chirp_gen.v
vivado/ip/r4w_chirp_gen/r4w_chirp_gen_axi.v
vivado/ip/r4w_chirp_corr/r4w_chirp_corr.v
vivado/ip/r4w_chirp_corr/r4w_chirp_corr_axi.v
vivado/ip/r4w_dma/r4w_dma.v
vivado/ip/r4w_dma/r4w_dma_axi.v
}# Add interconnect
create_bd_cell -type ip -vlnv xilinx.com:ip:axi_interconnect:2.1 axi_interconnect_0
set_property -dict [list CONFIG.NUM_MI {6}] [get_bd_cells axi_interconnect_0]
# Connect PS to interconnect
connect_bd_intf_net [get_bd_intf_pins zynq_ps/M_AXI_GP0] \
[get_bd_intf_pins axi_interconnect_0/S00_AXI]Recommended address map for Zynq-7000:
| IP Core | Base Address | Size |
|---|---|---|
| r4w_fft | 0x43C0_0000 | 4 KB |
| r4w_fir | 0x43C1_0000 | 4 KB |
| r4w_nco | 0x43C2_0000 | 256 B |
| r4w_chirp_gen | 0x43C3_0000 | 256 B |
| r4w_chirp_corr | 0x43C4_0000 | 256 B |
| r4w_dma | 0x43C5_0000 | 4 KB |
For high-throughput operations, configure AXI DMA:
# Add AXI DMA
create_bd_cell -type ip -vlnv xilinx.com:ip:axi_dma:7.1 axi_dma_0
set_property -dict [list \
CONFIG.c_include_sg {0} \
CONFIG.c_sg_include_stscntrl_strm {0} \
CONFIG.c_include_mm2s {1} \
CONFIG.c_include_s2mm {1} \
CONFIG.c_m_axis_mm2s_tdata_width {32} \
CONFIG.c_s_axis_s2mm_tdata_width {32} \
] [get_bd_cells axi_dma_0]
# Enable HP port for DMA
set_property -dict [list CONFIG.PCW_USE_S_AXI_HP0 {1}] [get_bd_cells zynq_ps]The Rust driver accesses hardware via memory-mapped I/O:
use r4w_fpga::ZynqFpga;
// Auto-detect Zynq platform
let fpga = ZynqFpga::auto_detect()?;
// Or with explicit configuration
let config = ZynqConfig {
register_regions: vec![
("fft".to_string(), 0x43C0_0000, 0x1000),
("fir".to_string(), 0x43C1_0000, 0x1000),
("nco".to_string(), 0x43C2_0000, 0x100),
("chirp_gen".to_string(), 0x43C3_0000, 0x100),
("chirp_corr".to_string(), 0x43C4_0000, 0x100),
("dma".to_string(), 0x43C5_0000, 0x1000),
],
enable_dma: true,
dma_buffer_size: 65536,
..Default::default()
};
let fpga = ZynqFpga::new(config)?;For UIO-based access (recommended over
/dev/mem):
/dts-v1/;
/plugin/;
/ {
fragment@0 {
target = <&amba>;
__overlay__ {
r4w_fft@43C00000 {
compatible = "r4w,fft";
reg = <0x43C00000 0x1000>;
interrupt-parent = <&intc>;
interrupts = <0 29 4>;
};
r4w_fir@43C10000 {
compatible = "r4w,fir";
reg = <0x43C10000 0x1000>;
};
r4w_dma@43C50000 {
compatible = "r4w,dma";
reg = <0x43C50000 0x1000>;
interrupt-parent = <&intc>;
interrupts = <0 30 4>;
};
};
};
};
The iCE40 design uses a simpler SPI-based interface:
┌─────────────────────────────────────────────────────────────┐
│ iCE40 FPGA │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SPI Slave │───►│ Register │───►│ Chirp Gen │ │
│ │ Interface │ │ Decoder │ │ + NCO │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ▲ │ │
│ │ ▼ │
│ ┌─────┴─────┐ ┌──────────────┐ │
│ │ FTDI │ │ DAC I/Q │ │
│ │ SPI │ │ Output │ │
│ └───────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Transaction Format:
Command Byte: [R/W][ADDR_HI]
Address Byte: [ADDR_LO]
Data Bytes: [DATA_3][DATA_2][DATA_1][DATA_0]
R/W: 0=Read, 1=Write
ADDR: 16-bit register address
DATA: 32-bit data (MSB first)
Timing:
CS_N: ──┐ ┌──
└────────────────────────────────────┘
CLK: ____/¯¯\__/¯¯\__/¯¯\__/¯¯\__ ... __/¯¯\__
MOSI: ----<CMD><ADDR_H><ADDR_L><D3><D2><D1><D0>
MISO: --------------------------------<D3><D2><D1><D0> (read)
| Address | Name | Description |
|---|---|---|
| 0x0000 | CTRL | Control register |
| 0x0004 | SF | Spreading factor |
| 0x0008 | STATUS | Status register |
| 0x000C | SYMBOL | Symbol value |
| 0x0010 | NCO_FREQ | NCO frequency word |
| 0x0014 | NCO_PHASE | NCO phase offset |
| 0x0018 | DATA_I | I output (read-only) |
| 0x001C | DATA_Q | Q output (read-only) |
| 0x0020 | ID | 0x52344C49 (“R4LI”) |
| 0x0024 | VERSION | Version number |
The ECP5 version adds:
Using the open-source toolchain:
# iCE40
cd lattice
yosys -p "synth_ice40 -top r4w_top_ice40 -json r4w_ice40.json" \
design/r4w_top_ice40.v ip/r4w_spi_slave/*.v ip/r4w_nco/*.v ip/r4w_chirp_gen/*.v
nextpnr-ice40 --hx8k --package ct256 --json r4w_ice40.json \
--asc r4w_ice40.asc --pcf constraints/ice40_hx8k.pcf
icepack r4w_ice40.asc r4w_ice40.bin
# ECP5
yosys -p "synth_ecp5 -top r4w_top_ecp5 -json r4w_ecp5.json" \
design/r4w_top_ecp5.v ip/r4w_spi_slave/*.v ip/r4w_nco/*.v ip/r4w_chirp_gen/*.v
nextpnr-ecp5 --25k --package CABGA381 --json r4w_ecp5.json \
--lpf constraints/ulx3s_v3.lpf --textcfg r4w_ecp5.config
ecppack r4w_ecp5.config r4w_ecp5.bit# Create project
create_project r4w_zynq ./r4w_zynq -part xc7z020clg400-1
# Add source files
add_files -norecurse [glob vivado/ip/*/*.v]
add_files -norecurse [glob vivado/ip/common/*.v]
# Add constraints
add_files -fileset constrs_1 -norecurse vivado/constraints/zynq_z020.xdc
# Create block design
source vivado/scripts/create_bd.tcl
# Generate wrapper
make_wrapper -files [get_files r4w_system.bd] -top
add_files -norecurse r4w_zynq.srcs/sources_1/bd/r4w_system/hdl/r4w_system_wrapper.v
# Synthesize
launch_runs synth_1 -jobs 8
wait_on_run synth_1
# Implement
launch_runs impl_1 -to_step write_bitstream -jobs 8
wait_on_run impl_1# vivado/constraints/zynq_z020.xdc
# System clock (from PS)
create_clock -period 10.000 -name clk_100 [get_ports FCLK_CLK0]
# Clock domain crossings (if any)
set_clock_groups -asynchronous \
-group [get_clocks clk_100] \
-group [get_clocks spi_clk]
# False paths for configuration registers
set_false_path -from [get_cells -hierarchical -filter {NAME =~ */ctrl_reg*}]
set_false_path -to [get_cells -hierarchical -filter {NAME =~ */status_*}]For Zynq xc7z020:
| IP Core | LUTs | FFs | DSP | BRAM |
|---|---|---|---|---|
| r4w_fft (1024pt) | 2,500 | 3,000 | 4 | 4 |
| r4w_fir (64 tap) | 800 | 1,200 | 2 | 0.5 |
| r4w_nco (CORDIC) | 400 | 600 | 0 | 0 |
| r4w_chirp_gen | 600 | 800 | 0 | 0.5 |
| r4w_chirp_corr | 3,000 | 3,500 | 4 | 4 |
| r4w_dma | 500 | 700 | 0 | 2 |
| Total | ~7,800 | ~9,800 | 10 | 11 |
| Available (z020) | 53,200 | 106,400 | 220 | 140 |
| Utilization | 15% | 9% | 5% | 8% |
Each IP core has an associated testbench:
vivado/sim/
├── tb_r4w_fft.v
├── tb_r4w_fir.v
├── tb_r4w_nco.v
├── tb_r4w_chirp_gen.v
├── tb_r4w_chirp_corr.v
└── tb_r4w_dma.v
lattice/sim/
├── tb_r4w_top_ice40.v
└── tb_r4w_top_ecp5.v
Vivado:
# Create simulation fileset
add_files -fileset sim_1 -norecurse [glob vivado/sim/*.v]
# Set top-level
set_property top tb_r4w_fft [get_filesets sim_1]
# Run simulation
launch_simulation
run 100 usIcarus Verilog (Lattice):
cd lattice
# Compile
iverilog -o sim/tb_r4w_top_ice40.vvp \
sim/tb_r4w_top_ice40.v \
design/r4w_top_ice40.v \
ip/r4w_spi_slave/r4w_spi_slave.v \
ip/r4w_nco/r4w_nco.v \
ip/r4w_chirp_gen/r4w_chirp_gen.v
# Run
vvp sim/tb_r4w_top_ice40.vvp
# View waveforms
gtkwave tb_r4w_top_ice40.vcdThe Rust codebase can generate test vectors for verification:
use r4w_core::dsp::chirp::generate_chirp;
// Generate reference chirp
let reference = generate_chirp(7, true, 125000.0, 500000.0);
// Write to file for Verilog testbench
let mut file = File::create("test_vectors/chirp_sf7_up.mem")?;
for sample in reference {
let i_fixed = (sample.i * 32767.0) as i16;
let q_fixed = (sample.q * 32767.0) as i16;
writeln!(file, "{:04X}{:04X}", i_fixed as u16, q_fixed as u16)?;
}Software developers use the FpgaAccelerator trait.
Your FPGA implementation must support:
pub trait FpgaAccelerator: Send + Sync {
// Device info
fn info(&self) -> FpgaInfo;
fn is_available(&self) -> bool;
fn capabilities(&self) -> FpgaCapabilities;
// Core DSP
fn fft(&self, samples: &[IQSample], inverse: bool) -> FpgaResult<Vec<IQSample>>;
fn fir_filter(&self, samples: &[IQSample], taps: &[f32]) -> FpgaResult<Vec<IQSample>>;
fn complex_multiply(&self, a: &[IQSample], b: &[IQSample]) -> FpgaResult<Vec<IQSample>>;
// Waveform-specific
fn generate_chirp(&self, sf: u8, upchirp: bool) -> FpgaResult<Vec<IQSample>>;
fn chirp_correlate(&self, samples: &[IQSample], sf: u8) -> FpgaResult<(u32, f32)>;
// Streaming
fn start_stream(&mut self, config: StreamConfig) -> FpgaResult<StreamHandle>;
fn stop_stream(&mut self, handle: StreamHandle) -> FpgaResult<()>;
fn write_stream(&mut self, handle: StreamHandle, samples: &[IQSample]) -> FpgaResult<usize>;
fn read_stream(&mut self, handle: StreamHandle, buffer: &mut [IQSample]) -> FpgaResult<usize>;
// Low-level
fn read_register(&self, address: usize) -> FpgaResult<u32>;
fn write_register(&mut self, address: usize, value: u32) -> FpgaResult<()>;
fn reset(&mut self) -> FpgaResult<()>;
}Software will query capabilities at startup:
let caps = fpga.capabilities();
println!("Max FFT: {} points", caps.max_fft_size);
println!("Max FIR: {} taps", caps.max_fir_taps);
println!("DMA: {}", if caps.supports_streaming { "yes" } else { "no" });
for core in &caps.ip_cores {
println!("IP Core: {} v{} at 0x{:08X}",
core.name, core.version, core.base_address);
}Hardware errors should map to these Rust error types:
pub enum FpgaError {
DeviceNotFound(String),
PlatformNotSupported,
MmapFailed { address: usize, reason: String },
Timeout { address: usize, timeout_ms: u32 },
DmaError(String),
StreamError(String),
IpCoreNotAvailable(String),
ConfigError(String),
BufferSizeMismatch { expected: usize, actual: usize },
InvalidAddress(usize),
NotSupported(String),
}| Metric | Target | Notes |
|---|---|---|
| FFT (1024pt) | < 10 μs | At 100 MHz fabric clock |
| FIR (per sample) | < 100 ns | Pipelined, 64 taps |
| Chirp generation | < 1 ms | For SF12 (4096 samples) |
| DMA transfer | > 100 MS/s | Using HP ports |
For maximum throughput, implement pipelining:
// Pipeline stages for FIR filter
always @(posedge clk) begin
// Stage 1: Input register
sample_d1 <= s_axis_tdata;
valid_d1 <= s_axis_tvalid;
// Stage 2: Multiply
for (i = 0; i < NUM_TAPS; i = i + 1) begin
mult_result[i] <= $signed(sample_d1[i]) * $signed(coef[i]);
end
valid_d2 <= valid_d1;
// Stage 3: Accumulate (tree structure)
sum_l1[0] <= mult_result[0] + mult_result[1];
sum_l1[1] <= mult_result[2] + mult_result[3];
// ...
valid_d3 <= valid_d2;
// Stage 4: Final sum
output <= sum_l2[0] + sum_l2[1];
output_valid <= valid_d3;
endFor area-constrained designs, consider time-multiplexing:
// Shared multiplier for NCO and FIR
always @(posedge clk) begin
case (state)
STATE_NCO: begin
mult_a <= phase_sin;
mult_b <= amplitude;
end
STATE_FIR: begin
mult_a <= sample_in;
mult_b <= current_tap;
end
endcase
end
assign mult_result = mult_a * mult_b;//-----------------------------------------------------------------------------
// R4W New IP Core with AXI-Lite Interface
//-----------------------------------------------------------------------------
module r4w_newcore_axi #(
parameter C_S_AXI_DATA_WIDTH = 32,
parameter C_S_AXI_ADDR_WIDTH = 8
)(
// AXI4-Lite Interface
input wire S_AXI_ACLK,
input wire S_AXI_ARESETN,
// ... (standard AXI-Lite ports)
// Optional AXI-Stream ports
input wire [31:0] S_AXIS_TDATA,
input wire S_AXIS_TVALID,
output wire S_AXIS_TREADY,
output wire [31:0] M_AXIS_TDATA,
output wire M_AXIS_TVALID,
input wire M_AXIS_TREADY
);
// Standard constants
localparam IP_ID = 32'h5234xxxx; // "R4xx" pattern
localparam IP_VERSION = 32'h00010000; // 1.0.0
// Register addresses
localparam ADDR_CTRL = 8'h00;
localparam ADDR_CONFIG = 8'h04;
localparam ADDR_STATUS = 8'h08;
localparam ADDR_VERSION = 8'h1C;
localparam ADDR_ID = 8'h20;
// Control registers
reg [31:0] ctrl_reg;
wire start = ctrl_reg[0];
wire soft_reset = ctrl_reg[31];
// Instantiate common AXI slave
axi_lite_slave #(
.C_S_AXI_DATA_WIDTH(C_S_AXI_DATA_WIDTH),
.C_S_AXI_ADDR_WIDTH(C_S_AXI_ADDR_WIDTH)
) axi_slave ( /* ... */ );
// Your processing logic here
endmoduleTo expose your new IP core to software:
crates/r4w-fpga/src/types.rs:pub enum IpCoreType {
Fft { size: usize },
Fir { max_taps: usize },
Nco,
ChirpGenerator,
ChirpCorrelator,
Dma,
NewCore { /* your params */ }, // Add here
}fn probe_newcore(regions: &[MemoryRegion], config: &Config) -> Option<IpCore> {
let base = config.newcore_base_addr?;
for region in regions {
if region.contains(base) {
if let Ok(id) = region.read32(base + 0x20) {
if id == 0x5234xxxx { // Your IP ID
return Some(IpCore {
name: "r4w_newcore".to_string(),
core_type: IpCoreType::NewCore { /* params */ },
base_address: base,
// ...
});
}
}
}
}
None
}| Issue | Symptom | Solution |
|---|---|---|
| No response from IP | Read returns 0xFFFFFFFF | Check address mapping, clock, reset |
| Wrong data | Incorrect results | Verify data format, endianness |
| Timeout | DONE never asserts | Check state machine, enable signals |
| DMA failure | Transfer incomplete | Verify buffer alignment, length |
| Timing failure | Unpredictable behavior | Add pipeline stages, reduce clock |
Insert Integrated Logic Analyzer for runtime debugging:
# Create ILA core
create_debug_core u_ila_0 ila
# Connect probes
set_property port_width 32 [get_debug_ports u_ila_0/probe0]
connect_debug_port u_ila_0/probe0 [get_nets [list {r4w_fft/ctrl_reg[*]}]]
set_property port_width 32 [get_debug_ports u_ila_0/probe1]
connect_debug_port u_ila_0/probe1 [get_nets [list {r4w_fft/status_reg[*]}]]
# Add trigger
set_property port_width 1 [get_debug_ports u_ila_0/probe2]
connect_debug_port u_ila_0/probe2 [get_nets [list r4w_fft/start]]Software can dump all registers for debugging:
fn dump_registers(fpga: &dyn FpgaAccelerator, base: usize, count: usize) {
println!("Register dump at 0x{:08X}:", base);
for i in 0..count {
let addr = base + i * 4;
match fpga.read_register(addr) {
Ok(val) => println!(" 0x{:03X}: 0x{:08X}", i * 4, val),
Err(e) => println!(" 0x{:03X}: ERROR - {:?}", i * 4, e),
}
}
}Use LEDs for quick status indication:
// iCE40/ECP5 status LEDs
assign led[0] = pll_locked; // Green = clock OK
assign led[1] = chirp_busy; // Activity indicator
assign led[2] = chirp_done; // Completion indicator
assign led[3] = !spi_cs_n; // SPI activity
assign led[7:4] = sf[3:0]; // Current SF| Resource | Available | Used (Typical) | Remaining |
|---|---|---|---|
| LUTs | 53,200 | ~8,000 (15%) | 45,200 |
| FFs | 106,400 | ~10,000 (9%) | 96,400 |
| DSP48E1 | 220 | 10 (5%) | 210 |
| BRAM | 140 (36Kb) | 11 (8%) | 129 |
| Resource | Available | Used (Typical) | Remaining |
|---|---|---|---|
| LCs | 7,680 | ~3,500 (46%) | 4,180 |
| FFs | 7,680 | ~2,000 (26%) | 5,680 |
| BRAM | 32 (4Kb) | 4 (13%) | 28 |
| Configuration | LUTs | DSP | Notes |
|---|---|---|---|
| Minimal (NCO + chirp only) | 1,000 | 0 | Good for iCE40 |
| Standard (FFT + FIR + chirp) | 5,000 | 6 | Typical deployment |
| Full (all cores + DMA) | 8,000 | 10 | Full acceleration |
| Extended (multi-FFT) | 15,000 | 16 | High throughput |
┌─────────────────────────────────────────────────────────────────────────┐
│ Development Workflow │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Software Developer FPGA Developer │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Define │ │ Review │ │
│ │ Requirements │──────────────►│ Feasibility │ │
│ └─────────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Design & │ │
│ │ Implement │ │
│ └─────────────────┘ │
│ │ │
│ ┌─────────────────┐ ┌──────┴──────────┐ │
│ │ Write Driver │◄───────────────│ Provide │ │
│ │ Code │ │ Register Map │ │
│ └─────────────────┘ └─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Generate │──────────────►│ Verify with │ │
│ │ Test Vectors │ │ Test Vectors │ │
│ └─────────────────┘ └─────────────────┘ │
│ │ │ │
│ └──────────────┬───────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Integration │ │
│ │ Testing │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Standard format for exchanging test data:
# test_vectors/fft_1024_input.csv
# Format: I,Q (floating point -1.0 to 1.0)
0.5,0.0
0.353553,0.353553
0.0,0.5
-0.353553,0.353553
...
// Load in testbench
initial begin
$readmemh("fft_1024_input.mem", input_memory);
endFor development without hardware, software uses the
SimulatedFpga:
#[cfg(feature = "sim")]
let fpga: Box<dyn FpgaAccelerator> = Box::new(SimulatedFpga::new());
#[cfg(not(feature = "sim"))]
let fpga: Box<dyn FpgaAccelerator> = Box::new(ZynqFpga::auto_detect()?);
// Same API works for both
let result = fpga.fft(&samples, false)?;This allows software developers to work without FPGA hardware while you develop the bitstream.
vivado/
├── ip/
│ ├── common/
│ │ ├── axi_lite_slave.v # Reusable AXI-Lite slave
│ │ └── iq_pack.v # I/Q packing/unpacking
│ ├── r4w_fft/
│ │ ├── r4w_fft.v # FFT core logic
│ │ └── r4w_fft_axi.v # AXI wrapper
│ ├── r4w_fir/
│ ├── r4w_nco/
│ ├── r4w_chirp_gen/
│ ├── r4w_chirp_corr/
│ └── r4w_dma/
├── sim/
│ ├── tb_r4w_fft.v
│ └── ...
├── scripts/
│ └── create_bd.tcl
└── constraints/
├── zynq_z020.xdc
└── zynq_ultra.xdc
lattice/
├── ip/
│ ├── r4w_spi_slave/
│ ├── r4w_nco/
│ └── r4w_chirp_gen/
├── design/
│ ├── r4w_top_ice40.v
│ └── r4w_top_ecp5.v
├── sim/
│ └── tb_r4w_top_ice40.v
└── constraints/
├── ice40_hx8k.pcf
└── ulx3s_v3.lpf
crates/r4w-fpga/
├── src/
│ ├── lib.rs
│ ├── traits.rs # FpgaAccelerator trait
│ ├── types.rs # FpgaInfo, IpCore, etc.
│ ├── error.rs # FpgaError enum
│ ├── zynq/
│ │ ├── mod.rs # ZynqFpga implementation
│ │ ├── config.rs
│ │ ├── dma.rs
│ │ ├── mmap.rs
│ │ └── registers.rs
│ ├── lattice/
│ │ ├── mod.rs # LatticeFpga implementation
│ │ ├── spi.rs
│ │ └── ftdi.rs
│ └── sim/
│ └── mod.rs # SimulatedFpga
└── Cargo.toml
| Core | ID (Hex) | ID (ASCII) |
|---|---|---|
| FFT | 0x52345746 | “R4WF” |
| FIR | 0x52344649 | “R4FI” |
| NCO | 0x5234494F | “R4IO” |
| Chirp Gen | 0x52344347 | “R4CG” |
| Chirp Corr | 0x52344343 | “R4CC” |
| DMA | 0x52344D41 | “R4MA” |
| Lattice iCE40 | 0x52344C49 | “R4LI” |
| Lattice ECP5 | 0x52344C45 | “R4LE” |
| Register | Bit | Meaning |
|---|---|---|
| CTRL | 0 | START |
| CTRL | 31 | SOFT_RESET |
| STATUS | 0 | DONE |
| STATUS | 1 | BUSY |
| STATUS | 2 | ERROR |
# Check Zynq platform
cat /sys/firmware/devicetree/base/compatible
# Read register (devmem)
devmem2 0x43C00020 w # Read FFT ID
# Program iCE40
iceprog r4w_ice40.bin
# Program ECP5
ecpprog r4w_ecp5.bit| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-01-01 | Initial release |
For questions or contributions, please open an issue on the R4W GitHub repository.