7.5 Verilog FFT Design

Category Verilog Tutorial

FFT (Fast Fourier Transform) is an efficient algorithm for DFT (Discrete Fourier Transform). It plays an irreplaceable role in digital processing methods based on time-frequency transformation analysis.

FFT Principle

Formula Derivation

The computation formula for DFT is:

Among which,

By breaking down the Discrete Fourier Transform formula into odd and even terms, the first N/2 points can be expressed as:

Similarly, the last N/2 points can be expressed as:

It can be seen that the values of the last N/2 points can be completely determined by the intermediate values calculated during the first N/2 points. Continuing the odd-even decomposition of A[k] and B[k] until it becomes a 2-point DFT, thus avoiding a lot of redundant calculations, the process of Fast Discrete Fourier Transform (FFT) is realized.

Algorithm Structure

The structural diagram of the 8-point FFT calculation is as follows.

From the diagram, it can be seen that only a few simple multiplications and additions are needed to complete the Discrete Fourier Transform process, instead of multiplying and accumulating each data point.

Important Features

(1) The concept of levels

Each division is called one level of operation.

Assuming the FFT operation has N points, there are M levels of operations, which satisfy:

Each level of operation is labeled as m = 0, 1, 2, ..., M-1.

To facilitate division and calculation, the number of points N in FFT often takes the integer power of 2.

(2) Butterfly Unit

The FFT calculation structure is composed of several butterfly operation units, and the schematic of each unit is as follows:

The input and output of the butterfly unit satisfy:

Among them,

Each butterfly unit performs one multiplication and two additions during operation.

There are N/2 butterfly units in each level.

Therefore, the number of multiplications and additions required to complete an FFT is:

(3) The concept of groups

Each level of N/2 butterfly units can be divided into several groups, each with the same structure and

For example, when m=0, it can be divided into N/2=4 groups.

When m=1, it can be divided into N/4=2 groups.

When m=M-1, it can only be divided into 1 group.

(4)

In the second level operation of the 8-point FFT, that is, m=1, the butterfly operation factor can be simplified to:

(5) Bit-reversal

For the FFT calculation of N=8 points, the positions corresponding to X(0) ~ X(7) are in binary code as follows:

X(000), X(001), X(010), X(011), X(100), X(101), X(110), X(111)

By reversing the binary code of the positions:

X(000), X(100), X(010), X(110), X(001), X(101), X(011), X(111)

The corresponding decimal positions are:

X(0), X(4), X(2), X(6), X(1), X(5), X(3), X(7)

This corresponds exactly to the order of the first-level input data of the FFT.

This feature is conducive to the programming implementation of FFT.

FFT Design

Design Description

To illustrate the transformation process of FFT with a simple simulation, the number of data points is taken as a smaller value of 8.

If the data is serially input, it needs to be cached first, so the data input mode is designed to be parallel.

The data input is divided into two parts, real and imaginary, so the calculation result is also divided into real and imaginary parts.

The design adopts a pipeline structure and does not consider the consumption of resources for the time being.

To simplify the design structure, a compromise is made here, and the multiplication calculation directly uses the multiplication sign. If the FFT design is applied to reality, the multiplication structure must be replaced with a multiplier that can be pipelined, or use the more efficient multiplier IP provided by the official.

Butterfly Unit Design

The butterfly unit is a fixed-point operation, and the rotation factor needs to be quantified in fixed-point.

With the help of MATLAB, the rotation factor is increased by 8192 times (left-shifted by 13 bits), which can be referred to in the appendix.

To prevent the multiplication and addition in the butterfly operation from causing the bit width to increase step by step, the output data needs to be fixed-width truncation after each level of operation, which can yq_real_r <= xp_real_d1 - xq_wnr_real; yq_imag_r <= xp_imag_d1 - xq_wnr_imag; end end

//(3) discard the low 13bits because of Wnr assign yp_real = {yp_real_r[39], yp_real_r[13+23:13]}; assign yp_imag = {yp_imag_r[39], yp_imag_r[13+23:13]}; assign yq_real = {yq_real_r[39], yq_real_r[13+23:13]}; assign yq_imag = {yq_imag_r[39], yq_imag_r[13+23:13]}; assign valid = en_r[2];

endmodule


**Top-Level Instantiation**

According to the structural diagram of the FFT algorithm, instantiate the butterfly unit to complete the final FFT function.

You can manually instantiate the 12 butterfly units according to the input-output correspondence of each level of butterfly units, or you can use generate for instantiation. In this case, you need to be very familiar with the characteristics of "groups" and "levels" in FFT:

-

(1) For an 8-point FFT design, 3 levels of operations are required, with 4 butterfly units at each level, and the number of groups at each level are 4, 2, and 1, respectively.

-

(2) The distance between the two input ports of a butterfly unit within a group at each level is always

-

(3) The distance between the first input port of the first butterfly unit between adjacent groups at each level is

The instantiation code is as follows.

Here, the one-dimensional and two-dimensional addresses of the matrix signal xm_real (xm_imag) represent the identifiers of the level and group.

When judging the connection relationship between signal ports, a seemingly complex judgment logic is used, and there is also a multiplication sign. In fact, the circuit finally generated is exactly the same as the method of manually writing code to instantiate 12 butterfly units. Because the variables in generate are only auxiliary to generate the actual circuit, and the calculation and judgment of related values have been completed during compilation. These variables will not generate actual circuits, but provide a method for faster module instantiation.

## Example

timescale 1ns/100ps module fft8 ( input clk, input rstn, input en,

input signed [23:0] x0_real, input signed [23:0] x0_imag, input signed [23:0] x1_real, input signed [23:0] x1_imag, input signed [23:0] x2_real, input signed [23:0] x2_imag, input signed [23:0] x3_real, input signed [23:0] x3_imag, input signed [23:0] x4_real, input signed [23:0] x4_imag, input signed [23:0] x5_real, input signed [23:0] x5_imag, input signed [23:0] x6_real, input signed [23:0] x6_imag, input signed [23:0] x7_real, input signed [23:0] x7_imag,

output valid, output signed [23:0] y0_real, output signed [23:0] y0_imag, output signed [23:0] y1_real, output signed [23:0] y1_imag, output signed [23:0] y2_real, output signed [23:0] y2_imag, output signed [23:0] y3_real, output signed [23:0] y3_imag, output signed [23:0] y4_real, output signed [23:0] y4_imag, output signed [23:0] y5_real, output signed [23:0] y5_imag, output signed [23:0] y6_real, output signed [23:0] y6_imag, output signed [23:0] y7_real, output signed [23:0] This is a Chinese to English translation, please provide the English translation for this text. Do not provide any explanations or text apart from the translation. Chinese: (k[3:m] << (m+1)) + (k[m:0]-(1<<m))) + (1<<m) ],

.factor_real(factor_real[k[m:0]<(1<<m)? k[m:0] : k[m:0]-(1<<m) ]), .factor_imag(factor_imag[k[m:0]<(1<<m)? k[m:0] : k[m:0]-(1<<m) ]),

//output data .valid (en_connect[ (m+1)*4 + k ] ), .yp_real (xm_real[ m+1 ][k[m:0] < (1<<m) ? (k[3:m] << (m+1)) + k[m:0] : (k[3:m] << (m+1)) + (k[m:0]-(1<<m))] ), .yp_imag (xm_imag[ m+1 ][(k[m:0]) < (1<<m) ? (k[3:m] << (m+1)) + k[m:0] : (k[3:m] << (m+1)) + (k[m:0]-(1<<m))] ), .yq_real (xm_real[ m+1 ][(k[m:0] < (1<<m) ? (k[3:m] << (m+1)) + k[m:0] : (k[3:m] << (m+1)) + (k[m:0]-(1<<m))) + (1<<m) ]), .yq_imag (xm_imag[ m+1 ][((k[m:0]) < (1<<m) ? (k[3:m] << (m+1)) + k[m:0] : (k[3:m] << (m+1)) + (k[m:0]-(1<<m))) + (1<<m) ]) ); end end endgenerate

assign valid = en_connect[12]; assign y0_real = xm_real[3][0] ; assign y0_imag = xm_imag[3][0] ; assign y1_real = xm_real[3][1] ; assign y1_imag = xm_imag[3][1] ; assign y2_real = xm_real[3][2] ; assign y2_imag = xm_imag[3][2] ; assign y3_real = xm_real[3][3] ; assign y3_imag = xm_imag[3][3] ; assign y4_real = xm_real[3][4] ; assign y4_imag = xm_imag[3][4] ; assign y5_real = xm_real[3][5] ; assign y5_imag = xm_imag[3][5] ; assign y6_real = xm_real[3][6] ; assign y6_imag = xm_imag[3][6] ; assign y7_real = xm_real[3][7] ; assign y7_imag = xm_imag[3][7] ;

endmodule


**testbench**

The testbench is written as follows, mainly for the continuous input of 16 real and complex data streams. Since each FFT only has 8 data points, the data fed in is quite arbitrary and is not regular data such as sine waves.

## Example

`timescale 1ns/100ps module test ; reg clk; reg rstn; reg en ;

reg signed [23:0] x0_real; reg signed [23:0] x0_imag; reg signed [23:0] x1_real; reg signed [23:0] x1_imag; reg signed [23:0] x2_real; reg signed [23:0] x2_imag; reg signed [23:0] x3_real; reg signed [23: x1_real = (x1_real > 22'h3F_ffff) ? 'b0 : x1_real + 1; x2_real = (x2_real > 22'h3F_ffff) ? 'b0 : x2_real + 31; x3_real = (x3_real > 22'h3F_ffff) ? 'b0 : x3_real + 1; x4_real = (x4_real > 22'h3F_ffff) ? 'b0 : x4_real + 23; x5_real = (x5_real > 22'h3F_ffff) ? 'b0 : x5_real + 1; x6_real = (x6_real > 22'h3F_ffff) ? 'b0 : x6_real + 6; x7_real = (x7_real > 22'h3F_ffff) ? 'b0 : x7_real + 1;

x0_imag = (x0_imag > 22'h3F_ffff) ? 'b0 : x0_imag + 2; x1_imag = (x1_imag > 22'h3F_ffff) ? 'b0 : x1_imag + 5; x2_imag = (x2_imag > 22'h3F_ffff) ? 'b0 : x2_imag + 3; x3_imag = (x3_imag > 22'h3F_ffff) ? 'b0 : x3_imag + 6; x4_imag = (x4_imag > 22'h3F_ffff) ? 'b0 : x4_imag + 4; x5_imag = (x5_imag > 22'h3F_ffff) ? 'b0 : x5_imag + 8; x6_imag = (x6_imag > 22'h3F_ffff) ? 'b0 : x6_imag + 11; x7_imag = (x7_imag > 22'h3F_ffff) ? 'b0 : x7_imag + 7; end end

//finish simulation initial #1000 $finish; endmodule

Simulation Results

It can be roughly seen that the FFT results can be output in a pipeline manner.

Using MATLAB's built-in FFT function to calculate the FFT of the same data, the results of the first two sets of data are as follows.

It can be seen that when the first input signal has only the real part effective, the FFT results are exactly the same.

However, when the second input also has a signal in the imaginary part, the results between the two begin to have errors, and sometimes the errors are quite large.

Simulating the FFT process implemented in Verilog with MATLAB, it is found that the FFT results of this process are basically consistent with the FFT results implemented in Verilog.

The two FFT results with errors are compared by taking the absolute value, as shown in the figure below.

It can be seen that the trend of the FFT results is roughly the same, but there are visible errors at some points.

Design Summary:

As mentioned when designing the butterfly unit, the choice of bit width when quantizing the rotation factor will introduce errors.

Moreover, the results of the operation of each butterfly unit will be truncated, which will also introduce errors.

When MATLAB calculates FFT, there is no need to consider the issue of precision, and it performs FFT calculations on the data with the highest precision.

The above will all lead to the difference in the results of the two FFT calculation methods.

Interested scholars can increase the bit width of the rotation factor and the input data, and increase the number of FFT points, and then perform simulation comparison again, the relative error should be reduced.

Appendix: MATLAB Usage

Generating Rotation Factors

An 8-point FFT only needs to use 4 rotation factors. The expansion multiple of the rotation factor is 8192.

Example

``` clear all;close all;clc; %======================================================= % Wnr calcuting %======================================================= for r = 0:3 Wnr_factor = cos(pi/4*r) - j*sin(pi/4*r) ; Wnr_integer = floor(Wnr_factor * 2^13) ;

if (real(Wnr_integer)<0) Wnr_real = real(Wnr_integer) + 2^16 ; %Two's complement for negative numbers else Wnr_real = real(Wnr_integer) ; end if (imag(Wnr_integer)<0) Wnr_imag = imag(Wnr_integer) + 2^16 ; else Wnr_imag = imag(Wnr_integer); end

Follow on WeChat

❮ C Int2Str Windows10 Java Setup ❯