4.4 Verilog FIFO Design
Category Advanced Verilog Tutorial
FIFO (First In First Out) is a commonly used memory in asynchronous data transfer. The characteristic of this memory is that data is processed in a first-in, first-out (or last-in, last-out) manner. In fact, the issue of asynchronous transfer of wide-bit data, whether from a fast clock domain to a slow clock domain or vice versa, can be handled using FIFO.
FIFO Principles
Operation Process
After reset, under the control of the write clock and status signals, data is written into the FIFO. The write address of the RAM starts from 0, and with each write operation, the write address pointer increments by one, pointing to the next storage unit. When the FIFO is full, no more data can be written, otherwise data will be lost due to overwriting.
When the FIFO is not empty or full, under the control of the read clock and status signals, data can be read from the FIFO. The read address of the RAM starts from 0, and with each read operation, the read address pointer increments by one, pointing to the next storage unit. When the FIFO is empty, no more data can be read, otherwise the read data will be incorrect.
The storage structure of the FIFO is dual-port RAM, allowing simultaneous read and write operations. A typical asynchronous FIFO structure is shown below. Ports and internal signals will be explained during code writing.
Read and Write Moments
Regarding write moments, as long as the FIFO is not full, write operations can be performed; if the FIFO is full, further writing is prohibited.
Regarding read moments, as long as the FIFO is not empty, read operations can be performed; if the FIFO is empty, further reading is prohibited.
Regardless, during a normal period of reading and writing the FIFO, if both operations are performed simultaneously, the write rate to the FIFO must not exceed the read rate.
Empty State
Upon initial reset, the FIFO contains no data, and the empty status signal is active. When data is written into the FIFO, the empty status signal is pulled low and becomes inactive. When the read data address catches up with the write address, i.e., both addresses are equal, the FIFO is in an empty state.
Since the FIFO is asynchronous, when comparing read and write addresses, synchronization logic with clock delays is required, which takes some time. Therefore, the empty status indication signal is not real-time and has a certain delay. If new data is written into the FIFO during this delay, there will be a situation where the empty status indication signal is active, but data actually exists in the FIFO.
Strictly speaking, this empty status indication is incorrect. However, the purpose of generating the empty status is to prevent data reading from an empty FIFO. When the empty status signal is generated, and there is data in the FIFO, it is equivalent to preemptively determining the empty status signal, and it is safe to stop reading FIFO data at this time. Therefore, this design is practically sound.
Full State
Upon initial reset, the FIFO contains no data, and the full signal is inactive. When data is written into the FIFO, and read operations are not performed or are relatively slow, a full status signal is generated when the write data address exceeds the read data address by one FIFO depth. At this time, the write address and read address are also equal, but the meaning is different.
At this point, an extra 1-bit is often used as an extension bit for both read and write addresses to distinguish between an empty and a full state when the read and write addresses are the same. When the read and write addresses and the extension bit are all the same, it indicates that the number of read and write data is consistent, and the FIFO is in an empty state. If the read and write addresses are the same, but the extension bit is opposite, it indicates that the number of write data has exceeded the number of read data by one FIFO depth, and the FIFO is in a full state. Of course, this condition holds true only if reading from an empty state and writing to a full state are prohibited.
Similarly, due to the existence of asynchronous delay logic, the full status signal is also not real-time. However, it also preemptively determines the full status signal, and not writing to the FIFO at this time does not affect the correctness of the application.
FIFO Design
Design Requirements
To design a FIFO applicable to various scenarios, the following requirements are proposed:
(1) FIFO depth and width are parameterized, with signals indicating empty and full states, and a configurable full status signal. This signal is pulled high when the internal data reaches the set parameter quantity.
(2) The input and output data widths can be different, but the write data and address widths must be consistent with the read data and address widths. For example, if the write data width is 8 bits, the write address width should be 6 bits (64 data points). If the output data width requires 32 bits, the output address width should be 4 bits (16 data points).
(3) The FIFO is asynchronous, meaning read and write control signals come from different clock domains. Before outputting empty and full status signals, read and write address signals must be synchronized using Gray code to reduce errors in data transmission during multi-bit synchronization. The conversion between Gray code and binary is shown in the following diagram.
Dual-Port RAM Design
RAM port parameters are configurable, and read and write widths can be different. It is recommended that the memory array be defined with parameters for long address width and short data width to facilitate selection access to the array variable.
The Verilog description is as follows.
Example
module ramdp
#( parameter AWI = 5 ,
parameter AWO = 7 ,
parameter DWI = 64 ,
parameter DWO = 16
)
(
input CLK_WR , // Write clock
input WR_EN , // Write enable
input [AWI-1:0] ADDR_WR ,// Write address
input [DWI-1:0] D , // Write data
input CLK_RD , // Read clock
input RD_EN , // Read enable
input [AWO-1:0] ADDR_RD ,// Read address
output reg [DWO-1:0] Q // Read data
);
// Output width greater than input width, calculate the expansion factor and corresponding bits
parameter EXTENT = DWO/DWI ;
parameter EXTENT_BIT = AWI-AWO > 0 ? AWI-AWO : 'b1 ;
// Input width greater than output width, calculate the shrink factor and corresponding bits
parameter SHRINK = DWI/DWO ;
parameter SHRINK_BIT = AWO-AWI > 0 ? AWO-AWI : 'b1;
genvar i ;
generate
// Data width expansion (address width reduction)
if (DWO >= DWI) begin
// Write logic, write once per clock
reg [DWI-1:0] mem [(1<<AWI)-1 : 0] ;
always @(posedge CLK_WR) begin
if (WR_EN) begin
mem[ADDR_WR] <= D ;
end
end
// Read logic, read 4 times per clock
for (i=0; i<EXTENT; i=i+1) begin
always @(posedge CLK_RD) begin
if (RD_EN) begin
Q[(i+1)*DWI-1: i*DWI] <= mem[(ADDR_RD*EXTENT) + i ] ;
end
end
end
end
//=================================================
// Data width reduction (address width expansion)
else begin
// Write logic, write 4 times per clock
reg [DWO-1:0] mem [(1<<AWO)-1 : 0] ;
for (i=0; i<SHRINK; i=i+1) begin
always @(posedge CLK_WR) begin
if (WR_EN) begin
mem[(ADDR_WR*SHRINK)+i] <= D[(i+1)*DWO -1: i*DWO] ;
end
end
end
// Read logic, read once per clock
always @(posedge CLK_RD) begin
if (RD_EN) begin
Q <= mem[ADDR_RD] ;
end
end
end
endgenerate
endmodule
Counter Design
The counter is used to generate read and write address information, with configurable width, and does not need to set an end value, allowing it to overflow and automatically restart counting. The Verilog description is as follows.
Example
module ccnt
#(parameter W )
(
input rstn ,
input clk ,
input en ,
output [W-1:0] count
);
reg [W-1:0] count_r ;
always @(posedge clk or negedge rstn) begin
if (!rstn) begin
count_r <= 'b0 ;
end
else if (en) begin
count_r <= count_r + 1'b1 ;
end
end
assign count = count_r ;
endmodule
FIFO Design
This module is the main part of the FIFO, generating read and write control logic, and producing empty, full, and programmable full status signals.
Due to space limitations, only the logic code for the case where the read data width is greater than the write data width is provided here. The code description for the case where the write data width is greater than the read data width can be found in the attachment.
Example
module fifo
#( parameter AWI = 5 ,
parameter AWO = 3 ,
parameter DWI = 4 ,
parameter DWO = 16 ,
parameter PROG_DEPTH = 16) // Programmable depth
(
input rstn, // Read and write use the same reset
input wclk, // Write clock
input winc, // Write enable
input [DWI-1: 0] wdata, // Write data
input rclk, // Read clock
input rinc, // Read enable
output [DWO-1 : 0] rdata, // Read data
output wfull, // Write full flag
output rempty, // Empty flag
output prog_full // Programmable full flag
);
// Output bit width is greater than input bit width, calculate the expansion factor and corresponding bits
parameter EXTENT = DWO/DWI;
parameter EXTENT_BIT = AWI-AWO;
// Output bit width is less than input bit width, calculate the shrink factor and corresponding bits
parameter SHRINK = DWI/DWO;
parameter SHRINK_BIT = AWO-AWI;
//==================== push/wr counter ===============
wire [AWI-1:0] waddr;
wire wover_flag; // Use one extra bit for write address extension
ccnt #(.W(AWI+1))
u_push_cnt(
.rstn (rstn),
.clk (wclk),
.en (winc && !wfull), // Disable write when full
.count ({wover_flag, waddr})
);
//============== pop/rd counter ===================
wire [AWO-1:0] raddr;
wire rover_flag; // Use one extra bit for read address extension
ccnt #(.W(AWO+1))
u_pop_cnt(
.rstn (rstn),
.clk (rclk),
.en (rinc & !rempty), // Disable read when empty
.count ({rover_flag, raddr})
);
//==============================================
// Narrow data in, wide data out
generate
if (DWO >= DWI) begin : EXTENT_WIDTH
// Gray code conversion
wire [AWI:0] wptr = ({wover_flag, waddr}>>1) ^ ({wover_flag, waddr});
// Synchronize write data pointer to read clock domain
reg [AWI:0] rq2_wptr_r0;
reg [AWI:0] rq2_wptr_r1;
always @(posedge rclk or negedge rstn) begin
if (!rstn) begin
rq2_wptr_r0 <= 'b0;
rq2_wptr_r1 <= 'b0;
end
else begin
rq2_wptr_r0 <= wptr;
rq2_wptr_r1 <= rq2_wptr_r0;
end
end
// Gray code conversion
wire [AWI-1:0] raddr_ex = raddr << EXTENT_BIT;
wire [AWI:0] rptr = ({rover_flag, raddr_ex}>>1) ^ ({rover_flag, raddr_ex});
// Synchronize read data pointer to write clock domain
reg [AWI:0] wq2_rptr_r0;
reg [AWI:0] wq2_rptr_r1;
always @(posedge wclk or negedge rstn) begin
if (!rstn) begin
wq2_rptr_r0 <= 'b0;
wq2_rptr_r1 <= 'b0;
end
else begin
wq2_rptr_r0 <= rptr;
wq2_rptr_r1 <= wq2_rptr_r0;
end
end
// Gray code decoding
// If only empty and full status signals are needed, no decoding is required
// Due to the presence of the programmable full status signal, address decoding is convenient for comparison
reg [AWI:0] wq2_rptr_decode;
reg [AWI:0] rq2_wptr_decode;
integer i;
always @(*) begin
wq2_rptr_decode[AWI] = wq2_rptr_r1[AWI];
for (i=AWI-1; i>=0; i=i-1) begin
wq2_rptr_decode[i] = wq2_rptr_decode[i+1] ^ wq2_rptr_r1[i];
end
end
always @(*) begin
rq2_wptr_decode[AWI] = rq2_wptr_r1[AWI];
for (i=AWI-1; i>=0; i=i-1) begin
rq2_wptr_decode[i] = rq2_wptr_decode[i+1] ^ rq2_wptr_r1[i];
end
end
// Read and write addresses and extension bits are identical, indicating an empty state
assign rempty = (rover_flag == rq2_wptr_decode[AWI]) &&
(raddr_ex >= rq2_wptr_decode[AWI-1:0]);
// Read and write addresses are the same, but extension bits are different, indicating a full state
assign wfull = (wover_flag != wq2_rptr_decode[AWI]) &&
(waddr >= wq2_rptr_decode[AWI-1:0]);
// When the extension bits are the same, the write address must not be less than the read address
// When the extension bits are different, the write address part must be less than the read address, and the actual write address should be increased by one FIFO depth
assign prog_full = (wover_flag == wq2_rptr_decode[AWI]) ?
(waddr - wq2_rptr_decode[AWI-1:0] >= PROG_DEPTH-1) :
(waddr + (1<<AWI) - wq2_rptr_decode[AWI-1:0] >= PROG_DEPTH-1);
// Dual-port RAM instantiation
ramdp
#( .AWI (AWI),
.AWO (AWO),
.DWI (DWI),
.DWO (DWO))
u_ramdp
(
.CLK_WR (wclk),
.WR_EN (winc & !wfull), // Disable write when full
.ADDR_WR (waddr),
.D (wdata[DWI-1:0]),
.CLK_RD (rclk),
.RD_EN (rinc & !rempty), // Disable read when empty
.ADDR_RD (raddr),
.Q (rdata[DWO-1:0])
);
end
//==============================================
// Big in and small out
/*
else begin: SHRINK_WIDTH
……
end
*/
endgenerate
endmodule
FIFO Call
The designed FIFO can be called below to complete the asynchronous processing of multi-bit width data transmission.
The write data width is 4 bits, and the write depth is 32.
The read data width is 16 bits, and the read depth is 8, with a configurable full depth of 16.
Example
module fifo_s2b(
input rstn,
input [4-1: 0] din, // Asynchronous write data
input din_clk, // Asynchronous write clock
input din_en, // Asynchronous write enable
output [16-1 : 0] dout, // Synchronized data
input dout_clk, // Synchronization clock
input dout_en // Synchronization data enable
);
wire fifo_empty, fifo_full, prog_full;
wire rd_en_wir;
wire [15:0] dout_wir;
// Disable read when empty, otherwise read continuously
assign rd_en_wir = fifo_empty ? 1'b0 : 1'b1;
fifo #(.AWI(5), .AWO(3), .DWI(4), .DWO(16), .PROG_DEPTH(16))
u_buf_s2b(
.rstn (rstn),
.wclk (din_clk),
.winc (din_en),
.wdata (din),
.rclk (dout_clk),
.rinc (rd_en_wir),
.rdata (dout_wir),
.wfull (fifo_full),
.rempty (fifo_empty),
.prog_full (prog_full)
);
assign dout = dout_wir;
endmodule
.prog_full (prog_full));
// Data and enable after cache synchronization
reg dout_en_r;
always @(posedge dout_clk or negedge rstn) begin
if (!rstn) begin
dout_en_r <= 1'b0;
end
else begin
dout_en_r <= rd_en_wir;
end
end
assign dout = dout_wir;
assign dout_en = dout_en_r;
endmodule
Testbench
Example
`timescale 1ns/1ns
`define SMALL2BIG
module test;
`ifdef SMALL2BIG
reg rstn;
reg clk_slow, clk_fast;
reg [3:0] din;
reg din_en;
wire [15:0] dout;
wire dout_en;
// Reset
initial begin
clk_slow = 0;
clk_fast = 0;
rstn = 0;
#50 rstn = 1;
end
// Read clock clock_slow faster than write clock clk_fast by 1/4
// Ensure read data is slightly faster than write data
parameter CYCLE_WR = 40;
always #(CYCLE_WR/2/4) clk_fast = ~clk_fast;
always #(CYCLE_WR/2-1) clk_slow = ~clk_slow;
// Data generation
initial begin
din = 16'h4321;
din_en = 0;
wait (rstn);
// (1) Test full, prog_full, empty signals
force test.u_data_buf2.u_buf_s2b.rinc = 1'b0;
repeat(32) begin
@(negedge clk_fast);
din_en = 1'b1;
din = {$random()} % 16;
end
@(negedge clk_fast) din_en = 1'b0;
// (2) Test data read and write
#500;
rstn = 0;
#10 rstn = 1;
release test.u_data_buf2.u_buf_s2b.rinc;
repeat(100) begin
@(negedge clk_fast);
din_en = 1'b1;
din = {$random()} % 16;
end
// (3) Stop reading and test empty, full, prog_full signals again
force test.u_data_buf2.u_buf_s2b.rinc = 1'b0;
repeat(18) begin
@(negedge clk_fast);
din_en = 1'b1;
din = {$random()} % 16;
end
end
fifo_s2b u_data_buf2(
.rstn (rstn),
.din (din),
.din_clk (clk_fast),
.din_en (din_en),
.dout (dout),
.dout_clk (clk_slow),
.dout_en (dout_en));
`else
`endif
// Stop simulation
initial begin
forever begin
#100;
if ($time >= 5000) $finish;
end
end
endmodule
Simulation Analysis
Based on the 3-step test stimuli in the testbench, the analysis is as follows:
Test (1): The timing results for FIFO ports and some internal signals are as follows.
As shown in the figure, the FIFO starts writing data, and there is a delay before the empty status signal is pulled low, which is caused by synchronizing read and write address information.
Since no FIFO reading operation is performed at this time, relative to the write data operation, the full and prog_full signals are pulled high with almost no delay.
Test (2): When the FIFO is read and written simultaneously, the port signals of the asynchronous processing module at the top level of the digital system are shown below, with two diagrams displaying the reading process at the beginning and end of data transmission.
As shown in the figure, data can be correctly transmitted at the beginning and end, completing the asynchronous processing of multi-bit wide data across different clock domains.
Test (3): The timing simulation diagram of the entire FIFO read and write behavior and the read stop is shown below.
As shown in the figure, when reading and writing simultaneously, the read empty status signal rempty will be pulled low, indicating that data has been written into the FIFO. On one hand, the read data rate is slightly higher than the write rate, and there will be delays in data transmission, so rempty will have a behavior of being pulled high in the middle process.
During the read and write process, the full and prog_full signals remain low, indicating that the data in the FIFO has not reached a certain amount. When the read operation is stopped, the two full signals are soon pulled high, indicating that the FIFO is full. By carefully comparing the read and write address information, the FIFO behavior is correct.
The complete FIFO design can be found in the attachment, including the asynchronous design and simulation when the input data width is less than the output data width.
Source Code Download
-1.2 Verilog Switch-Level Modeling
-2.2 Verilog Combinational Logic UDP
-2.3 Verilog Sequential Logic UDP
-3.2 Verilog specify Block Statements
-3.3 Verilog Setup and Hold Time
-3.5 Verilog Delay Backannotation
-4.1 Verilog Synchronization and Asynchronization
-4.2 Verilog Clock Domain Crossing: Slow to Fast
-4.3 Verilog Clock Domain Crossing: Fast to Slow
- 4.4 Verilog FIFO Design
-5.1 Verilog Reset Introduction
-5.2 Verilog Clock Introduction
-6.1 Verilog Low Power Introduction
-6.2 Verilog System-Level Low Power Design
-6.3 Verilog RTL-Level Low Power Design (Part 1)
-6.4 Verilog RTL-Level Low Power Design (Part 2)
-7.3 Verilog Random Numbers and Probability Distribution
-7.4 Verilog Real to Integer Conversion
-7.5 Verilog Other System Tasks
-8.3 Verilog TF Subroutine List
-8.5 Verilog ACC Subroutine List
-9.2 Verilog Synthesizable Design
WeChat Subscription
English: