Easy Tutorial
❮ Python Print Without Newline C Bool True False ❯

4.4 Verilog FIFO Design

Category Advanced Verilog Tutorial

FIFO (First In First Out) is a commonly used memory in asynchronous data transfer. The characteristic of this memory is that data is processed in a first-in, first-out (or last-in, last-out) manner. In fact, the issue of asynchronous transfer of wide-bit data, whether from a fast clock domain to a slow clock domain or vice versa, can be handled using FIFO.


FIFO Principles

Operation Process

After reset, under the control of the write clock and status signals, data is written into the FIFO. The write address of the RAM starts from 0, and with each write operation, the write address pointer increments by one, pointing to the next storage unit. When the FIFO is full, no more data can be written, otherwise data will be lost due to overwriting.

When the FIFO is not empty or full, under the control of the read clock and status signals, data can be read from the FIFO. The read address of the RAM starts from 0, and with each read operation, the read address pointer increments by one, pointing to the next storage unit. When the FIFO is empty, no more data can be read, otherwise the read data will be incorrect.

The storage structure of the FIFO is dual-port RAM, allowing simultaneous read and write operations. A typical asynchronous FIFO structure is shown below. Ports and internal signals will be explained during code writing.

Read and Write Moments

Regarding write moments, as long as the FIFO is not full, write operations can be performed; if the FIFO is full, further writing is prohibited.

Regarding read moments, as long as the FIFO is not empty, read operations can be performed; if the FIFO is empty, further reading is prohibited.

Regardless, during a normal period of reading and writing the FIFO, if both operations are performed simultaneously, the write rate to the FIFO must not exceed the read rate.

Empty State

Upon initial reset, the FIFO contains no data, and the empty status signal is active. When data is written into the FIFO, the empty status signal is pulled low and becomes inactive. When the read data address catches up with the write address, i.e., both addresses are equal, the FIFO is in an empty state.

Since the FIFO is asynchronous, when comparing read and write addresses, synchronization logic with clock delays is required, which takes some time. Therefore, the empty status indication signal is not real-time and has a certain delay. If new data is written into the FIFO during this delay, there will be a situation where the empty status indication signal is active, but data actually exists in the FIFO.

Strictly speaking, this empty status indication is incorrect. However, the purpose of generating the empty status is to prevent data reading from an empty FIFO. When the empty status signal is generated, and there is data in the FIFO, it is equivalent to preemptively determining the empty status signal, and it is safe to stop reading FIFO data at this time. Therefore, this design is practically sound.

Full State

Upon initial reset, the FIFO contains no data, and the full signal is inactive. When data is written into the FIFO, and read operations are not performed or are relatively slow, a full status signal is generated when the write data address exceeds the read data address by one FIFO depth. At this time, the write address and read address are also equal, but the meaning is different.

At this point, an extra 1-bit is often used as an extension bit for both read and write addresses to distinguish between an empty and a full state when the read and write addresses are the same. When the read and write addresses and the extension bit are all the same, it indicates that the number of read and write data is consistent, and the FIFO is in an empty state. If the read and write addresses are the same, but the extension bit is opposite, it indicates that the number of write data has exceeded the number of read data by one FIFO depth, and the FIFO is in a full state. Of course, this condition holds true only if reading from an empty state and writing to a full state are prohibited.

Similarly, due to the existence of asynchronous delay logic, the full status signal is also not real-time. However, it also preemptively determines the full status signal, and not writing to the FIFO at this time does not affect the correctness of the application.


FIFO Design

Design Requirements

To design a FIFO applicable to various scenarios, the following requirements are proposed:

Dual-Port RAM Design

RAM port parameters are configurable, and read and write widths can be different. It is recommended that the memory array be defined with parameters for long address width and short data width to facilitate selection access to the array variable.

The Verilog description is as follows.

Example

module ramdp
    #(  parameter       AWI     = 5 ,
        parameter       AWO     = 7 ,
        parameter       DWI     = 64 ,
        parameter       DWO     = 16
        )
    (
        input                   CLK_WR , // Write clock
        input                   WR_EN ,  // Write enable
        input [AWI-1:0]         ADDR_WR ,// Write address
        input [DWI-1:0]         D ,      // Write data
        input                   CLK_RD , // Read clock
        input                   RD_EN ,  // Read enable
        input [AWO-1:0]         ADDR_RD ,// Read address
        output reg [DWO-1:0]    Q        // Read data
     );
   // Output width greater than input width, calculate the expansion factor and corresponding bits
   parameter       EXTENT       = DWO/DWI ;
   parameter       EXTENT_BIT   = AWI-AWO > 0 ? AWI-AWO : 'b1 ;
   // Input width greater than output width, calculate the shrink factor and corresponding bits
   parameter       SHRINK       = DWI/DWO ;
   parameter       SHRINK_BIT   = AWO-AWI > 0 ? AWO-AWI : 'b1;

   genvar i ;
   generate
      // Data width expansion (address width reduction)
      if (DWO >= DWI) begin
         // Write logic, write once per clock
         reg [DWI-1:0]         mem [(1<&lt;AWI)-1 : 0] ;
         always @(posedge CLK_WR) begin
            if (WR_EN) begin
               mem[ADDR_WR]  <= D ;
            end
         end

         // Read logic, read 4 times per clock
         for (i=0; i&lt;EXTENT; i=i+1) begin
            always @(posedge CLK_RD) begin
               if (RD_EN) begin
                  Q[(i+1)*DWI-1: i*DWI]  <= mem[(ADDR_RD*EXTENT) + i ] ;
               end
            end
         end
      end

      //=================================================
      // Data width reduction (address width expansion)
      else begin
         // Write logic, write 4 times per clock
         reg [DWO-1:0]         mem [(1<&lt;AWO)-1 : 0] ;
         for (i=0; i&lt;SHRINK; i=i+1) begin
            always @(posedge CLK_WR) begin
               if (WR_EN) begin
                  mem[(ADDR_WR*SHRINK)+i]  <= D[(i+1)*DWO -1: i*DWO] ;
               end
            end
         end

         // Read logic, read once per clock
         always @(posedge CLK_RD) begin
            if (RD_EN) begin
               Q <= mem[ADDR_RD] ;
            end
         end
      end
   endgenerate

endmodule

Counter Design

The counter is used to generate read and write address information, with configurable width, and does not need to set an end value, allowing it to overflow and automatically restart counting. The Verilog description is as follows.

Example

module ccnt
 #(parameter W )
  (
    input               rstn ,
    input               clk ,
    input               en ,
    output [W-1:0]      count
    );

  reg [W-1:0]          count_r ;
  always @(posedge clk or negedge rstn) begin
     if (!rstn) begin
        count_r        <= 'b0 ;
     end
     else if (en) begin
        count_r        <= count_r + 1'b1 ;
     end
  end
  assign count = count_r ;

endmodule

FIFO Design

This module is the main part of the FIFO, generating read and write control logic, and producing empty, full, and programmable full status signals.

Due to space limitations, only the logic code for the case where the read data width is greater than the write data width is provided here. The code description for the case where the write data width is greater than the read data width can be found in the attachment.

Example

module fifo
    #(  parameter       AWI        = 5 ,
        parameter       AWO        = 3 ,
        parameter       DWI        = 4 ,
        parameter       DWO        = 16 ,
        parameter       PROG_DEPTH = 16) // Programmable depth
    (
        input                   rstn,  // Read and write use the same reset
        input                   wclk,  // Write clock
        input                   winc,  // Write enable
        input [DWI-1: 0]        wdata, // Write data

        input                   rclk,  // Read clock
        input                   rinc,  // Read enable
        output [DWO-1 : 0]      rdata, // Read data

        output                  wfull,   // Write full flag
output rempty, // Empty flag
output prog_full // Programmable full flag
);

// Output bit width is greater than input bit width, calculate the expansion factor and corresponding bits
parameter EXTENT = DWO/DWI;
parameter EXTENT_BIT = AWI-AWO;
// Output bit width is less than input bit width, calculate the shrink factor and corresponding bits
parameter SHRINK = DWI/DWO;
parameter SHRINK_BIT = AWO-AWI;

//==================== push/wr counter ===============
wire [AWI-1:0] waddr;
wire wover_flag; // Use one extra bit for write address extension
ccnt #(.W(AWI+1))
u_push_cnt(
   .rstn (rstn),
   .clk (wclk),
   .en (winc && !wfull), // Disable write when full
   .count ({wover_flag, waddr})
);

//============== pop/rd counter ===================
wire [AWO-1:0] raddr;
wire rover_flag; // Use one extra bit for read address extension
ccnt #(.W(AWO+1))
u_pop_cnt(
   .rstn (rstn),
   .clk (rclk),
   .en (rinc & !rempty), // Disable read when empty
   .count ({rover_flag, raddr})
);

//==============================================
// Narrow data in, wide data out
generate
   if (DWO >= DWI) begin : EXTENT_WIDTH

      // Gray code conversion
      wire [AWI:0] wptr = ({wover_flag, waddr}>>1) ^ ({wover_flag, waddr});
      // Synchronize write data pointer to read clock domain
      reg [AWI:0] rq2_wptr_r0;
      reg [AWI:0] rq2_wptr_r1;
      always @(posedge rclk or negedge rstn) begin
         if (!rstn) begin
            rq2_wptr_r0 <= 'b0;
            rq2_wptr_r1 <= 'b0;
         end
         else begin
            rq2_wptr_r0 <= wptr;
            rq2_wptr_r1 <= rq2_wptr_r0;
         end
      end

      // Gray code conversion
      wire [AWI-1:0] raddr_ex = raddr << EXTENT_BIT;
      wire [AWI:0] rptr = ({rover_flag, raddr_ex}>>1) ^ ({rover_flag, raddr_ex});
      // Synchronize read data pointer to write clock domain
      reg [AWI:0] wq2_rptr_r0;
      reg [AWI:0] wq2_rptr_r1;
      always @(posedge wclk or negedge rstn) begin
         if (!rstn) begin
            wq2_rptr_r0 <= 'b0;
            wq2_rptr_r1 <= 'b0;
         end
         else begin
            wq2_rptr_r0 <= rptr;
            wq2_rptr_r1 <= wq2_rptr_r0;
         end
      end

      // Gray code decoding
      // If only empty and full status signals are needed, no decoding is required
      // Due to the presence of the programmable full status signal, address decoding is convenient for comparison
      reg [AWI:0] wq2_rptr_decode;
      reg [AWI:0] rq2_wptr_decode;
      integer i;
      always @(*) begin
         wq2_rptr_decode[AWI] = wq2_rptr_r1[AWI];
         for (i=AWI-1; i>=0; i=i-1) begin
            wq2_rptr_decode[i] = wq2_rptr_decode[i+1] ^ wq2_rptr_r1[i];
         end
      end
      always @(*) begin
         rq2_wptr_decode[AWI] = rq2_wptr_r1[AWI];
         for (i=AWI-1; i>=0; i=i-1) begin
            rq2_wptr_decode[i] = rq2_wptr_decode[i+1] ^ rq2_wptr_r1[i];
         end
      end

      // Read and write addresses and extension bits are identical, indicating an empty state
      assign rempty = (rover_flag == rq2_wptr_decode[AWI]) &&
                      (raddr_ex >= rq2_wptr_decode[AWI-1:0]);
      // Read and write addresses are the same, but extension bits are different, indicating a full state
      assign wfull = (wover_flag != wq2_rptr_decode[AWI]) &&
                     (waddr >= wq2_rptr_decode[AWI-1:0]);
      // When the extension bits are the same, the write address must not be less than the read address
      // When the extension bits are different, the write address part must be less than the read address, and the actual write address should be increased by one FIFO depth
      assign prog_full = (wover_flag == wq2_rptr_decode[AWI]) ?
                         (waddr - wq2_rptr_decode[AWI-1:0] >= PROG_DEPTH-1) :
                         (waddr + (1<&lt;AWI) - wq2_rptr_decode[AWI-1:0] >= PROG_DEPTH-1);

      // Dual-port RAM instantiation
      ramdp
      #( .AWI (AWI),
         .AWO (AWO),
         .DWI (DWI),
         .DWO (DWO))
      u_ramdp
      (
         .CLK_WR (wclk),
         .WR_EN (winc & !wfull), // Disable write when full
         .ADDR_WR (waddr),
         .D (wdata[DWI-1:0]),
         .CLK_RD (rclk),
         .RD_EN (rinc & !rempty), // Disable read when empty
         .ADDR_RD (raddr),
         .Q (rdata[DWO-1:0])
      );

   end

   //==============================================
   // Big in and small out
   /*
   else begin: SHRINK_WIDTH
      ……
   end
   */
endgenerate
endmodule

FIFO Call

The designed FIFO can be called below to complete the asynchronous processing of multi-bit width data transmission.

The write data width is 4 bits, and the write depth is 32.

The read data width is 16 bits, and the read depth is 8, with a configurable full depth of 16.

Example

module fifo_s2b(
      input rstn,
      input [4-1: 0] din, // Asynchronous write data
      input din_clk, // Asynchronous write clock
      input din_en, // Asynchronous write enable

      output [16-1 : 0] dout, // Synchronized data
      input dout_clk, // Synchronization clock
      input dout_en // Synchronization data enable
);

   wire fifo_empty, fifo_full, prog_full;
   wire rd_en_wir;
   wire [15:0] dout_wir;

   // Disable read when empty, otherwise read continuously
   assign rd_en_wir = fifo_empty ? 1'b0 : 1'b1;

   fifo #(.AWI(5), .AWO(3), .DWI(4), .DWO(16), .PROG_DEPTH(16))
   u_buf_s2b(
      .rstn (rstn),
      .wclk (din_clk),
      .winc (din_en),
      .wdata (din),

      .rclk (dout_clk),
      .rinc (rd_en_wir),
      .rdata (dout_wir),

      .wfull (fifo_full),
      .rempty (fifo_empty),

      .prog_full (prog_full)
   );

   assign dout = dout_wir;

endmodule
.prog_full (prog_full));

// Data and enable after cache synchronization
reg dout_en_r;
always @(posedge dout_clk or negedge rstn) begin
   if (!rstn) begin
      dout_en_r <= 1'b0;
   end
   else begin
      dout_en_r <= rd_en_wir;
   end
end
assign dout = dout_wir;
assign dout_en = dout_en_r;

endmodule

Testbench

Example

`timescale 1ns/1ns
`define SMALL2BIG
module test;

`ifdef SMALL2BIG
   reg rstn;
   reg clk_slow, clk_fast;
   reg [3:0] din;
   reg din_en;
   wire [15:0] dout;
   wire dout_en;

   // Reset
   initial begin
      clk_slow = 0;
      clk_fast = 0;
      rstn = 0;
      #50 rstn = 1;
   end

   // Read clock clock_slow faster than write clock clk_fast by 1/4
   // Ensure read data is slightly faster than write data
   parameter CYCLE_WR = 40;
   always #(CYCLE_WR/2/4) clk_fast = ~clk_fast;
   always #(CYCLE_WR/2-1) clk_slow = ~clk_slow;

   // Data generation
   initial begin
      din = 16'h4321;
      din_en = 0;
      wait (rstn);
      // (1) Test full, prog_full, empty signals
      force test.u_data_buf2.u_buf_s2b.rinc = 1'b0;
      repeat(32) begin
         @(negedge clk_fast);
         din_en = 1'b1;
         din = {$random()} % 16;
      end
      @(negedge clk_fast) din_en = 1'b0;

      // (2) Test data read and write
      #500;
      rstn = 0;
      #10 rstn = 1;
      release test.u_data_buf2.u_buf_s2b.rinc;
      repeat(100) begin
         @(negedge clk_fast);
         din_en = 1'b1;
         din = {$random()} % 16;
      end

      // (3) Stop reading and test empty, full, prog_full signals again
      force test.u_data_buf2.u_buf_s2b.rinc = 1'b0;
      repeat(18) begin
         @(negedge clk_fast);
         din_en = 1'b1;
         din = {$random()} % 16;
      end
   end

   fifo_s2b u_data_buf2(
         .rstn (rstn),
         .din (din),
         .din_clk (clk_fast),
         .din_en (din_en),

         .dout (dout),
         .dout_clk (clk_slow),
         .dout_en (dout_en));

`else
`endif

   // Stop simulation
   initial begin
      forever begin
         #100;
         if ($time >= 5000) $finish;
      end
   end

endmodule

Simulation Analysis

Based on the 3-step test stimuli in the testbench, the analysis is as follows:

Test (1): The timing results for FIFO ports and some internal signals are as follows.

As shown in the figure, the FIFO starts writing data, and there is a delay before the empty status signal is pulled low, which is caused by synchronizing read and write address information.

Since no FIFO reading operation is performed at this time, relative to the write data operation, the full and prog_full signals are pulled high with almost no delay.

Test (2): When the FIFO is read and written simultaneously, the port signals of the asynchronous processing module at the top level of the digital system are shown below, with two diagrams displaying the reading process at the beginning and end of data transmission.

As shown in the figure, data can be correctly transmitted at the beginning and end, completing the asynchronous processing of multi-bit wide data across different clock domains.

Test (3): The timing simulation diagram of the entire FIFO read and write behavior and the read stop is shown below.

As shown in the figure, when reading and writing simultaneously, the read empty status signal rempty will be pulled low, indicating that data has been written into the FIFO. On one hand, the read data rate is slightly higher than the write rate, and there will be delays in data transmission, so rempty will have a behavior of being pulled high in the middle process.

During the read and write process, the full and prog_full signals remain low, indicating that the data in the FIFO has not reached a certain amount. When the read operation is stopped, the two full signals are soon pulled high, indicating that the FIFO is full. By carefully comparing the read and write address information, the FIFO behavior is correct.

The complete FIFO design can be found in the attachment, including the asynchronous design and simulation when the input data width is less than the output data width.

Source Code Download

Download

-0.1 Digital Logic Design

-0.2 Verilog Coding Style

-0.3 Verilog Code Guide

-1.1 Verilog Gate Types

-1.2 Verilog Switch-Level Modeling

-1.3 Verilog Gate Delay

-2.1 Verilog UDP Basics

-2.2 Verilog Combinational Logic UDP

-2.3 Verilog Sequential Logic UDP

-3.1 Verilog Delay Models

-3.2 Verilog specify Block Statements

-3.3 Verilog Setup and Hold Time

-3.4 Verilog Timing Checks

-3.5 Verilog Delay Backannotation

-4.1 Verilog Synchronization and Asynchronization

-4.2 Verilog Clock Domain Crossing: Slow to Fast

-4.3 Verilog Clock Domain Crossing: Fast to Slow

-5.1 Verilog Reset Introduction

-5.2 Verilog Clock Introduction

-5.3 Verilog Clock Division

-5.4 Verilog Clock Switching

-6.1 Verilog Low Power Introduction

-6.2 Verilog System-Level Low Power Design

-6.3 Verilog RTL-Level Low Power Design (Part 1)

-6.4 Verilog RTL-Level Low Power Design (Part 2)

-7.1 Verilog Display Tasks

-7.2 Verilog File Operations

-7.3 Verilog Random Numbers and Probability Distribution

-7.4 Verilog Real to Integer Conversion

-7.5 Verilog Other System Tasks

-8.1 Verilog PLI Introduction

-8.2 Verilog TF Subroutines

-8.3 Verilog TF Subroutine List

-8.4 Verilog ACC Subroutines

-8.5 Verilog ACC Subroutine List

-9.1 Verilog Logic Synthesis

-9.2 Verilog Synthesizable Design

WeChat Subscription

English:

❮ Python Print Without Newline C Bool True False ❯