6.4 Verilog RTL-Level Low Power Design (Part 2)
Category Advanced Verilog Tutorial
Gated Clock
Under normal circumstances, the clock tree is composed of a large number of buffers and inverters. The clock signal is the signal with the highest toggle rate in the design, and the power consumption of the clock tree can account for up to 30% of the entire design's power consumption. Incorporating a gated clock (clock gating) circuit can reduce the switching behavior of the clock tree, saving switching power consumption. At the same time, the reduction of switching behavior of the clock pins will also reduce the internal power consumption of the registers. Therefore, using a gated clock can effectively reduce power consumption.
Implementation Principle
In layman's terms, when a module or flip-flop is not working, closing the clock without affecting the normal function of the logic can be called gated clock logic. The clock is not always present at this time, so it can be vividly called a gated clock.
There are mainly the following 3 methods to implement a gated clock.
1. Using AND Logic
The simplest method is to directly perform an "AND" logic with the clock enable control (gating) signal and the clock.
For example, to perform this operation on the clock of a RAM, the code is as follows:
Example
// use and-logic
module clkgate_basic
(
input clk,
input clken,
input rstn,
input wr_en,
input [3:0] addr,
input [7:0] data,
output [7:0] q
);
//clk gate
wire clk_gate = clk & clken;
ram #(4, 8)
u1_ram16x8
(
.CLK (clk_gate),
.A (addr),
.D (data),
.EN (clken),
.WR (wr_en),
.Q (q)
);
endmodule
The RAM model is as follows:
Example
module ram
#( parameter AW = 2,
parameter DW = 3 )
(
input CLK,
input [AW-1:0] A,
input [DW-1:0] D,
input EN,
input WR, //1 for write and 0 for read
output reg [DW-1:0] Q
);
parameter MASK = 3;
reg [DW-1:0] mem [0:(1<<AW)-1];
always @(posedge CLK) begin
if (EN && WR) begin
mem[A] <= D;
end
else if (EN && !WR) begin
Q <= mem[A];
end
end
endmodule
The testbench code is as follows:
Example
``verilog
timescale 1ns/1ns
module test;
//signals declaration
reg rstn;
reg clk;
reg clken;
reg wr_en;
reg [3:0] addr;
reg [7:0] data;
wire [7:0] q;
initial begin rstn = 0; #7 rstn = 0; end
always begin #50 clk = 0; #50 clk = 1; end
//data logic initial begin clken = 0; wr_en = 0; addr = 4'h3; data = 8'h31; # 53; //(1) normal write and read clken = 1; wr_en = 1; repeat(9) begin @(negedge clk); data = data + 1; addr = addr + 1; end @(negedge clk); clken = 0; wr_en = 0;
//read
#211;
addr = 4'h3;
clken = 1;
repeat(9) begin
@(negedge clk);
addr = addr + 1;
end
@(negedge clk);
//end
clken = 0;
end // initial begin
clkgate_basic u_ram_clkgate ( .clk(clk), .clken(clken), .rstn(rstn), .wr_en(wr_en), .addr(addr), .data(data), .q(q) );
//simulation finish always begin #100; if ($time >= The RTL front-end simulation is shown in the figure below:
The post-synthesis simulation waveforms often appear as shown in the figure below:
Comparison reveals that the EN signal is no longer present after synthesis, and the clock (CP pin) is not always present after clock gating. This ensures the correctness of the logic while reducing clock toggling and lowering power consumption.
Download the source code for this section
6.4 RTL-Level Low Power Design in Verilog (Part 2)