Guide for RTL design and SystemVerilog development on the RISC-V CPU. Use when asked about CPU architecture, instruction set, FSM states, memory interface, or SystemVerilog coding conventions.
This is a multi-cycle non-pipelined RISC-V RV32IMACF CPU implementation in SystemVerilog with Rust-based verification.
Key Components:
top (CPU)
├── fetch_buffer (RV32C fetch buffer - manages compressed instruction alignment)
├── decompress (RV32C instruction decompressor - combinational)
├── decoder (Instruction decoder)
├── alu (ALU operations - RV32I + M extension)
│ └── div_unit (Hardware division unit)
├── regfile (Register file)
├── csr_file (Control and Status Registers)
├── branch_unit (Branch comparison)
├── mem_interface (Memory interface logic)
└── writeback_mux (Result selection)
The CPU uses a 12-state finite state machine:
imem_readydmem_readydmem_readyDifferent instruction types require different numbers of cycles:
| Instruction Class | Base Cycles | States |
|---|---|---|
| R-type (ADD, SUB, etc.) | 4 | FETCH → DECODE → EXECUTE → WRITEBACK |
| I-type Arithmetic | 4 | FETCH → DECODE → EXECUTE → WRITEBACK |
| Load (LW, LH, LB) | 5 | FETCH → DECODE → MEM_ADDR → MEM_READ → WRITEBACK |
| Store (SW, SH, SB) | 4 | FETCH → DECODE → MEM_ADDR → MEM_WRITE |
| Branch | 3 | FETCH → DECODE → BRANCH |
| Jump (JAL/JALR) | 4 | FETCH → DECODE → EXECUTE → WRITEBACK |
| Upper Immediate | 4 | FETCH → DECODE → EXECUTE → WRITEBACK |
| M-Extension (MUL/DIV) | 4 | FETCH → DECODE → EXECUTE → WRITEBACK |
| System (FENCE) | 2 | FETCH → DECODE |
| System (ECALL/EBREAK) | 2 | FETCH → DECODE → HALT |
| CSR Operations | 4 | FETCH → DECODE → CSR → WRITEBACK |
Note: Memory latency adds additional cycles. For example, with 3-cycle memory latency, a load instruction takes 5 base cycles + 3 cycles in FETCH + 3 cycles in MEM_READ = 11 total cycles.
The multi-cycle design adds handshaking signals:
Instruction Memory:
imem_req (output): CPU requests instruction fetchimem_ready (input): Memory has valid instruction dataimem_addr (output): Instruction addressimem_data (input): Instruction dataData Memory:
dmem_req (output): CPU requests memory operationdmem_ready (input): Memory operation completedmem_addr (output): Data addressdmem_wdata (output): Write datadmem_rdata (input): Read datadmem_we (output): Write enabledmem_re (output): Read enabledmem_size (output): Operation size (byte/halfword/word)instr_complete (output): High for 1 cycle when instruction finishes executionsnake_case for signal namesimem_, dmem_, alu_, etc.rs1, rs2, rd, funct3, etc.rst for internal RTL modules. The supported FPGA flows map these resets efficiently without extra inversion logic, so active-high remains the project default.always_ff @(posedge clk) (or the local clock domain) and perform reset inside the block with if (rst).valid, pending, or similar control bit, reset the control bit rather than the payload register itself. Write or refresh the payload whenever you capture new data, typically in the same branch where you set/assert the control bit, and downstream logic must ignore the payload whenever that control bit is low.default_nettype Guards (MANDATORY)Every .sv file must begin with `default_nettype none and end with `default_nettype wire:
`default_nettype none
// … file content …
`default_nettype wire
`default_nettype none turns implicit net declarations into compile errors, catching undeclared signals before they silently become 1-bit wires.`default_nettype wire restores the default so the guard does not bleed into other files included after this one..sv files already carry these guards. Any new file added under rtl/ must include them.# Lint SystemVerilog files before committing (RTL files are in subdirectories)
find rtl/common -name '*.sv' -exec verilator --lint-only --Wno-MULTITOP {} +
All SystemVerilog code should pass Verilator linting before being committed.
# Verify RTL can be synthesized to FPGA (whenever SystemVerilog is modified)
(cd rtl/fpga && make)
Important: CI automatically runs FPGA synthesis verification on all SystemVerilog changes. The design must successfully synthesize to the default ECP5 target (ecp5_icepi_zero) using Yosys/nextpnr-ecp5.
Key constraints:
ENABLE_F_EXT=0; check current build reports for the latest resource/timing headroomCRITICAL RULE: When debugging hardware, NEVER rely heavily on abstract reasoning about what signals "should" be doing.
$display() statements to observe actual signal valuesalways_ff @(posedge clk) begin
if (state == S_FETCH) begin
$display("FETCH: pc=%h instr=%h imem_ready=%b", pc, imem_data, imem_ready);
end
if (state == S_EXECUTE) begin
$display("EXECUTE: alu_op=%h rs1_data=%h rs2_data=%h result=%h",
alu_op, rs1_data, rs2_data, alu_result);
end
end
Key Principle: Treat hardware debugging like experimental science - observe first, then reason based on evidence.