Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

2Building a processor that adds

To start off, let’s build the simplest processor we can: a processor that can process only one instruction: add. Programs will just be a series of adds:

add x18 x18 x10
add x18 x18 x18
add ...

In order to support add in our datapath, we consider the two state elements changed by this instruction’s operations:

Other state elements:

For now, we disconnect DMEM since it is unused for add (). We will add it back when we discuss loads and stores.

Figure 1:For now, we disconnect DMEM since it is unused for add (Figure 1). We will add it back when we discuss loads and stores.

3Tracing the add Datapath

Given the above analysis, we can now connect wires between key elements of our processor.

Figure 2:The add datapath. Use the menu bar to trace through the animation or download a copy of the PDF/PPTX file.

  1. Instruction Fetch:

    • On the rising clock edge, the pc wire updates to the instruction to execute in this cycle. It feeds into IMEM which, after some delay, updates the inst output signal.

    • Increment the PC to the next instruction. The pc wire also feeds into a small adder that adds 4. The output to this small adder is wired to the input of the PC register, set up and ready to update on the next rising clock edge.

  2. Instruction Decode: We only have one instruction, so decoding is simply decoding the specific bits to identify the registers. We use the green card and our R-Type format to introduce a splitter on the inst signal to “index” into the RegFile as follows:

    • Wire inst[7:11] (bits 7 through 11, inclusive) to the rd input of RegFile.

    • Wire inst[15:19] to the rs1 input of RegFile.

    • Wire inst[20:24] to the rs2 input of RegFile.

    After some delay, the RegFile updates the rdata1 and rdata2 signals to the values of R[rs1] and R[rs2], where rs1 and rs2 are determined from the instruction inst.

  3. Execute: Our ALU (see below) should perform the Addition operation. For now, we just mark this block as an Adder. Feed in the two RegFile output signals into the A and B inputs of the “ALU.” After some delay, the

  4. Memory: (We don’t access memory, so skip this.)

  5. Write Back: Connect the output signal of the ALU to the wdata input signal of the RegFile. Set the RegFile control signal RegWEn to 1 to indicate that wdata should be written to R[rd] on the next rising clock edge.

    Around the next rising clock edge, wdata, RegWEn, and rd should be held stable through setup and hold time of RegFile.

4Building a processor that adds and subs

Next, let’s improve our processor by supporting two instructions: add and sub. Example program:

add x18 x18 x10
sub x18 x18 x18
sub ...
add ...

Let’s again consider the state elements changed by this instruction’s operations:

sub is almost the same as add, except now the ALU subtracts. We implement the support for both add and sub by assuming more complexity in the Control Logic “block” (Figure 3).

To implement sub and add, we update control logic.

Figure 3:To implement sub and add, we update control logic.

How do we determine add or sub? Recall our discussion of Design Decisions for R-Type: add and sub have the same opcode and funct3 fields, but different funct7 fields. Importantly, the inst[30] bit is 1 for sub and 0 for add.

5Building an R-Type processor

We can extend our reasoning above to build a processor that implements all R-Type instructions:

6Arithmetic Logic Unit (ALU)

We encourage revisiting this section after reading a few more example datapath traces.

In the previous chapter we implemented a basic four-operation ALU. As shown in Figure 4 and Table 1, the RISC-V ALU takes the same input and output, but the control signal ALUSel is much wider to accommodate the functionality needed for the full RISC-V datapath.

ALU Block.

Figure 4:ALU Block.

Table 1:Signals for ALU Block

NameDirectionBit WidthDescription
AInput32Data to use for Input A in the ALU operation
BInput32Data to use for Input B in the ALU operation
ALUSelInput4Selects which operation the ALU should perform (see Table 2)
ALUResultOutput32Result of the ALU operation

In the full RISC-V implementation, our ALU (Figure 4) must perform arithmetic for many signals:

6.1Course Project Details

Below, we detail the ALU operations that must be implemented for the course project’s datapath.

Table 2:Operations for ALU Block for the course project

ALUSel Value
(for Project)
OperationALU Function
0addALUResult = A + B
1sllALUResult = A << B[4:0]
2sltALUResult = (A < B (signed)) ? 1 : 0
3Unused-
4xorALUResult = A ^ B
5srlALUResult = (unsigned) A >> B[4:0]
6orALUResult = A | B
7andALUResult = A & B
8mulALUResult = (signed) (A * B)[31:0]
9mulhALUResult = (signed) (A * B)[63:32]
10Unused-
11mulhuALUResult = (A * B)[63:32]
12subALUResult = A - B
13sraALUResult = (signed) A >> B[4:0]
14Unused-
15bselALUResult = B

Observations/reminders:

6.2General Multiplication

An ALU that implements the mul, mulh, and mulhu instructions can support parts of the RISC-V “M” extension.

InstructionNameDescriptionTypeOpcodeFunct3Funct7
mul rd rs1 rs2MULtiplyR[rd] = (R[rs1] * R[rs2])[31:0]R011 0011000000 0001
mulh rd rs1 rs2MULtiply Higher BitsR[rd] = (R[rs1] * R[rs2])[63:32] (Signed)R011 00110001000 0001
mulhu rd rs1 rs2MULtiply Higher Bits (Unsigned)R[rd] = (R[rs1] * R[rs2])[63:32] (Unigned)R011 0011011000 0001

The result of multiplying 2 32-bit numbers can be up to 64 bits of information, but we’re limited to 32-bit data lines, so mulh and mulhu are used to get the upper 32 bits of the product. The Multiplier component has a Carry Out output (with the description “the upper bits of the product”) which might be particularly useful for certain multiply operations.