CPU (RISC, basic)

The following is a design for a simple 32-bit central processing unit (CPU) which can be seen as an example of a Reduced Instruction Set Computer (RISC) with a hardwired programmable control unit. The CPU is pipelined, that is to say it is constructed to allow the overlapping of execution of multiple instructions. The CPU design is quite basic and there is no hardware support for the handling of pipeline hazards.

Instruction Set Architecture

The CPU's instruction set architecture (instructions and registers) is detailed in this section.

Registers

There are 32 register addresses but only 31 registers (1 to 31). Each register is 32-bits wide. Register 0 is always read as zero and loads to it have no effect.

Instructions

For each instruction the following is given:

a description of the instruction in English,
a 32-bits wide bit pattern which identifies the instruction, and
a description of the instruction in the form of a RTL expression.

The symbols used are as follows:

⇐ - a register transfer.
& - the bitwise and (conjunction) operator.
| - the bitwise or (disjunction) operator.
^ - the bitwise exclusive-or operator.
~ - the bitwise negation/complement operator.
= - the equality operator.
≠ - the inequality operator.
if - the conditional operator.
MEMORY[L] - the contents of the memory at location L.
PC - the Program Counter.
REG[A] - the contents of the register with address A.
sign_extend(V) - the value V is sign extended to 32-bits. Example: sign_extend(101) = 11111111111111111111111111111101.
zero_fill(V) - the value V is extended to 32-bits with zeroes. Example: zero_fill(101) = 00000000000000000000000000000101.

Generic Format

The generic format for instructions is given in the following table.

Instruction Format
3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
Opcode						DR					SA					SB
Opcode						DR					SA					IMM

Here DR is Destination Register, SA is Source Register A, SB is Source Register B, and IMM is IMMediate operand.

Instruction List

The list of instructions is now given.

AND

Logical AND with register operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	0	1	1	DR					SA					SB

REG[DR] ⇐ REG[SA] & REG[SB]

ADD

Addition with register operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	0	0	1	DR					SA					SB

REG[DR] ⇐ REG[SA] + REG[SB]

ADI

Addition with signed immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	1	0	0	0	1	DR					SA					IMM

REG[DR] ⇐ REG[SA] + sign_extend(IMM)

ADU

Addition with unsigned immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	0	0	0	0	1	DR					SA					IMM

REG[DR] ⇐ REG[SA] + zero_fill(IMM)

ANI

Logical AND with signed immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	1	0	0	1	1	DR					SA					IMM

REG[DR] ⇐ REG[SA] & zero_fill(IMM)

BNZ

Branch if not zero.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	0	0	1	1	1						SA					IMM

if R[SA] ≠ 0 then PC ⇐ PC + sign_extend(IMM)

BZ

Branch if zero.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	1	1	1						SA					IMM

if R[SA] = 0 then PC ⇐ PC + sign_extend(IMM)

JML

Jump and link.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	0	0	0	DR										IMM

R[DR] ⇐ PC + 1 PC ⇐ PC + sign_extend(IMM)

JMP

Jump.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	1	1	1											IMM

PC ⇐ PC + sign_extend(IMM)

JMR

Jump to register contents.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	1	0	1	DR					SA

PC ⇐ R[SA]

LD

Load.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	1	0	0	DR					SA

R[DR] ⇐ MEMORY[R[SA]]

LSL

Logical shift left.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	1	0	0	1	DR					SA					SB

REG[DR] ⇐ REG[SA] << REG[SB]

LSR

Logical shift right.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	1	0	0	0	DR					SA					SB

REG[DR] ⇐ REG[SA] >> REG[SB]

NOP

No operation.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	0	0	0

Do nothing.

NOT

Logical NOT.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	1	1	0	DR					SA

REG[DR] ⇐ ~REG[SA]

OR

Logical OR with register operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	1	0	0	DR					SA					SB

REG[DR] ⇐ REG[SA] | REG[SB]

ORI

Logical OR with unsigned immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	1	0	1	0	0	DR					SA					IMM

REG[DR] ⇐ REG[SA] | zero_fill(IMM)

SBI

Subtraction with signed immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	1	0	0	1	0	DR					SA					IMM

REG[DR] ⇐ REG[SA] - sign_extend(IMM)

SBU

Subtraction with unsigned immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	0	0	0	1	0	DR					SA					IMM

REG[DR] ⇐ REG[SA] - zero_fill(IMM)

SLT

Set if less than.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	0	1	0	DR					SA					SB

if R[SA] < R[SB] then R[DR] ⇐ 1 else R[DR] ⇐ 0

ST

Store.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
1	1	0	0	1	1						SA					SB

MEMORY[R[SA]] ⇐ R[SB]

SUB

Subtraction with register operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	0	1	0	DR					SA					SB

REG[DR] ⇐ REG[SA] - REG[SB]

XOR

Logical exclusive OR with register operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	0	0	1	0	1	DR					SA					SB

REG[DR] ⇐ REG[SA] ^ REG[SB]

XRI

Logical exclusive OR with unsigned immediate operand.

3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
0	1	0	1	0	1	DR					SA					IMM

REG[DR] ⇐ REG[SA] ^ zero_fill(IMM)

CPU Top-level

The top-level module connects up the five pipeline stages, and also handles the reset signal so that execution can start at address zero and no erroneous memory writes or register loads are generated.

	
module cpu(IADDR2, MEMA4, MW4R, MEMDOUT4, MEMDIN4, IDATA2, RESETn, CLK);
   output [31:0] IADDR2;
   output [31:0] MEMA4;
   output 	 MW4R;
   output [31:0] MEMDOUT4;
   input [31:0]  MEMDIN4;   
   input [31:0]  IDATA2;
   input 	 RESETn;   
   input 	 CLK;
   
   wire [31:0] 	 PC1, PC2, PC2R, PC3;
   wire [31:0] 	 IR2;
   wire 	 RL3, RL4, RL5, RL5R;
   wire [4:0] 	 DA3, DA4, DA5;
   wire [4:0] 	 AA3;
   wire [4:0] 	 BA3;
   wire [1:0] 	 MD3, MD4;
   wire [1:0] 	 BS3, BS4, BS4R;
   wire 	 PS3, PS4;
   wire 	 MW3, MW4;
   wire [3:0] 	 FS3;
   wire [4:0] 	 SH3;
   wire [31:0] 	 BRA4;
   wire [31:0] 	 RAA4;
   wire [31:0] 	 RA3;
   wire [31:0] 	 RB3;
   wire [31:0] 	 D5;
   wire [31:0] 	 A3;
   wire [31:0] 	 B3;
   wire [31:0] 	 DMEM4;
   wire [31:0] 	 DALU4;
   wire 	 Z4, Z4R;
   wire 	 LT4;
   
   multiplexer_2_1 #(32)muxPC2(PC2R, 32'h00000000, PC2, RESETn); // Set PC to 0 on reset.
   multiplexer_2_1 #(1)muxZ(Z4R, 1'b0, Z4, RESETn); // Set Z to 0 on reset (so we can set PC).
   multiplexer_2_1 #(2)muxBS(BS4R, 1'b0, BS4, RESETn); // Set BS to 0 on reset (so we can set PC).
   
   multiplexer_2_1 #(1)muxMW(MW4R, 1'b0, MW4, RESETn); // Set MW to 0 on reset.
   multiplexer_2_1 #(1)muxRL(RL5R, 1'b0, RL5, RESETn); // Set RL to 0 on reset.
   
   register_file rf0(RA3, RB3, D5, AA3, BA3, DA5, RL5R, CLK);  // Read in stage 3, Loaded in stage 5.
   
   
   stage1 s1(PC1, PC2R, BS4R, PS4, Z4R, BRA4, RAA4, CLK);
   stage2 s2(IR2, PC2, IADDR2, PC1, IDATA2, CLK);
   stage3 s3(PC3, RL3, DA3, MD3, BS3, PS3, MW3, FS3, SH3, A3, B3, AA3, BA3, RA3, RB3, PC2, IR2, CLK);
   stage4 s4(BS4, PS4, BRA4, RAA4, Z4, MW4, DMEM4, MEMA4, MEMDOUT4, RL4, DA4, MD4, LT4, DALU4, MEMDIN4, PC3, RL3, DA3, MD3, BS3, PS3, MW3, FS3, SH3, A3, B3, CLK);
   stage5 s5(D5, DA5, RL5, DALU4, DMEM4, LT4, MD4, RL4, DA4);
endmodule // cpu

Stage 1

The first pipeline stage handles the program counter (PC). The PC is either incremented in the usual case, or set to a given address upon a jump. The cause of the jump is determined by BS4[1] and BS4[0].

Jump Instruction
BS4[1]	BS4[0]	Meaning
0	0	No jump
0	1	Jump conditional (BZ/BNZ)
1	0	Jump unconditional register (JMR)
1	1	Jump unconditional immediate (JMP, JML)

Just what address is used for the jump (BRA or RAA) is determined by BS4[1] and W1.

Jump Destination Source
BS4[1]	W1	Meaning
0	0	No jump
0	1	Jump BRA
1	0	Jump RAA
1	1	Jump BRA

	
module stage1(PC1, PC2, BS4, PS4, Z4, BRA4, RAA4, CLK);
   output [31:0] PC1;    // PC output.
   input [31:0]  PC2;    // Incremented PC (current PC+1).  
   input [31:0]  RAA4;   // Register contents for JMR
   input [31:0]  BRA4;   // BRanch Address.
   input 	 Z4;     // Zero status bit.
   input 	 PS4;    // Pass Status/complement status. (BZ or BNZ)
   input [1:0] 	 BS4;    // Branch Select.
   input 	 CLK;    // CPU clock.
   
   wire 	 W0, W1, W2; // Misc. temp. wires.
   wire [31:0] 	 PC;
   
   xor(W0, Z4, PS4);     // W0==1 = BZ or BNZ condition satisfied.
   or(W1, W0, BS4[1]);   // BS[1]==0 = conditional jump, BS[1]==1 = unconditional jump.
   and(W2, BS4[0], W1);  // BS[0]==0 = register address, BS[0]==1 = IMMEDIATE address
   multiplexer_4_1 #(32)muxPC(PC, PC2, BRA4, RAA4, BRA4, BS4[1], W2);
   
   register_parallel_load rplPC(PC1, PC, 1'b1, CLK);  // The PC register.
endmodule // stage1

Stage 2

The second pipeline stage loads the instruction register (IR2) from the instruction memory and increments the PC.

	
module stage2(IR2, PC2, IADDR2, PC1, IDATA2, CLK);
   output [31:0] IR2;    // The Instruction Register.
   output [31:0] PC2;    // The increment PC.
   output [31:0] IADDR2; // The ADDRess sent to the instruction memory.
   input [31:0]  PC1;    // The PC from the previous stage.
   input [31:0]  IDATA2; // The instruction from the instruction memory.
   input 	 CLK;    // The CPU clock.
   
   wire 	 C, V;  // Ignored status bits from the adder/subtractor circuit.
   wire [31:0] 	 PCinc; // Output of the PC increment operation.
   
   assign IADDR2 = PC1;  // IDATA will be the instruction at the address held in the PC.
   
   carry_select_adder_subtractor addsubPC(PCinc, C, V, PC1, 32'h00000001, 1'b0, 1'b0); // Ignore C, and V. No carry in.
   register_parallel_load rplPC(PC2, PCinc, 1'b1, CLK);   // The PC register.
   register_parallel_load rplIR(IR2, IDATA2, 1'b1, CLK);  // The IR register.
endmodule // stage2

Stage 3

The third pipeline stage decodes the contents of the instruction register and generates the control signals used in other pipeline stages. These signals are loaded into their respective registers.

	
module stage3(PC3, RL3, DA3, MD3, BS3, PS3, MW3, FS3, SH3, A3, B3, AA, BA, RA, RB, PC2, IR2, CLK);
   output [31:0] PC3; // Program Counter
   output 	 RL3; // Register Load.
   output [4:0]  DA3; // Register D Address.
   output [1:0]  MD3; // Bus D Mux select.
   output [1:0]  BS3; // Branch Select.
   output 	 PS3; // Pass Status/complement status.
   output 	 MW3; // Memory Read/Write.
   output [3:0]  FS3; // Function unit operation select.
   output [4:0]  SH3; // SHift amount. 
   output [31:0] A3;  // A bus.
   output [31:0] B3;  // B bus.
   output [4:0]  AA;  // Register A Address.
   output [4:0]  BA;  // Register B Address.
   input [31:0]  RA;  // Contents of register RA.
   input [31:0]  RB;  // Contents of register RB.
   input [31:0]  PC2; // Program Counter.
   input [31:0]  IR2; // Instruction Register.
   input 	 CLK; // CPU clock.

   wire 	 CS;
   wire 	 MA;
   wire  	 MB;
   wire [3:0] 	 FS;
   wire 	 MW;
   wire 	 PS;
   wire [1:0] 	 BS;
   wire [1:0] 	 MD;
   wire 	 RL;
   wire [4:0] 	 DA;
   wire [31:0] 	 CONSTANT;
   wire [31:0] 	 A;
   wire [31:0] 	 B;
   
   instruction_decoder id(AA, BA, DA, RL, MD, BS, PS, MW, FS, MB, MA, CS, IR2);
   constant_unit cu0(CONSTANT, IR2[15:0], CS);
   
   multiplexer_2_1 #(32)muxA(A, RA, PC2, MA);
   multiplexer_2_1 #(32)muxB(B, RB, CONSTANT, MB);

   register_parallel_load rplPC(PC3, PC2, 1'b1, CLK);  // The PC register.
   register_parallel_load_1 rplRL(RL3, RL, 1'b1, CLK); // The RL register.
   register_parallel_load_5 rplDA(DA3, DA, 1'b1, CLK); // The DA register.
   register_parallel_load_2 rplMD(MD3, MD, 1'b1, CLK); // The MD register.
   register_parallel_load_2 rplBS(BS3, BS, 1'b1, CLK); // The BS register.
   register_parallel_load_1 rplPS(PS3, PS, 1'b1, CLK); // The PS register.
   register_parallel_load_1 rplMW(MW3, MW, 1'b1, CLK); // The MW register.
   register_parallel_load_4 rplFS(FS3, FS, 1'b1, CLK); // The FS register.
   register_parallel_load_5 rplSH(SH3, IR2[4:0], 1'b1, CLK);  // The SH register.
   register_parallel_load rplA(A3, A, 1'b1, CLK);  // The A register.
   register_parallel_load rplB(B3, B, 1'b1, CLK);  // The B register.
endmodule // stage3

Stage 4

The fourth pipeline stage executes the instruction. The contents of the A and B busses (A3, B3) is sent to the function unit and the data memory. The outputs of the function unit and memory passed on to the next stage.

	
module stage4(BS4, PS4, BRA4, RAA4, Z4, MW4, DMEM4, MEMA4, MEMDOUT4, RL4, DA4, MD4, LT4, DALU4, MEMDIN4, PC3, RL3, DA3, MD3, BS3, PS3, MW3, FS3, SH3, A3, B3, CLK);
   output [1:0]  BS4;   // Branch Select.
   output 	 PS4;   // Pass Status/complement status.
   output [31:0] BRA4;  // BRanch Address.
   output [31:0] RAA4;  // Register A contents for branch Address.
   output 	 Z4;    // Zero status bit.
   output 	 MW4;   // Memory Write.
   output [31:0] DMEM4; // Memory Data DFF.
   output [31:0] MEMA4; // Memory Address.
   output [31:0] MEMDOUT4; // Memory Data Output.
   output 	 RL4;   // Register Load.
   output [4:0]  DA4;   // Register D Address.
   output [1:0]  MD4;   // Mux D select.
   output 	 LT4;   // Less Than status bit.
   output [31:0] DALU4; // ALU Data (result).
   input [31:0]  MEMDIN4; // Memory Data Input.   
   input [31:0]  PC3;   // Program Counter.
   input 	 RL3;   // Register Load.
   input [4:0] 	 DA3;   // Register D Address.
   input [1:0] 	 MD3;   // Mux D select.
   input [1:0] 	 BS3;   // Branch Select.
   input 	 PS3;   // Pass Status/complement status.
   input 	 MW3;   // Memory Write.
   input [3:0] 	 FS3;   // Function Select.
   input [4:0] 	 SH3;   // SHift amount.
   input [31:0]  A3;    // Bus A contents.
   input [31:0]  B3;    // Bus B contents.
   input 	 CLK;   // CPU clock.
   
   wire  C4, V4, N4, Z4; // Function Unit status bits.
   wire  LT; // Less Than status bit.
   wire  Cinc, Vinc; // Adder/subtractor status bits. These are ignored.
   wire [31:0] DALU;
   
   assign BS4 = BS3;
   assign PS4 = PS3;   
   assign MEMA4 = A3;
   assign MEMDOUT4 = B3;
   assign MW4 = MW3;
   
   carry_select_adder_subtractor addsub0(BRA4, Cinc, Vinc, PC3, B3, 1'b0, 1'b0); // Ignore C, and V. No carry in.
   function_unit fu0(DALU, C4, V4, N4, Z4, A3, B3, SH3, FS3);
   xor(LT, N4, V4);
   assign RAA4 = A3;

   register_parallel_load_1 rplRL(RL4, RL3, 1'b1, CLK); // The RL register.
   register_parallel_load_5 rplDA(DA4, DA3, 1'b1, CLK); // The DA register.
   register_parallel_load_2 rplMD(MD4, MD3, 1'b1, CLK); // The MD register.
   register_parallel_load_1 rplLT(LT4, LT, 1'b1, CLK); // The LT register.
   register_parallel_load rplDALU(DALU4, DALU, 1'b1, CLK); // The ALU result register.
   register_parallel_load rplDMEM(DMEM4, MEMDIN4, 1'b1, CLK); // The Memory Data bus register.
endmodule // stage4

Stage 5

The final stage of the pipeline takes the outputs of the previous stage and uses MD4 to determine which to use. The control signal RL4 determines whether or not the result is loaded into the register file. If so, then the input DA4 holds the address of the register to be loaded. The outputs RL5 and DA5 are just copies of RL4 and DA4. These are fed into the register file by the top-level CPU module.

	
module stage5(D5, DA5, RL5, DALU4, DMEM4, LT4, MD4, RL4, DA4);
   output [31:0] D5;   
   output [4:0]  DA5;
   output 	 RL5;
   input [31:0]  DALU4;
   input [31:0]  DMEM4;
   input 	 LT4;
   input [1:0] 	 MD4;
   input 	 RL4;
   input [4:0] 	 DA4;
   
   
   multiplexer_4_1 #(32)muxD(D5, DALU4, DMEM4, {31'h0, LT4}, 32'h0, MD4[1], MD4[0]);
   assign RL5 = RL4;
   assign DA5 = DA4;   
endmodule // stage5

Instruction Decoder

The instruction decoder is used in stage 3 to generate the control signals from the contents of the instruction register. The control signals are hardwired.

The format of instructions is given in the table below.

Instruction Format
3	3	2	2	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1
1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0	9	8	7	6	5	4	3	2	1	0
Opcode						DR					SA					SB
Opcode						DR					SA					IMM

The decoding scheme used is shown in the table below. The fields which cover control signals are as follows:

RL - Load the register in the instruction operand field DR with the result from stage 4.
MD - Selection signal for Mux D found in stage 5.
BS - Selection signal for branch type in stage 1.
PS - Selection signal for the jump condition (Pass Status) in stage 1.
MW - Memory write signal.
FS - Function Unit selection signal.
MB - Selection signal for Mux B in stage 3.
MA - Selection signal for Mux A in stage 3.
CS - Selection signal for the Constant Unit in stage 3.

Instruction Decode
Mnemonic	Opcode	RL	MD	BS	PS	MW	FS	MB	MA	CS
NOP	000000	0	--	00	-	0	----	-	-	-
ADD	000001	1	00	00	-	0	0001	0	0	-
SUB	000010	1	00	00	-	0	0010	0	0	-
AND	000011	1	00	00	-	0	0011	0	0	-
OR	000100	1	00	00	-	0	0100	0	0	-
XOR	000101	1	00	00	-	0	0101	0	0	-
NOT	000110	1	00	00	-	0	0110	--	0	-
LSR	001000	1	00	00	-	0	1000	0	0	-
LSL	001001	1	00	00	-	0	1001		0
ADI	010001	1	00	00	-	0	0001	1	0	1
SBI	010010	1	00	00	-	0	0010	1	0	1
ANI	010011	1	00	00	-	0	0011	1	0	0
ORI	010100	1	00	00	-	0	0100	1	0	0
XRI	010101	1	00	00	-	0	0101	1	0	0
ADU	100001	1	00	00	-	0	0001	1	0	0
SBU	100010	1	00	00	-	0	0010	1	0	0
BNZ	100111	0	--	01	1	0	0111	1	0	1
JML	110000	1	00	11	-	0	0000	1	1	1
SLT	110010	1	10	00	-	0	0010	0	0	-
ST	110011	0	00	00	-	1	----	0	0	-
LD	110100	1	01	00	-	0	----	0	0	-
JMR	110101	0	--	10	-	0	----	-	-	-
BZ	110111	0	--	01	0	0	0111	1	0	1
JMP	111000	0	--	11	-	0	----	1	1	1

	
`define   NOP    6'b000000
`define   ADD    6'b000001
`define   SUB    6'b000010
`define   AND    6'b000011
`define   OR     6'b000100
`define   XOR    6'b000101
`define   NOT    6'b000110
`define   LSR    6'b001000
`define   LSL    6'b001001
`define   ADI    6'b010001
`define   SBI    6'b010010
`define   ANI    6'b010011
`define   ORI    6'b010100
`define   XRI    6'b010101
`define   ADU    6'b100001
`define   SBU    6'b100010
`define   BNZ    6'b100111
`define   JML    6'b110000
`define   SLT    6'b110010
`define   ST     6'b110011
`define   LD     6'b110100
`define   JMR    6'b110101
`define   BZ     6'b110111
`define   JMP    6'b111000


module instruction_decoder(AA, BA, DA, RL, MD, BS, PS, MW, FS, MB, MA, CS, INSTR);
   output [4:0] AA;     // Register A Address.
   output [4:0] BA;     // Register B Address.
   output [4:0] DA;     // Register D Address.
   output 	RL;     // Register Load.
   output [1:0]	MD;     // Bus D mux select.
   output [1:0]	BS;     // Branch Select.
   output 	PS;     // Pass Status/complement status.
   output 	MW;     // Memory Read/Write.
   output [3:0] FS;     // Function unit operation select.
   output 	MB;     // Bus B mux select.
   output 	MA;     // Bus A mux select.
   output 	CS;     // Constant sign extend/zero fill select.
   input [31:0] INSTR;  // The instruction to decode.

   assign FS = INSTR[29:26];
   assign DA = INSTR[25:21];
   assign AA = INSTR[20:16];
   assign BA = INSTR[15:11];

   assign RL = ((INSTR[31:26] == `ADD)
		|| (INSTR[31:26] == `SUB)
		|| (INSTR[31:26] == `AND)
		|| (INSTR[31:26] == `OR)
		|| (INSTR[31:26] == `XOR)
		|| (INSTR[31:26] == `NOT)
		|| (INSTR[31:26] == `LSL)
		|| (INSTR[31:26] == `LSR)
		|| (INSTR[31:26] == `ADI)
		|| (INSTR[31:26] == `SBI)
		|| (INSTR[31:26] == `ANI)
		|| (INSTR[31:26] == `ORI)
		|| (INSTR[31:26] == `XRI)
		|| (INSTR[31:26] == `ADU)
		|| (INSTR[31:26] == `SBU)
		|| (INSTR[31:26] == `JML)
		|| (INSTR[31:26] == `SLT)
		|| (INSTR[31:26] == `LD)) ? 1'b1 : 1'b0;
   
   assign MD = INSTR[31:26] == `LD ? 2'b01 : (INSTR[31:26] == `SLT ? 2'b10 : 2'b00);
   
   assign BS = (INSTR[31:26] == `JML || INSTR[31:26] == `JMP) ? 2'b11 : 
	       (INSTR[31:26] == `JMR ? 2'b10 :
		((INSTR[31:26] == `BZ || INSTR[31:26] == `BNZ) ? 2'b01 :
		 2'b00));
   
   assign  PS =  INSTR[31:26] == `BNZ ? 1'b1 : 1'b0;
   
   assign  MW = INSTR[31:26] == `ST ? 1'b1 : 1'b0;
   
   assign MB = (INSTR[31:26] == `ADI ||
		INSTR[31:26] == `SBI ||
		INSTR[31:26] == `ANI ||
		INSTR[31:26] == `ORI ||
		INSTR[31:26] == `XRI ||
		INSTR[31:26] == `ADU ||
		INSTR[31:26] == `SBU ||
		INSTR[31:26] == `BZ ||
		INSTR[31:26] == `BNZ ||
		INSTR[31:26] == `JMP ||
		INSTR[31:26] == `JML) ? 1'b1 :
	       1'b0;

   assign MA = (INSTR[31:26] == `JML || INSTR[31:26] == `JMP) ? 1'b1 : 1'b0;
   
   assign CS = (INSTR[31:26] == `ADI
		|| INSTR[31:26] == `SBI
		|| INSTR[31:26] == `JML
		|| INSTR[31:26] == `BZ
		|| INSTR[31:26] == `BNZ
		|| INSTR[31:26] == `JMP) ? 1'b1 :
	       1'b0;
endmodule // instruction_decoder

Register File

The register file contains 31 32-bit registers, 1 to 31, and register 0 always contains zero. Registers are read in stage 3, and loaded in stage 5.

	
module register_file(A, B, D, AA, BA, DA, Load, CLK);
   output [31:0] A;     // Data contents of A reg.
   output [31:0] B;     // Data contents of B reg.
   input [31:0]  D;     // Data to load into D reg.
   input [4:0] 	 AA;    // Address of A reg.
   input [4:0] 	 BA;    // Address of B reg.
   input [4:0] 	 DA;    // Address of D reg.
   input 	 Load;  // Enable loading of D reg - active high.
   input 	 CLK;   // Clock.
   
   wire [31:0] 	 Q1, Q2, Q3, Q4, Q5, Q6, Q7;
   wire [31:0] 	 Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15;
   wire [31:0] 	 Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24;
   wire [31:0] 	 Q25, Q26, Q27, Q28, Q29, Q30, Q31;	 
   
   wire 	 dr0, dr1, dr2, dr3, dr4, dr5, dr6, dr7;
   wire 	 dr8, dr9, dr10, dr11, dr12, dr13, dr14, dr15;
   wire 	 dr16, dr17, dr18, dr19, dr20, dr21, dr22, dr23, dr24;
   wire 	 dr25, dr26, dr27, dr28, dr29, dr30, dr31; 	 
   wire 	 load1, load2, load3, load4, load5, load6, load7;
   wire 	 load8, load9, load10, load11, load12, load13, load14, load15;
   wire 	 load16, load17, load18, load19, load20, load21, load22, load23, load24;
   wire 	 load25, load26, load27, load28, load29, load30, load31;
   
   multiplexer_32_1 #(32)muxa(A, 16'h0000, Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, 
			      Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31,
			      AA);
   multiplexer_32_1 #(32)muxb(B, 16'h0000, Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, 
			      Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31,
			      BA);

   // Note: dr0 is unused.
   regaddr_decoder dec(dr0, dr1, dr2, dr3, dr4, dr5, dr6, dr7, dr8, dr9, dr10, dr11, dr12, dr13, dr14, dr15, 
		       dr16, dr17, dr18, dr19, dr20, dr21, dr22, dr23, dr24, dr25, dr26, dr27, dr28, dr29, dr30, dr31,
		       DA[4], DA[3], DA[2], DA[1], DA[0], 1'b1);

   and(load1, dr1, Load);
   and(load2, dr2, Load);
   and(load3, dr3, Load);
   and(load4, dr4, Load);
   and(load5, dr5, Load);
   and(load6, dr6, Load);
   and(load7, dr7, Load);
   and(load8, dr8, Load);
   and(load9, dr9, Load);
   and(load10, dr10, Load);
   and(load11, dr11, Load);
   and(load12, dr12, Load);
   and(load13, dr13, Load);
   and(load14, dr14, Load);
   and(load15, dr15, Load);
   and(load16, dr16, Load);
   and(load17, dr17, Load);
   and(load18, dr18, Load);
   and(load19, dr19, Load);
   and(load20, dr20, Load);
   and(load21, dr21, Load);
   and(load22, dr22, Load);
   and(load23, dr23, Load);
   and(load24, dr24, Load);
   and(load25, dr25, Load);
   and(load26, dr26, Load);
   and(load27, dr27, Load);
   and(load28, dr28, Load);
   and(load29, dr29, Load);
   and(load30, dr30, Load);
   and(load31, dr31, Load);
   
   register_parallel_load r1(Q1, D, load1, CLK);
   register_parallel_load r2(Q2, D, load2, CLK);
   register_parallel_load r3(Q3, D, load3, CLK);
   register_parallel_load r4(Q4, D, load4, CLK);
   register_parallel_load r5(Q5, D, load5, CLK);
   register_parallel_load r6(Q6, D, load6, CLK);
   register_parallel_load r7(Q7, D, load7, CLK);
   register_parallel_load r8(Q8, D, load8, CLK);
   register_parallel_load r9(Q9, D, load9, CLK);
   register_parallel_load r10(Q10, D, load10, CLK);
   register_parallel_load r11(Q11, D, load11, CLK);
   register_parallel_load r12(Q12, D, load12, CLK);
   register_parallel_load r13(Q13, D, load13, CLK);
   register_parallel_load r14(Q14, D, load14, CLK);
   register_parallel_load r15(Q15, D, load15, CLK);
   register_parallel_load r16(Q16, D, load16, CLK);
   register_parallel_load r17(Q17, D, load17, CLK);
   register_parallel_load r18(Q18, D, load18, CLK);
   register_parallel_load r19(Q19, D, load19, CLK);
   register_parallel_load r20(Q20, D, load20, CLK);
   register_parallel_load r21(Q21, D, load21, CLK);
   register_parallel_load r22(Q22, D, load22, CLK);
   register_parallel_load r23(Q23, D, load23, CLK);
   register_parallel_load r24(Q24, D, load24, CLK);
   register_parallel_load r25(Q25, D, load25, CLK);
   register_parallel_load r26(Q26, D, load26, CLK);
   register_parallel_load r27(Q27, D, load27, CLK);
   register_parallel_load r28(Q28, D, load28, CLK);
   register_parallel_load r29(Q29, D, load29, CLK);
   register_parallel_load r30(Q30, D, load30, CLK);
   register_parallel_load r31(Q31, D, load31, CLK);
endmodule // registers

module register_parallel_load(Q, D, Load, CLK);
   output [31:0] Q;
   input [31:0]  D;
   input 	 Load;
   input 	 CLK;
   
   wire 	 Loadn;
   wire 	 w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12;           // Connecting wires.
   wire 	 w13, w14, w15, w16, w17, w18, w19, w20, w21, w22, w23, w24;  // Connecting wires.
   wire 	 w25, w26, w27, w28, w29, w30, w31, w32, w33, w34, w35, w36;  // Connecting wires.
   wire 	 w37, w38, w39, w40, w41, w42, w43, w44, w45, w46, w47, w48;  // Connecting wires.
   wire 	 w49, w50, w51, w52, w53, w54, w55, w56, w57, w58, w59, w60;  // Connecting wires.
   wire 	 w61, w62, w63, w64, w65, w66, w67, w68, w69, w70, w71, w72;  // Connecting wires.
   wire 	 w73, w74, w75, w76, w77, w78, w79, w80, w81, w82, w83, w84;  // Connecting wires.
   wire 	 w85, w86, w87, w88, w89, w90, w91, w92, w93, w94, w95, w96;  // Connecting wires.
   
   not(Loadn, Load);

   and(w1, Q[0], Loadn);
   and(w2, D[0], Load);
   or(w3, w2, w1);

   and(w4, Q[1], Loadn);
   and(w5, D[1], Load);
   or(w6, w5, w4);

   and(w7, Q[2], Loadn);
   and(w8, D[2], Load);
   or(w9, w8, w7);

   and(w10, Q[3], Loadn);
   and(w11, D[3], Load);
   or(w12, w11, w10);

   and(w13, Q[4], Loadn);
   and(w14, D[4], Load);
   or(w15, w14, w13);

   and(w16, Q[5], Loadn);
   and(w17, D[5], Load);
   or(w18, w17, w16);

   and(w19, Q[6], Loadn);
   and(w20, D[6], Load);
   or(w21, w20, w19);

   and(w22, Q[7], Loadn);
   and(w23, D[7], Load);
   or(w24, w23, w22);

   and(w25, Q[8], Loadn);
   and(w26, D[8], Load);
   or(w27, w26, w25);

   and(w28, Q[9], Loadn);
   and(w29, D[9], Load);
   or(w30, w29, w28);

   and(w31, Q[10], Loadn);
   and(w32, D[10], Load);
   or(w33, w32, w31);

   and(w34, Q[11], Loadn);
   and(w35, D[11], Load);
   or(w36, w35, w34);

   and(w37, Q[12], Loadn);
   and(w38, D[12], Load);
   or(w39, w38, w37);

   and(w40, Q[13], Loadn);
   and(w41, D[13], Load);
   or(w42, w41, w40);

   and(w43, Q[14], Loadn);
   and(w44, D[14], Load);
   or(w45, w44, w43);

   and(w46, Q[15], Loadn);
   and(w47, D[15], Load);
   or(w48, w47, w46);
   
   and(w49, Q[16], Loadn);
   and(w50, D[16], Load);
   or(w51, w50, w49);

   and(w52, Q[17], Loadn);
   and(w53, D[17], Load);
   or(w54, w53, w52);

   and(w55, Q[18], Loadn);
   and(w56, D[18], Load);
   or(w57, w56, w55);

   and(w58, Q[19], Loadn);
   and(w59, D[19], Load);
   or(w60, w59, w58);

   and(w61, Q[20], Loadn);
   and(w62, D[20], Load);
   or(w63, w62, w61);

   and(w64, Q[21], Loadn);
   and(w65, D[21], Load);
   or(w66, w65, w64);

   and(w67, Q[22], Loadn);
   and(w68, D[22], Load);
   or(w69, w68, w67);

   and(w70, Q[23], Loadn);
   and(w71, D[23], Load);
   or(w72, w71, w70);

   and(w73, Q[24], Loadn);
   and(w74, D[24], Load);
   or(w75, w74, w73);

   and(w76, Q[25], Loadn);
   and(w77, D[25], Load);
   or(w78, w77, w76);

   and(w79, Q[26], Loadn);
   and(w80, D[26], Load);
   or(w81, w80, w79);

   and(w82, Q[27], Loadn);
   and(w83, D[27], Load);
   or(w84, w83, w82);

   and(w85, Q[28], Loadn);
   and(w86, D[28], Load);
   or(w87, w86, w85);

   and(w88, Q[29], Loadn);
   and(w89, D[29], Load);
   or(w90, w89, w88);

   and(w91, Q[30], Loadn);
   and(w92, D[30], Load);
   or(w93, w92, w91);

   and(w94, Q[31], Loadn);
   and(w95, D[31], Load);
   or(w96, w95, w94);
   
   d_flip_flop_edge_triggered dff0(Q[0], CLK, w3);
   d_flip_flop_edge_triggered dff1(Q[1], CLK, w6);
   d_flip_flop_edge_triggered dff2(Q[2], CLK, w9);
   d_flip_flop_edge_triggered dff3(Q[3], CLK, w12);
   d_flip_flop_edge_triggered dff4(Q[4], CLK, w15);
   d_flip_flop_edge_triggered dff5(Q[5], CLK, w18);
   d_flip_flop_edge_triggered dff6(Q[6], CLK, w21);
   d_flip_flop_edge_triggered dff7(Q[7], CLK, w24);
   d_flip_flop_edge_triggered dff8(Q[8], CLK, w27);
   d_flip_flop_edge_triggered dff9(Q[9], CLK, w30);
   d_flip_flop_edge_triggered dff10(Q[10], CLK, w33);
   d_flip_flop_edge_triggered dff11(Q[11], CLK, w36);
   d_flip_flop_edge_triggered dff12(Q[12], CLK, w39);
   d_flip_flop_edge_triggered dff13(Q[13], CLK, w42);
   d_flip_flop_edge_triggered dff14(Q[14], CLK, w45);
   d_flip_flop_edge_triggered dff15(Q[15], CLK, w48);
   d_flip_flop_edge_triggered dff16(Q[16], CLK, w51);
   d_flip_flop_edge_triggered dff17(Q[17], CLK, w54);
   d_flip_flop_edge_triggered dff18(Q[18], CLK, w57);
   d_flip_flop_edge_triggered dff19(Q[19], CLK, w60);
   d_flip_flop_edge_triggered dff20(Q[20], CLK, w63);
   d_flip_flop_edge_triggered dff21(Q[21], CLK, w66);
   d_flip_flop_edge_triggered dff22(Q[22], CLK, w69);
   d_flip_flop_edge_triggered dff23(Q[23], CLK, w72);
   d_flip_flop_edge_triggered dff24(Q[24], CLK, w75);
   d_flip_flop_edge_triggered dff25(Q[25], CLK, w78);
   d_flip_flop_edge_triggered dff26(Q[26], CLK, w81);
   d_flip_flop_edge_triggered dff27(Q[27], CLK, w84);
   d_flip_flop_edge_triggered dff28(Q[28], CLK, w87);
   d_flip_flop_edge_triggered dff29(Q[29], CLK, w90);
   d_flip_flop_edge_triggered dff30(Q[30], CLK, w93);
   d_flip_flop_edge_triggered dff31(Q[31], CLK, w96);
   
endmodule // register_parallel_load

module register_parallel_load_5(Q, D, Load, CLK);
   output [4:0] Q;
   input [4:0]  D;
   input 	 Load;
   input 	 CLK;
   
   wire 	 Loadn;
   wire 	 w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12;           // Connecting wires.
   wire 	 w13, w14, w15;  // Connecting wires.
   
   not(Loadn, Load);

   and(w1, Q[0], Loadn);
   and(w2, D[0], Load);
   or(w3, w2, w1);

   and(w4, Q[1], Loadn);
   and(w5, D[1], Load);
   or(w6, w5, w4);

   and(w7, Q[2], Loadn);
   and(w8, D[2], Load);
   or(w9, w8, w7);

   and(w10, Q[3], Loadn);
   and(w11, D[3], Load);
   or(w12, w11, w10);

   and(w13, Q[4], Loadn);
   and(w14, D[4], Load);
   or(w15, w14, w13);
   
   d_flip_flop_edge_triggered dff0(Q[0], CLK, w3);
   d_flip_flop_edge_triggered dff1(Q[1], CLK, w6);
   d_flip_flop_edge_triggered dff2(Q[2], CLK, w9);
   d_flip_flop_edge_triggered dff3(Q[3], CLK, w12);
   d_flip_flop_edge_triggered dff4(Q[4], CLK, w15);
   
endmodule // register_parallel_load_5

module register_parallel_load_4(Q, D, Load, CLK);
   output [3:0] Q;
   input [3:0]  D;
   input 	 Load;
   input 	 CLK;
   
   wire 	 Loadn;
   wire 	 w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12;           // Connecting wires.
   
   not(Loadn, Load);

   and(w1, Q[0], Loadn);
   and(w2, D[0], Load);
   or(w3, w2, w1);

   and(w4, Q[1], Loadn);
   and(w5, D[1], Load);
   or(w6, w5, w4);

   and(w7, Q[2], Loadn);
   and(w8, D[2], Load);
   or(w9, w8, w7);

   and(w10, Q[3], Loadn);
   and(w11, D[3], Load);
   or(w12, w11, w10);

   d_flip_flop_edge_triggered dff0(Q[0], CLK, w3);
   d_flip_flop_edge_triggered dff1(Q[1], CLK, w6);
   d_flip_flop_edge_triggered dff2(Q[2], CLK, w9);
   d_flip_flop_edge_triggered dff3(Q[3], CLK, w12);
   
endmodule // register_parallel_load_4

module register_parallel_load_2(Q, D, Load, CLK);
   output [1:0] Q;
   input [1:0]  D;
   input 	 Load;
   input 	 CLK;
   
   wire 	 Loadn;
   wire 	 w1, w2, w3, w4, w5, w6;           // Connecting wires.
   
   not(Loadn, Load);

   and(w1, Q[0], Loadn);
   and(w2, D[0], Load);
   or(w3, w2, w1);

   and(w4, Q[1], Loadn);
   and(w5, D[1], Load);
   or(w6, w5, w4);

   d_flip_flop_edge_triggered dff0(Q[0], CLK, w3);
   d_flip_flop_edge_triggered dff1(Q[1], CLK, w6);
   
endmodule // register_parallel_load_2

module register_parallel_load_1(Q, D, Load, CLK);
   output  Q;
   input   D;
   input   Load;
   input   CLK;
   
   wire    Loadn;
   wire    w1, w2, w3;           // Connecting wires.
   
   not(Loadn, Load);
   
   and(w1, Q, Loadn);
   and(w2, D, Load);
   or(w3, w2, w1);
   
   d_flip_flop_edge_triggered dff0(Q, CLK, w3);
   
endmodule // register_parallel_load_1

module d_flip_flop_edge_triggered(Q, C, D);
   output Q;
   input  C;
   input  D;
   
   reg 	  Q;
   
   always @(posedge C)
     begin
	Q <= D;
     end
endmodule // d_flip_flop_edge_triggered

module regaddr_decoder(X0, X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, 
		   X16, X17, X18, X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29, X30, X31,
		   A4, A3, A2, A1, A0, 
		   E);
   output X0; // Minterm 0
   output X1; // Minterm 1
   output X2; // Minterm 2
   output X3; // Minterm 3
   output X4; // Minterm 4
   output X5; // Minterm 5
   output X6; // Minterm 6
   output X7; // Minterm 7
   output X8; // Minterm 8
   output X9; // Minterm 9
   output X10; // Minterm 10
   output X11; // Minterm 11
   output X12; // Minterm 12
   output X13; // Minterm 13
   output X14; // Minterm 14
   output X15; // Minterm 15
   output X16; // Minterm 16
   output X17; // Minterm 17
   output X18; // Minterm 18
   output X19; // Minterm 19
   output X20; // Minterm 20
   output X21; // Minterm 21
   output X22; // Minterm 22
   output X23; // Minterm 23
   output X24; // Minterm 24
   output X25; // Minterm 25
   output X26; // Minterm 26
   output X27; // Minterm 27
   output X28; // Minterm 28
   output X29; // Minterm 29
   output X30; // Minterm 30
   output X31; // Minterm 31

   input  A4;  // Input binary code most significant bit   
   input  A3;
   input  A2;
   input  A1;  
   input  A0;  // Input binary code least significant bit

   input  E;   // Enable signal

   wire   A4n; // A4 negated
   wire   A3n; // A3 negated
   wire   A2n; // A2 negated
   wire   A1n; // A1 negated
   wire   A0n; // A0 negated

   not(A4n, A4);
   not(A3n, A3);
   not(A2n, A2);
   not(A1n, A1);
   not(A0n, A0);
   
   and(X0, A4n, A3n, A2n, A1n, A0n, E);  // Minterm 0: 00000
   and(X1, A4n, A3n, A2n, A1n, A0, E);   // Minterm 1: 00001
   and(X2, A4n, A3n, A2n, A1, A0n, E);   // Minterm 2: 00010
   and(X3, A4n, A3n, A2n, A1, A0, E);    // Minterm 3: 00011
   and(X4, A4n, A3n, A2, A1n, A0n, E);   // Minterm 4: 00100
   and(X5, A4n, A3n, A2, A1n, A0, E);    // Minterm 5: 00101
   and(X6, A4n, A3n, A2, A1, A0n, E);    // Minterm 6: 00110
   and(X7, A4n, A3n, A2, A1, A0, E);     // Minterm 7: 00111
   and(X8, A4n, A3, A2n, A1n, A0n, E);   // Minterm 8: 01000
   and(X9, A4n, A3, A2n, A1n, A0, E);    // Minterm 9: 01001
   and(X10, A4n, A3, A2n, A1, A0n, E);   // Minterm 10: 01010
   and(X11, A4n, A3, A2n, A1, A0, E);    // Minterm 11: 01011
   and(X12, A4n, A3, A2, A1n, A0n, E);   // Minterm 12: 01100
   and(X13, A4n, A3, A2, A1n, A0, E);    // Minterm 13: 01101
   and(X14, A4n, A3, A2, A1, A0n, E);    // Minterm 14: 01110
   and(X15, A4n, A3, A2, A1, A0, E);     // Minterm 15: 01111
   and(X16, A4, A3n, A2n, A1n, A0n, E);  // Minterm 16: 10000
   and(X17, A4, A3n, A2n, A1n, A0, E);   // Minterm 17: 10001
   and(X18, A4, A3n, A2n, A1, A0n, E);   // Minterm 18: 10010
   and(X19, A4, A3n, A2n, A1, A0, E);    // Minterm 19: 10011
   and(X20, A4, A3n, A2, A1n, A0n, E);   // Minterm 20: 10100
   and(X21, A4, A3n, A2, A1n, A0, E);    // Minterm 21: 10101
   and(X22, A4, A3n, A2, A1, A0n, E);    // Minterm 22: 10110
   and(X23, A4, A3n, A2, A1, A0, E);     // Minterm 23: 10111
   and(X24, A4, A3, A2n, A1n, A0n, E);   // Minterm 24: 11000
   and(X25, A4, A3, A2n, A1n, A0, E);    // Minterm 25: 11001
   and(X26, A4, A3, A2n, A1, A0n, E);    // Minterm 26: 11010
   and(X27, A4, A3, A2n, A1, A0, E);     // Minterm 27: 11011
   and(X28, A4, A3, A2, A1n, A0n, E);    // Minterm 28: 11100
   and(X29, A4, A3, A2, A1n, A0, E);     // Minterm 29: 11101
   and(X30, A4, A3, A2, A1, A0n, E);     // Minterm 30: 11110
   and(X31, A4, A3, A2, A1, A0, E);      // Minterm 31: 11111
endmodule // regaddr_decoder

Function Unit

The function unit comprises the shifter and the ALU. The operations are shown in the table below.

Function Unit Operations
Op	Function
0000	Increment
0001	Addition
0010	Subtraction
0011	Bitwise AND
0100	Bitwise OR
0101	Bitwise XOR
0110	Bitwise NOT
0111	A
1000	Logical Shift Right
1001	Logical Shift Left

	
module function_unit(Y, C, V, N, Z, A, B, SH, Op);
   output [31:0] Y;   // Bus D result.
   output 	 C;   // Carry output.
   output 	 N;   // Negative.
   output 	 V;   // Overflow.
   output 	 Z;   // Zero.
   input [31:0]  A;   // Bus A operand.
   input [31:0]  B;   // Bus B operand.
   input [4:0] 	 SH;  // SHift amount.
   input [3:0] 	 Op;  // Operation.

   wire [31:0] 	 Ya;  // ALU result output.
   wire [31:0] 	 Ys;  // Shifter result output.

   alu alu0(Ya, C, V, A, B, {Op[2], Op[1], Op[0]});   
   shifter shifter0(Ys, A, SH, Op[0]);
   multiplexer_2_1 #(32) mux0(Y, Ya, Ys, Op[3]);
   
   assign N = Y[31];       // Most significant bit is the sign bit in 2's complement.   
   zero z(Z, Y);           // Zero status bit.
endmodule // function_unit

//
// Z == 1 <=> A == 0
//
module zero(Z, A);
   output Z;        // Result. 
   input [31:0]  A; // Operand.

   wire [31:0] 	 Y; // Temp result.
   
   xnor(Y[0], A[0], 1'b0);
   xnor(Y[1], A[1], 1'b0);
   xnor(Y[2], A[2], 1'b0);
   xnor(Y[3], A[3], 1'b0);
   xnor(Y[4], A[4], 1'b0);
   xnor(Y[5], A[5], 1'b0);
   xnor(Y[6], A[6], 1'b0);
   xnor(Y[7], A[7], 1'b0);
   xnor(Y[8], A[8], 1'b0);
   xnor(Y[9], A[9], 1'b0);
   xnor(Y[10], A[10], 1'b0);
   xnor(Y[11], A[11], 1'b0);
   xnor(Y[12], A[12], 1'b0);
   xnor(Y[13], A[13], 1'b0);
   xnor(Y[14], A[14], 1'b0);
   xnor(Y[15], A[15], 1'b0);
   xnor(Y[16], A[16], 1'b0);
   xnor(Y[17], A[17], 1'b0);
   xnor(Y[18], A[18], 1'b0);
   xnor(Y[19], A[19], 1'b0);
   xnor(Y[20], A[20], 1'b0);
   xnor(Y[21], A[21], 1'b0);
   xnor(Y[22], A[22], 1'b0);
   xnor(Y[23], A[23], 1'b0);
   xnor(Y[24], A[24], 1'b0);
   xnor(Y[25], A[25], 1'b0);
   xnor(Y[26], A[26], 1'b0);
   xnor(Y[27], A[27], 1'b0);
   xnor(Y[28], A[28], 1'b0);
   xnor(Y[29], A[20], 1'b0);
   xnor(Y[30], A[30], 1'b0);
   xnor(Y[31], A[31], 1'b0);
    
   and(Z, 
       Y[0], Y[1], Y[2], Y[3], 
       Y[4], Y[5], Y[6], Y[7], 
       Y[8], Y[9], Y[10], Y[11], 
       Y[12], Y[13], Y[14], Y[15], 
       Y[16], Y[17], Y[18], Y[19],
       Y[20], Y[21], Y[22], Y[23],
       Y[24], Y[25], Y[26], Y[27],
       Y[28], Y[29], Y[30], Y[31]);
endmodule // zero

ALU

The ALU operations are shown in the table below.

ALU Operations
Op	Function
000	Increment
001	Addition
010	Subtraction
011	Bitwise AND
100	Bitwise OR
101	Bitwise XOR
110	Bitwise NOT
111	A

	
module alu(Y, C, V, A, B, Op);
   output [31:0] Y;  // Bus D result.
   output 	 C;  // Carry.
   output 	 V;  // oVerflow.
   input [31:0]  A;  // Bus A operand.
   input [31:0]  B;  // Bus B operand.
   input [2:0] 	 Op; // Operation.

   wire [31:0] 	 Inc, AS, And, Or, Xor, Not;
   wire 	 Cinc, Vinc, Cas, Vas;
   
   // The operations
   carry_select_adder_subtractor addsub0(Inc, Cinc, Vinc, A, 32'h00000001, 1'b0, Op[1]); // Op == 3'b000,
   carry_select_adder_subtractor addsub1(AS, Cas, Vas, A, B, 1'b0, Op[1]);             // Op == 3'b001, 3'b010
   andop aluand(And, A, B);                                                        // Op == 3'b011
   orop aluor(Or, A, B);                                                           // Op == 3'b100
   xorop aluxor(Xor, A, B);                                                        // Op == 3'b101
   notop alunot(Not, A);                                                           // Op == 3'b110
   multiplexer_8_1 #(32)muxY(Y, Inc, AS, AS, And, Or, Xor, Not, A, Op);          // Select the result.
   multiplexer_2_1 #(1)muxC(C, Cinc, Cas, Op[0]);  // Select the right C
   multiplexer_2_1 #(1)muxV(V, Vinc, Vas, Op[0]);  // Select the right V
endmodule // alu

module andop(Y, A, B);
   output [31:0] Y;  // Result.
   input [31:0]  A;  // Operand.
   input [31:0]  B;  // Operand.

   and(Y[0], A[0], B[0]);
   and(Y[1], A[1], B[1]);
   and(Y[2], A[2], B[2]);
   and(Y[3], A[3], B[3]);
   and(Y[4], A[4], B[4]);
   and(Y[5], A[5], B[5]);
   and(Y[6], A[6], B[6]);
   and(Y[7], A[7], B[7]);
   and(Y[8], A[8], B[8]);
   and(Y[9], A[9], B[9]);
   and(Y[10], A[10], B[10]);
   and(Y[11], A[11], B[11]);
   and(Y[12], A[12], B[12]);
   and(Y[13], A[13], B[13]);
   and(Y[14], A[14], B[14]);
   and(Y[15], A[15], B[15]);
   and(Y[16], A[16], B[16]);
   and(Y[17], A[17], B[17]);
   and(Y[18], A[18], B[18]);
   and(Y[19], A[19], B[19]);
   and(Y[20], A[20], B[20]);
   and(Y[21], A[21], B[21]);
   and(Y[22], A[22], B[22]);
   and(Y[23], A[23], B[23]);
   and(Y[24], A[24], B[24]);
   and(Y[25], A[25], B[25]);
   and(Y[26], A[26], B[26]);
   and(Y[27], A[27], B[27]);
   and(Y[28], A[28], B[28]);
   and(Y[29], A[29], B[29]);
   and(Y[30], A[30], B[30]);
   and(Y[31], A[31], B[31]);
endmodule // andop

module orop(Y, A, B);
   output [31:0] Y; // Result.
   input [31:0]  A; // Operand.
   input [31:0]  B; // Operand.

   or(Y[0], A[0], B[0]);
   or(Y[1], A[1], B[1]);
   or(Y[2], A[2], B[2]);
   or(Y[3], A[3], B[3]);
   or(Y[4], A[4], B[4]);
   or(Y[5], A[5], B[5]);
   or(Y[6], A[6], B[6]);
   or(Y[7], A[7], B[7]);
   or(Y[8], A[8], B[8]);
   or(Y[9], A[9], B[9]);
   or(Y[10], A[10], B[10]);
   or(Y[11], A[11], B[11]);
   or(Y[12], A[12], B[12]);
   or(Y[13], A[13], B[13]);
   or(Y[14], A[14], B[14]);
   or(Y[15], A[15], B[15]);
   or(Y[16], A[16], B[16]);
   or(Y[17], A[17], B[17]);
   or(Y[18], A[18], B[18]);
   or(Y[19], A[19], B[19]);
   or(Y[20], A[20], B[20]);
   or(Y[21], A[21], B[21]);
   or(Y[22], A[22], B[22]);
   or(Y[23], A[23], B[23]);
   or(Y[24], A[24], B[24]);
   or(Y[25], A[25], B[25]);
   or(Y[26], A[26], B[26]);
   or(Y[27], A[27], B[27]);
   or(Y[28], A[28], B[28]);
   or(Y[29], A[29], B[29]);
   or(Y[30], A[30], B[30]);
   or(Y[31], A[31], B[31]);
endmodule // orop

module xorop(Y, A, B);
   output [31:0] Y; // Result.
   input [31:0]  A; // Operand.
   input [31:0]  B; // Operand.

   xor(Y[0], A[0], B[0]);
   xor(Y[1], A[1], B[1]);
   xor(Y[2], A[2], B[2]);
   xor(Y[3], A[3], B[3]);
   xor(Y[4], A[4], B[4]);
   xor(Y[5], A[5], B[5]);
   xor(Y[6], A[6], B[6]);
   xor(Y[7], A[7], B[7]);
   xor(Y[8], A[8], B[8]);
   xor(Y[9], A[9], B[9]);
   xor(Y[10], A[10], B[10]);
   xor(Y[11], A[11], B[11]);
   xor(Y[12], A[12], B[12]);
   xor(Y[13], A[13], B[13]);
   xor(Y[14], A[14], B[14]);
   xor(Y[15], A[15], B[15]);
   xor(Y[16], A[16], B[16]);
   xor(Y[17], A[17], B[17]);
   xor(Y[18], A[18], B[18]);
   xor(Y[19], A[19], B[19]);
   xor(Y[20], A[20], B[20]);
   xor(Y[21], A[21], B[21]);
   xor(Y[22], A[22], B[22]);
   xor(Y[23], A[23], B[23]);
   xor(Y[24], A[24], B[24]);
   xor(Y[25], A[25], B[25]);
   xor(Y[26], A[26], B[26]);
   xor(Y[27], A[27], B[27]);
   xor(Y[28], A[28], B[28]);
   xor(Y[29], A[29], B[29]);
   xor(Y[30], A[30], B[30]);
   xor(Y[31], A[31], B[31]);
endmodule // xorop

module notop(Y, A);
   output [31:0] Y; // Result.
   input [31:0]  A; // Operand.
   
   not(Y[0], A[0]);
   not(Y[1], A[1]);
   not(Y[2], A[2]);
   not(Y[3], A[3]);
   not(Y[4], A[4]);
   not(Y[5], A[5]);
   not(Y[6], A[6]);
   not(Y[7], A[7]);
   not(Y[8], A[8]);
   not(Y[9], A[9]);
   not(Y[10], A[10]);
   not(Y[11], A[11]);
   not(Y[12], A[12]);
   not(Y[13], A[13]);
   not(Y[14], A[14]);
   not(Y[15], A[15]);
   not(Y[16], A[16]);
   not(Y[17], A[17]);
   not(Y[18], A[18]);
   not(Y[19], A[19]);
   not(Y[20], A[20]);
   not(Y[21], A[21]);
   not(Y[22], A[22]);
   not(Y[23], A[23]);
   not(Y[24], A[24]);
   not(Y[25], A[25]);
   not(Y[26], A[26]);
   not(Y[27], A[27]);
   not(Y[28], A[28]);
   not(Y[29], A[29]);
   not(Y[30], A[30]);
   not(Y[31], A[31]);
endmodule // notop

      
module carry_select_adder_subtractor(S, C, V, A, B, CB, Op);
   output [31:0] S;   // The 32-bit sum/difference.
   output 	 C;   // The 1-bit carry/borrow status.
   output 	 V;   // The 1-bit overflow status.
   input 	 CB;  // The carry/borrow input.
   input [31:0]  A;   // The 32-bit augend/minuend.
   input [31:0]  B;   // The 32-bit addend/subtrahend.
   input 	 Op;  // The operation: 0=Add, 1=Subtract.
   
   wire 	 C31; // The carry out bit of adder/subtractor, used to generate final carry/borrrow.   
   wire [31:0] 	 Bx;
   wire 	 CBx;
   
   // Looking at the truth table for not we see that  
   // B xor 0 = B, and
   // B xor 1 = not(B).
   // So, if Op==1 means we are subtracting, then
   // adding A and B xor Op alog with setting the first
   // carry bit to Op, will give us a result of
   // A+B when Op==0, and A+not(B)+1 when Op==1.
   // Note that not(B)+1 is the 2's complement of B, so
   // this gives us subtraction.     
   xor(Bx[0], B[0], Op);
   xor(Bx[1], B[1], Op);
   xor(Bx[2], B[2], Op);
   xor(Bx[3], B[3], Op);
   xor(Bx[4], B[4], Op);
   xor(Bx[5], B[5], Op);
   xor(Bx[6], B[6], Op);
   xor(Bx[7], B[7], Op);
   xor(Bx[8], B[8], Op);
   xor(Bx[9], B[9], Op);
   xor(Bx[10], B[10], Op);
   xor(Bx[11], B[11], Op);
   xor(Bx[12], B[12], Op);
   xor(Bx[13], B[13], Op);
   xor(Bx[14], B[14], Op);
   xor(Bx[15], B[15], Op);
   xor(Bx[16], B[16], Op);
   xor(Bx[17], B[17], Op);
   xor(Bx[18], B[18], Op);
   xor(Bx[19], B[19], Op);
   xor(Bx[20], B[20], Op);
   xor(Bx[21], B[21], Op);
   xor(Bx[22], B[22], Op);
   xor(Bx[23], B[23], Op);
   xor(Bx[24], B[24], Op);
   xor(Bx[25], B[25], Op);
   xor(Bx[26], B[26], Op);
   xor(Bx[27], B[27], Op);
   xor(Bx[28], B[28], Op);
   xor(Bx[29], B[29], Op);
   xor(Bx[30], B[30], Op);
   xor(Bx[31], B[31], Op);
   xor(C, C31, Op);            // Carry = C15 for addition, Carry = not(C15) for subtraction.
   xor(CBx, CB, Op); 
   carry_select_adder csa(S, C31, V, A, Bx, CBx);   
endmodule // carry_select_adder_subtractor

module carry_select_adder(S, C, V, A, B, Cin);
   output [31:0] S;   // The 32-bit sum.
   output 	 C;   // The 1-bit carry.
   output 	 V;   // The 1-bit overflow status.
   input [31:0]  A;   // The 32-bit augend.
   input [31:0]  B;   // The 32-bit addend.
   input 	 Cin; // The initial carry in.

   wire [3:0] 	S1_0;   // Nibble 1 sum output with carry input 0.
   wire [3:0] 	S1_1;   // Nibble 1 sum output with carry input 1.
   wire [3:0] 	S2_0;   // Nibble 2 sum output with carry input 0.
   wire [3:0] 	S2_1;   // Nibble 2 sum output with carry input 1.
   wire [3:0] 	S3_0;   // Nibble 3 sum output with carry input 0.
   wire [3:0] 	S3_1;   // Nibble 3 sum output with carry input 1.
   wire [3:0] 	S4_0;   // Nibble 4 sum output with carry input 0.
   wire [3:0] 	S4_1;   // Nibble 4 sum output with carry input 1.
   wire [3:0] 	S5_0;   // Nibble 5 sum output with carry input 0.
   wire [3:0] 	S5_1;   // Nibble 5 sum output with carry input 1.
   wire [3:0] 	S6_0;   // Nibble 6 sum output with carry input 0.
   wire [3:0] 	S6_1;   // Nibble 6 sum output with carry input 1.
   wire [3:0] 	S7_0;   // Nibble 7 sum output with carry input 0.
   wire [3:0] 	S7_1;   // Nibble 7 sum output with carry input 1.
   wire 	C1_0;   // Nibble 1 carry output with carry input 0.
   wire 	C1_1;   // Nibble 1 carry output with carry input 1.
   wire 	C2_0;   // Nibble 2 carry output with carry input 0.
   wire 	C2_1;   // Nibble 2 carry output with carry input 1.
   wire 	C3_0;   // Nibble 3 carry output with carry input 0.
   wire 	C3_1;   // Nibble 3 carry output with carry input 1.
   wire 	C4_0;   // Nibble 4 carry output with carry input 0.
   wire 	C4_1;   // Nibble 4 carry output with carry input 1.
   wire 	C5_0;   // Nibble 5 carry output with carry input 0.
   wire 	C5_1;   // Nibble 5 carry output with carry input 1.
   wire 	C6_0;   // Nibble 6 carry output with carry input 0.
   wire 	C6_1;   // Nibble 6 carry output with carry input 1.
   wire 	C7_0;   // Nibble 7 carry output with carry input 0.
   wire 	C7_1;   // Nibble 7 carry output with carry input 1.
   wire 	C0;     // Nibble 0 carry output used to select multiplexer output.
   wire 	C1;     // Nibble 1 carry output used to select multiplexer output.
   wire 	C2;     // Nibble 2 carry output used to select multiplexer output.
   wire 	C3;     // Nibble 3 carry output used to select multiplexer output.
   wire 	C4;     // Nibble 4 carry output used to select multiplexer output.
   wire 	C5;     // Nibble 5 carry output used to select multiplexer output.
   wire 	C6;     // Nibble 6 carry output used to select multiplexer output.
   wire 	C7;     // Nibble 7 carry output used to select multiplexer output.
   wire         V0;     // Nibble 0 overflow output.
   wire 	V1_0;   // Nibble 1 overflow output with carry input 0.
   wire 	V1_1;   // Nibble 1 overflow output with carry input 1.
   wire 	V2_0;   // Nibble 2 overflow output with carry input 0.
   wire 	V2_1;   // Nibble 2 overflow output with carry input 1.
   wire 	V3_0;   // Nibble 3 overflow output with carry input 0.
   wire 	V3_1;   // Nibble 3 overflow output with carry input 1.
   wire 	V4_0;   // Nibble 4 overflow output with carry input 0.
   wire 	V4_1;   // Nibble 4 overflow output with carry input 1.
   wire 	V5_0;   // Nibble 5 overflow output with carry input 0.
   wire 	V5_1;   // Nibble 5 overflow output with carry input 1.
   wire 	V6_0;   // Nibble 6 overflow output with carry input 0.
   wire 	V6_1;   // Nibble 6 overflow output with carry input 1.
   wire 	V7_0;   // Nibble 7 overflow output with carry input 0.
   wire 	V7_1;   // Nibble 7 overflow output with carry input 1.
   
   ripple_carry_adder rc_nibble_0(S[3:0], C0, V0, A[3:0], B[3:0], Cin);                 // Calculate S nibble 0.
   ripple_carry_adder rc_nibble_1_carry_0(S1_0, C1_0, V1_0, A[7:4], B[7:4], 1'b0);      // Calculate S nibble 1 with carry input 0.
   ripple_carry_adder rc_nibble_1_carry_1(S1_1, C1_1, V1_1, A[7:4], B[7:4], 1'b1);      // Calculate S nibble 1 with carry input 1.
   ripple_carry_adder rc_nibble_2_carry_0(S2_0, C2_0, V2_0, A[11:8], B[11:8], 1'b0);    // Calculate S nibble 2 with carry input 0.
   ripple_carry_adder rc_nibble_2_carry_1(S2_1, C2_1, V2_1, A[11:8], B[11:8], 1'b1);    // Calculate S nibble 2 with carry input 1.
   ripple_carry_adder rc_nibble_3_carry_0(S3_0, C3_0, V3_0, A[15:12], B[15:12], 1'b0);  // Calculate S nibble 3 with carry input 0.
   ripple_carry_adder rc_nibble_3_carry_1(S3_1, C3_1, V3_1, A[15:12], B[15:12], 1'b1);  // Calculate S nibble 3 with carry input 1.
   ripple_carry_adder rc_nibble_4_carry_0(S4_0, C4_0, V4_0, A[19:16], B[19:16], 1'b0);  // Calculate S nibble 4 with carry input 0.
   ripple_carry_adder rc_nibble_4_carry_1(S4_1, C4_1, V4_1, A[19:16], B[19:16], 1'b1);  // Calculate S nibble 4 with carry input 1.
   ripple_carry_adder rc_nibble_5_carry_0(S5_0, C5_0, V5_0, A[23:20], B[23:20], 1'b0);  // Calculate S nibble 5 with carry input 0.
   ripple_carry_adder rc_nibble_5_carry_1(S5_1, C5_1, V5_1, A[23:20], B[23:20], 1'b1);  // Calculate S nibble 5 with carry input 1.
   ripple_carry_adder rc_nibble_6_carry_0(S6_0, C6_0, V6_0, A[27:24], B[27:24], 1'b0);  // Calculate S nibble 6 with carry input 0.
   ripple_carry_adder rc_nibble_6_carry_1(S6_1, C6_1, V6_1, A[27:24], B[27:24], 1'b1);  // Calculate S nibble 6 with carry input 1.
   ripple_carry_adder rc_nibble_7_carry_0(S7_0, C7_0, V7_0, A[31:28], B[31:28], 1'b0);  // Calculate S nibble 7 with carry input 0.
   ripple_carry_adder rc_nibble_7_carry_1(S7_1, C7_1, V7_1, A[31:28], B[31:28], 1'b1);  // Calculate S nibble 7 with carry input 1.

   multiplexer_2_1 #(1) muxc1(C1, C1_0, C1_1, C0); // C0 selects the carry output for nibble 1.
   multiplexer_2_1 #(1) muxc2(C2, C2_0, C2_1, C1); // C1 selects the carry output for nibble 2.
   multiplexer_2_1 #(1) muxc3(C3, C3_0, C3_1, C2); // C2 selects the carry output for nibble 3.
   multiplexer_2_1 #(1) muxc4(C4, C4_0, C4_1, C3); // C3 selects the carry output for nibble 4.
   multiplexer_2_1 #(1) muxc5(C5, C5_0, C5_1, C4); // C4 selects the carry output for nibble 5.
   multiplexer_2_1 #(1) muxc6(C6, C6_0, C6_1, C5); // C5 selects the carry output for nibble 6.
   multiplexer_2_1 #(1) muxc(C, C7_0, C7_1, C6);   // C6 selects the carry output for nibble 7 which is the global carry output.
   multiplexer_2_1 #(1) muxv(V, V7_0, V7_1, C6);   // C6 selects the overflow output for nibble 7 which is the global overflow output.
   
   multiplexer_2_1 #(4) muxs1(S[7:4], S1_0, S1_1, C0);    // C0 selects the result for nibble 1.
   multiplexer_2_1 #(4) muxs2(S[11:8], S2_0, S2_1, C1);   // C1 selects the result for nibble 2.
   multiplexer_2_1 #(4) muxs3(S[15:12], S3_0, S3_1, C2);  // C2 selects the result for nibble 3.
   multiplexer_2_1 #(4) muxs4(S[19:16], S4_0, S4_1, C3);    // C3 selects the result for nibble 4.
   multiplexer_2_1 #(4) muxs5(S[23:20], S5_0, S5_1, C4);   // C4 selects the result for nibble 5.
   multiplexer_2_1 #(4) muxs6(S[27:24], S6_0, S6_1, C5);  // C5 selects the result for nibble 6.
   multiplexer_2_1 #(4) muxs7(S[31:28], S7_0, S7_1, C6);  // C6 selects the result for nibble 7.
endmodule // carry_select_adder

module ripple_carry_adder(S, C, V, A, B, Cin);
   output [3:0] S;   // The 4-bit sum.
   output 	C;   // The 1-bit carry.
   output       V;   // The 1-bit overflow status.   
   input [3:0] 	A;   // The 4-bit augend.
   input [3:0] 	B;   // The 4-bit addend.
   input 	Cin; // The carry input.
 	
   wire 	C0; // The carry out bit of fa0, the carry in bit of fa1.
   wire 	C1; // The carry out bit of fa1, the carry in bit of fa2.
   wire 	C2; // The carry out bit of fa2, the carry in bit of fa3.
	
   full_adder fa0(S[0], C0, A[0], B[0], Cin);    // Least significant bit.
   full_adder fa1(S[1], C1, A[1], B[1], C0);
   full_adder fa2(S[2], C2, A[2], B[2], C1);
   full_adder fa3(S[3], C, A[3], B[3], C2);    // Most significant bit.
   xor(V, C, C2);  // Overflow   
endmodule // ripple_carry_adder

module full_adder(S, Cout, A, B, Cin);
   output S;
   output Cout;
   input  A;
   input  B;
   input  Cin;
   
   wire   w1;
   wire   w2;
   wire   w3;
   wire   w4;
   
   xor(w1, A, B);
   xor(S, Cin, w1);
   and(w2, A, B);   
   and(w3, A, Cin);
   and(w4, B, Cin);   
   or(Cout, w2, w3, w4);
endmodule // full_adder

Barrel Shifter

The barrel shifter can perform logical shifts up to 31 places to the left or right. The implementation centers around a right rotation. We take the 32-bit value to shift and prepend 32 zero bits. If we are shifting SHA places to the right, then we rotate SHA places to the right. If we are shifting SHA places to the left, then we rotate the two's complement of SHA places to the right. The rotation property is only needed when shifting left.

We use three shifters:

shifter2 takes the 64-bit input and rotates either 0, 16, 32, or 48 bits to the right. The two most significant bits of the shift amount determine which. The output is 46 bits.
shifter1 takes the 46-bit input and shifts either 0, 4, 8, or 12 bits to the right. The middle two bits of the shift amount determine which. The output is 34 bits.
shifter0 takes the 34-bit input and shifts either 0, 1, 2, or 3 bits to the right. The two least significant bits of the shift amount determine which. The output is 32 bits.

	
module shifter(OUT, IN, SHA, DIR);   
   output [31:0] OUT; // OUTput.
   input [31:0]  IN;  // INput.
   input [4:0] 	 SHA; // SHift Amount.
   input 	 DIR; // DIRection, 0 = right, 1= left.
   
   wire [46:0] 	 s2;
   wire [34:0] 	 s1;
   wire [5:0] 	 places;
   wire [5:0] 	 SHAneg;
   
   twos_complement_5bit twos0(SHAneg, SHA);
   multiplexer_2_1 #(6) mux0(places, {1'b0, SHA}, SHAneg, DIR);   
   shifter2 sh2(s2, {32'h00000000,IN}, places[5], places[4]);
   shifter1 sh1(s1, s2, places[3], places[2]);
   shifter0 sh0(OUT, s1, places[1], places[0]);
endmodule // shifter

module shifter2(OUT, IN, SEL1, SEL0);
   output [46:0] OUT;   
   input [63:0]  IN;
   input  	 SEL1;
   input 	 SEL0;

   multiplexer_4_1 #(1) mux0(OUT[0], IN[0], IN[16], IN[32], IN[48], SEL1, SEL0);
   multiplexer_4_1 #(1) mux1(OUT[1], IN[1], IN[17], IN[33], IN[49], SEL1, SEL0);
   multiplexer_4_1 #(1) mux2(OUT[2], IN[2], IN[18], IN[34], IN[50], SEL1, SEL0);
   multiplexer_4_1 #(1) mux3(OUT[3], IN[3], IN[19], IN[35], IN[51], SEL1, SEL0);
   multiplexer_4_1 #(1) mux4(OUT[4], IN[4], IN[20], IN[36], IN[52], SEL1, SEL0);
   multiplexer_4_1 #(1) mux5(OUT[5], IN[5], IN[21], IN[37], IN[53], SEL1, SEL0);
   multiplexer_4_1 #(1) mux6(OUT[6], IN[6], IN[22], IN[38], IN[54], SEL1, SEL0);
   multiplexer_4_1 #(1) mux7(OUT[7], IN[7], IN[23], IN[39], IN[55], SEL1, SEL0);
   multiplexer_4_1 #(1) mux8(OUT[8], IN[8], IN[24], IN[40], IN[56], SEL1, SEL0);
   multiplexer_4_1 #(1) mux9(OUT[9], IN[9], IN[25], IN[41], IN[57], SEL1, SEL0);
   multiplexer_4_1 #(1) mux10(OUT[10], IN[10], IN[26], IN[42], IN[58], SEL1, SEL0);
   multiplexer_4_1 #(1) mux11(OUT[11], IN[11], IN[27], IN[43], IN[59], SEL1, SEL0);
   multiplexer_4_1 #(1) mux12(OUT[12], IN[12], IN[28], IN[44], IN[60], SEL1, SEL0);
   multiplexer_4_1 #(1) mux13(OUT[13], IN[13], IN[29], IN[45], IN[61], SEL1, SEL0);
   multiplexer_4_1 #(1) mux14(OUT[14], IN[14], IN[30], IN[46], IN[62], SEL1, SEL0);
   multiplexer_4_1 #(1) mux15(OUT[15], IN[15], IN[31], IN[47], IN[63], SEL1, SEL0);
   multiplexer_4_1 #(1) mux16(OUT[16], IN[16], IN[32], IN[48], IN[0], SEL1, SEL0);
   multiplexer_4_1 #(1) mux17(OUT[17], IN[17], IN[33], IN[49], IN[1], SEL1, SEL0);
   multiplexer_4_1 #(1) mux18(OUT[18], IN[18], IN[34], IN[50], IN[2], SEL1, SEL0);
   multiplexer_4_1 #(1) mux19(OUT[19], IN[19], IN[35], IN[51], IN[3], SEL1, SEL0);
   multiplexer_4_1 #(1) mux20(OUT[20], IN[20], IN[36], IN[52], IN[4], SEL1, SEL0);
   multiplexer_4_1 #(1) mux21(OUT[21], IN[21], IN[37], IN[53], IN[5], SEL1, SEL0);
   multiplexer_4_1 #(1) mux22(OUT[22], IN[22], IN[38], IN[54], IN[6], SEL1, SEL0);
   multiplexer_4_1 #(1) mux23(OUT[23], IN[23], IN[39], IN[55], IN[7], SEL1, SEL0);
   multiplexer_4_1 #(1) mux24(OUT[24], IN[24], IN[40], IN[56], IN[8], SEL1, SEL0);
   multiplexer_4_1 #(1) mux25(OUT[25], IN[25], IN[41], IN[57], IN[9], SEL1, SEL0);
   multiplexer_4_1 #(1) mux26(OUT[26], IN[26], IN[42], IN[58], IN[10], SEL1, SEL0);
   multiplexer_4_1 #(1) mux27(OUT[27], IN[27], IN[43], IN[59], IN[11], SEL1, SEL0);
   multiplexer_4_1 #(1) mux28(OUT[28], IN[28], IN[44], IN[60], IN[12], SEL1, SEL0);
   multiplexer_4_1 #(1) mux29(OUT[29], IN[29], IN[45], IN[61], IN[13], SEL1, SEL0);
   multiplexer_4_1 #(1) mux30(OUT[30], IN[30], IN[46], IN[62], IN[14], SEL1, SEL0);
   multiplexer_4_1 #(1) mux31(OUT[31], IN[31], IN[47], IN[63], IN[15], SEL1, SEL0);
   multiplexer_4_1 #(1) mux32(OUT[32], IN[32], IN[48], IN[0], IN[16], SEL1, SEL0);
   multiplexer_4_1 #(1) mux33(OUT[33], IN[33], IN[49], IN[1], IN[17], SEL1, SEL0);
   multiplexer_4_1 #(1) mux34(OUT[34], IN[34], IN[50], IN[2], IN[18], SEL1, SEL0);
   multiplexer_4_1 #(1) mux35(OUT[35], IN[35], IN[51], IN[3], IN[19], SEL1, SEL0);
   multiplexer_4_1 #(1) mux36(OUT[36], IN[36], IN[52], IN[4], IN[20], SEL1, SEL0);
   multiplexer_4_1 #(1) mux37(OUT[37], IN[37], IN[53], IN[5], IN[21], SEL1, SEL0);
   multiplexer_4_1 #(1) mux38(OUT[38], IN[38], IN[54], IN[6], IN[22], SEL1, SEL0);
   multiplexer_4_1 #(1) mux39(OUT[39], IN[39], IN[55], IN[7], IN[23], SEL1, SEL0);
   multiplexer_4_1 #(1) mux40(OUT[40], IN[40], IN[56], IN[8], IN[24], SEL1, SEL0);
   multiplexer_4_1 #(1) mux41(OUT[41], IN[41], IN[57], IN[9], IN[25], SEL1, SEL0);
   multiplexer_4_1 #(1) mux42(OUT[42], IN[42], IN[58], IN[10], IN[26], SEL1, SEL0);
   multiplexer_4_1 #(1) mux43(OUT[43], IN[43], IN[59], IN[11], IN[27], SEL1, SEL0);
   multiplexer_4_1 #(1) mux44(OUT[44], IN[44], IN[60], IN[12], IN[28], SEL1, SEL0);
   multiplexer_4_1 #(1) mux45(OUT[45], IN[45], IN[61], IN[13], IN[29], SEL1, SEL0);
   multiplexer_4_1 #(1) mux46(OUT[46], IN[46], IN[62], IN[14], IN[30], SEL1, SEL0);
endmodule // shifter2

module shifter1(OUT, IN, SEL1, SEL0);
   output [34:0] OUT;   
   input [46:0]  IN;
   input  	 SEL1;
   input 	 SEL0;

   multiplexer_4_1 #(1) mux0(OUT[0], IN[0], IN[4], IN[8], IN[12], SEL1, SEL0);
   multiplexer_4_1 #(1) mux1(OUT[1], IN[1], IN[5], IN[9], IN[13], SEL1, SEL0);
   multiplexer_4_1 #(1) mux2(OUT[2], IN[2], IN[6], IN[10], IN[14], SEL1, SEL0);
   multiplexer_4_1 #(1) mux3(OUT[3], IN[3], IN[7], IN[11], IN[15], SEL1, SEL0);
   multiplexer_4_1 #(1) mux4(OUT[4], IN[4], IN[8], IN[12], IN[16], SEL1, SEL0);
   multiplexer_4_1 #(1) mux5(OUT[5], IN[5], IN[9], IN[13], IN[17], SEL1, SEL0);
   multiplexer_4_1 #(1) mux6(OUT[6], IN[6], IN[10], IN[14], IN[18], SEL1, SEL0);
   multiplexer_4_1 #(1) mux7(OUT[7], IN[7], IN[11], IN[15], IN[19], SEL1, SEL0);
   multiplexer_4_1 #(1) mux8(OUT[8], IN[8], IN[12], IN[16], IN[20], SEL1, SEL0);
   multiplexer_4_1 #(1) mux9(OUT[9], IN[9], IN[13], IN[17], IN[21], SEL1, SEL0);
   multiplexer_4_1 #(1) mux10(OUT[10], IN[10], IN[14], IN[18], IN[22], SEL1, SEL0);
   multiplexer_4_1 #(1) mux11(OUT[11], IN[11], IN[15], IN[19], IN[23], SEL1, SEL0);
   multiplexer_4_1 #(1) mux12(OUT[12], IN[12], IN[16], IN[20], IN[24], SEL1, SEL0);
   multiplexer_4_1 #(1) mux13(OUT[13], IN[13], IN[17], IN[21], IN[25], SEL1, SEL0);
   multiplexer_4_1 #(1) mux14(OUT[14], IN[14], IN[18], IN[22], IN[26], SEL1, SEL0);
   multiplexer_4_1 #(1) mux15(OUT[15], IN[15], IN[19], IN[23], IN[27], SEL1, SEL0);
   multiplexer_4_1 #(1) mux16(OUT[16], IN[16], IN[20], IN[24], IN[28], SEL1, SEL0);
   multiplexer_4_1 #(1) mux17(OUT[17], IN[17], IN[21], IN[25], IN[29], SEL1, SEL0);
   multiplexer_4_1 #(1) mux18(OUT[18], IN[18], IN[22], IN[26], IN[30], SEL1, SEL0);
   multiplexer_4_1 #(1) mux19(OUT[19], IN[19], IN[23], IN[27], IN[31], SEL1, SEL0);
   multiplexer_4_1 #(1) mux20(OUT[20], IN[20], IN[24], IN[28], IN[32], SEL1, SEL0);
   multiplexer_4_1 #(1) mux21(OUT[21], IN[21], IN[25], IN[29], IN[33], SEL1, SEL0);
   multiplexer_4_1 #(1) mux22(OUT[22], IN[22], IN[26], IN[30], IN[34], SEL1, SEL0);
   multiplexer_4_1 #(1) mux23(OUT[23], IN[23], IN[27], IN[31], IN[35], SEL1, SEL0);
   multiplexer_4_1 #(1) mux24(OUT[24], IN[24], IN[28], IN[32], IN[36], SEL1, SEL0);
   multiplexer_4_1 #(1) mux25(OUT[25], IN[25], IN[29], IN[33], IN[37], SEL1, SEL0);
   multiplexer_4_1 #(1) mux26(OUT[26], IN[26], IN[30], IN[34], IN[38], SEL1, SEL0);
   multiplexer_4_1 #(1) mux27(OUT[27], IN[27], IN[31], IN[35], IN[39], SEL1, SEL0);
   multiplexer_4_1 #(1) mux28(OUT[28], IN[28], IN[32], IN[36], IN[40], SEL1, SEL0);
   multiplexer_4_1 #(1) mux29(OUT[29], IN[29], IN[33], IN[37], IN[41], SEL1, SEL0);
   multiplexer_4_1 #(1) mux30(OUT[30], IN[30], IN[34], IN[38], IN[42], SEL1, SEL0);
   multiplexer_4_1 #(1) mux31(OUT[31], IN[31], IN[35], IN[39], IN[43], SEL1, SEL0);
   multiplexer_4_1 #(1) mux32(OUT[32], IN[32], IN[36], IN[40], IN[44], SEL1, SEL0);
   multiplexer_4_1 #(1) mux33(OUT[33], IN[33], IN[37], IN[41], IN[45], SEL1, SEL0);
   multiplexer_4_1 #(1) mux34(OUT[34], IN[34], IN[38], IN[42], IN[46], SEL1, SEL0);
endmodule // shifter1

module shifter0(OUT, IN, SEL1, SEL0);
   output [31:0] OUT;   
   input [34:0]  IN;
   input  	 SEL1;
   input 	 SEL0;
   
   multiplexer_4_1 #(1) mux0(OUT[0], IN[0], IN[1], IN[2], IN[3], SEL1, SEL0);
   multiplexer_4_1 #(1) mux1(OUT[1], IN[1], IN[2], IN[3], IN[4], SEL1, SEL0);
   multiplexer_4_1 #(1) mux2(OUT[2], IN[2], IN[3], IN[4], IN[5], SEL1, SEL0);
   multiplexer_4_1 #(1) mux3(OUT[3], IN[3], IN[4], IN[5], IN[6], SEL1, SEL0);
   multiplexer_4_1 #(1) mux4(OUT[4], IN[4], IN[5], IN[6], IN[7], SEL1, SEL0);
   multiplexer_4_1 #(1) mux5(OUT[5], IN[5], IN[6], IN[7], IN[8], SEL1, SEL0);
   multiplexer_4_1 #(1) mux6(OUT[6], IN[6], IN[7], IN[8], IN[9], SEL1, SEL0);
   multiplexer_4_1 #(1) mux7(OUT[7], IN[7], IN[8], IN[9], IN[10], SEL1, SEL0);
   multiplexer_4_1 #(1) mux8(OUT[8], IN[8], IN[9], IN[10], IN[11], SEL1, SEL0);
   multiplexer_4_1 #(1) mux9(OUT[9], IN[9], IN[10], IN[11], IN[12], SEL1, SEL0);
   multiplexer_4_1 #(1) mux10(OUT[10], IN[10], IN[11], IN[12], IN[13], SEL1, SEL0);
   multiplexer_4_1 #(1) mux11(OUT[11], IN[11], IN[12], IN[13], IN[14], SEL1, SEL0);
   multiplexer_4_1 #(1) mux12(OUT[12], IN[12], IN[13], IN[14], IN[15], SEL1, SEL0);
   multiplexer_4_1 #(1) mux13(OUT[13], IN[13], IN[14], IN[15], IN[16], SEL1, SEL0);
   multiplexer_4_1 #(1) mux14(OUT[14], IN[14], IN[15], IN[16], IN[17], SEL1, SEL0);
   multiplexer_4_1 #(1) mux15(OUT[15], IN[15], IN[16], IN[17], IN[18], SEL1, SEL0);
   multiplexer_4_1 #(1) mux16(OUT[16], IN[16], IN[17], IN[18], IN[19], SEL1, SEL0);
   multiplexer_4_1 #(1) mux17(OUT[17], IN[17], IN[18], IN[19], IN[20], SEL1, SEL0);
   multiplexer_4_1 #(1) mux18(OUT[18], IN[18], IN[19], IN[20], IN[21], SEL1, SEL0);
   multiplexer_4_1 #(1) mux19(OUT[19], IN[19], IN[20], IN[21], IN[22], SEL1, SEL0);
   multiplexer_4_1 #(1) mux20(OUT[20], IN[20], IN[21], IN[22], IN[23], SEL1, SEL0);
   multiplexer_4_1 #(1) mux21(OUT[21], IN[21], IN[22], IN[23], IN[24], SEL1, SEL0);
   multiplexer_4_1 #(1) mux22(OUT[22], IN[22], IN[23], IN[24], IN[25], SEL1, SEL0);
   multiplexer_4_1 #(1) mux23(OUT[23], IN[23], IN[24], IN[25], IN[26], SEL1, SEL0);
   multiplexer_4_1 #(1) mux24(OUT[24], IN[24], IN[25], IN[26], IN[27], SEL1, SEL0);
   multiplexer_4_1 #(1) mux25(OUT[25], IN[25], IN[26], IN[27], IN[28], SEL1, SEL0);
   multiplexer_4_1 #(1) mux26(OUT[26], IN[26], IN[27], IN[28], IN[29], SEL1, SEL0);
   multiplexer_4_1 #(1) mux27(OUT[27], IN[27], IN[28], IN[29], IN[30], SEL1, SEL0);
   multiplexer_4_1 #(1) mux28(OUT[28], IN[28], IN[29], IN[30], IN[31], SEL1, SEL0);
   multiplexer_4_1 #(1) mux29(OUT[29], IN[29], IN[30], IN[31], IN[32], SEL1, SEL0);
   multiplexer_4_1 #(1) mux30(OUT[30], IN[30], IN[31], IN[32], IN[33], SEL1, SEL0);
   multiplexer_4_1 #(1) mux31(OUT[31], IN[31], IN[32], IN[33], IN[34], SEL1, SEL0);
endmodule // shifter0


/*
 * Take the 5-bit value IN and output OUT which is its
 * 6-bit two's complement.
 */   
module twos_complement_5bit(OUT, IN);
   output [5:0] OUT;
   input [4:0] 	IN;

   wire [4:0] 	INn;

   not(INn[0], IN[0]);
   not(INn[1], IN[1]);
   not(INn[2], IN[2]);
   not(INn[3], IN[3]);
   not(INn[4], IN[4]);

   incrementer_6bit i0(OUT, {1'b1,INn});   
endmodule // twos_complement_5bit


/*
 * Simple 6-bit incrementer using a ripple carry adder.
 */
module incrementer_6bit(S, A);
   output [5:0] S;   // The 6-bit sum.
   input [5:0] 	A;   // The 6-bit augend.

   wire 	C0; // The carry out bit of fa0, the carry in bit of fa1.
   wire 	C1; // The carry out bit of fa1, the carry in bit of fa2.
   wire 	C2; // The carry out bit of fa2, the carry in bit of fa3.
   wire 	C3; // The carry out bit of fa3, the carry in bit of fa4.
   wire 	C4; // The carry out bit of fa4, the carry in bit of fa5.
   wire 	C5; // The carry out bit of fa5, which is ignored
	
   full_adder fa0(S[0], C0, A[0], 1'b1, 1'b0);    // Least significant bit.
   full_adder fa1(S[1], C1, A[1], 1'b0, C0);
   full_adder fa2(S[2], C2, A[2], 1'b0, C1);
   full_adder fa3(S[3], C3, A[3], 1'b0, C2);    
   full_adder fa4(S[4], C4, A[4], 1'b0, C3);
   full_adder fa5(S[5], C5, A[5], 1'b0, C4);    // Most significant bit.   
endmodule // incrementer_6bit

Constant Unit

The constant unit handles zero filling: turning a 16-bit unsigned immediate value into the equivalent unsigned 32-bit value, and sign extending: turning a 16-bit signed immediate value into the equivalent signed 32-bit value.

	
module constant_unit(CONSTANT, IMMEDIATE, CS);
   output [31:0] CONSTANT;  // The 32-bit output constant.
   input [15:0]  IMMEDIATE; // The immediate 15-bit value input.
   input 	 CS;        // Constant Select: 0=>zero fill, 1=>sign extend.
   
   wire [31:0] 	 ZF;
   wire [31:0] 	 SE;
   
   zero_fill zf0(ZF, IMMEDIATE);
   sign_extend se0(SE, IMMEDIATE);
   
   multiplexer_2_1 #(32)mux0(CONSTANT, ZF, SE, CS);
endmodule // constant_unit

module zero_fill(OUT, IN);
   output [31:0] OUT;
   input [15:0]  IN;
   
   assign OUT = {16'h0000, IN};
endmodule // zero_fill

module sign_extend(OUT, IN);
   output [31:0] OUT;
   input [15:0]  IN;
   
   wire [15:0] 	 HI;
   
   multiplexer_2_1 mux0(HI, 16'h0000, 16'hffff, IN[15]);
   assign OUT={HI, IN};
endmodule // sign_extend

Multiplexers

The code for all of the different types of multiplexers is given here.

	
module multiplexer_2_1(X, A0, A1, S);
   parameter WIDTH=16;     // How many bits wide are the lines
   
   output [WIDTH-1:0] X;   // The output line
   
   input [WIDTH-1:0]  A1;  // Input line with id 1'b1
   input [WIDTH-1:0]  A0;  // Input line with id 1'b0
   input 	      S;  // Selection bit
   
   assign X = (S == 1'b0) ? A0 : A1;
endmodule // multiplexer_2_1

module multiplexer_4_1(X, A0, A1, A2, A3, S1, S0);
   parameter WIDTH=16;     // How many bits wide are the lines

   output [WIDTH-1:0] X;   // The output line

   input [WIDTH-1:0]  A3;  // Input line with id 2'b11
   input [WIDTH-1:0]  A2;  // Input line with id 2'b10
   input [WIDTH-1:0]  A1;  // Input line with id 2'b01
   input [WIDTH-1:0]  A0;  // Input line with id 2'b00
   input 	      S0;  // Least significant selection bit
   input 	      S1;  // Most significant selection bit

   assign X = (S1 == 0 
	       ? (S0 == 0 
		  ? A0       // {S1,S0} = 2'b00
		  : A1)      // {S1,S0} = 2'b01
	       : (S0 == 0 
		  ? A2       // {S1,S0} = 2'b10
		  : A3));    // {S1,S0} = 2'b11		  
endmodule // multiplexer_4_1

module multiplexer_8_1(X, A0, A1, A2, A3, A4, A5, A6, A7, S);
   parameter WIDTH=16;     // How many bits wide are the lines

   output [WIDTH-1:0] X;   // The output line

   input [WIDTH-1:0]  A7;  // Input line with id 3'b111
   input [WIDTH-1:0]  A6;  // Input line with id 3'b110
   input [WIDTH-1:0]  A5;  // Input line with id 3'b101
   input [WIDTH-1:0]  A4;  // Input line with id 3'b100
   input [WIDTH-1:0]  A3;  // Input line with id 3'b011
   input [WIDTH-1:0]  A2;  // Input line with id 3'b010
   input [WIDTH-1:0]  A1;  // Input line with id 3'b001
   input [WIDTH-1:0]  A0;  // Input line with id 3'b000
   input [2:0]	      S;   

   assign X = (S[2] == 0 
	       ? (S[1] == 0 
		  ? (S[0] == 0 
		     ? A0       // {S2,S1,S0} = 3'b000
		     : A1)      // {S2,S1,S0} = 3'b001
		  : (S[0] == 0 
		     ? A2       // {S2,S1,S0} = 3'b010
		     : A3))     // {S2,S1,S0} = 3'b011
	       : (S[1] == 0 
		  ? (S[0] == 0 
		     ? A4       // {S2,S1,S0} = 3'b100
		     : A5)      // {S2,S1,S0} = 3'b101
		  : (S[0] == 0 
		     ? A6       // {S2,S1,S0} = 3'b110
		     : A7)));   // {S2,S1,S0} = 3'b111
endmodule // multiplexer_8_1

module multiplexer_16_1(X, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, S);
   parameter WIDTH=16;     // How many bits wide are the lines

   output [WIDTH-1:0] X;   // The output line
   
   input [WIDTH-1:0]  A15;  // Input line with id 4'b1111
   input [WIDTH-1:0]  A14;  // Input line with id 4'b1110
   input [WIDTH-1:0]  A13;  // Input line with id 4'b1101
   input [WIDTH-1:0]  A12;  // Input line with id 4'b1100
   input [WIDTH-1:0]  A11;  // Input line with id 4'b1011
   input [WIDTH-1:0]  A10;  // Input line with id 4'b1010
   input [WIDTH-1:0]  A9;  // Input line with id 4'b1001
   input [WIDTH-1:0]  A8;  // Input line with id 4'b1000
   input [WIDTH-1:0]  A7;  // Input line with id 4'b0111
   input [WIDTH-1:0]  A6;  // Input line with id 4'b0110
   input [WIDTH-1:0]  A5;  // Input line with id 4'b0101
   input [WIDTH-1:0]  A4;  // Input line with id 4'b0100
   input [WIDTH-1:0]  A3;  // Input line with id 4'b0011
   input [WIDTH-1:0]  A2;  // Input line with id 4'b0010
   input [WIDTH-1:0]  A1;  // Input line with id 4'b0001
   input [WIDTH-1:0]  A0;  // Input line with id 4'b0000
   input [3:0]	      S;   

   assign X = (S[3] == 0 
	       ? (S[2] == 0 
		  ? (S[1] == 0 
		     ? (S[0] == 0 
			? A0       // {S3, S2,S1,S0} = 4'b0000
			: A1)      // {S3, S2,S1,S0} = 4'b0001
		     : (S[0] == 0 
			? A2       // {S3, S2,S1,S0} = 4'b0010
			: A3))     // {S3, S2,S1,S0} = 4'b0011
		  : (S[1] == 0 
		     ? (S[0] == 0 
			? A4       // {S3, S2,S1,S0} = 4'b0100
			: A5)      // {S3, S2,S1,S0} = 4'b0101
		     : (S[0] == 0 
			? A6       // {S3, S2,S1,S0} = 4'b0110
			: A7)))    // {S3, S2,S1,S0} = 4'b0111
	       : (S[2] == 0 
		  ? (S[1] == 0 
		     ? (S[0] == 0 
			? A8       // {S3, S2,S1,S0} = 4'b1000
			: A9)      // {S3, S2,S1,S0} = 4'b1001
		     : (S[0] == 0 
			? A10      // {S3, S2,S1,S0} = 4'b1010
			: A11))    // {S3, S2,S1,S0} = 4'b1011
		  : (S[1] == 0 
		     ? (S[0] == 0 
			? A12      // {S3, S2,S1,S0} = 4'b1100
			: A13)     // {S3, S2,S1,S0} = 4'b1101
		     : (S[0] == 0 
			? A14      // {S3, S2,S1,S0} = 4'b1110
			: A15)))); // {S3, S2,S1,S0} = 4'b1111
endmodule // multiplexer_16_1

module multiplexer_32_1(X, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, 
			A16, A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31,
			S);
   parameter WIDTH=16;     // How many bits wide are the lines

   output [WIDTH-1:0] X;   // The output line
   
   input [WIDTH-1:0]  A31;  // Input line with id 5'b11111
   input [WIDTH-1:0]  A30;  // Input line with id 5'b11110
   input [WIDTH-1:0]  A29;  // Input line with id 5'b11101
   input [WIDTH-1:0]  A28;  // Input line with id 5'b11100
   input [WIDTH-1:0]  A27;  // Input line with id 5'b11011
   input [WIDTH-1:0]  A26;  // Input line with id 5'b11010
   input [WIDTH-1:0]  A25;  // Input line with id 5'b11001
   input [WIDTH-1:0]  A24;  // Input line with id 5'b11000
   input [WIDTH-1:0]  A23;  // Input line with id 5'b10111
   input [WIDTH-1:0]  A22;  // Input line with id 5'b10110
   input [WIDTH-1:0]  A21;  // Input line with id 5'b10101
   input [WIDTH-1:0]  A20;  // Input line with id 5'b10100
   input [WIDTH-1:0]  A19;  // Input line with id 5'b10011
   input [WIDTH-1:0]  A18;  // Input line with id 5'b10010
   input [WIDTH-1:0]  A17;  // Input line with id 5'b10001
   input [WIDTH-1:0]  A16;  // Input line with id 5'b10000
   input [WIDTH-1:0]  A15;  // Input line with id 5'b01111
   input [WIDTH-1:0]  A14;  // Input line with id 5'b01110
   input [WIDTH-1:0]  A13;  // Input line with id 5'b01101
   input [WIDTH-1:0]  A12;  // Input line with id 5'b01100
   input [WIDTH-1:0]  A11;  // Input line with id 5'b01011
   input [WIDTH-1:0]  A10;  // Input line with id 5'b01010
   input [WIDTH-1:0]  A9;  // Input line with id 5'b01001
   input [WIDTH-1:0]  A8;  // Input line with id 5'b01000
   input [WIDTH-1:0]  A7;  // Input line with id 5'b00111
   input [WIDTH-1:0]  A6;  // Input line with id 5'b00110
   input [WIDTH-1:0]  A5;  // Input line with id 5'b00101
   input [WIDTH-1:0]  A4;  // Input line with id 5'b00100
   input [WIDTH-1:0]  A3;  // Input line with id 5'b00011
   input [WIDTH-1:0]  A2;  // Input line with id 5'b00010
   input [WIDTH-1:0]  A1;  // Input line with id 5'b00001
   input [WIDTH-1:0]  A0;  // Input line with id 5'b00000
   input [4:0]	      S;   

   assign X = (S[4] == 0
	       ? (S[3] == 0 
		  ? (S[2] == 0 
		     ? (S[1] == 0 
			? (S[0] == 0 
			   ? A0        // {S3, S2,S1,S0} = 5'b00000
			   : A1)       // {S3, S2,S1,S0} = 5'b00001
			: (S[0] == 0 
			   ? A2        // {S3, S2,S1,S0} = 5'b00010
			   : A3))      // {S3, S2,S1,S0} = 5'b00011
		     : (S[1] == 0 
			? (S[0] == 0 
			   ? A4        // {S3, S2,S1,S0} = 5'b00100
			   : A5)       // {S3, S2,S1,S0} = 5'b00101
			: (S[0] == 0 
			   ? A6        // {S3, S2,S1,S0} = 5'b00110
			   : A7)))     // {S3, S2,S1,S0} = 5'b00111
		  : (S[2] == 0 
		     ? (S[1] == 0 
			? (S[0] == 0 
			   ? A8        // {S3, S2,S1,S0} = 5'b01000
			   : A9)       // {S3, S2,S1,S0} = 5'b01001
			: (S[0] == 0 
			   ? A10       // {S3, S2,S1,S0} = 5'b01010
			   : A11))     // {S3, S2,S1,S0} = 5'b01011
		     : (S[1] == 0 
			? (S[0] == 0 
			   ? A12       // {S3, S2,S1,S0} = 5'b01100
			   : A13)      // {S3, S2,S1,S0} = 5'b01101
			: (S[0] == 0 
			   ? A14       // {S3, S2,S1,S0} = 5'b01110
			   : A15))))   // {S3, S2,S1,S0} = 5'b01111
	       : (S[3] == 0 
		  ? (S[2] == 0 
		     ? (S[1] == 0 
			? (S[0] == 0 
			   ? A16       // {S3, S2,S1,S0} = 5'b10000
			   : A17)      // {S3, S2,S1,S0} = 5'b10001
			: (S[0] == 0 
			   ? A18       // {S3, S2,S1,S0} = 5'b10010
			   : A19))     // {S3, S2,S1,S0} = 5'b10011
		     : (S[1] == 0 
			? (S[0] == 0 
			   ? A20       // {S3, S2,S1,S0} = 5'b10100
			   : A21)      // {S3, S2,S1,S0} = 5'b10101
			: (S[0] == 0 
			   ? A22       // {S3, S2,S1,S0} = 5'b10110
			   : A23)))    // {S3, S2,S1,S0} = 5'b10111
		  : (S[2] == 0 
		     ? (S[1] == 0 
			? (S[0] == 0 
			   ? A24       // {S3, S2,S1,S0} = 5'b11000
			   : A25)      // {S3, S2,S1,S0} = 5'b11001
			: (S[0] == 0 
			   ? A26       // {S3, S2,S1,S0} = 5'b11010
			   : A27))     // {S3, S2,S1,S0} = 5'b11011
		     : (S[1] == 0 
			? (S[0] == 0 
			   ? A28       // {S3, S2,S1,S0} = 5'b11100
			   : A29)      // {S3, S2,S1,S0} = 5'b11101
			: (S[0] == 0 
			   ? A30       // {S3, S2,S1,S0} = 5'b11110
			   : A31))))); // {S3, S2,S1,S0} = 5'b11111
endmodule // multiplexer_32_1

References

Mano, M. Morris, and Kime, Charles R. Logic and Computer Design Fundamentals. 2nd Edition. Prentice Hall, 2000.