Breadboard CPU

◷ 20 min read

✏ published 2020-07-19

Jack Leightcap


- Build Process

- Architecture

- Instruction Set

- Memory

- Example Program

- Emulator

- Microcoding

- Emulate Example Program

- Synthesizing Emulator, Running on FPGA

Build Process

During the current situation (see date), I built Ben Eater's 8-bit breadboard CPU. The simple architecture provides for a modular and testable build, however with a deeply complex final result.

I found the wiring way too relaxing,

close up of the wiring of Ben Eater's breadboard CPU


The CPU consists of a few discrete modules:

In total, the final result looked like

wide shot of all breadboards that make up the CPU

Instruction Set

The CPU uses 8-bit instruction words,

Instruction | Machine Code | Meaning
NOP         | 0000_xxxx    | Do nothing
LDA imm8    | 0001_imm8    | A = ram[imm8]
ADD imm8    | 0010_imm8    | A += ram[imm8]
SUB imm8    | 0011_imm8    | A -= ram[imm8]
STA imm8    | 0100_imm8    | ram[imm8] = A
LDI imm8    | 0101_imm8    | A = imm8
JMP imm8    | 0110_imm8    | PC = imm8
JC imm8     | 0111_imm8    | if carry flag PC = imm8, else PC++
JZ imm8     | 1000_imm8    | if zero flag PC = imm8, else PC++
OUT         | 1110_xxxx    | OUT = A
HLT         | 1111_xxxx    | Halt CPU


The memory is addressed by the lower nibble of instructions and immediates (so 4 bits), leading to a maximum of 16 bytes of addressable RAM. Try not to get too excited !

Example Program

Here's an example program that uses almost every feature of the CPU,

0x0 | 0b0101_1111 // LDI 15
0x1 | 0b0100_1111 // STA 15
0x2 | 0b0010_1111 // ADD 15
0x3 | 0b0100_0100 // STA 4
0x4 | 0b0000_0000
0x5 | 0b1110_0000 // OUT
0x6 | 0b0110_1110 // JMP 14
0x7 | 0b0000_0000
0x8 | 0b0000_0000
0x9 | 0b0000_0000
0xa | 0b0000_0000
0xb | 0b0000_0000
0xc | 0b0000_0000
0xd | 0b0000_0000
0xe | 0b1111_1111 // HALT
0xf | 0b0000_0000

This program:

This example uses the idea of code as data heavily — data is calculated, and then treated as an instruction to be executed; and an instruction is loaded as a data to be output.

This example uses every available feature besides conditional jumps, which are hard to shoe-horn into programs this small (but there to use regardless).


Immediately following the build, I wanted to implement the CPU in a HDL. Not that this provides any deeper insight — the CPU can already be single stepped, and just plonk an LED onto a pin to see what it's doing.

Instead, an emulator provides a much simpler interface to mess with the architecture and test new ideas for programs. I used on Verilog, after 'learning' it in a Digital Logic class. eateremu provides a simple interface to modify and test this architecture and programs.


This emulator is great for seeing the microcode of the CPU in effect as instructions are executed: the even more atomic micro-instructions that build up to assembly. Just looking how the SUB instruction is microcoded,

    parameter HLT = 16'b1000000000000000; // Halt clock
    parameter MI  = 16'b0100000000000000; // Memory address register in
    parameter RI  = 16'b0010000000000000; // RAM data in
    parameter RO  = 16'b0001000000000000; // RAM data out
    parameter IO  = 16'b0000100000000000; // Instruction register out
    parameter II  = 16'b0000010000000000; // Instruction register in
    parameter AI  = 16'b0000001000000000; // A register in
    parameter AO  = 16'b0000000100000000; // A register out
    parameter EO  = 16'b0000000010000000; // ALU out
    parameter SU  = 16'b0000000001000000; // ALU subtract
    parameter BI  = 16'b0000000000100000; // B register in
    parameter OI  = 16'b0000000000010000; // Output register in
    parameter CE  = 16'b0000000000001000; // Program counter enable
    parameter CO  = 16'b0000000000000100; // Program counter out
    parameter J   = 16'b0000000000000010; // Jump (program counter in)
    parameter FI  = 16'b0000000000000001; // Flags in

    // ...

    always @(posedge clk) begin
        case (count)
            3'b000: begin
                ctrl_data <= MI | CO;
            3'b001: begin
                ctrl_data <= RO | II | CE;
            3'b010: begin
                // ...
                ctrl_data <= IO | MI;
                // ...
            3'b011: begin case (instruction)
                // ...
                ctrl_data <= RO | BI;
                // ...
            endcase end
            3'b100: begin case (instruction)
                // ...
                ctrl_data <= EO | AI | SU | FI;
                // ...
            endcase end
                ctrl_data <= 0;
        count <= count + 1;
    // ...

There are four microinstructions that make up the subtract instruction. A reminder that the effect of the subtract instruction is A -= ram[imm8]; subtract from the A register the value at the memory location given by an immediate value. The first two microinstructions (from here on I'll call uops) are common to all instructions, the last two are specific to subtraction:

Note that each step has more than one uop, which occur entirely in parallel! There's lots of room for hardware-based optimizations here. These steps, in effect:

Emulate Example Program

With that intro to microcode out of the way, let's get a deeper appreciation for the complexity of that example program. Writing the example program in ram.v,

// ...
initial begin
    memory[0]  <= 8'b0101_1111; // LDI 15
    memory[1]  <= 8'b0100_1111; // STA 15
    memory[2]  <= 8'b0010_1111; // ADD 15
    memory[3]  <= 8'b0100_0100; // STA 4
    memory[4]  <= 8'b0000_0000;
    memory[5]  <= 8'b1110_0000; // OUT
    memory[6]  <= 8'b0110_1110; // JMP 14
    memory[7]  <= 8'b0000_0000;
    memory[8]  <= 8'b0000_0000;
    memory[9]  <= 8'b0000_0000;
    memory[10] <= 8'b0000_0000;
    memory[11] <= 8'b0000_0000;
    memory[12] <= 8'b0000_0000;
    memory[13] <= 8'b0000_0000;
    memory[14] <= 8'b1111_1111; // HALT
    memory[15] <= 8'b0000_0000;
// ...

Just running the emulator with this program straight out,

; make eateremu_tb
iverilog -o eateremu_tb eateremu_tb.v
; ./eateremu_tb

the expected states of the output register, good as a sanity check but not that exciting.

Instead, compiling eateremu.v with -DVERBOSE shows all the state. The order of bits in the control word here are

  ctrl = {hlt, mi, ri, ro, io, ii, ai, ao, eo, su, bi, oi, ce, co, j, fi}

Executing the above program,

       0: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=0 a=00 b=00 o=00
       1: bus=zzzz0000 ctrl=0100000000000100 mem_addr=0 a=00 b=00 o=00
       3: bus=01011111 ctrl=0001010000001000 mem_addr=0 a=00 b=00 o=00
       5: bus=zzzz1111 ctrl=0000101000000000 mem_addr=0 a=00 b=00 o=00
       6: bus=zzzz1111 ctrl=0000101000000000 mem_addr=0 a=0f b=00 o=00
       7: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=0 a=0f b=00 o=00
      17: bus=zzzz0001 ctrl=0100000000000100 mem_addr=0 a=0f b=00 o=00
      18: bus=zzzz0001 ctrl=0100000000000100 mem_addr=1 a=0f b=00 o=00
      19: bus=01001111 ctrl=0001010000001000 mem_addr=1 a=0f b=00 o=00
      21: bus=zzzz1111 ctrl=0100100000000000 mem_addr=1 a=0f b=00 o=00
      22: bus=zzzz1111 ctrl=0100100000000000 mem_addr=f a=0f b=00 o=00
      23: bus=00001111 ctrl=0010000100000000 mem_addr=f a=0f b=00 o=00
      25: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=f a=0f b=00 o=00
      33: bus=zzzz0010 ctrl=0100000000000100 mem_addr=f a=0f b=00 o=00
      34: bus=zzzz0010 ctrl=0100000000000100 mem_addr=2 a=0f b=00 o=00
      35: bus=00101111 ctrl=0001010000001000 mem_addr=2 a=0f b=00 o=00
      37: bus=zzzz1111 ctrl=0100100000000000 mem_addr=2 a=0f b=00 o=00
      38: bus=zzzz1111 ctrl=0100100000000000 mem_addr=f a=0f b=00 o=00
      39: bus=00001111 ctrl=0001000000100000 mem_addr=f a=0f b=00 o=00
      40: bus=00001111 ctrl=0001000000100000 mem_addr=f a=0f b=0f o=00
      41: bus=00011110 ctrl=0000001010000001 mem_addr=f a=0f b=0f o=00
      42: bus=00101101 ctrl=0000001010000001 mem_addr=f a=1e b=0f o=00
      43: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=f a=1e b=0f o=00
      49: bus=zzzz0011 ctrl=0100000000000100 mem_addr=f a=1e b=0f o=00
      50: bus=zzzz0011 ctrl=0100000000000100 mem_addr=3 a=1e b=0f o=00
      51: bus=01000100 ctrl=0001010000001000 mem_addr=3 a=1e b=0f o=00
      53: bus=zzzz0100 ctrl=0100100000000000 mem_addr=3 a=1e b=0f o=00
      54: bus=zzzz0100 ctrl=0100100000000000 mem_addr=4 a=1e b=0f o=00
      55: bus=00011110 ctrl=0010000100000000 mem_addr=4 a=1e b=0f o=00
      57: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=4 a=1e b=0f o=00
      65: bus=zzzz0100 ctrl=0100000000000100 mem_addr=4 a=1e b=0f o=00
      67: bus=00011110 ctrl=0001010000001000 mem_addr=4 a=1e b=0f o=00
      69: bus=zzzz1110 ctrl=0100100000000000 mem_addr=4 a=1e b=0f o=00
      70: bus=zzzz1110 ctrl=0100100000000000 mem_addr=e a=1e b=0f o=00
      71: bus=11111111 ctrl=0001001000000000 mem_addr=e a=1e b=0f o=00
      72: bus=11111111 ctrl=0001001000000000 mem_addr=e a=ff b=0f o=00
      73: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=e a=ff b=0f o=00
      81: bus=zzzz0101 ctrl=0100000000000100 mem_addr=e a=ff b=0f o=00
      82: bus=zzzz0101 ctrl=0100000000000100 mem_addr=5 a=ff b=0f o=00
      83: bus=11100000 ctrl=0001010000001000 mem_addr=5 a=ff b=0f o=00
      85: bus=11111111 ctrl=0000000100010000 mem_addr=5 a=ff b=0f o=00
      86: bus=11111111 ctrl=0000000100010000 mem_addr=5 a=ff b=0f o=ff
      87: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=5 a=ff b=0f o=ff
      97: bus=zzzz0110 ctrl=0100000000000100 mem_addr=5 a=ff b=0f o=ff
      98: bus=zzzz0110 ctrl=0100000000000100 mem_addr=6 a=ff b=0f o=ff
      99: bus=01101110 ctrl=0001010000001000 mem_addr=6 a=ff b=0f o=ff
     101: bus=zzzz1110 ctrl=0000100000000010 mem_addr=6 a=ff b=0f o=ff
     103: bus=zzzzzzzz ctrl=0000000000000000 mem_addr=6 a=ff b=0f o=ff
     113: bus=zzzz1110 ctrl=0100000000000100 mem_addr=6 a=ff b=0f o=ff
     114: bus=zzzz1110 ctrl=0100000000000100 mem_addr=e a=ff b=0f o=ff
     115: bus=11111111 ctrl=0001010000001000 mem_addr=e a=ff b=0f o=ff
     117: bus=zzzzzzzz ctrl=1000000000000000 mem_addr=e a=ff b=0f o=ff

Lots going on here! If you're interested in parsing through that giant wall of 1s and 0s more power to you!

This is a great example of the benefits that can come from writing 'closer to the metal': it's a really mind-melting experience to execute data as instructions, and store instructions as data. That level of control is obfuscated (with the major trade-off of convenience) behind assemblers and compilers.

Synthesizing Emulator, Running on FPGA

With some rewriting to use only sythesizable Verilog, running a program on an iCEBreaker FPGA;

; make
# ...lots of output
; make prog

FPGA displaying the number '42' as output of program

To go full circle, looking at the generated logic:

block diagram of yosys output from CPU component

Back to a mess of wires and logic gates.