Z80 Homebrew Computer Part 3 -- Emulator, Instruction Set Simulator

◷ 30 min read

✏ published 2022-03-02

Jack Leightcap

Contents

- background

- but what really is an emulator?

- design

- instruction decoding

- instruction metadata and execution

- registers, memory

- hardware considerations

- conclusion, limitations

background

so, where is this project at? from previous posts, i can:

there's a lot to play with here! in the meantime, i've been working on a UART interface, parallel ATA driver, and a couple programming languages (maybe some posts to come).

there's plenty of limitations to this. i inserted and removed an EEPROM enough times that i broke off the legs of the IC. after a few apartment shuffles, dis- and re-assembling the rat's nest also got very annoying. so, time to design a PCB write an emulator!

but what really is an emulator?

i have a pile of Z80 machine code, and i want to run it on my laptop. how could i go about this?

each of these have tons of complexity associated. focusing on the goal of understanding the Z80, this post will cover a Z80 instruction set simulator; as in, write a software implementation of a Z80. the major question that comes with this approach is

when do you stop emulating?

of course this could go on forever until simulating the universe. the emulator just needs to pick of level of abstraction, and be clear about its limitations.

design

i'm going to be designing a Z80 instruction set simulator in Rust, with the goal of instruction-cycle accuracy.

instruction decoding

so, there's a file called add.bin or whatever on my laptop. it contains some bytes that describe some Z80 program,

; hexdump -C output/do_pass/add.bin
00000000  3e 08 06 04 80 d3 00 76                           |>......v|
00000008

so a good first step is to get those bytes in an accessible form,

let instructions: Vec<u8> = fs::read(&path)?;

looking these bytes up in the Z80 instruction set, disassembling these bytes by hand:

3E 08       → LD A, 0x08
06 04       → LD B, 0x04
80          → ADD A, B
D3 00       → OUT (0), A
76          → HALT

because instructions are encoded with a variable number of bytes, unfortunately can't just map a function u8 → Instruction over the vector of instructions. further, note that sometimes associated data (u8 or u16) is encoded alongside the instruction,

3E XX       → LD A, XX
C3 XX XX    → JMP XXXX

sometimes there's just... data!

06 04

this isn't LD B, 0x04 like before, now it's... just 0x0604 = 1540. von Neumann architectures have ruined the day yet again.

so, how best to deal with this? clearly there's some structure to parse these instructions. encoding all possibilities,

// opcode argument with 8-bit length;
//
// AND R
//     |
//     +------ 8-bit register argument
//
// LD R, *
//    |  |
//    |  +---- u8 immediate argument
//    +------- 8-bit register argument
//
// LD (**), A
//    |     |
//    |     +- 8-bit register argument
//    +------- u16 memory index
// LD (SR+*), A
//        |
//        +--- u8 memory offset from register
//
enum Arg8 {
    U8, /* 8 bit immediate argument */
    Reg(R),
    Mem(MemAddr),
    MemOffset(Sr),
}

// opcode argument with 16-bit length;
//
// DEC HL
//     |
//     +------ 16-bit register argument
//
// LD HL, **
//    |   |
//    |   +--- u16 immediate argument
//    +------- 16-bit register argument
pub enum Arg16 {
    U16, /* 16 bit immediate argument */
    Reg(Sr),
    Mem(MemAddr),
}

cool, now there's some algebraic datatypes that reflect the intrinsic structure of instructions.

now, what instructions are there, and how do they fit into this shape?

enum Instr {
    Adc8(Arg8, Arg8),
    Adc16(Arg16, Arg16),
    Add8(Arg8, Arg8),
    Add16(Arg16, Arg16),
    And8(Arg8),
    Bit(u8, Arg8),
    Call(ArgF, Arg16),
    Ccf,
    Cp8(Arg8),
    Cpd,
    Cpdr,
    ...

now, there's a type-level representation for the structure of those earlier instructions:

LD A, 0x08   :: Instr::Ld8(Arg8::Reg(R::A), Arg8::U8)
LD B, 0x04   :: Instr::Ld8(Arg8::Reg(R::B), Arg8::U8)
ADD A, B     :: Instr::Add8(Arg8::Reg(R::A), Arg8::Reg(R::B))
OUT (0), A   :: Instr::Out(Arg8::Mem(MemAddr::Imm), Arg8::Reg(R::A))
HALT         :: Instr::Halt

this allows for very helpful totality checking in the later execution steps. so the general decoding method effectively acts as a look-up table,

fn decode(&mut self, opcode: u8) -> Result<Instr, Error> {
    match opcode {
        ...
        0x80 => Ok(Instr::Add8(Arg8::Reg(R::A), Arg8::Reg(R::B))),
        ...

because there's information about coupled data encoded in the type (Arg8::U8, MemAddr::Imm, etc), the general procedure for parsing:

so now can inch closer to actual execution.

instruction metadata and execution

there's a few considerations for each instruction (some metadata) in order for the execution to accurately update the state of a Z80. the obvious is the semantic meaning of the instruction; the effect of Instr::Add8(Arg8::Reg(R::A), Arg8::Reg(R::B)) should be the expected A += B. but, the execution state also includes the internal state of the Z80.

for something like these 8-bit ADD instructions,

fn add8(&mut self, dst: &Arg8, src: &Arg8) -> Result<InstrMetadata, Error> {
    let srcval = self.arg8_read(src)?;
    let dstval = self.arg8_read(dst)?;
    let ans = srcval.wrapping_add(dstval); // overflows are very much the correct behavior here!
    self.arg8_write(dst, ans)?;

    let (pc, cycles) = match src {
        Arg8::Reg(_) => (Pc::I, 4),
        Arg8::Mem(_) => (Pc::I, 7),
        Arg8::U8 => (Pc::Im8, 7),
        // no instruction encodes this situation. the types could encode this, but
        // the special cases are better described as a per-instruction implementation detail.
        Arg8::MemOffset(_) => unreachable!(),
    };

    Ok(InstrMetadata {
        pc,
        cycles,
        flags: self.add_flag::<u8>(srcval, dstval, ans),
    })
}

note that an important design decision was made here, although kind of ad-hoc. the emulator won't care what the Z80 actually does when it adds; in reality the Z80 ALU is only 4 bits and it performs two sub-adds with a middle carry. this is reflected in the pretty high cycle count for such a simple operation.

the flags implementation looks something like

fn add_flag<T>(&mut self, a1: T, a2: T, ans: T) -> FlagState
where
    T: FlagFunc<T> + Copy,
{
    // each flag can be set, reset, or untouched.
    FlagState {
        c: T::c(a1, a2),
        z: T::z(ans),
        pv: T::v(a1, a2, true),
        s: T::s(ans),
        n: Fc::R,
        h: T::h(),
    }
}

the generics and traits might look scary here, but this is to make the flag handling generic across types like u8 or u16 where "ADD" means the same thing. there's some more to explain about flags, and more generally registers, before this makes sense.

registers, memory

the Z80 register scheme is... interesting. there's a ton of overlap of responsibilities.

the A and F registers are the special case. the A register acts as the kind of 'main' register — a lot of 8-bit operations default to, or are fastest when going through this register. likewise for HL for 16-bit operations. the F register is the flags register, each of its 8 bits represents the state of the flags.

first, what's the best way to represent this? registers B, C, and BC being closely linked means that a layer of abstraction is needed on top of the bare memory for a uniform interface.

struct Cpu {
    /* registers */
    a: u8, // AF pair
    f: u8, // dual-purpose flags
    b: u8, // BC pair
    c: u8,
    d: u8, // DE pair
    e: u8,
    h: u8,
    l: u8,
    i: u8,
    r: u8,
    ixl: u8,
    ixh: u8,
    iyl: u8,
    iyh: u8,
    pc: u16, // program counter
    sp: u16, // stack pointer
    ...

the Cpu struct (used as the self for previous methods) is, rounding up, the emulator state. this is where the bare u8 are stored.

used earlier, a type of the available registers

// 8-bit registers
enum R {
    A,
    F,
    B,
    C,
    D,
    E,
    H,
    L,
    I,
    R,
    Ixl,
    Ixh,
    Iyl,
    Iyh,
}

// sandwich registers, read/write 2x8-bit registers as 1x16-bit register
enum Sr {
    Af,
    Bc,
    De,
    Hl,
    Sp,
    Pc,
    Ix,
    Iy,
}

these types can then be used to pattern match an interface over the Cpu struct for accessing register memory

// read from a register
fn r8_read(&self, r: &R) -> u8 {
    match r {
        R::A => self.a,
        ...
    }
}
// write to a register
fn r8_write(&mut self, r: &R, v: u8) {
    match r {
        R::A => self.a = v,
        ...
    }
}
// read from a sandwich register
fn r16_read(&self, r: &Sr) -> u16 {
    match r {
        Sr::Bc => ((self.b as u16) << 8) | self.c as u16,
        ...
    }
}
// write to a sandwich register
fn r16_write(&mut self, r: &Sr, x: u16) {
    match r {
        Sr::Bc => {
            self.b = ((x & 0xff00) >> 8) as u8;
            self.c = (x & 0x00ff) as u8;
        }
        ...
    }
}

so, when the flags are updated at the end of an instruction, the appropriate bits of the F register are written.

hardware considerations

there's now a uniform interface over the registers. what can this emulator do now?

what's this missing? well, a ton, actually. to at least run a program, the next important piece is some memory (RAM and ROM). like with the uniform interface of registers,

struct Cpu {
    ...
    /* memory */
    ram: [u8; RAM_SIZE],
    rom: [u8; ROM_SIZE],
    ...

there's some space in the Cpu struct for keeping around its memory.

now, the Cpu struct is a bit overloaded semantically. bad design choice on my part!

the interface over this memory, based on the memory map described in a previous post,

fn write(&mut self, x: usize, v: u8) -> Result<(), Error> {
    match x {
        0x0000..=0x7fff => {
            println!("Ignoring write to ROM at {:#06x}", x);
            Ok(())
        } // write to ROM has no action
        0x8000..=0xffff => {
            self.ram[x - 0x8000] = v;
            Ok(())
        }
        _ => error !
    }
}

fn read(&self, x: u16) -> u8 {
    match x {
        0x0000..=0x7fff => self.rom[x as usize],
        0x8000..=0xffff => self.ram[(x - 0x8000) as usize],
        _ -> error !
    }
}

note that the use of usize as a memory indexing type means that there's the unsatisfying lack of totality over indexes, so need those default error cases.

conclusion, limitations

with all that, right now you can grab the source and test out that program from the start,

; cargo run output/do_pass/add.bin
    Compiling jemu v0.1.0 (/home/jleightcap/r/jemu)
    Finished dev [unoptimized + debuginfo] target(s) in 6.37s
    Running `target/debug/jemu output/do_pass/add.bin`
> 0xc

or even to watching the state as each instruction executes,

; cargo run output/do_pass/add.bin -vvv
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
    Running `target/debug/jemu output/do_pass/add.bin -vvv`
rom: [
    0x3e,
    0x8,
    0x6,
    0x4,
    0x80,
    0xd3,
    0x0,
    0x76,
]
LD A *
    a: 0x8
    f: 0x0
    b: 0x0
    c: 0x0
    d: 0x0
    e: 0x0
    h: 0x0
    l: 0x0
    i: 0x0
    r: 0x0
    ixl: 0x0
    ixh: 0x0
    iyl: 0x0
    iyh: 0x0
    pc: 0x02
    sp: 0x00
    FlagState { c: X, z: X, pv: X, s: X, n: X, h: X }
LD B *
    a: 0x8
    f: 0x0
    b: 0x4
    c: 0x0
    d: 0x0
    e: 0x0
    h: 0x0
    l: 0x0
    i: 0x0
    r: 0x0
    ixl: 0x0
    ixh: 0x0
    iyl: 0x0
    iyh: 0x0
    pc: 0x04
    sp: 0x00
    FlagState { c: X, z: X, pv: X, s: X, n: X, h: X }
ADD A B
    a: 0xc
    f: 0x0
    b: 0x4
    c: 0x0
    d: 0x0
    e: 0x0
    h: 0x0
    l: 0x0
    i: 0x0
    r: 0x0
    ixl: 0x0
    ixh: 0x0
    iyl: 0x0
    iyh: 0x0
    pc: 0x05
    sp: 0x00
    FlagState { c: R, z: R, pv: R, s: R, n: R, h: X }
OUT [**] A
    a: 0xc
    f: 0x0
    b: 0x4
    c: 0x0
    d: 0x0
    e: 0x0
    h: 0x0
    l: 0x0
    i: 0x0
    r: 0x0
    ixl: 0x0
    ixh: 0x0
    iyl: 0x0
    iyh: 0x0
    pc: 0x07
    sp: 0x00
    FlagState { c: X, z: X, pv: X, s: X, n: X, h: X }
HALT
====HALTED====
> 0xc

this post covered the design of an emulator that simulates the designed hardware. there's a lot this didn't cover that is implemented:

with this, a huge class of programs can be designed and ran. this is plenty of a starting board to get working on

however, there's always more to do!