add error message on invalid register
tweak comments
remove assembler from future work
This program is a complete, self-contained simulator for most of the RV32I instruction set, as well as a simple assembler for the same. It is implemented entirely in AWK. I have tested it only under GNU AWK 5.0.1, it may or may not work under other AWKs.
For usage, please see the comments in riscv.awk.
Here is the simulator running a RISC-V assembler program which computes the
20th Fibonacci number (6765, 0x1a6d). The input
file includes the assembler program, as well as
several directives that cause riscv.awk
to assemble and execute it. All but
the last line of output consists of two dumps of the simulation state, one
before the CPU is run, and one after.
$ awk -f riscv.awk < tests/simulation/fibo.txt
# BEGINNING OF RISCV.AWK DUMP
pcpoke 0x00000000
poke 0x00000000 0x000102b7 # 0x000102b7: LUI x5 0x10000
poke 0x00000004 0x0002a503 # 0x0002a503: LW x10 x5 0
poke 0x00000008 0x00100593 # 0x00100593: ADDI x11 x0 1
poke 0x0000000c 0x00000613 # 0x00000613: ADDI x12 x0 0
poke 0x00000010 0x00100693 # 0x00100693: ADDI x13 x0 1
poke 0x00000014 0x00000713 # 0x00000713: ADDI x14 x0 0
poke 0x00000018 0x00a6c663 # 0x00a6c663: BLT x13 x10 0xc
poke 0x0000001c 0x0000006f # 0x0000006f: JAL x0 0x0
poke 0x00000020 0x0180006f # 0x0180006f: JAL x0 0x18
poke 0x00000024 0x00d60733 # 0x00d60733: ADD x14 x12 x13
poke 0x00000028 0x00d00633 # 0x00d00633: ADD x12 x0 x13
poke 0x0000002c 0x00e006b3 # 0x00e006b3: ADD x13 x0 x14
poke 0x00000030 0x00158593 # 0x00158593: ADDI x11 x11 1
poke 0x00000034 0xfea5c8e3 # 0xfea5c8e3: BLT x11 x10 0xfffffff0
poke 0x00000038 0x00d2a223 # 0x00d2a223: SW x5 x13 4
# END OF RISCV.AWK DUMP
# BEGINNING OF RISCV.AWK DUMP
pcpoke 0x000025b0
rpoke 10 0x00000014
rpoke 11 0x00000014
rpoke 12 0x00001055
rpoke 13 0x00001a6d
rpoke 14 0x00001a6d
poke 0x00000000 0x000102b7 # 0x000102b7: LUI x5 0x10000
poke 0x00000004 0x0002a503 # 0x0002a503: LW x10 x5 0
poke 0x00000008 0x00100593 # 0x00100593: ADDI x11 x0 1
poke 0x0000000c 0x00000613 # 0x00000613: ADDI x12 x0 0
poke 0x00000010 0x00100693 # 0x00100693: ADDI x13 x0 1
poke 0x00000014 0x00000713 # 0x00000713: ADDI x14 x0 0
poke 0x00000018 0x00a6c663 # 0x00a6c663: BLT x13 x10 0xc
poke 0x0000001c 0x0000006f # 0x0000006f: JAL x0 0x0
poke 0x00000020 0x0180006f # 0x0180006f: JAL x0 0x18
poke 0x00000024 0x00d60733 # 0x00d60733: ADD x14 x12 x13
poke 0x00000028 0x00d00633 # 0x00d00633: ADD x12 x0 x13
poke 0x0000002c 0x00e006b3 # 0x00e006b3: ADD x13 x0 x14
poke 0x00000030 0x00158593 # 0x00158593: ADDI x11 x11 1
poke 0x00000034 0xfea5c8e3 # 0xfea5c8e3: BLT x11 x10 0xfffffff0
poke 0x00000038 0x00d2a223 # 0x00d2a223: SW x5 x13 4
poke 0x00010000 0x00000014 # 0x00000014: UNKNOWN
poke 0x00010004 0x00001a6d # 0x00001a6d: UNKNOWN
# END OF RISCV.AWK DUMP
0x00001a6d
This program can also be used as a disassembler:
$ cat sample.txt
0x000102b7
0x00028293
0x0002a503
0x00100593
0x00000613
0x00100693
0x00000713
0x00a6ce63
0xf0201073
0xf0301073
0x02c0006f
0xf0269073
0xf0369073
0x0200006f
0x00d60733
0x00d00633
0x00e006b3
0x00158593
0xf0269073
0xf0359073
0xfea5c4e3
0x00d2a223
$ awk -f riscv.awk -v mode=disasm < sample.txt
0x000102b7: LUI x5 0x10000
0x00028293: ADDI x5 x5 0
0x0002a503: LW x10 x5 0
0x00100593: ADDI x11 x0 1
0x00000613: ADDI x12 x0 0
0x00100693: ADDI x13 x0 1
0x00000713: ADDI x14 x0 0
0x00a6ce63: BLT x13 x10 0x1c
0xf0201073: CSRRW x0 x0 -254
0xf0301073: CSRRW x0 x0 -253
0x02c0006f: JAL x0 0x2c
0xf0269073: CSRRW x0 x13 -254
0xf0369073: CSRRW x0 x13 -253
0x0200006f: JAL x0 0x20
0x00d60733: ADD x14 x12 x13
0x00d00633: ADD x12 x0 x13
0x00e006b3: ADD x13 x0 x14
0x00158593: ADDI x11 x11 1
0xf0269073: CSRRW x0 x13 -254
0xf0359073: CSRRW x0 x11 -253
0xfea5c4e3: BLT x11 x10 0xffffffe8
0x00d2a223: SW x5 x13 4
According to time
on an i5-4670K, this script takes 2.186 seconds to run,
simulating 10000 CPU cycles in that time, for an effective clock rate of
~4.5kHZ. This is not a particularly scientific assessment, but it is reasonable
to assume that this simulator is not terribly fast. What did you expect?
Memory is allocated on-the-fly, since AWK "arrays" are actually hash tables, and uninitialized locations are 0 by default. Thus this simulator should be fairly memory efficient.
Better disassembly could be useful, especially with respect to load and store, and handling of non-canonical no-ops.
A mode could be added to allow streaming instructions directly into memory,
rather than having to poke
them in one at a time.
A break
directive could be added, which halts the simulation on some
condition and implies a dump
so it can later be resumed.
This started as a joke between myself and my friend Joshua Nelson. I decided to build it anyway, and here we are.
I have found AWK a surprisingly pleasant language in which to implement a CPU simulator. A lot of the challenge is getting data into and out of the simulator, and controlling it, which AWK makes very straightforward. The thinly provisioned memory is also handy.
If you have a question, comment, or patch, you should send it in via my public inbox.
Tests can be run using the ./scripts/run_tests.sh
script.
Unit tests organized as a directory of TSV files in the test/unit/
folder.
The first column of each file is an input to riscv.awk, and the second column
is the expected output. Each TSV file corresponds to one test case.
Files in test/simulation/
are each used as input to the program, and it's
exit code is assumed to indicate a test failure in the event it is nonzero.