~seanld/stack-vm

Simplistic stack-based virtual machine, meant as a target for my own languages.
b26b4454 — Sean Wilkerson 2 years ago
Correct documentation table
f5fb2b2b — Sean Wilkerson 2 years ago
Add gitignore for built binary
460caadd — Sean Wilkerson 2 years ago
Saving changes

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~seanld/stack-vm
read/write
git@git.sr.ht:~seanld/stack-vm

You can also use your local clone with git send-email.

#StackVM

This is my refined attempt (out of a multitude) at creating a stack-based general-purpose virtual machine, that is intended as a common compilation target for some of the custom programming language projects I am working on.

How will this differ from your last stack-based VM project hosted on GitHub?

This will add/change a few things compared to my last VM. Primarily:

  • Previously a Python 3 implementation, now it will be in C.
  • VM executed plaintext code before, now it will be an actual bytecode.
  • Instructions and system architecture will be more generic.
    • Previous VM was built assuming the only thing targeting it would be my previous customized Lisp implementation.

#Bytecode Spec

Bytecode is formatted such that, when reading left (start) to right (end), the left is the bottom of the stack, and the right is the top of the stack.

Here is an example "hello world" program in bytecode formatting (examples/helloworld.stk):

0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17
-----------------------------------------------------------------------------------------
0x01 0x07 0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21 0x00 0x00 0x08

Bytes 0-1 define what the following bytes are to be interpreted as. In this case, 0x01 means "this is a data type", and 0x85 means "this is an ASCII string."

Bytes 2-14 are the contents of the ASCII string. In this case, it is the encoding of Hello, world! in hexadecimal.

Byte 15 is the NULL-terminator that shows where the ASCII string contents end. This is necessary for the VM to know when to stop reading bytes as a string.

At this point, the VM has successfully read in the ASCII string, and it knows that byte sequence is concluded. So it will set itself up to read a new byte sequence (datum or instruction).

Byte 16 indicates a new item type of type "instruction" that will follow.

Byte 17 Is a simple instruction that only needs one byte (0x08) which is the instruction to print the preceding item on the stack. In this example, that would be the string Hello, world!

#Primitive Data Types

The VM comes with several "primitive" data types, that provide bare functionality. These data types have a numeric representation that is used by the VM to determine how the following bytes in the bytecode stream should be handled.

Opcode (hex) Data Type Description
0x01 8-bit unsigned integer N/A
0x02 16-bit unsigned integer N/A
0x03 32-bit unsigned integer N/A
0x04 8-bit signed integer N/A
0x05 16-bit signed integer N/A
0x06 32-bit signed integer N/A
0x07 ASCII string All bytes following are ASCII characters.

#Instruction Set

The following table describes the instruction set that the virtual machine conforms to.

Variable t refers to the top position of the stack, likewise, t-1 and t-2 refer to the second and third items on the stack, respectively.

Opcode (hex) Instruction Function
0x01 Add Add the top two stack items, push result
0x02 Subtract Subtract t-2 from t-1, push result
0x03 Multiply Multiply the top two stack items, push result
0x04 Divide Divide t-1 by t-2, push result
0x05 Store Move value at t-1 to memory address t-2
0x06 Fetch Push value in memory at t-1 to stack
0x07 Read line Read a line of user input, push string to stack
0x08 Print Prints a string to output