~luxferre/Equi

A self-descriptive stack-based PC platform
upd
another typofix
typofix

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~luxferre/Equi
read/write
git@git.sr.ht:~luxferre/Equi

You can also use your local clone with git send-email.

#Equi

Equi is a general-purpose 16-bit stack-based platform (and a programming language/VM named the same) aimed at low-cost, low-energy computing. It was inspired by Forth, Uxn, VTL-2, SIMPL and some other similar projects.

The name Equi comes from the fact each source code instruction is equivalent to a machine instruction. No, it isn't mapped to one machine instruction. It is the machine instruction. All the instructions and data in Equi are represented with printable ASCII characters only. This allows to bootstrap Equi code directly from the keyboard (any standard keyboard/keypad that allows serial input) using a tiny interpreter stored, for instance, in the hardware ROM.

This document describes a more-or-less formal specification. A tutorial book on how to use it is a work-in-progress.

#Specification

Main features of an Equi machine:

  • Instruction bus: 8-bit;
  • Data bus: 16-bit;
  • Address bus: 16-bit;
  • Up to 65536 bytes of RAM;
  • Up to 64 MiB flat persistent storage (tape, disk, flash etc);
  • Serial terminal input and output;
  • Up to 65535 peripheral extension ports, including several virtual ports;
  • Multitasking support with up to 8 concurrently running tasks (by default);
  • Two 256-byte (128-word) stacks, main and return, per task;
  • One 32-byte literal stack per task;
  • 16-bit input buffer pointer and global and individual mode flags.

The default Equi RAM layout is:

Size (bytes) Purpose
2 Main/return stack size in words
1 Literal stack size in bytes (up to 255)
2 Command buffer start address
2 Command buffer size in bytes
2 IBP - input buffer pointer
1 II - instruciton ignore mode flag
1 MM - minification/bypass pseudo-mode flag
2 Currently running task ID
varies Task context table (see the next layout)
varies Command buffer area

And the Equi program task context layout in the task context table is:

Size (bytes) Purpose
2 Task ID
1 Active flag
1 Privileged flag
1 CM - compilation mode flag
1 LSP - literal stack pointer
2 MSP - main stack pointer
2 RSP - return stack pointer
2 CLTP - compilation lookup table pointer
2 CBP - compilation buffer pointer
2 Task's GPD start address
2 Task's command buffer start address
2 Task's command buffer length in bytes
2 PC - program counter
varies Main stack
varies Return stack
varies Literal stack
varies Compilation lookup table
varies General purpose data (GPD) area

Equi is strictly case-sensitive: all uppercase basic Latin letters, as well as a number of special characters, are reserved for machine instructions, and all custom words must be defined in lowercase only (additionally, _ character is allowed in the identifiers). Within comments (see below), any characters can be used.

All whitespace characters (space, tabulation, CR or LF) are discarded in Equi upon loading the program and can be used for code clarity any way the author wants.

The interpreter can run in one of the four modes: command (default), interpretation (IM), compilation (CM) and instruction ignore (II) mode. An Equi machine always starts in the command mode. The latter three are triggered by certain instructions that set the corresponding flags. The semantics of the compilation mode is similar to that of Forth, and will be covered in detail here later on.

In the command mode, the interpreter doesn't perform any instruction execution and doesn't manipulate program counter (PC). Instead, it accumulates all characters typed from the standard input into the so-called command buffer. The only instruction Equi must react to in this mode is Q, the quit instruction, that loads the currently input command buffer contents into a task context and starts its execution in the interpretation mode. Note that this also means that every Equi program file, even when run in a non-interactive environment, must end with a Q character, and as long as every program has a halting Q instruction, you can safely concatenate several Equi programs in a single file to be executed sequentially.

In the instruction ignore more (II flag set), all instructions or arbitrary characters except ) (that unsets the II flag), are skipped and discarded. This can be used to write comments. In a well-formed Equi program, the characters braced in the II instructions ( and ), as well as any whitespace characters, will never enter the command buffer upon loading.

In the interpretation mode, when the interpreter encounters any of the following characters - _0-9A-Fa-z (not including -) - it pushes their ASCII values bytewise onto the literal stack (32-byte long). When any other character (except :, " or ') is encountered when the literal stack is not empty, the # instruction logic (see below) is performed automatically. If : is encountered, compilation mode logic is performed instead. If a Q instruction or a on-printable character is encountered, Equi returns to the command mode immediately.

In the compilation mode, all instructions except ; are skipped while the CM flag is set. When the interpreter encounters ; instruction, it performs the finalizing logic to save the compiled word into CLT (see below) and returns to the interpretation mode.

Equi's core instruction set is:

Op Stack state Meaning
# ( -- ) Literal: pop all characters from the literal stack, discard all _a-z characters, leave the top 4 characters (replacing the missing ones with 0) and push the 16-bit value from them (in the order they were pushed) onto the main stack
" ( -- lit1 lit2 ... ) Pop all the values from the literal stack and push them onto the main stack as 16-bit values
( ( -- ) Set the II flag: when it is set, the interpreter must ignore all instructions except ), used for writing comments
) ( -- ) Unset the II flag, returning to the normal interpretation or compilation mode
: ( -- ) Compilation mode start: set CM flag and set CBP to PC+1 value
; ( -- ) Compilation mode end: replace this instruction in-memory with R instruction, pop all characters from the literal stack, append the lookup table with their CRC16 hash and CBP value, unset the CM flag and increment CLTP value
' ( -- ) Call the compiled word: pop all characters from the literal stack, compute their CRC16 hash, look it up in CLT for a CBP value, set PC to CBP if found, error out if not, then push PC to return stack and set PC to the CBP value
R R: ( a -- ) Return: pop and assign the PC value from the return stack
] M: ( a -- ) R: ( -- a ) Pop the value from main stack and push onto return stack
[ M: ( -- a ) R: ( a -- ) Pop the value from return stack and push onto main stack
L ( addr -- a ) Load a 16-bit value from addr
S ( a addr -- ) Store a 16-bit value into addr
W ( a addr -- ) Write a 8-bit value into addr (note that both value and address still must be 16-bit, the higher byte of the value is discarded)
! ( a -- ) Drop the top value from the stack
$ ( a -- a a ) Duplicate the top value on the stack
% ( a b -- b a ) Swap top two values on the stack
@ ( a b c -- b c a ) Rotate top three values on the stack
\ ( a b -- a b a ) Copy over the second value on the stack
J ( rel -- ) Jump: increase or decrease PC according to the relative value (treated as signed, from -32768 to 32767)
I ( cond rel -- ) Pop relative value and condition. If the condition value is not zero, J to the relative value
X ( -- pc ) Locate eXecution point: push PC+1 value onto the main stack
G ( -- gpd_start ) Locate GPD area start: push its flat offset onto the main stack
> ( a b -- a>b ) Push 1 onto the stack if the second popped value is greater than the first, 0 otherwise
< ( a b -- a>b ) Push 1 onto the stack if the second popped value is less than the first, 0 otherwise
= ( a b -- a==b ) Push 1 onto the stack if the two popped values are equal, 0 otherwise
+ ( a b -- a+b ) Sum
- ( a b -- a-b ) Difference
* ( a b -- a*b ) Product
/ ( a b -- a/b rem ) Integer division (with remainder)
N ( a -- -a ) Single-instruction negation (complement to 65536)
T ( a XY -- [a >> X] << Y ) Bitwise shift: by the first nibble to the right and then by the second nibble to the left
~ ( a -- ~a ) Bitwise NOT
& ( a b -- a&b ) Bitwise AND
| ( a b -- a|b ) Bitwise OR
^ ( a b -- a^b ) Bitwise XOR
. ( a -- ) Output a character by the ASCII (or Unicode, if supported) value into the standard terminal
H ( a -- ) Output the hexadecimal 16-bit value from the stack top into the standard terminal
, ( -- a ) Non-blocking key input of an ASCII (or Unicode, if supported) value from the standard terminal
? ( -- a ) Blocking key input of an ASCII (or Unicode, if supported) value from the standard terminal
P ( p1 p2 port -- r1 r2 status ) Port I/O: pass two 16-bit parameters to the port and read the operation status and results into the words on the stack top
} ( blk len maddr -- status) Persistent storage write operation. Stack parameters: block number (x1K), data length, RAM address
{ ( blk len maddr -- status) Persistent storage read operation. Stack parameters: block number (x1K), data length, RAM address
Y ( addr len priv -- taskid ) Fork an area from the command buffer starting at addr into a new task, activate it (see below) and push the task ID onto the stack
Q ( -- ) Quit the interpretation mode (unset IM flag if set), or the interpreter shell itself if in command mode (halt the machine when it's nowhere to exit to)

Note that, due to the dynamic nature of word allocation and ability to reconfigure the runtime environment for different offsets depending on the target, absolute jumps are not directly supported in Equi and generally not recommended, although one can easily do them with ]R sequence and/or calculate absolute positions using X instruction.

Please also note that Equi doesn't specify any graphical or sound output capabilities. If such support is required, it generally must be implemented, as with any other peripheral, via the port I/O interface (P) instruction specific to a particular hardware/software implementation. Same goes for how standard serial terminal input/output is processed: Equi specification doesn't enforce any particular way. On the desktop/laptop PCs, however, it is advised, especially for software-based implementations/VMs, that the terminal I/O should be VT100-compatible, including, for instance, control character support and the output of an audiovisual bell for ASCII 0x07 (\a or ^G). Depending on the target, these features may already be supported by the underlying OS's terminal emulator or may be implemented as a part of the VM itself.

See FizzBuzz for a more thorough example of how different features of the current Equi specification are used.

#Reference implementation

Being a purely PC-oriented low-level runtime/programming environment, Equi has the reference implementation emulator/VM written in C (ANSI C89 standard), equi.c, compilable and runnable on all the systems supporting standard I/O. Note that, for portability reasons, this emulator:

  • accepts the program from a single file at a time only,
  • only implements four ports for P instruction: 0 as an echo port (returns passed parameters as corresponding result values), 1 as a random port (returns two random values in the results in the range between the two parameter values) 2 as a CRC16 calculation port for a given memory location and its length, and 3 for task control (see below), for any other port value it outputs its parameters to the standard error stream and puts three 0x0000 values back onto the stack,
  • implements s command line parameter that runs the emulator in the silent mode without printing any welcome banners or interactive prompts,
  • sandboxes the { and } operations using the file with the name you supply on the compile time to the PERSIST_FILE constant. The file must already be created and accessible. If it doesn't exist, these operations will effectively do nothing except putting 0x0000 (success status) onto the stack.

Additionally, this emulator implements m command line parameter that means that, instead of execution, the VM shall output the current command buffer contents upon reaching the Q instruction. This is particularly useful to save minified versions of .equi files to further reuse them in more space-restricted environments. Note that minified and non-minified files load and run fully identically, but the size difference can be significant. I.e. for the current FizzBuzz example version, the source is 1544 bytes long but its actual application snapshot in the command buffer (which can be dumped with the m parameter as a minified variant) is just 180 bytes long. The rest is comments and whitespace characters that are skipped while loading the program into the command buffer.

The source code file should compile using any mainstream C compiler with C89 support, like GCC/DJGPP, Clang, TCC etc. However, it is also being developed to be compilable with CC65 compiler for targets like Apple II or Atari 800. All the machine/target specific configuration is done at compile time, using compiler command-line switches. Here are the instructions to build Equi using different known C compilers.

The following constants can be adjusted at compile time:

  • STACK_SIZE - main and return stacks size in bytes (65535 max);
  • LIT_STACK_SIZE - literal stack size in bytes (255 max);
  • GPD_AREA_SIZE - GPD area size in bytes;
  • CMD_BUF_SIZE - command buffer size in bytes (65535 max);
  • CLT_ENTRIES_MAX - size (in entries) of the compilation lookup table (CLT), each entry taking exactly 4 bytes;
  • PERSIST_FILE - the name of persistent storage sandbox file (PERS.DAT by default);
  • EQUI_TASKS_MAX - maximum amount of concurrently running tasks on the system.

Please keep in mind that the reference implementation code primarily serves as a, well, reference on how the specification should be implemented, so it emphasizes on code portability and readability over performance whenever such a choice arises.

The project Makefile, provided for convenience, supports passing these constants with -DFLAGS="..." switch. Below are the steps to build Equi without a Makefile from the equi.c source file alone, with a corresponding make target specified as well.

#Building with GCC/Clang/MinGW (for current mainstream targets): make

Build with default parameters (you can override any of the above constants with -D switch:

cc -std=c89 -Os -o equi equi.c [-DSTACK_SIZE=... ...]

#Building with TCC (TinyCC, Tiny C Compiler): make tcc

Equi's codebase detects TCC and attempts to save size by linking against tcclib instead of the standard libraries. Note that TCC doesn't support size optimization switches and C89 standard in the most recent versions, so it will fall back to C99 instead. Anyway, the most sensible command to build Equi with TCC is:

tcc -std=c89 -o equi equi.c [-DSTACK_SIZE=... ...]

#Building with CC65 for Enhanced Apple IIe: make a2

This is where things start to get interesting, as we need to specify the exact target machine for CC65 and perform certain target-dependent post-build manipulation. For now, Equi reference implementation is only being tested for 65C02-based Enhanced Apple IIe (as the earliest model both supported by CC65 suite and supporting lowercase character I/O), so the command to build it would be:

cl65 --standard c89 -O -Os -t apple2enh -o equi.a2enh [-DSTACK_SIZE=... ...] equi.c

Then, if there are no compiler/linker errors, we can proceed with building the image (assuming we're using Java and AppleCommander with an empty 140K ProDOS 8 image bundled in the repo for image assembly):

cp platform-build-tools/apple2/tpl.dsk equi.dsk
java -jar platform-build-tools/apple2/ac.jar -p equi.dsk equi.system sys < $(cl65 --print-target-path)/apple2enh/util/loader.system
java -jar platform-build-tools/apple2/ac.jar -as equi.dsk equi bin < equi.a2enh

This will build a bootable disk image with Equi for Apple II that can be tested on emulators or real hardware.

You can also add a 96K-sized PERS.DAT file shipped in the repo to use the persistent storage capabilities (done automatically with the Makefile target):

java -jar platform-build-tools/apple2/ac.jar -dos equi.dsk PERS.DAT bin < platform-build-tools/PERS.DAT

#Multitasking in Equi

Equi supports running several tasks concurrently scheduled instruction-by-instruction in a round-robin fashion. The general rules are as follows:

  1. Every task context has an ID, starting from 0 and ending with EQUI_TASKS_MAX - 1, and two specific attributes - active and privileged. The active attribute determines whether or not the task is running, the privileged attribute determines whether or not the task can write to the command buffer area not belonging to itself.
  2. The program code passed into Equi on start is loaded into task 0 and its privileged attribute is always set. This way, any code initially run in the machine can act as a loader and launcher for other tasks.
  3. A privileged task can spawn either another privileged task or non-privileged task. A non-privileged task can only spawn another non-privileged task.
  4. No task, whether privileged or not, can write into any RAM area outside its own GPD area and the command buffer. Non-privileged tasks are additionally limited to the command buffer area they already take and cannot write anywhere else.
  5. When a task has ended, its active flag is unset. Equi runtime then may use its task slot to allocate another task when necessary.
  6. Equi machine halts/quits when no active task is left.

New tasks are created (and instantly activated) with Y instruciton that accepts the code address, code length and privileged flag from the stack, and returns the task ID on top of the stack. Using this task ID, you can further control the status of the task using system port 3, passing the task ID as p1 parameter and one of the following operation codes as p2 parameter to the P instruction:

  • 0: get task status (active or not) as r1,
  • 1: set active status of the task (start/resume it) if your own task is privileged,
  • 2: unset active status of the task (pause/terminate it) if your own task is privileged,
  • 3: get the privilege status of the task as r1.

See this snippet for a very simple example of using Y instruciton to allocate new tasks from existing code.

#FAQ

#Why does the world need another Forth-like system?

Because it aims for a different set of goals than typical Forth systems, mainly to explore the realms of blurring the borders between source and machine code, and to create a VM that can be easily programmed with printable text on the lowest level with no assembly required. Equi is to a typical Forth what VTL-2 was to BASIC, except in this case it is much more capable and extensible at its core.

#What is the main niche for Equi? With a hard 16-bit address bus, is it a Uxn's competitor?

No, not at all. Although Equi was partially inspired by Uxn, it aims for a totally different goal. Uxn was primarily designed for an esoteric computer, Varvara, with graphical, non-blocking input and sound capabilities in mind, and for compact binary machine code size, requiring preprocessing and assembly to obtain it. Equi was primarily designed for a more old-school serial terminal experience, and for machine code being readable and writable by humans at some expense of compactness. Still, FizzBuzz is 180 bytes in Equi when minified, and this size can be reduced even further by switching to single-character words and removing zeroes in hex literals where possible. And the resulting .equi file would still be readable by outputting its contents to a terminal, compared to 99-byte FizzBuzz in Uxn that would only have to be read in a hex viewer or via special disassembly tools.

#I want to use Equi programs in a relatively modern POSIX environment as a part of a scripted process. Is this possible?

Totally! The Makefile for the reference implementation includes sensible default parameters for all targets. Just call cat program.equi | /path/to/equi - s | [other program] in your scripts, where s parameter is used to suppress all banners and prompts and terminal initialization code from the standard output stream. Just make sure to place PERS.DAT file in the appropriate place if you need the persistence capabilities in your Equi-based scripts, and not use input instructions in your programs if unsure what they will do with the streamed input. You can, of course, call Equi programs in a usual way just as well, with /path/to/equi /path/to/program.equi s, for instance.

#Too few core instructions! There still are lots of unused uppercase Latin letters, why not utilise them?

Yes, Equi was designed to be useable from a standard keyboard but this doesn't mean every possible letter should be covered by an instruction. Implementation complexity should be kept low. Besides, new core features not present in every target system are much more convenient to implement via port I/O mechanism.

#Too many core instructions! E.g. - can be easily replaced with N+, and all bitwise operations can be done using NAND or NOR alone!

While Equi definitely is a minimalist runtime, it's not limited to a 16- or 32-instruction set and tries to keep the balance between simplicity of implementation and simplicity of usage (as far as it can go for a machine-level language). Omitting too many primitive operations would require programmers to paste more instructions instead of one or define them as custom words where it would be totally unnecessary. That being said, Equi's instruction set still might be optimised a little in future versions.

#Why is there a distinction between instructions and custom-defined words? Forth doesn't have one!

This distinction only exists to simplify program interpretation flow. Forth uses whitespace as an essential syntactic feature to delimit words and literals, Equi does not. Therefore, the only way to distinguish between a string literal and compiled word definition is by the means of a special instruction. And using for the compiled words the same approach as for the hexadecimal short literals (automatically try to detect one before an instruction) would be too resource-heavy for the oldest systems as it would involve computing CRC16 on the literal stack contents every single instruction. A dedicated instruction that denotes what to do with the literal stack is much more convenient and straightforward to implement.

#Is Equi self-hosted, i.e. can it compile and run a new version of itself?

Depends on what exactly you mean by this. If you mean something like Uxntal assembler written in Uxntal, the beauty of Equi is that it doesn't need such a tool, because what you type is what gets directly executed. A single-pass Equi code minifier, similar to what equi m does in the reference implementation, surely can be implemented in Equi itself, and a proof of this concept is under development now. With Equi being Turing-complete, a full Equi VM running inside an Equi VM is theoretically also possible, although it would be rather slow, complex to implement and bearing little to no practical use. If, however, you mean a compiler of Equi to the target's machine language, implemented in Equi itself, the amount of work required to do that would be comparable to implementing compilers on a Forth system, and would most likely hit the 64K RAM limit. But for the simplest targets this also it possible if you throw enough time and effort into this.

#Where are labels? Macros? Includes? Why doesn't Equi have them? Even Uxntal has them!

Being flexible and human-readable, but a machine language nevertheless, Equi deliberately doesn't include any features that would qualify as preprocessing and require more than a single pass when loading a program into the command buffer. The principle "one source instruction is one machine instruction" is paramount for the entire platform. One can, however, create a translator that compiles a higher level programming language into Equi, with that compiler/language having any required preprocessing features.

Some features can be simulated with tools external to Equi, for instance, includes can be achieved by concatenating several files, as long as only the last of them contains the Q instruction, and loop labels can be emulated with saving jump addresses to the return stack and calling them back when necessary. Only whitespace and comments, which are absolutely needed in order to write readable programs directly in Equi, are being stripped during the single-pass program bootup.

#What are the minimum system requirements to implement/port and/or run Equi?

For the reference implementation in ANSI C, at least 32K of RAM and 6502 or better CPU are recommended, and 64K RAM and above are ideal. For your own implementations, make sure that the CPU speed is enough to perform 16-bit integer multiplication and division, as well as CRC-16 calculation, without noticeable lags, and that the command buffer, CLT and GPD areas are large enough to fit programs for your tasks. Also, persistent storage and realtime clock facilities are nice to have as the bare minimum.

#Which CRC-16 variant is required for Equi?

There is no required variant of CRC-16. Different implementations using different CRC-16 algos doesn't mean the programs for them would be incompatible, it's only related to the internal storage of the compiled words in CLT. The recommended CRC-16 variant though is the one used in the reference implementation, CRC-16-CCITT (0xFFFF). This one is simple to implement and provides a good pseudo-random distribution even for long sequences of zero bytes.

#Is non-blocking key input implemented for the targets that support it?

For now, no, but it may come true in the future versions. Now, more essential features are being focused upon.

#Credits

Created by Luxferre in 2022, released into public domain.

Made in Ukraine.