~luxferre/nrj-oisc

7c2380ecfbf5061364d53cf3e5dc8deca1c45293 — Luxferre 4 months ago 4ce5363 master
Added the first README draft
1 files changed, 101 insertions(+), 0 deletions(-)

A README.md
A README.md => README.md +101 -0
@@ 0,0 1,101 @@
# NRJ: NOR and Reference Jump one-instruction computer

NRJ is a single-instruction computer architecture designed by Luxferre in 2022 and released into public domain. This architecture is meant to be portable and easily reimplementable in both software and hardware. It includes a specification (this README), a reference emulator, a reference assembler and a minimum standard library.

This document is a work in progress, and the specification and reference implementation may change at any time without notice.

## The Machine (`nrj.c`)

NRJ features:

- a single-instruction CPU with a 3-operand NOR and Reference Jump instruction;
- fully allocated word-addressable RAM (the size also depends on the word size, i.e. for NRJ16 it's 65536 words, i.e. 131072 bytes);
- a context-dependent I/O module that provides memory-mapped input and output routines to the CPU.

Depending on the word size, particular NRJ machine variants are called NRJ8, NRJ16, NRJ32 etc.

NRJ emulation is notoriously simple to implement. The entire algorithm of running a program by an NRJ machine is as follows:

1. Load the program into the RAM from its start.
2. Set PC to 3.
3. Check the memory cell 0. If it's non-zero, run the input routine on its value and cell 2 value and clear the cell 0.
4. Perform the NOR operation on two cell values: 1) the cell referenced by the cell contents at PC and 2) the cell referenced by the next cell contents. Write the result to the first one.
5. Set PC to the value of the cell referenced by the cell contents at PC+2.
6. Check the memory cell 1. If it's non-zero, run the output routine on its value and cell 2 value and clear the cell 1.
7. If PC is equal to the maximum word value, halt the machine, otherwise go to step 3.

For I/O routines, the operand is passed to the cell address 0 or 1 respectively and the context ID is passed to the cell 2. Current NRJ reference implementation only provides I/O routines for the standard context (ID 0), which are:

- input (from the standard terminal keyboard) a character value into the address passed into cell 0, and
- output (to the standard terminal text display) a character whose ASCII code was passed into the cell 1.

The standard context input/output should always be implemented as non-blocking and unbuffered.

By convention, the NRJ binary files should (but aren't required to) either have `.nrj` suffix or the suffix that includes the word size, like `.nrj16` or `.nrj32`. If the word size isn't explicitly in the suffix, 16-bit word size should be assumed.

## The Assembler (`nrjasm.py`)

To start writing software for any new OISC architecture, some toolchain is required. The first reference assembly language and the corresponding assembly program for NRJ is called nrjasm.

By convention, the nrjasm source files should have `.nrjasm` suffix. All numeric values in the code are treated as hexadecimal.

Besides the only possible 3-operand machine instruction (with the basic form `[hex_addr] [hex_addr] [hex_addr]`), the nrjasm language has just 10 core directives, as well as a comment operator (`;`) and two dereferencing operators (`@` and `'`). Everything else is built on top of them in the standard library and your own code.

The directives can be divided into two classes:

- preprocessor directives (`.bit`, `.org`, `.inc`, `.var`, `.set`, `.def`, `.end`);
- pseudo-location directives (`FREE`, `NXT`, `HLT`).

Here, the nrjasm directives and operators are described in the order they are processed by the assembler.

### Comments

Only single-line comments are supported in nrjasm. Everything after `;` in the source line, as well as all the remaining leading or trailing whitespace, is stripped away at the first stage of assembly.

### Includes

You can include another source file in any place of your main code or any other included file with the `.inc` directive that accepts a file path that can be either relative or absolute. Note that if the path is relative, the current implementation will consider the path relative to your current directory you run the nrjasm executable from. There also is a basic cyclic inclusion protection, so every source file can only be included once.

### Word size

Although NRJ16 looks like the most practical NRJ variant (and is the default one), nrjasm can build for any word size divisible by 8. To override the default, supply the necessary size with the `.bit` directive, like `.bit 8`. Note that can only be done once per source, any subsequent `.bit` occurrences will be ignored.

### Macros

Macros are probably the most important nrjasm feature. They allow you to define repeatable, callable and parameterized pieces of code that are expanded quite early at the build time. Every macro definition starts with `.def [macro name]` and finishes with `.end`. Each macro can accept up to 3 parameters. Within the macro, you can refer to these parameters with the pseudo-variables `%A`, `%B` and `%C`. If the macro uses more pseudo-variables than the amount of parameters actually passed to it, the missing ones are replaced as follows: `%A` and `%B` are set to `HLT`, and `%C` is set to `NXT`. `HLT` and `NXT` are pseudo-location directives we'll cover a bit later.

Nesting macros is not allowed, i.e. you cannot define a macro inside another macro.

### Code start

To denote where exactly in the binary the code should be placed by the assembler, you can use `.org` directive. Normally, you want to at least use `.org 3` directive at the start of your code to reserve the first three words for input, output and I/O context ID ports if you don't plan on pre-populating their values. Unlike `.bit`, `.org` directive can be used multiple times in the nrjasm code and you can allocate your instruction however you see fit.

### Variables and dereferencing operators

The `.var` directive allows you to set a named alias to a hexadecimal word constant value. To retrieve the value from the alias, we prepend the `@` dereferencing operator to it. Yes, that's it.
However, most of the time we use these hexadecimal constants as variable addresses (hence the name `.var`, not `.const`). And most of the time, we don't care what the address actually is, we just need to reserve some space for our variable. This is when we can use the `FREE` pseudo-location directive (and it can only be used here, in conjunction with `.var`). E.g. instead of writing `.var mycoolvariable 12EF` we can write `.var mycoolvariable FREE` and the assembler will reserve the first location **yet unused by other variables** and set its address to `@mycoolvariable`.

In fact, `FREE` directive only allows the assembler to fill in the address values only after all other `.var`s are processed. So the addresses taken with `FREE` are guaranteed to be larger than any strictly defined ones, and even calculated relatively to the maximum used address. For this reason, you must define at least one "strict" `.var` if you want `FREE` to work correctly. Also, keep in mind that nrjasm doesn't know where the code part ends and the variable part begins (it only knows where it ends), so you must adjust the minimum variable address manually in case the two start overlapping.

Besides `@` operator that dereferences a name into its memory address, there also is the `'` operator that dereferences a character into its ASCII code. This can be useful in some cases when you need to output or compare something to a character value. E.g. in NRJ16, `'M` is fully equivalent to `004D` hex constant but easier to write and remember.

Note that the dereferencing operators don't work in `.var` itself, it can only either accept `FREE` or a plain hexadecimal value as the second parameter.

### Setting memory values at the build time

To just set a word value at some address (plain or named), you can use `.set` directive. This is a common scenario to initialize a variable with a non-zero value at build time, when we write `.set @myvar 1234` after declaring `myvar` with, for instance, `.var myvar FREE`. Instead of a plain hex value, the `.set` directive can also take pseudo-location directives `HLT` and `NXT`, which we're going to describe right now.

### Lookup table and pseudo-location directives to deal with it

Since NRJ machine cannot just jump to the next instruction afterwards and requires some memory cell to refer to it first, some lookup table is necessary to do so. In nrjasm, this table is constructed right after the last allocated variable address, or in the middle of the memory space in a highly unlikely event no variables were declared at all.

The very first entry of the lookup table always refers to the last machine address (`FFFF` in case of NRJ16) to allow to halt. This is why the `HLT` pseudo-location directive always gets substituted with the actual lookup table start offset.

The `NXT` pseudo-location directive, on the other hand, is resolved to the address of the lookup table entry corresponding to the following machine instruction in the actual code. This allows to write machine instructions in a linear fashion by just using `NXT` as the third operand instead of an absolute address.

Besides direct machine instructions, `HLT` and `NXT` pseudo-locations can also be used in the `.set` directive. Their logic is processed differently but they essentially give the same effect.

## The Library

_(under construction)_