~luxferre/nrj-oisc

NOR and Reference Jump OISC platform
7c2380ec — Luxferre 3 months ago
Added the first README draft
4ce53635 — Luxferre 3 months ago
Git rid of some hardcoding
24eaaab7 — Luxferre 3 months ago
Optimized lookup table placement and trailing zero words stripping

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~luxferre/nrj-oisc
read/write
git@git.sr.ht:~luxferre/nrj-oisc

You can also use your local clone with git send-email.

#NRJ: NOR and Reference Jump one-instruction computer

NRJ is a single-instruction computer architecture designed by Luxferre in 2022 and released into public domain. This architecture is meant to be portable and easily reimplementable in both software and hardware. It includes a specification (this README), a reference emulator, a reference assembler and a minimum standard library.

This document is a work in progress, and the specification and reference implementation may change at any time without notice.

#The Machine (nrj.c)

NRJ features:

  • a single-instruction CPU with a 3-operand NOR and Reference Jump instruction;
  • fully allocated word-addressable RAM (the size also depends on the word size, i.e. for NRJ16 it's 65536 words, i.e. 131072 bytes);
  • a context-dependent I/O module that provides memory-mapped input and output routines to the CPU.

Depending on the word size, particular NRJ machine variants are called NRJ8, NRJ16, NRJ32 etc.

NRJ emulation is notoriously simple to implement. The entire algorithm of running a program by an NRJ machine is as follows:

  1. Load the program into the RAM from its start.
  2. Set PC to 3.
  3. Check the memory cell 0. If it's non-zero, run the input routine on its value and cell 2 value and clear the cell 0.
  4. Perform the NOR operation on two cell values: 1) the cell referenced by the cell contents at PC and 2) the cell referenced by the next cell contents. Write the result to the first one.
  5. Set PC to the value of the cell referenced by the cell contents at PC+2.
  6. Check the memory cell 1. If it's non-zero, run the output routine on its value and cell 2 value and clear the cell 1.
  7. If PC is equal to the maximum word value, halt the machine, otherwise go to step 3.

For I/O routines, the operand is passed to the cell address 0 or 1 respectively and the context ID is passed to the cell 2. Current NRJ reference implementation only provides I/O routines for the standard context (ID 0), which are:

  • input (from the standard terminal keyboard) a character value into the address passed into cell 0, and
  • output (to the standard terminal text display) a character whose ASCII code was passed into the cell 1.

The standard context input/output should always be implemented as non-blocking and unbuffered.

By convention, the NRJ binary files should (but aren't required to) either have .nrj suffix or the suffix that includes the word size, like .nrj16 or .nrj32. If the word size isn't explicitly in the suffix, 16-bit word size should be assumed.

#The Assembler (nrjasm.py)

To start writing software for any new OISC architecture, some toolchain is required. The first reference assembly language and the corresponding assembly program for NRJ is called nrjasm.

By convention, the nrjasm source files should have .nrjasm suffix. All numeric values in the code are treated as hexadecimal.

Besides the only possible 3-operand machine instruction (with the basic form [hex_addr] [hex_addr] [hex_addr]), the nrjasm language has just 10 core directives, as well as a comment operator (;) and two dereferencing operators (@ and '). Everything else is built on top of them in the standard library and your own code.

The directives can be divided into two classes:

  • preprocessor directives (.bit, .org, .inc, .var, .set, .def, .end);
  • pseudo-location directives (FREE, NXT, HLT).

Here, the nrjasm directives and operators are described in the order they are processed by the assembler.

#Comments

Only single-line comments are supported in nrjasm. Everything after ; in the source line, as well as all the remaining leading or trailing whitespace, is stripped away at the first stage of assembly.

#Includes

You can include another source file in any place of your main code or any other included file with the .inc directive that accepts a file path that can be either relative or absolute. Note that if the path is relative, the current implementation will consider the path relative to your current directory you run the nrjasm executable from. There also is a basic cyclic inclusion protection, so every source file can only be included once.

#Word size

Although NRJ16 looks like the most practical NRJ variant (and is the default one), nrjasm can build for any word size divisible by 8. To override the default, supply the necessary size with the .bit directive, like .bit 8. Note that can only be done once per source, any subsequent .bit occurrences will be ignored.

#Macros

Macros are probably the most important nrjasm feature. They allow you to define repeatable, callable and parameterized pieces of code that are expanded quite early at the build time. Every macro definition starts with .def [macro name] and finishes with .end. Each macro can accept up to 3 parameters. Within the macro, you can refer to these parameters with the pseudo-variables %A, %B and %C. If the macro uses more pseudo-variables than the amount of parameters actually passed to it, the missing ones are replaced as follows: %A and %B are set to HLT, and %C is set to NXT. HLT and NXT are pseudo-location directives we'll cover a bit later.

Nesting macros is not allowed, i.e. you cannot define a macro inside another macro.

#Code start

To denote where exactly in the binary the code should be placed by the assembler, you can use .org directive. Normally, you want to at least use .org 3 directive at the start of your code to reserve the first three words for input, output and I/O context ID ports if you don't plan on pre-populating their values. Unlike .bit, .org directive can be used multiple times in the nrjasm code and you can allocate your instruction however you see fit.

#Variables and dereferencing operators

The .var directive allows you to set a named alias to a hexadecimal word constant value. To retrieve the value from the alias, we prepend the @ dereferencing operator to it. Yes, that's it. However, most of the time we use these hexadecimal constants as variable addresses (hence the name .var, not .const). And most of the time, we don't care what the address actually is, we just need to reserve some space for our variable. This is when we can use the FREE pseudo-location directive (and it can only be used here, in conjunction with .var). E.g. instead of writing .var mycoolvariable 12EF we can write .var mycoolvariable FREE and the assembler will reserve the first location yet unused by other variables and set its address to @mycoolvariable.

In fact, FREE directive only allows the assembler to fill in the address values only after all other .vars are processed. So the addresses taken with FREE are guaranteed to be larger than any strictly defined ones, and even calculated relatively to the maximum used address. For this reason, you must define at least one "strict" .var if you want FREE to work correctly. Also, keep in mind that nrjasm doesn't know where the code part ends and the variable part begins (it only knows where it ends), so you must adjust the minimum variable address manually in case the two start overlapping.

Besides @ operator that dereferences a name into its memory address, there also is the ' operator that dereferences a character into its ASCII code. This can be useful in some cases when you need to output or compare something to a character value. E.g. in NRJ16, 'M is fully equivalent to 004D hex constant but easier to write and remember.

Note that the dereferencing operators don't work in .var itself, it can only either accept FREE or a plain hexadecimal value as the second parameter.

#Setting memory values at the build time

To just set a word value at some address (plain or named), you can use .set directive. This is a common scenario to initialize a variable with a non-zero value at build time, when we write .set @myvar 1234 after declaring myvar with, for instance, .var myvar FREE. Instead of a plain hex value, the .set directive can also take pseudo-location directives HLT and NXT, which we're going to describe right now.

#Lookup table and pseudo-location directives to deal with it

Since NRJ machine cannot just jump to the next instruction afterwards and requires some memory cell to refer to it first, some lookup table is necessary to do so. In nrjasm, this table is constructed right after the last allocated variable address, or in the middle of the memory space in a highly unlikely event no variables were declared at all.

The very first entry of the lookup table always refers to the last machine address (FFFF in case of NRJ16) to allow to halt. This is why the HLT pseudo-location directive always gets substituted with the actual lookup table start offset.

The NXT pseudo-location directive, on the other hand, is resolved to the address of the lookup table entry corresponding to the following machine instruction in the actual code. This allows to write machine instructions in a linear fashion by just using NXT as the third operand instead of an absolute address.

Besides direct machine instructions, HLT and NXT pseudo-locations can also be used in the .set directive. Their logic is processed differently but they essentially give the same effect.

#The Library

(under construction)