~crc_/retroforth

ref: 2022.1 retroforth/future/utf8.retro -rw-r--r-- 1.0 KiB
6db8a84e — crc update release notes 10 months ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# UTF-8 Characters

UTF-8 allows for characters to be one to four bytes long. Since Retro
is 32-bits internally, all characters can fit into a sincle entry on
the stack. These words will be used to pack and unpack the character
values.

~~~
:uc:pack (????n-c) ;
:uc:unpack (c-????n) ;
~~~

# UTF-8 Strings

Strings in Retro have been C-style null terminated sequences of ASCII
characters. I'm seeking to change this as I'd like to support Unicode
(UTF-8) and to merge much of the string and array handling code.

This will be an ongoing process.

Temporary sigil.

~~~
:sigil:" (-a) a:from-string class:data ; immediate
~~~

Return the length (in utf8 characters or bytes) of a string.

~~~
:us:length (a-n)  #0 swap [ #192 and #128 -eq? + ] a:for-each n:abs ;
:us:length/bytes (a-n)  a:length ;
~~~

~~~
~~~


Fetch a character from a string.

~~~
:us:fetch (an-c) ;
~~~

Store a character into a string.

~~~
:us:store (can-) ;
~~~

Tests.

```
"((V⍳V)=⍳⍴V)/V←,V us:length n:put nl
"((V⍳V)=⍳⍴V)/V←,V us:length/bytes n:put nl
```