#+TITLE: The Carth programming language
Features and other stuff to do/implement in/around Carth.
*IMPORTANT*: When done implementing a TODO, do one or a combination of
the following three things, depending on complexity of the TODO and
the fix etc:
1. Delete the todo. Especially appropriate if the todo is a simple
bugfix or such. Also, link to the commit that fixes the bug or
whatever in the message of the commit that removes the todo.
2. Document the changes in the [[https://gitlab.com/JoJoZ/carth-website/tree/master/pages/reference.org][reference]]. Should be done when any kind
of user-facing feature is implemented, or some other "major" change
has been done.
3. Keep the todo, and mark it as *DONE* with a description of what was
done and how it went. This should only be done seldomly I
feel. Also, link to the commit that fixes the TODO would be nice.
* INACTIVE Package system
* NEXT Module system
Postfix syntax for module paths? A bit like web-domains -
"sub.main.top". E.g. "vector.collections.std". Most relevant
information floats to the left. Maybe a good idea, maybe
not. Consider it.
Look at ML modules.
** INACTIVE Allow conflicting imports if unambiguous?
I'm thinking something that would allow the following. It would be
less annoying than having to qualify everything. Also, gotta think
about how this relates to overloading à la C++.
#+BEGIN_SRC carth
(module Foo
(data FooThing First Second)
(define: isFirst
(Fun FooThing Bool)
(fun-match
[First True]
[Second False])))
(module Bar
(data BarThing First Second)
(define: isFirst
(Fun BarThing Bool)
(fun-match
[First True]
[Second False])))
;; First, there should be no error for just importing modules with conflicting
;; defs. This is ok in Haskell, unless one of the conflicting defs is used.
(import Foo)
(import Bar)
;; Second, it should be allowed to use one of a set of conflicting defs if the
;; type makes it unambiguous....
;; either explicitly
(define: x FooThing First)
(define: y BarThing First)
;; or implicitly
(define t (isFirst x))
(define u (isFirst y))
#+END_SRC
* NEXT Typeclasses
** Agda style classes w implicit args
In Haskell, you can only have a single instance of a specific
typeclass for a specific type. This doesn't always make
sense. Consider Semigroup for Int. Both + and * make sense, but we
can only have one unless we goof around with newtypes etc, and that
kinda sucks.
Consider an approach more like agda. That model is more lika basic
Hindley-Milner + dictionsry passing, except the "typeclass"
argument can be passed implicitly with the {} syntax! That seems
really cool.
I'm not sure how implicit arguments work though. Does the compiler
just look at all available bindings and pick the first/only
available variable of that type?
https://agda.readthedocs.io/en/v2.5.2/language/implicit-arguments.html
https://agda.readthedocs.io/en/v2.5.2/language/instance-arguments.html
Or just do it kind of Haskell style, but give the instances names
and allow multiple, overlapping instances, raisi g an error if the
instance is ambiguous somehow.
Problem with instances as implicit arguments:
https://youtu.be/2EdQFCP5mZ8?t=1259. We'd have to know exactly
which instances exist for the same type, and from where they're
imported and what scoping they'll have. That sucks. Another
horrible thing: imagine creating a sorted list with one instance, and doing
a sorted lookup with another (accidentally or not), you could an incorrect
result with no error from the compiler!
Maybe an alternative could be to have both ~primary~ and
~secondary~ instances, where the primary instances may not overlap
or be orphaned, like Rust, but may be passed implicitly, while
secondary instances may overlap and be orphaned, but must be
"overriden"/passed explicitly.
But that may also not work. For the following code,
#+BEGIN_SRC haskell
foo :: Foo a => a -> a
foo = bar
bar :: Foo a => a -> a
bar = ...
#+END_SRC
consider that we call ~foo~ with an explicit secondary
instance. What instance will ~bar~ be given? If we must pass
secondary instances explicitly, it seems ~bar~ would get the
primary instance, and ~foo~ and ~bar~ would be called with
different instances. BAD!
Probably last update for this section: [[https://old.reddit.com/r/haskell/comments/765ogm/multiple_type_class_instances_for_the_same_type/][this thread]] has convinced me
that Haskell-/Rust-style typeclasses is the best idea.
* NEXT Linear types
Linear types would allow predictable performance and behaviour of
e.g. IO tasks. Force a single manual file-close or
buffer-flush. Force a single free for malloc. Affine types would
allow better performance. E.g. pure, in-place modification of
array. If noone else points to it, value can be consumed and
modified rather than cloned. Something like: ~fn push(mut v:
Vec<i32>, x: i32) -> Vec<i32> { v.push(x); v }~ Implemented as maybe
a wrapper, or an interface? Maybe like in haskell with lolly
operator?
Things to consider: Linear arrow vs. `kind` approach or similar?
Check out Idris Uniqueness types, Linear Haskell's linear arrows,
and however Blodwen does it (linear arrows kind of I think).
* NEXT Higher kinded types
* INACTIVE Type families / functional dependencies and multi-param classes / Dependent types
I'm on the fence here, but the consensus seems to be that type
families are better than fundeps. Also, it might be possible to
avoid needing to implement Multi-parameter typeclasses if type
families are available to compensate. Seems that would reduce
ambiguities and mental overhead a bit.
Neither type families or fundeps are necessary if we have dependent
types, but that would likely bring difficulties of it's own.
Type families in Haskell vs Dependent types in a pseudo-Haskell vs
Dependent types in Agda:
** Type families, Haskell
#+BEGIN_SRC haskell
class Iter c where
type Item c
next :: c -> Maybe (Item c, c)
nextList :: [a] -> Maybe (a, [a])
nextList = \case
[] -> Nothing
a : as -> Just (a, as)
instance Iter [a] where
type Item [a] = a
next = nextList
#+END_SRC
** Dependent types, pseudo-Haskell
#+BEGIN_SRC haskell
class Iter c where
item :: Type
next :: c -> Maybe (item, c)
nextList :: [a] -> Maybe (a, [a])
nextList = \case
[] -> Nothing
a : as -> Just (a, as)
instance Iter [a] where
item = a
next = nextList
#+END_SRC
** Dependent types, Agda
#+BEGIN_SRC agda2
record Iter (C : Set) : Set1 where
field
item : Set
next : C -> Maybe (item × C)
nextList : {A : Set} -> List A -> Maybe (A × List A)
nextList [] = nothing
nextList (x ∷ xs) = just (x , xs)
listIter : {A : Set} -> Iter (List A)
listIter {a} = record
{ item = a
; next = nextList
}
#+END_SRC
* Memory model
How do we handle the heap? Garbage collection like Haskell?
Ownership and borrowing like Rust? Something in between?
Should heap allocations be explicit or implicit? Even if we go with
a Haskell-like model, should there be an explicit ~Box a~ type?
** NEXT Consider something Rust-like
I.e. affine/linear types, lifetimes, little/no GC by default.
Would allow writing real-time applications like games.
E.g. GHC seems to prefer throughput over latency, so very long
pauses are possible when you're working with a nontrial amount of
data. "You're actually doing pretty well to have a 51ms pause time
with over 200Mb of live data.".
Lifetimes could fit in with Higher Kinded Types quite
naturally. Instead of just having the kind ~*~ (aka. ~type~), you'd
have two kinds: ~type~ and ~lifetime~. You could then have a type
like ~Ref 'a Int~ where ~Ref~ is a type operator with kind ~lifetime
-> type -> type~.
Another option could be to add ways of controlling when GC happens
so you can reduce spikes of latency. Haskell has ~performGC :: IO
()~ that does this. [[https://old.reddit.com/r/haskell/comments/6d891n/has_anyone_noticed_gc_pause_lag_in_haskell/di0vqb0/][Here is a gameboy]] who eliminates spikes at the
cost of overall performance by calling ~performGC~ every frame.
[[https://github.com/rust-lang/rfcs/blob/master/text/1598-generic_associated_types.md][Some inspiration here]].
** Garbage collector
Until we get linear types, and probably even then, we'll need some
form of GC.
There are many problems with refcounting: Generated llvm ir/asm gets
polluted; While performance is more predictable, it's typically
worse overall; Cycle breaking would either require using weak refs
where appropriate, which would in turn require user input or an
advanced implementation, or a periodic cycle breaker, which would be
costly performance wise. So tracing GC is probably a good idea.
*** NEXT Boehms GC
Simplest way to get rudimentary, but decently performant, GC.
*** INACTIVE DIY Garbage collector
A tracing GC would be quite separate from the rest of the
program. The only pollution would be calls to the allocator (not
much different from the current sitch w malloc) and
(de)registrations of local variables in Let forms (a total of two
function calls per heap allocated variable).
Implementing a tracing GC would also be a fun challenge, and I'm
sure it could be fun to try different algorithms etc.
Look at https://github.com/mkirchner/gc.
**** How it would work
Basically, instead of calling =malloc=, the alloc function of the
GC is called. This function keeps track of either the number of
calls, the time, or the current sum of allocated space, and
periodically performs a mark-and-sweep, walking through the object
graph and marking objects not directly or indirectly referenced by
a "root" node for sweeping.
Root nodes are global variables and all local variables visible in
the current scope. Global variables can be registered in the main
wrapper, while local variables could be registered right after
they've been created (in a Let, Match, ...). They would then be
unregistered right before the function returns (or in the case of
tail calls, right before the tail call). Registering could happen
directly in the GC alloc routine.
** Merging affine/linear types and GC
Best of both worlds? Maybe.
I don't think I want memory management to be quite as explicit and
cumbersome as in Rust, especially wrt lifetimes. An alternative
could be to just add linear types to allow for structures that
require mutability, like HashMap, but not borrowing. This would not
enable us to write *the most* performant code, but we'd be able to
do a lot better than with just GC--games may be quite possible.
* INACTIVE Effect system
* INACTIVE Macros?
* INACTIVE Property system
I'm thinking of a system where you annotate functions in a source
file with pre- and postconditions, which can then be checked in
different modes depending on how much time you've got etc.
- Proof-mode. Exchaustive checking of conditions. All possible
inputs are generated, and the system checks that the precondition
always implies the postcondition.
- Test-mode. Statistical, random testing. Generate enough inputs
such that the precondition is fulfilled for a statistically
significant subset of the complete set of possible inputs.
- Debug-mode. Functions are not tested ahead of time, instead
assertions are inserted and checked at runtime.
- Release-mode. Conditions are completely ignored.
* NEXT Consider using lib for pretty printing
https://hackage.haskell.org/package/pretty-1.1.1.1
* INACTIVE Hoogle equivalent
https://wiki.haskell.org/Hoogle
* INACTIVE Web playground
Like play.rustlang.org
* INACTIVE Language server protocol
[[https://github.com/Microsoft/language-server-protocol]]
[[https://internals.rust-lang.org/t/introducing-rust-language-server-source-release/4209]]
* NEXT Reference
Rust has a [[https://doc.rust-lang.org/reference/][good reference]]. Look at that for inspiration.
** INACTIVE Document syntax
** INACTIVE Document type system
** INACTIVE Document memory model
* NEXT Continuous deployment of webpage
at [[https://carth.jo.zone/]] or some other place.
* INACTIVE HTML documentation generation
Like [[https://www.haskell.org/haddock/][haddock]] and [[https://www.haskell.org/haddock/][rustdoc]].
* INACTIVE Documentation checker
Like a typechecker-pass but for generated documentation. Verify that
all links are alive, that examples compile and produce the expected
output, etc.
* NEXT Debug information in LLVM-IR
You should be able to run a Carth program in GDB and actually be
able to do stuff, so we need to emit metadata about source-locations
and stuff in the LLVM-IR. Something like the following, from the rust playground:
#+BEGIN_EXAMPLE
...
; playground::foo
; Function Attrs: nonlazybind uwtable
define internal i32 @_ZN10playground3foo17hc5d9d5678570880bE() unnamed_addr #1 !dbg !1258 {
start:
; call std::panicking::begin_panic
call void @_ZN3std9panicking11begin_panic17hb85d687efeb64e5dE([0 x i8]* noalias nonnull readonly align 1 bitcast (<{ [4 x i8] }>* @13 to [0 x i8]*), i64 4, { [0 x i64], { [0 x i8]*, i64 }, [0 x i32], i32, [0 x i32], i32, [0 x i32] }* noalias readonly align 8 dereferenceable(24) bitcast (<{ i8*, [16 x i8] }>* @12 to { [0 x i64], { [0 x i8]*, i64 }, [0 x i32], i32, [0 x i32], i32, [0 x i32] }*)), !dbg !1264
unreachable, !dbg !1264
}
; playground::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN10playground4main17h7395a4a007d16efeE() unnamed_addr #1 !dbg !1265 {
start:
; call playground::foo
%0 = call i32 @_ZN10playground3foo17hc5d9d5678570880bE(), !dbg !1266
br label %bb1, !dbg !1266
bb1: ; preds = %start
ret void, !dbg !1267
}
...
!1266 = !DILocation(line: 6, column: 4, scope: !1265)
#+END_EXAMPLE
* INACTIVE Guarantee no stack overflow for tail recursion
We should guarantee that directly (and indirectly?) recursive
function call should not cause the stack usage to grow
indefinitely. Tail call elimination or trampolining should take
place. Will need to look into what LLVM can do, and what's possible
on different platforms. Hopefully we won't have to resort to
trampolining--that seems slow.
* INACTIVE Boxing to allow for dynamic linking
Boxing vs monomorphization. Boxing results in smaller binary and
dynamically-linkable interface,bot results in slower code.
Maybe monomorphize all package-internal code, and box all
public-facing functions?
* Standard library (std, stdlib)
** INACTIVE Prefer somewhat big / wide stdlib
Small / bad standard library + good package manager => npm / cargo
situation, where everything has sooo many dependencies. Having a dep
is not bad per say, but when the numbers completely blow up, like in
rust- and javascript-land, things can get messy. The best way to
avoid this, I think, is having a standard library that has you
covered for most common things.
Examples of libraries in other ecosystems that should be part of the
stdlib: `is-even` in JavaScript, `composition` in Haskell, `rand` in
Rust.
Go seems to have done this relatively well. Their stdlib has
everything from JPEG codec, to a webserver. The stdlib shouldn't
have everything though, as that will add a bunch of legacy cruft
over time, like in Java. Would not be as much of a problem if we're
not afraid of releasing new major versions removing deprecated
stuff.
** INACTIVE Concurrency / parallelism primitives
Mutex, semaphore, etc.
Look at how Rust and Haskell do it.
Also, look at the crate [[https://crates.io/crates/parking_lot][parking_lot]], which does replaces the
standard Rust primitives with smarter ones. E.g. the mutex does a
small number of spins first, to avoid expensive thread juggling by
the OS when the critical section is very short, but resort to the
usual process interrupts in case it goes on for longer, to avoid
priority inversion which is a problem with spinlocks.
https://matklad.github.io/2020/01/02/spinlocks-considered-harmful.html
https://matklad.github.io/2020/01/04/mutexes-are-faster-than-spinlocks.html