~jojo/Carth

ref: fe104f251f232a134b292ef0efcb2da1abf163b3 Carth/TODO.org -rw-r--r-- 21.2 KiB
fe104f25JoJo Replace Parse.parseTokenTreeOrRest with Lex.tokentree in Err 1 year, 7 months ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
#+TITLE: The Carth programming language

Features and other stuff to do/implement in/around Carth.

* INACTIVE Package system

* NEXT Module system
  Postfix syntax for module paths? A bit like web-domains -
  "sub.main.top". E.g. "vector.collections.std".  Most relevant
  information floats to the left. Maybe a good idea, maybe
  not. Consider it.

  Look at ML modules.

** INACTIVE Allow conflicting imports if unambiguous?
   I'm thinking something that would allow the following. It would be
   less annoying than having to qualify everything. Also, gotta think
   about how this relates to overloading à la C++.

   #+BEGIN_SRC carth
   (module Foo
           (data FooThing First Second)
           (define: isFirst
               (Fun FooThing Bool)
             (fun-match
               [First True]
               [Second False])))

   (module Bar
           (data BarThing First Second)
           (define: isFirst
               (Fun BarThing Bool)
             (fun-match
               [First True]
               [Second False])))

   ;; First, there should be no error for just importing modules with conflicting
   ;; defs. This is ok in Haskell, unless one of the conflicting defs is used.
   (import Foo)
   (import Bar)

   ;; Second, it should be allowed to use one of a set of conflicting defs if the
   ;; type makes it unambiguous....

   ;; either explicitly
   (define: x FooThing First)
   (define: y BarThing First)

   ;; or implicitly
   (define t (isFirst x))
   (define u (isFirst y))
   #+END_SRC

* NEXT Typeclasses
  Need some kind of system like type classes for ad hoc
  polymorphism. Maybe Haskell style type classes, Agda style
  implicits, or Ocaml style modules. Not sure.
** Agda style classes w implicit args
   https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#implicit-parameters

   In Haskell, you can only have a single instance of a specific
   typeclass for a specific type. This doesn't always make
   sense. Consider Semigroup for Int. Both + and * make sense, but we
   can only have one unless we goof around with newtypes etc, and that
   kinda sucks.

   Consider an approach more like agda. That model is more lika basic
   Hindley-Milner + dictionsry passing, except the "typeclass"
   argument can be passed implicitly with the {} syntax! That seems
   really cool.

   I'm not sure how implicit arguments work though. Does the compiler
   just look at all available bindings and pick the first/only
   available variable of that type?

   https://agda.readthedocs.io/en/v2.5.2/language/implicit-arguments.html

   https://agda.readthedocs.io/en/v2.5.2/language/instance-arguments.html

   Or just do it kind of Haskell style, but give the instances names
   and allow multiple, overlapping instances, raisi g an error if the
   instance is ambiguous somehow.

   Problem with instances as implicit arguments:
   https://youtu.be/2EdQFCP5mZ8?t=1259.  We'd have to know exactly
   which instances exist for the same type, and from where they're
   imported and what scoping they'll have. That sucks. Another
   horrible thing: imagine creating a sorted list with one instance, and doing
   a sorted lookup with another (accidentally or not), you could an incorrect
   result with no error from the compiler!

   Maybe an alternative could be to have both ~primary~ and
   ~secondary~ instances, where the primary instances may not overlap
   or be orphaned, like Rust, but may be passed implicitly, while
   secondary instances may overlap and be orphaned, but must be
   "overriden"/passed explicitly.

   But that may also not work. For the following code,

   #+BEGIN_SRC haskell
   foo :: Foo a => a -> a
   foo = bar

   bar :: Foo a => a -> a
   bar = ...
   #+END_SRC

   consider that we call ~foo~ with an explicit secondary
   instance. What instance will ~bar~ be given? If we must pass
   secondary instances explicitly, it seems ~bar~ would get the
   primary instance, and ~foo~ and ~bar~ would be called with
   different instances. BAD!

   Probably last update for this section: [[https://old.reddit.com/r/haskell/comments/765ogm/multiple_type_class_instances_for_the_same_type/][this thread]] has convinced me
   that Haskell-/Rust-style typeclasses is the best idea.

* NEXT Linear types
  Linear types would allow predictable performance and behaviour of
  e.g. IO tasks. Force a single manual file-close or
  buffer-flush. Force a single free for malloc.  Affine types would
  allow better performance.  E.g. pure, in-place modification of
  array.  If noone else points to it, value can be consumed and
  modified rather than cloned. Something like: ~fn push(mut v:
  Vec<i32>, x: i32) -> Vec<i32> { v.push(x); v }~ Implemented as maybe
  a wrapper, or an interface?  Maybe like in haskell with lolly
  operator?

  Things to consider: Linear arrow vs. `kind` approach or similar?

  Check out Idris Uniqueness types, Linear Haskell's linear arrows,
  and however Blodwen does it (linear arrows kind of I think).

* NEXT Higher kinded types

* INACTIVE Type families / functional dependencies and multi-param classes / Dependent types
  I'm on the fence here, but the consensus seems to be that type
  families are better than fundeps. Also, it might be possible to
  avoid needing to implement Multi-parameter typeclasses if type
  families are available to compensate. Seems that would reduce
  ambiguities and mental overhead a bit.

  Neither type families or fundeps are necessary if we have dependent
  types, but that would likely bring difficulties of it's own.

  Type families in Haskell vs Dependent types in a pseudo-Haskell vs
  Dependent types in Agda:

** Type families, Haskell
   #+BEGIN_SRC haskell
   class Iter c where
       type Item c
       next :: c -> Maybe (Item c, c)

   nextList :: [a] -> Maybe (a, [a])
   nextList = \case
       [] -> Nothing
       a : as -> Just (a, as)

   instance Iter [a] where
       type Item [a] = a
       next = nextList
   #+END_SRC

** Dependent types, pseudo-Haskell
   #+BEGIN_SRC haskell
   class Iter c where
       item :: Type
       next :: c -> Maybe (item, c)

   nextList :: [a] -> Maybe (a, [a])
   nextList = \case
       [] -> Nothing
       a : as -> Just (a, as)

   instance Iter [a] where
       item = a
       next = nextList
   #+END_SRC

** Dependent types, Agda
   #+BEGIN_SRC agda2
   record Iter (C : Set) : Set1 where
     field
       item : Set
       next : C -> Maybe (item × C)

   nextList : {A : Set} -> List A -> Maybe (A × List A)
   nextList [] = nothing
   nextList (x ∷ xs) = just (x , xs)

   listIter : {A : Set} -> Iter (List A)
   listIter {a} = record
     { item = a
     ; next = nextList
     }
   #+END_SRC

* INACTIVE Custom GC
  Until we get linear types, and even then, we'll need some form of
  GC. Boehm's seems to be working well enough, but a conservative
  collector is not ideal, and I think it would be a fun project to
  write my own GC.

  There are many problems with refcounting: Generated llvm ir/asm gets
  polluted; While performance is more predictable, it's typically
  worse overall; Cycle breaking would either require using weak refs
  where appropriate, which would in turn require user input or an
  advanced implementation, or a periodic cycle breaker, which would be
  costly performance wise. So tracing GC is probably a good idea.

  GHC seems to prefer throughput over latency, so very long pauses are
  possible when you're working with a nontrial amount of data. "You're
  actually doing pretty well to have a 51ms pause time with over 200Mb
  of live data.".

  It could be interesting to add ways of controlling when GC happens
  so you can reduce spikes of latency. Haskell has ~performGC :: IO
  ()~ that does this. [[https://old.reddit.com/r/haskell/comments/6d891n/has_anyone_noticed_gc_pause_lag_in_haskell/di0vqb0/][Here is a gameboy]] who eliminates spikes at the
  cost of overall performance by calling ~performGC~ every frame.

  [[https://github.com/rust-lang/rfcs/blob/master/text/1598-generic_associated_types.md][Some inspiration here]].

  A tracing GC would be quite separate from the rest of the
  program. The only pollution would be calls to the allocator (not
  much different from the current sitch w malloc) and
  (de)registrations of local variables in Let forms (a total of two
  function calls per heap allocated variable).

  Implementing a tracing GC would be a fun challenge, and I'm sure it
  could be fun to try different algorithms etc.

  Look at https://github.com/mkirchner/gc.

* INACTIVE Effect system

* INACTIVE Macros?

* INACTIVE Property system
  I'm thinking of a system where you annotate functions in a source
  file with pre- and postconditions, which can then be checked in
  different modes depending on how much time you've got etc.

  - Proof-mode. Exchaustive checking of conditions. All possible
     inputs are generated, and the system checks that the precondition
     always implies the postcondition.
  - Test-mode. Statistical, random testing. Generate enough inputs
    such that the precondition is fulfilled for a statistically
    significant subset of the complete set of possible inputs.
  - Debug-mode. Functions are not tested ahead of time, instead
     assertions are inserted and checked at runtime.
  - Release-mode. Conditions are completely ignored.

* NEXT Consider using lib for pretty printing
  https://hackage.haskell.org/package/pretty-1.1.1.1

* INACTIVE Hoogle equivalent
  https://wiki.haskell.org/Hoogle

* INACTIVE Playground
  Like play.rustlang.org

  https://play.rust-lang.org/help
  https://github.com/google/nsjail

  Might actually be pretty easy by making use of Guix
  containers. Sandboxes the filesystem, and doesn't give network
  access unless `--network` is provided.

  #+BEGIN_EXAMPLE
  guix environment --container --ad-hoc coreutils clang carth
  #+END_EXAMPLE

* INACTIVE Language server protocol
  [[https://github.com/Microsoft/language-server-protocol]]
  [[https://internals.rust-lang.org/t/introducing-rust-language-server-source-release/4209]]

* INACTIVE HTML documentation generation
  Like [[https://www.haskell.org/haddock/][haddock]] and [[https://www.haskell.org/haddock/][rustdoc]].

* INACTIVE Documentation checker
  Like a typechecker-pass but for generated documentation. Verify that
  all links are alive, that examples compile and produce the expected
  output, etc.
* Standard library (std, stdlib)
  Prefer somewhat big / wide stdlib. Small / bad standard library +
  good package manager => npm / cargo situation, where everything has
  sooo many dependencies. Having a dep is not bad per say, but when
  the numbers completely blow up, like in rust- and javascript-land,
  things can get messy. The best way to avoid this, I think, is having
  a standard library that has you covered for most common things.

  Examples of libraries in other ecosystems that should be part of the
  stdlib: `is-even` in JavaScript, `composition` in Haskell, `rand` in
  Rust.

  Go seems to have done this relatively well. Their stdlib has
  everything from JPEG codec, to a webserver. The stdlib shouldn't
  have everything though, as that will add a bunch of legacy cruft
  over time, like in Java. Would not be as much of a problem if we're
  not afraid of releasing new major versions removing deprecated
  stuff.

** INACTIVE Concurrency / parallelism primitives
   Mutex, semaphore, etc.

   Look at how Rust and Haskell do it.

   Also, look at the crate [[https://crates.io/crates/parking_lot][parking_lot]], which does replaces the
   standard Rust primitives with smarter ones. E.g. the mutex does a
   small number of spins first, to avoid expensive thread juggling by
   the OS when the critical section is very short, but resort to the
   usual process interrupts in case it goes on for longer, to avoid
   priority inversion which is a problem with spinlocks.
   https://matklad.github.io/2020/01/02/spinlocks-considered-harmful.html
   https://matklad.github.io/2020/01/04/mutexes-are-faster-than-spinlocks.html

* NEXT Some algorithms & data structures
  We need good collections & algs for sorting etc. if Carth is going
  to be of any use to anyone. Would also be a good way to add to the
  set of test-programs & find the worst pain points of current Carth.

  Many of these have implementations to look at and compare to on
  [[rosettacode.org]].

  This list is sort of off the top of my head, so some might not be
  good fits in a purely functional language. Look at some resource on
  persistend data structures as well.

  - Priority queue
  - Binary tree
  - B-tree
  - Random number generator
  - Binary search
  - bubble, insertion, selection sort
  - quicksort

* INACTIVE "Global" memoization
  This is just an idea I had, and may or may not be wise to implement.

  Add a special function for "memoized application" that acts like the
  application function (in Haskell, ($) :: (a -> b) -> a -> b), the
  difference being that it stores the result in a global, hidden Map
  from function pointers and arguments to results. The user can then
  selectively memoize certain functions (or even just certain
  applications of the function), and not others -- the wise choice
  would be to not memoize cheap functions, but do memoize computation
  heavy functions. This is perfectly legal if the language is
  completely pure, as there can be no side-effects that are not
  repeated properly yada yada.

  An alternative could be that the user can mark a function definition
  as memoized, and then it's always memoized, not just certain
  applications. Also, there could then be a unique Map for each such
  function.
* INACTIVE Async I/O
  Zig seems to have a smart solution that doesn't require a separate
  `async` version of the standard library, unlike Rust with
  `async-std`.

  https://ziglang.org/download/0.6.0/release-notes.html#Async-IO

  Also look at how Haskell does it. It's probably smart.

* INACTIVE Boxing to allow for dynamic linking
  Boxing vs monomorphization. Boxing results in smaller binary and
  dynamically-linkable interface, but results in slower code.

  When compiling a library, especially a dynamically linked one, how
  do we allow the export of polymorphic functions? We can't really use
  monomorphization, as we can't predict which types there should be
  instantiations for. Boxing would solve this problem and result in a
  smaller binary, but the code would most likely be slower, and the
  FFI would become more complicated.

  Maybe monomorphize all package-internal code, and require boxing for
  all public-facing polymorphic functions? Could require some keyword
  or special form, like `boxed`, to make it clear when the FFI will be
  affected.
* NEXT Add separate pass before Codegen to compile SrcPos:s
  I think it could be done purely and independently from rest of codegen. Would be more clean.
* NEXT Refactor & document Codegen & Gen
  It's getting big, complex, and unwieldy. Probably buggy as
  well. There's also a distinct lack of documentation. Always takes a
  sec for me to remember what some badly named function actually does.
* INACTIVE Use GADTs in Infer
* NEXT Have a look at LLVM.IRBuilder
  Might simplify my Codegen

  https://hackage.haskell.org/package/llvm-hs-pure-9.0.0/docs/LLVM-IRBuilder-Module.html#v:function

* INACTIVE Add basic repl
  Add a basic repl based on the JIT. Something very similar to
  http://www.stephendiehl.com/llvm/.

  Could maybe be the starting point for an on-demand architechture?
  Would probably require some memoization mechanism so that we don't
  unnecessarily check, monomorphise, and compile stuff we don't need
  to.
* NEXT Un-generalize module Selections
  Since we now use JIT instead of interpreter, only Codegen uses
  Selections, and we could make it simpler by inlining it.
* NEXT Type aliases
  Like ~type String = [Char]~ in Haskell.

* INACTIVE Compute glob vars at compiletime
  I'm not 100% sure this is a good idea anymore. Need to think about
  this.
* INACTIVE Compile-time eval in arbitrary contexts
  As a step towards something like dependent types
* INACTIVE Benchmark, profile, optimize
  Check out
  https://ollef.github.io/blog/posts/speeding-up-sixty.html. Great
  tips!
* INACTIVE Streamline learning the language
  Not that getting users is a primary concern, but if someone is
  indeed curious, I don't want them to be scared off by the process of
  getting started seeming complex.

  https://news.ycombinator.com/item?id=23347357
  https://www.hillelwayne.com/post/learning-a-language/
* NEXT Unify the different ASTs / IRs
  It's just kinda messy right now. Many files must be changed when
  touching just about any part of the AST representation. Also, takes
  up a lot of lines for not much apparent gain. Use some kind of
  attribute-tag to change the AST for different stages. Like:

  #+BEGIN_SRC haskell
  type Expr attr = Expr attr (Expr' attr)

  type ParsedExpr = Expr (Type, SrcPos)
  type CheckedExpr = Expr CheckedType
  #+END_SRC
* INACTIVE Use algebraic effects instead of mtl
  Not 100% about this one -- maybe my monad use is simple enough that
  there wouldn't actually be any gain? But still, I'd like to learn
  effects, so maybe it's worth trying out.

  Polysemy seems like the best one, but I'd have to do a little
  research. https://github.com/thma/PolysemyCleanArchitecture/tree/3a9354a5c31eaf03009e389ce49b318881a2460f#readme
* INACTIVE GRIN as alternative to LLVM and some of my own Codegen
  https://github.com/grin-compiler/grin

  GRIN seems promising. I wouldn't have to perform as complex
  transformations from Carth IR to LLVM, instead transforming to this
  more functional IR. GRIN might also be able to perform more
  optimizations.
* INACTIVE Optimize away zero-sized types before codegen
  It's bad that many operations on zero-sized types are currently
  actually compiled to, in practice, a ton of no-ops. I think it might
  be a good idea to add a dedicated optimization pass after
  monomorphization but before codegen that just gets rid of all
  zero-sized types and operations on them. For example, a type like
  ~(data Foo (Foo Bar Unit Baz))~ can be changed to ~(data Foo (Foo
  Bar Baz))~ without affecting the size of the generated struct
  etc. Also, a store of a ~{}~ into a ~{}*~ is really a no-op -- just
  noise in the generated LLVM. Being able to assume no zero-sized
  types in Gen/Codegen would also be really nice, I think.

  One issue: If you get rid of all ZSTs, what happens to a function
  with return-type Unit? What does it now return? One option could be
  to have add a special LLVM-Void type that just marks that the
  function should return void later. Another, more interesting option,
  would be to simply remove all functions and function-calls where the
  only remaining return type is a ZST, since, in purely functional
  programming, such a function can't do anything anyways. This would
  work, as long as *all* functions with side-effects are marked with
  IO & the RealWorld of IO is not a ZST & unsafePerformIO is known to
  the compiler and is (one of) the only (potentially) ZST-returning
  functions not optimized away, or unsafePerformIO returns something
  like ~(data (UnsafeIOResult a) (UnsafeIOResult a SizedMarker))~ to
  ensure the result is sized.

  Maybe do the flattening thing so there is only one zero sized type,
  but don't optimize away operations returning Unit completely. It
  would still be nice to be able to expect side effects and panics to
  happen. Also, RealWorld wouldn't have to have a size and actually
  impact performance.

* INACTIVE Builtin parsing of C header files
  I think Zig has this, and in Rust you can use the external tool
  ~bindgen~ to generate Rust declarations for C headers ahead of time.

  I just think it would be nice to not need to manually translate
  header files to use external libraries like OpenGL or SDL or
  whatever.
* INACTIVE Investigate alternative linkers
  Linking is one of the bottlenecks. However much caching etc I do in
  the parser & typechecker etc, the linker still has to do everything
  from scratch each time. I read somewhere that "gold" is a new GCC
  linker? Try using that maybe, unless it's already in use?

  https://news.ycombinator.com/item?id=24615916

* INACTIVE Produce .so:s for debug builds
  Linking is slow, so for debug builds we could try to split the
  output by module into separate .so:s. Then we'd only have to rebuild
  the .so of the affected module in incremental compilation.

  https://news.ycombinator.com/item?id=24615916

* INACTIVE Build Future into IO, or have both IO and AsyncIO?

* NEXT Some algorithms & data structures
  We need good collections & algs for sorting etc. if Carth is going
  to be of any use to anyone. Would also be a good way to add to the
  set of test-programs & find the worst pain points of current Carth.

  Many of these have implementations to look at and compare to on
  [[rosettacode.org]].

  This list is sort of off the top of my head, so some might not be
  good fits in a purely functional language. Look at some resource on
  persistend data structures as well.

  - Priority queue
  - Binary tree (2-3 tree better?)
  - B-tree (specifically 2-3 tree?)
  - Random number generator
  - bubble, insertion, selection sort
  - quicksort
* NEXT Don't actually define stuff like Str in the compiler
  Just assume they're defined by the user. Would mean less stuff in
  the compiler, and more in carth source. Both positives and
  negatives. I feel it would be nice as a user to be able to inspect
  the .carth source of the stdlib and actually see all the types and
  stuff though.