@@ 0,0 1,148 @@
+# Dynamic linking and dynamic loading
+
+Dynamic linking is the mechanism of locating and loading shared
+libraries, list which is known at compile time where every library
+need to be found and loaded for application to start executing. I
+already argued that dynamic linking trades severe drawbacks, such as
+including inducing sub-optimal library interfaces, for minor gains.
+=> ./2021-11-04.1.gmi
+
+Dynamic loading is a different story. It is mechanism for loading
+shared libraries at runtime, but which libraries is not known
+beforehand. With dynamic loading, failure to load shared library is
+ordinary error reported by ordinary library function (usually
+"dlopen") and handled by application, no magic involved. Even in its
+most basic form, dynamic loading bears a lot of extra complexity
+compared to static linking, yet sometimes it is the least of evils.
+
+## Purpose and alternative of dynamic loading
+
+Say we have high-level interpreted programming language, and we want
+to write some functionality in C, be it for performance reasons or to
+expose functionality of some C library.
+
+One way to achieve it is to compile necessary C libraries into
+interpreter itself. Since there are so many C libraries around, and
+everybody need their own subset of them, we face dilemma -- either
+compile every C library in existence into the interpreter (not
+realistic) or make build system flexible about what libraries are
+compiled in. The later is definitely possible -- Linux kernel have
+hundreds of configuration options, but it causes exponential explosion
+of way software project can be built. Not good.
+
+Alternative solution is dynamic loading. Instead of rebuilding
+interpreter, additional shared libraries can downloaded and loaded
+into the memory of the interpreter. Debugging stays worst-case O(2^n)
+problem, since each shared library loaded can potentially clash with
+any other, but at least we don't O(2^n) versions of interpreter
+floating around. Win? Probably...
+
+## Challenges of dynamic loading
+
+Dynamic linking and dynamic loading have different purposes, but they
+use same underlying mechanism and same format file format -- ELF, so
+all complicated stuff from ld.so(8) applies equally to dynamic linking
+and dynamic loading. Well, that is about supporting full flexibility
+of ELF format, maybe if we ask for less it would be simpler? Let's
+consider couple scenarios of ascending complexity.
+
+### Function that only uses its arguments
+
+```c
+int function(struct context *ctx, uint64_t x)
+{
+ uint64_t count = 0;
+ while (x != 1) {
+ ctx->callback(ctx, x, count);
+ x = (x % 2) ? (3 * x + 1) : (x / 2);
+ }
+ return count;
+}
+```
+
+If we have shared library with function that only uses its arguments,
+dynamically loading and executing that function is quite easy. One
+just need to map .text section of the library in R+X mode and call
+function at offset that can be found in .symtab section. If there is
+only one function, one can even assume that offset is zero.
+
+### Function that uses data and constant data
+
+Since dynamically loaded function can't know at compilation time at
+what address it will be mapped, all references must be compiled as
+relative to current instruction address. This is called position
+independent code. Position-independent code is bigger and slower than
+regular one, but there is no way around it. Also, to use constant
+data, like string literals, and global variables, one would need to
+map .rodata and .data sections correspondingly. Not that hard.
+
+### Data relocations
+
+Things get more complicated if we have pointers inside data, like in
+following:
+
+```c
+const char *strings[] = {"foo", "bar", NULL};
+```
+
+We can find array relative to current intruction, sure, but what is
+inside this array? Right, addresses that we don't know at compile
+time. So dynamic loader have to populate this array with addresses of
+strings in .rodata at run-time. This is called data relocation. Can we
+avoid it?
+
+Yes, we can. We can store offsets instead of pointers:
+
+```
+const char *strtab = "\0foo\0bar\0";
+const long strings[] = [1, 5, -1];
+```
+
+And whenever we need string, we use "strtab + strings[i]" instead. A
+bit more verbose and requires some build automation to generate array
+of offsets, but now everything is back into .rodata and no relocations
+are necessary.
+
+### Procedure linkage table
+
+Now suppose we are trying to dynamically load shared object that have
+some third-party library, like libgdbm or libexpat, compiled in. And
+let's even assume that this third-party does not need data
+relocations. But we can be almost sure that it will call some
+functions from standard C library, like read(2) or malloc(3), and does
+it directly, since it is the most natural and efficient way for static
+linking.
+
+In ideal world all libraries would be minimalistic and only work with
+data provided by input buffers, but in our worlds that is not the
+case. Actually, I am not sure if it is even possible to design
+equivalent of "libcurl" in minimalistic way.
+=> https://nullprogram.com/blog/2018/06/10
+
+Problem is that we don't know addresses of functions in standard
+library either, so loaded shared object includes array of addresses
+for standard library functions together with directives about which
+functions and in which order should be populated by dynamic loader.
+And answer "which" is answered by function name. And to make it
+possible, main executable must maintain mapping from every function
+name in standard library to its address. And we might include every
+function in other libraries linked into main executable too, since
+price is already paid. This is quite involved process, and now length
+of function name actually affects size of the executable.
+
+The only alternative to this process is to compile standard library
+into shared object itself, but that would mean that executable will
+have multiple copies of standard library functions in memory at same
+time. Plus, now we need somehow to ensure that main application and
+shared object have compatible implementations of malloc(3) compiled
+in. Sounds like even bigger mess.
+
+## Conclusion
+
+When I started writing this post, I wanted to make a point that with a
+bit of coding discipline we can drastically reduce complexity of
+dynamic loading. While doing research I realized that it is not true.
+
+While interface of dlopen(3) could probably made more minimalistic,
+dynamically loading code that uses functions from standard library is
+fundamentally hard.