a80f991c1012ca0b694ca354652a89d1ca59cc9e — Dmitry Bogatov 11 months ago 97fc9e3 + e884047
Merge branch 'wip/dynamic-loading'

* wip/dynamic-loading:
1 files changed, 148 insertions(+), 0 deletions(-)

A src/2022-12-06.1.gmi
A src/2022-12-06.1.gmi => src/2022-12-06.1.gmi +148 -0
@@ 0,0 1,148 @@
# Dynamic linking and dynamic loading

Dynamic linking is the mechanism of locating and loading shared
libraries, list which is known at compile time where every library
need to be found and loaded for application to start executing. I
already argued that dynamic linking trades severe drawbacks, such as
including inducing sub-optimal library interfaces, for minor gains.
=> ./2021-11-04.1.gmi

Dynamic loading is a different story. It is mechanism for loading
shared libraries at runtime, but which libraries is not known
beforehand.  With dynamic loading, failure to load shared library is
ordinary error reported by ordinary library function (usually
"dlopen") and handled by application, no magic involved. Even in its
most basic form, dynamic loading bears a lot of extra complexity
compared to static linking, yet sometimes it is the least of evils.

## Purpose and alternative of dynamic loading

Say we have high-level interpreted programming language, and we want
to write some functionality in C, be it for performance reasons or to
expose functionality of some C library.

One way to achieve it is to compile necessary C libraries into
interpreter itself. Since there are so many C libraries around, and
everybody need their own subset of them, we face dilemma -- either
compile every C library in existence into the interpreter (not
realistic) or make build system flexible about what libraries are
compiled in. The later is definitely possible -- Linux kernel have
hundreds of configuration options, but it causes exponential explosion
of way software project can be built. Not good.

Alternative solution is dynamic loading. Instead of rebuilding
interpreter, additional shared libraries can downloaded and loaded
into the memory of the interpreter. Debugging stays worst-case O(2^n)
problem, since each shared library loaded can potentially clash with
any other, but at least we don't O(2^n) versions of interpreter
floating around. Win? Probably...

## Challenges of dynamic loading

Dynamic linking and dynamic loading have different purposes, but they
use same underlying mechanism and same format file format -- ELF, so
all complicated stuff from ld.so(8) applies equally to dynamic linking
and dynamic loading. Well, that is about supporting full flexibility
of ELF format, maybe if we ask for less it would be simpler? Let's
consider couple scenarios of ascending complexity.

### Function that only uses its arguments

int function(struct context *ctx, uint64_t x)
	uint64_t count = 0;
	while (x != 1) {
	      ctx->callback(ctx, x, count);
	      x = (x % 2) ? (3 * x + 1) : (x / 2);
	return count;

If we have shared library with function that only uses its arguments,
dynamically loading and executing that function is quite easy. One
just need to map .text section of the library in R+X mode and call
function at offset that can be found in .symtab section. If there is
only one function, one can even assume that offset is zero.

### Function that uses data and constant data

Since dynamically loaded function can't know at compilation time at
what address it will be mapped, all references must be compiled as
relative to current instruction address. This is called position
independent code. Position-independent code is bigger and slower than
regular one, but there is no way around it. Also, to use constant
data, like string literals, and global variables, one would need to
map .rodata and .data sections correspondingly. Not that hard.

### Data relocations

Things get more complicated if we have pointers inside data, like in

const char *strings[] = {"foo", "bar", NULL};

We can find array relative to current intruction, sure, but what is
inside this array? Right, addresses that we don't know at compile
time. So dynamic loader have to populate this array with addresses of
strings in .rodata at run-time. This is called data relocation. Can we
avoid it?

Yes, we can. We can store offsets instead of pointers:

const char *strtab = "\0foo\0bar\0";
const long strings[] = [1, 5, -1];

And whenever we need string, we use "strtab + strings[i]" instead. A
bit more verbose and requires some build automation to generate array
of offsets, but now everything is back into .rodata and no relocations
are necessary.

### Procedure linkage table

Now suppose we are trying to dynamically load shared object that have
some third-party library, like libgdbm or libexpat, compiled in. And
let's even assume that this third-party does not need data
relocations. But we can be almost sure that it will call some
functions from standard C library, like read(2) or malloc(3), and does
it directly, since it is the most natural and efficient way for static

In ideal world all libraries would be minimalistic and only work with
data provided by input buffers, but in our worlds that is not the
case. Actually, I am not sure if it is even possible to design
equivalent of "libcurl" in minimalistic way.
=> https://nullprogram.com/blog/2018/06/10

Problem is that we don't know addresses of functions in standard
library either, so loaded shared object includes array of addresses
for standard library functions together with directives about which
functions and in which order should be populated by dynamic loader.
And answer "which" is answered by function name. And to make it
possible, main executable must maintain mapping from every function
name in standard library to its address. And we might include every
function in other libraries linked into main executable too, since
price is already paid. This is quite involved process, and now length
of function name actually affects size of the executable.

The only alternative to this process is to compile standard library
into shared object itself, but that would mean that executable will
have multiple copies of standard library functions in memory at same
time. Plus, now we need somehow to ensure that main application and
shared object have compatible implementations of malloc(3) compiled
in. Sounds like even bigger mess.

## Conclusion

When I started writing this post, I wanted to make a point that with a
bit of coding discipline we can drastically reduce complexity of
dynamic loading. While doing research I realized that it is not true.

While interface of dlopen(3) could probably made more minimalistic,
dynamically loading code that uses functions from standard library is
fundamentally hard.