~kaction/log

d5f25bc4301bc52e767bcf6cf42c75ae6d543fb2 — Dmitry Bogatov 2 years ago 97fc9e3
draft-1
1 files changed, 80 insertions(+), 0 deletions(-)

A src/2022-12-06.1.gmi
A src/2022-12-06.1.gmi => src/2022-12-06.1.gmi +80 -0
@@ 0,0 1,80 @@
# Dynamic linking and dynamic loading

Dynamic linking is the mechanism of locating and loading shared
libraries, list which is known at compile time where every library
need to be found and loaded for application to start executing. I
already argued that dynamic linking trades severe drawbacks, such as
including inducing sub-optimal library interfaces, for minor gains.
=> ./2021-11-04.1.gmi

Dynamic loading is a different story. It is mechanism for loading
shared libraries at runtime, but which libraries is not known
beforehand.  With dynamic loading, failure to load shared library is
ordinary error reported by ordinary library function (usually
"dlopen") and handled by application, no magic involved. Even in its
most basic form, dynamic loading bears a lot of extra complexity
compared to static linking, yet sometimes it is the least of evils.

## Purpose and alternative of dynamic loading

Say we have high-level interpreted programming language, and we want
to write some functionality in C, be it for performance reasons or to
expose functionality of some C library.

One way to achieve it is to compile necessary C libraries into
interpreter itself. Since there are so many C libraries around, and
everybody need their own subset of them, we face dilemma -- either
compile every C library in existence into the interpreter (not
realistic) or make build system flexible about what libraries are
compiled in. The later is definitely possible -- Linux kernel have
hundreds of configuration options, but it causes exponential explosion
of way software project can be built. Not good.

Alternative solution is dynamic loading. Instead of rebuilding
interpreter, additional shared libraries can downloaded and loaded
into the memory of the interpreter. Debugging stays worst-case O(2^n)
problem, since each shared library loaded can potentially clash with
any other, but at least we don't O(2^n) versions of interpreter
floating around. Win? Probably...

## Challenges of dynamic loading

Dynamic linking and dynamic loading have different purposes, but they
use same underlying mechanism and same format file format -- ELF, so
all complicated stuff from ld.so(8) applies equally to dynamic linking
and dynamic loading. Well, that is about supporting full flexibility
of ELF format, maybe if we ask for less it would be simpler? Let's
consider couple scenarios of ascending complexity.

### Function that only uses its arguments

```c
int function(struct context *ctx, uint64_t x)
{
	uint64_t count = 0;
	while (x != 1) {
	      ctx->callback(ctx, x, count);
	      x = (x % 2) ? (3 * x + 1) : (x / 2);
	}
	return count;
}
```

If we have shared library with function that only uses its arguments,
dynamically loading and executing that function is quite easy. One
just need to map .text section of the library in R+X mode and call
function at offset that can be found in .symtab section. If there is
only one function, one can even assume that offset is zero.

### Function that uses data and constant data

Since dynamically loaded function can't know at compilation time at
what address it will be mapped, all references must be compiled as
relative to current instruction address. This is called position
independent code. Position-independent code is bigger and slower than
regular one, but there is no way around it. Also, to use constant
data, like string literals, and global variables, one would need to
map .rodata and .data sections correspondingly. Not that hard.

### Data relocations