~williamvds/website

b8c6c27577c504c7f8bf552d052e707cd4357bcf — williamvds a month ago 5d1ddb5
Add link for ELF, weak clarifications
1 files changed, 20 insertions(+), 10 deletions(-)

M content/blog/linking-by-example/index.md
M content/blog/linking-by-example/index.md => content/blog/linking-by-example/index.md +20 -10
@@ 166,8 166,8 @@ variables because they live in the _data_ section.

As this suggests, an object file is split up into different sections for
different types of symbols. Object formats vary by platform, GNU/Linux uses
the Executable and Linking Format (ELF) from Unix System V. (!! link to some
resource) Checking the object files with `file` shows:
the Executable and Linking Format (ELF) originally from Unix System V[^elf].
Checking the object files with `file` shows:

    $ file ./build/CMakeFiles/example.dir/main.cpp.o
    ./build/CMakeFiles/example.dir/main.cpp.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped


@@ 385,7 385,7 @@ properties rather than code generation properties.
There's only one copy of the symbols in the final executable, and `square` is
now a global _weak_ symbol, while `operand` is a global _unique symbol_.

Here's the explanations from `nm`:
Here are the explanations from `nm`:

> "u" The  symbol  is  a unique global symbol.  This is a GNU extension to the standard set of ELF symbol bindings.  For such a symbol the dynamic linker will make sure
>    that in the entire process there is just one symbol with this name and type in use.


@@ 397,15 397,23 @@ Here's the explanations from `nm`:
>    normal defined symbol is used with no error.  When a weak undefined symbol is linked and the symbol is not defined, the value of the symbol  is  determined  in  a
>    system-specific manner without error.  On some systems, uppercase indicates that a default value has been specified.

So a weak symbol is a soft dependency, and linking won't fail if it's not
present. It can also be overridden by a strong symbol with the same name.
Implicit here is that weak symbols don't collide with one another, and the
linker will pick just one definition if there are several.
So a weak symbol is a kind of soft dependency, and linking won't fail if it's
not present. Also, a symbol declared weak can be overridden by a strong symbol
with the same name. Implicit here is that weak symbols don't collide with one
another, and the linker will pick just one definition if there are several.

<!-- ^^ TODO is there a source which states this explicitly? -->
In our case, the weak symbols are capitalised, meaning a default definition has
been provided. So the end result is the same for both symbols: the linker picks
one definition of each, and there's only one copy in the final executable.

In this case the end result is the same for both symbols: there's only one copy
in the final executable.
<aside>
You may be aware that in C++ you can override the global operators
<code>new</code> and <code>delete</code>. This is useful if, for example, you
want to use your own allocator for dynamic memory allocations. On GNU/Linux,
this is supported using weak symbols: the default operator definitions are
declared as weak symbols, but your override is be a strong symbol. The linker
will pick your strong override over the weak default definitions.
</aside>

The general suggestion in C++ is that any functions defined in headers should be
marked `inline`, to avoid running afoul of the One-Definition Rule.


@@ 708,6 716,8 @@ these examples with GCC 13. Maybe things will improve in later versions.

## References

[^elf]: <https://refspecs.linuxbase.org/elf/elf.pdf>

[^odr]: <https://en.cppreference.com/w/cpp/language/definition>

[^constexpr]: <https://en.cppreference.com/w/cpp/language/constexpr>