Add duktape external test
Add slimcc external test
Update Torture test script
This repository contains an implementation of C17 language compiler from
scratch. No existing open source compiler infrastructure is being reused. The
main priority is self-sufficiency of the project, compatibility with platform
ABI and compliance with C17 language standard. Any omissions or
incompatibilities between the language standard and Kefir behavior which are not
explicitly documented (see Implementation & Usage quirks
section below) shall
be considered bugs.
Kefir supports modern x86-64 Linux, FreeBSD, OpenBSD and NetBSD environments
(see Supported environments
section below). Compiler is also able to produce
JSON streams containing program representation on various stages of compilation
(tokens, AST, IR), as well as printing source code in preprocessed form. The
compiler targets GNU As (Intel syntax with/without prefixes and ATT syntax are
supported) and Yasm assemblers. Kefir is able to produce debug information in
DWARF5 format for GNU As. Position-independent code generation is supported.
Kefir features cc
-compatible command line interface.
Kefir website also provides some additional information.
Note to the users of Kefir: if you encounter any behavior that does not comply with C language standard or significantly diverges from other compilers, please do no hestitate to reach me out via email directly or through the mailing list. Code snippets for easier reproduction are especially welcome.
Kefir compiler is named after fermented milk drink, no other connotations are meant or intended.
Kefir targets x86-64 ISA and System-V ABI. Supported systems include modern
Linux, FreeBSD, OpenBSD and NetBSD operating systems (full test suite is
executed regularly in these environments). A platform is considered supported if
full automated test suite (see Test suite
section below) sucessfully executes
there -- no other guarantees and claims are made. On Linux, glibc
and musl
standard libraries are supported; musl
might be preferable because it's header
files are more compilant with standard C language without extensions, however as
of now Kefir supports enough GCC extensions to reasonably use include files from
glibc
, on BSDs system libc
can be used (additional macro definitions, such
as __GNUC__
, __GNUC_MINOR__
, could be necessary depending on used system
libc features). Kefir supports selection of target platform via --target
command line option.
For each respective target, compiler expects a set of environment variables
(e.g. KEFIR_GNU_INCLUDE
, KEFIR_GNU_LIB
, KEFIR_GNU_DYNAMIC_LINKER
) to be
present in order to correctly configure system include and library paths -- the
default values for these variables are detected upon Kefir build, however in the
event of any changes to system toolchain (e.g. after upgrades) configuration
needs to be re-generated.
In addition, see Implementation & Usage quirks
section below for some other
specifics of Kefir.
The main motivation of the project is deeper understanding of C programming language, as well as practical experience in the broader scope of compiler implementation aspects. Based on this, following goals were set for the project:
libc
implementations.Following things are NON-goals:
Note on the language standard support: initially the compiler development was focused on C11 language standard support. The migration to C17 happened when the original plan was mostly finished. During the migration applicable DRs from those included in C17 were inspected and code was updated accordingly. The compiler itself is still written in compliance with C11 language standard.
See CHANGELOG
.
The initial implementation has been finished: at the momement the compiler features all necessary components, and with some known minor idiosyncrasies supports C17 language standard. Kefir is able to re-use standard library provided by target systems. Further effort is concentrated on improving and extending the compiler, including:
Some implementation details that user needs to take into account:
libkefirrt.a
. The library is linked automatically if environment is
configured correctly. Kefir can also link built-in versions of runtime,
however make sure that correct --target
is specified during link phase.
Kefir might provide own versions of some header files as well -- if
environment is configured correctly, they are also added to include path
automatically.libatomic
for
GCC, libcompiler_rt
for Clang), thus any program that employs atomic
operations need to link a libatomic
-compatible library. It happens by
default for Glibc and *BSD targets. Furthermore, if <stdatomic.h>
header
from Clang includes is used (the default on FreeBSD and OpenBSD),
-D__GNUC__=4 -D__GNUC_MINOR__=20
command line arguments shall be added to
Kefir invocation.STDC
pragmas are implemented in preprocessor. Kefir does not perform
respective optimizations and implements conservative behavior, thus these
pragmas would be no-op.Kefir can be used along with musl libc standard
library. Kefir also supports glibc
, as well as libc
implementations provided
with FreeBSD, OpenBSD and NetBSD, however header files from these libraries tend
to include non-standard compiler features, and thus support might vary for
different library versions. In practice, Kefir implements enough compiler
extensions to make use of all target system standard libraries. However,
additional macro definitions (such as __GNUC__
and __GNUC_MINOR__
on BSD
systems) might be needed for successful compilation.
Several C language extensions are implemented for better compatibility with GCC. All of them are enabled by default in the driver, but disabled if the compiler is invoked directly (consult the manual for details). No specific compability guarantees are provided. Among the implemented extensions (non-exaustive list):
int function_name()
is automatically defined. The feature was
part of previous C standards, however it's absent from C11 onwards.int
as implicit function return type -- function definition may omit return
type, int
will be used instead.fieldname:
form -- old, deprecated form which is
still supported by GCC.&&
operator, gotos support
arbitratry addresses in goto *
form.__typeof__
, __typeof_unqual__
, __auto_type
type specifiers.__atomic*
and __sync*
builtins.Kefir also defines a few non-standard macros by default, such as macros
indicating data model (__LP64__
), endianess (__BYTE_ORDER__
and
__ORDER_LITTLE_ENDIAN__
), as well as __KEFIRCC__
which can be used to
identify the compiler.
Kefir has support of asm
directive, both in file and function scope.
Implementation supports output and input parameters, parameter constraints
(immediate, register, memory), clobbers and jump labels, however there is no
compatibility with respective GCC/Clang functionality (implemented bits behave
similarly, though, thus basic use cases shall be compatible). Additionally,
asm
-labels are supported for non-static non-local variables.
Kefir supports __attribute__(...)
syntax on parser level, however attributes
are ignored in most cases except aligned
/__aligned__
and __gnu_inline__
attributes. Presence of attribute in source code can be turned into a syntax
error by CLI option.
Disclaimer: Use at your own risk. This is experimental project which is not meant for production purposes. No guarantees are being made for correctness, completeness, stability and fitness for any particular purpose.
Kefir depends on a C11 compiler (tested with gcc
and clang
), GNU As
assembler, GNU Makefile as well as basic UNIX utilities for build. Development
and test dependencies include valgrind
(for test execution) as well. After
installing all dependencies, kefir can be built with a single command: make all EXTRA_CFLAGS="-march=native" -j$(nproc)
. By default, kefir builds a shared
library and links executables to it. Static linkage can be used by specifying
USE_SHARED=no
in make command line arguments. Sample PKGBUILD
is provided in
dist/kefir
directory.
It is also advised to run basic test suite:
LC_ALL=C.UTF-8 make test all # Linux
gmake test all CC=clang # FreeBSD
gmake test all CC=clang AS=gas # OpenBSD
gmake test all CC=gcc AS=gas # NetBSD
Optionally, Kefir can be installed via: make install DESTDIR=...
. Short
reference on compiler options can be obtained by running kefir --help
, as well
as in the manual which is supplied in the compiler distribution.
At the moment, Kefir is automatically tested in Ubuntu 22.04, FreeBSD 13.2 and OpenBSD 7.3 and NetBSD 9.3 environments. Arch Linux is used as a primary development environment.
Kefir provides scripts to build portable, standalone Kefir distribution package that incorporates statically-linked Kefir C compiler, musl libc, assembler and linker from GNU Binutils. The package targets modern x86_64-based Linux systems and provides a minimalistic C17 development toolchain independent of host system tooling.
Portable package can be obtained via:
make -f dist/portable/Makefile all
# Build artifact is located in bin/portable/kefir-portable-0.3.1.tar.gz
In addition, portable package can be fully bootstraped in 3-stage process:
make -f dist/portable/Makefile BOOTSTRAP=yes all
Kefir supports compilation with Emscripten into a
WebAssembly library, which can be invoked from client-side JavaScript in
Web-applications. Kefir functionality in that mode in limited due to absence of
normal POSIX environment, linker and assembler utilities: only text output
(assembly code, preprocessed code, tokens, ASTs, IR) can be produced from a
single input file. Furthermore, all include files need to be explicitly supplied
from JavaScript side in order to be available during compilation. Note that this
does not imply support for WebAssembly as a compilation target: it only serves
as a host environment. To build kefir.js
and kefir.wasm
in bin/web
directory, use:
make web -j$(nproc) # Requires Emscripten installed
A simple playground Web application is also available. It bundles Kefir web build with Musl include files and provides a simple Godbolt-like interface. Build as follows:
make webapp -j$(nproc)
The Web application is static and requires no server-side logic. An example of simple server command-line:
python -m http.server 8000 -d bin/webapp
A hosted version of the Web application is available at Kefir playground (please note that the Web page uses JavaScript and WebAssembly).
Kefir is capable of bootstraping itself (that is, compiling it's own source code). It can be performed as follows:
make bootstrap -j$(nproc)
Alternatively, bootstrap can be performed manually:
# Stage 0: Build & Test initial Kefir version with system compiler.
# Produces dynamically-linked binary in bin/kefir and
# shared library bin/libs/libkefir.so
make test all -j$(nproc)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/bin/libs
# Stage 1: Use previously built Kefir to compile itself.
# Replace $MUSL with actual path to musl installation.
# Produces statically-linked binary bin/bootstrap1/kefir
make -f bootstrap.mk bootstrap SOURCE=$(pwd)/source HEADERS=$(pwd)/headers BOOTSTRAP=$(pwd)/bootstrap/stage1 KEFIRCC=./bin/kefir \
LIBC_HEADERS="$INCLUDE" LIBC_LIBS="$LIB" -j$(nproc)
rm -rf bin # Remove kefir version produced by stage 0
# Stage 2: Use bootstrapped Kefir version to compile itself once again.
# Replace $MUSL with actual path to musl installation.
# Produces statically-linked binary bin/bootstrap2/kefir
make -f bootstrap.mk bootstrap SOURCE=$(pwd)/source HEADERS=$(pwd)/headers BOOTSTRAP=$(pwd)/bootstrap/stage2 KEFIRCC=./bootstrap/stage1/kefir \
LIBC_HEADERS="$INCLUDE" LIBC_LIBS="$LIB" -j$(nproc)
# Stage 3: Diff assembly files generated by Stage 1 and Stage 2.
# They shall be identical
./scripts/bootstrap_compare.sh bootstrap/stage1 bootstrap/stage2
Furthermore, kefir
can also be bootstrapped using normal build process:
make all CC=$PATH_TO_KEFIR -j$(nproc)
Kefir relies on following tests, most of which are executed as part of CI:
*.c
files which are
compiled either using system compiler or kefir depending on file
extension. Everything is then linked together and executed. The test suite
is executed on Linux with gcc and clang compilers, on FreeBSD with clang and
on OpenBSD with clang. In Linux and FreeBSD environments valgrind
is used
to control test suite correctness at runtime.compile
& execute
parts of GCC torture test suite are
executed with kefir compiler, with some permissive options enabled. At the
moment, out of 3445 tests, 537 fail and 29 are skipped due to being irrelevant
(e.g. SIMD or profiling test cases; there is no exhaustive skip list yet). All
failures happen on compilation stage, no abortions occur at runtime. The work
with torture test suite will be continued in order to reduce the number of
failures. The torture tests are included into CI pipeline with some basic test
result checks.
Furthermore, Kefir also provides an external test suite comprised of open source software that is known to work with Kefir. This suite are not included in the CI, however, it is regularly executed manually. Currently, the external test suite includes: bash 5.2.21, binutils 2.42 (only as and ld), curl 8.9.1, git 2.44.0, libsir 2.2.4, musl 1.2.5, nano 7.2, oksh 7.5, sqlite 3.45.3, tcc 0.9.27, tcl 8.6.14, tin 2.6.3, yasm 1.3.0, zlib 1.3.1. The external test suite is used to verify Kefir compatbility with real world software.
Own test suite is deterministic (that is, tests do not fail spuriously), however there might arise problems when executed in unusual environments (e.g. with non-Unicode locale). For instance, some tests contain unicode characters and require the environment to have appropriate locale set. Also, issues with local standard library version might cause test failures.
Currently, extension of the test suite is a major goal. It helps significantly in eliminating bugs, bringing kefir closer to C language standard support, improving compiler UX in general.
In order to simplify translation and facilitate portability, intermediate
representation (IR) layer was introduced. It defines architecture-agnostic
64-bit stack machine bytecode, providing generic calling convention and
abstracting out type layout information. Compiler is structured into separate
modules with respect to IR: code generation, AST analysis and translation. The
IR code is then converted into optimizer SSA-like representation. IR layer
provides several interfaces for AST analyzer to retrieve necessary target type
layout information (for instance, for constant expression analysis). AST
analysis and translation are separate stages to improve code structure and
reusability. Parser uses recursive descent approach with unlitmited
back-tracking. Lexer was implemented before preprocessor and can be used
independently of it (preprocessing stage can be completely omitted), thus both
lexer and preprocessor modules share the same lexing facilities. Driver links
kefir as a library and uses fork
syscalls in order to isolate each file
processing.
The primary code repository is hosted at Sourcehut, with secondary mirrors at Codeberg and author's personal website.
Author: Jevgenijs Protopopovs
The code base also includes patches from:
License: