Original post

The C language is still prominent in the industrial embedded world, where “IoT” often refers to platforms much more limited than a Raspberry Pi. Often having to deal with such environments, we wrote the following informal explainer about C for internal company needs, and thought it could be of interest for more readers. This is basic material, mixing C and operating systems knowledge, aimed at readers with no or limited understanding of how you go from C source code to an executable. We could expand on many points, but for now we just share this meandering overview.

C compilation

C is really a kind of portable-ish assembler with an abstract model of memory. When we say compilation, we specifically mean the C-to-object translation phase. This is called a “translation unit” in C-speak. Firstly, the file to be compiled is read. Then the preprocessor is executed – any preprocessor directives are expanded and removed, meaning headers are literally included directly in the file. Once this is done (and if it does not error), then and only then is the compilation phase run. The job of this phase is to turn C code into assembly and store it in a format for later use.


This is probably the most undocumented black magic all programmers rely on but don’t know about. From the above compilation phase we typically have an “object file”. So what you see is typically .c to .o, which you’re used to, all well and good. However, linking into the overall binary requires some further work and we must first take a digression into.

Program loading

Again, operating systems are hiding a lot of complexity here. Programs on Linux are now ELF files (previously a.out, hence why GCC without a -o option when compiling produces a file named this way) and on Windows the executable format has gone through several iterations: flat DOS files were COM and had no structure at all. COFF followed and has structure, as does its successor PE (Portable Executable).

What these files do is tell the program loader: where in the file the various sections of program code reside, and where the program would like sections to be loaded in memory. Modern operating systems often do not respect this request, partly because flags have been added to mark the code as “position independent” (-fPIC, or /dynamicbase) and so the code contains no addresses that need to be “fixed” by the program loader. An early performance problem in Windows programs was that DLLs had fixed base addresses and using the same one with multiple DLLs in the same process meant all subsequent loads also required rebasing, which was slow in the pre 1GHz processor days.
So to recap: PE and ELF files are used by the operating system to describe their internal contents and roughly what they are, such that code can be marked read+execute, data read only, bss read-write and so on.  These files also contain two other important pieces of information: external library dependencies and information marking what architecture they run on. This allows the operating system to deny loading a program quickly if it isn’t the right architecture (it would likely simply crash if loaded).

Linking again

So now we’ve talked about object formats, the job of the linker is to take the objects produced by compilation and put them together into the desired output. Since C supports functions implemented in other translation units (of course) and even in external libraries, part of the job of the linker is “symbol resolution”. It will try to find where these “symbols” are and match them up when producing the final binary. Specifically, it wants to know what address to encode for using call instructions, or if it should emit an entry saying “this program depends on an external library and wants function X from it, please load this before loading this program”.

There are in general two types of object produced by a linker: an ELF binary executable with an entry point (by convention this is a symbol called start or _start, but this jumps into libc and it is libc which eventually calls main after doing its own initialization) and a shared object (.so files) which basically exports a list of functions other code may use. The shared object concept comes from the days when disk space was limited. This allowed the same piece of code to be loaded by multiple programs, but only need to exist once on disk.

There is a third type of output, but the linker itself isn’t technically responsible for it. This is a “static library”. It is essentially a bit like a zip-archive (but not a zip) of object files. Linkers generally treat these files just as they treat other object files and will look in them and perform resolution as normal. This allows for the code to be entirely included in another executable or shared object without any external dependencies.

A full discussion of dynamic linking (how shared objects are eventually loaded) is incredibly complex and we won’t go into it here. What you need to know is that “soname” is used to allow multiple versions of the same object to exist on Unixes. You might have: /usr/lib64/libe4.so that symlinks to /usr/lib64/libe4.so.1, which symlinks /usr/lib64/libe4.so.1.2.7. This
allows applications to link at two levels. They may link: to /usr/lib64/libe4.so, which means “use the latest libe4” or to /usr/lib64/libe4.so.1, which means “use the latest libe4 with major version 1”.
As an aside, you might wonder to what degree you can control the output of the linker given you mostly just use it without ever thinking. Well, the linker has its own entire scripting language and ld --verbose will dump the default script it uses for your system. Here are some examples from the Xen hypervisor (which is essentially a kernel): https://github.com/xen-project/xen/blob/16bbf8e7b39b50457bb2f6547f166bd54d50e4cd/xen/arch/x86/xen.lds.S and https://github.com/xen-project/xen/blob/16bbf8e7b39b50457bb2f6547f166bd54d50e4cd/xen/arch/arm/xen.lds.S

Static vs. dynamic executables

You might have heard of static vs. dynamic linking. This is quite simple: if you run “file” on a binary and it says “statically linked” then it has no dependencies on shared objects at all. If you try “ldd” it will say “not a dynamic executable”. By contrast, a dynamic executable will say so, and ldd will print the list of libraries required at load-time (more may be loaded in both cases by dlopen). It is not technically possible to have a static binary on Windows at all. In this case it tends to mean “the libc shipped with Windows is linked in statically, rather than as a DLL”. Similarly, with glibc, static linking is very-hard-to-impractical on modern Linux systems.

Lastly, static executables are not magically super portable. They still use system calls and so require a minimal kernel version that implements these calls. They are also bound to the operating system they are compiled for, generally speaking. A static binary can typically be produced with C using musl or uclibc, and Golang does this when cgo is not invoked.


“Toolchain” is the term we use for all the tools needed to build an executable in C. We’ll give an overview of the GNU tools, as these copied earlier Unix tools and are generally everywhere.

GCC & binutils

Firstly, there is binutils. This is a set of tools for working with binaries, specifically including an assembler, linker and objcopy. Secondly, there is the GNU Compiler Collection (GCC). We normally think of GCC as a C compiler (this is called the frontend) but technically it is simply a command line interface to the backend compilers, which are invoked depending on the file type. Now, GNU toolchains have two properties: triplers and platforms. A triplet specifies information about the machine to be produced, e.g. riscv64-none-elf – this says the code should be risc-v, there’s no OS expected, and the output format is ELF.

Platforms specify where the toolchain is to be built, where it will run, and what it will produce code for. These are respectively the GNU build, host, and target options. Yes, this means you can compile GCC on x86-64, where the GCC will run on aarch64, and produce code for riscv32. “Cross compilers” are generally those that target other platforms. Usually build==host in this case (normally x86-64) but the compiler produces output for another platform. GCC+binutils is not the only compiler suite in town, but it’s by far the most common because it runs absolutely everywhere. This is one of the most successful GNU projects and was a strong enabler of the Linux ecosystem.

LLVM & Clang

LLVM and Clang are another pairing. They’re somewhat different in that they were designed to be more modular, which is what happens when you start your project in the 2000s having observed 20 years of GCC mistakes. Here’s how they fit together: LLVM is a virtual machine, but not the VMware sort. It has its own “instruction set” called intermediate representation, or IR. From this it has “backends” that translate that to the assembly of target architectures, and finally assemble them to machine code/objects. Originally it used GNU’s linker to assemble these objects, but has since grown its own. It has its own assembler too.

Clang is the C-language frontend. Specifically, it knows how to produce LLVM IR from C (and also C++ and Objective-C) and invoke the rest of the toolchain to get the linker to work and produce binaries. There are of course other LLVM frontends: Rust and Swift are the two most well known, but there’s an Ada one and an Ocaml one too.


Finally, Microsoft’s compiler is also widely used for Windows platforms. The architecture of their tools is pretty uninteresting from our point of view, but they have: cl.exe (compiler frontend), link.exe (linker), lib.exe (library tool), ml.exe (assembler) etc.

Run time symbols and ABI

Let’s briefly talk about run time symbols briefly. When we’re outputting shared or static objects for the linker, the ELF and PE formats do not support arbitrarily-named functions as they were all invented when you were either quite young or not born. Just like International DNS names are encoded to fit into 80s-era DNS, so too are function names for languages that are not C or do not follow its naming conventions. In particular, a C++ function in a namespace like this:
would be encoded in a mess of _ZN7-prefixed names.

The ABI is the application binary interface. This is what we expect programs to do when calling functions, and how they should pass arguments in registers, and so on. These days there are two standards for x86-64: AMD64 for Unix designed by AMD+Unix people, and Microsoft’s, who felt the need to be different. Since different platforms have different registers and even vastly different features, they typically have their own ABI. Sometimes, as was the case for 32-bit x86, there are multiple competing calling conventions produced by different compiler vendors.

When we talk about the ABI, we generally conflate both of the above. There are no standards in C and C++ for how processes should behave at this level, it’s simply by convention. However there are some important points to note: only the C convention is widely respected. Binary C++, Rust, Go, etc. distributions of libraries generally don’t happen. Hence #[repr(C)] and #[no_mangle] in Rust, and cgo in general.


C doesn’t need a runtime (it is of course portable assembler) but the C standard does actually specify quite a lot that assumes a full operating system, including things like fopen for opening files. This necessitates a standard library to accompany the language. Rust has std and core, Golang is batteries included – this is well understood.

There is some blurring, particularly on Unix, as to what is “libc” i.e. ISO C, what is POSIX i.e. “Unix-like” and what is just plain Linux. A particular case where the roles are thoroughly and entirely blurred together is the dynamic loading of libraries. That is outsourced to /lib/ld-linux.so.2 (on libc6 systems), which is hardcoded into the ELF binary. This library is developed under the auspices of glibc. More information here: https://linux.die.net/man/8/ld-linux.so – in short, libc also provides the dynamic loading resolution, which is sort-of a core part of Linux/Unix. You can see some of the problems this causes with: https://www.musl-libc.org/faq.html and you will also find that anything but a static binary will fail to run on Alpine Linux for the same reason (Alpine uses musl even for dynamic linking).

Like most standard libraries, some initialization is required and hence libc typically hijacks the default entry point symbol _start or start, and defines “main” as the starting point for consuming programs. Just as there are multiple competing compiler implementations, so there are multiple competing libc implementations. glibc is the typical Linux libc, from the GNU people. musl and uclibc are alternatives. Since libc is integral to programs and because there is some blurring between roles, libc also has a role to play as the program loader on Linux, especially when dynamic libraries are used. libc is typically implemented as a shared object (and as an example of soname, I vaguely remember the crossover from libc.so.5 to libc.so.6 from my early Ubuntu days). It is possible not to use libc by using the -ffreestanding option. This assumes the executable will provide everything it needs itself (including implementing its own entry point).

Microcontrollers and platforms other than the PC

If your environment has an operating system, you’re done. The rules are the same as for that operating system and the ABI it uses for that platform. How to do stuff on Linux on ARM is relatively well defined. The only exception to the standard development process is that you will probably use a cross compiler, because x86 is far more powerful than ARM. I highly do not recommend trying to compile large projects on a Raspberry Pi. The NDK for Android, for example, contains a cross compiler capable of targeting common Android targets, particularly armv7-a and aarch64.

Without an OS, you are in the same situation as a kernel engineer. You have to be careful how much of libc you rely on and you must define your entry point as the platform in question requires. You may have to set up interrupt handlers. There is no program loader: you will have to convert your ELF output to something like Intel’s HEX, or just a flat binary file that’ll be flashed to storage. You can use an RTOS kernel (previously discussed) that’ll do some of this work for you if it supports those chipsets. See https://en.wikipedia.org/wiki/Intel_HEX#Format and https://www.zephyrproject.org/
Kernel engineers generally have things slightly easier, namely that common bootloaders like grub actually can read ELF files too. UEFI firmware executes PE images (same as Windows) with a specific subtype to describe them as UEFI components. As a specific example of this, one can build the Linux kernel as an EFI binary. How much work needs to be done really really depends.

Depending on the system in use, the toolchain may integrate and try to “hide” some of the complexity of initializing hardware by hand. For example, Atmel Studio is a Visual Studio Shell-based project (uses VS but implements its own project logic) that has various wizards and emits appropriate code for each board. More commonly, a “board support package” is provided (https://en.wikipedia.org/wiki/Board_support_package) that may look like anything but commonly contains a makefile-based configuration system. For embedded Linux, Yocto is now quite common (based on buildroot). This is a Linux kernel menuconfig style system where you pick your configuration and then type make and wait.

Some concepts in amongst this are important. First is the “hardware abstraction layer”. In Windows, this tries to hide the details of the CPU setup in use, in particular between uniprocessor (old), symmetric multithreading (one processor with multiple cores) and symmetric multiprocessing (multiple independent processors, possibly with multiple threads each). More generally, it means trying to make higher level code as agnostic as possible with regards to the hardware. Second, the concept of scheduling simply does not exist at the very low level. Your kernel is setting an interrupt timer to take back control periodically from various processes, deciding what to schedule next and then doing that, even with kernel threads (this is called a fully pre-emptive system). The opposite extreme is cooperative scheduling, where the process must indicate that it wants to give up control at certain points. If it does not, e.g. if it gets stuck in a spin-loop, that’s game over. Most typical microcontrollers are uni-processor (single threaded).

Debugging and flashing

This last section will be brief, although there’s a lot to say, which might appear in a subsequent post. Essentially the common debug protocol is U(S)ART or the ARM-extension SWD. Modern PC bioses are typically flashed with SPI. Most “development kits” for microcontrollers come with an “On Circuit Debugger”, which means the USB port provides three things: power to the device, sometimes a serial port, and usually a UART-over-USB port. This allows for easy debugging. You can buy dedicated debugging kit, however.