Bare Hello World in C
Created Sunday 28 July 2024
The basic "Hello world" program in C that everyone knows is like this:
#include <stdio.h>
int main(void)
{
printf("Hello, world!\n");
return 0;
}
$ gcc -Oz main.c && strip a.out && ./a.out
Hello, world!
$ ldd a.out
linux-vdso.so.1 (0x0000747611e1f000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000747611bfc000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x0000747611e21000)
This results in a 15K binary that depends on libc.
I am Zig programmer as well. When we compile a hello world with Zig, we get a 4.8K binary:
const std = @import("std");
pub fn main() !void {
try std.io.getStdOut().writer().writeAll("Hello, world!\n");
}
$ zig build run -Doptimize=ReleaseSmall
Hello, world!
$ ldd ./zig-out/bin/zigtest0
not a dynamic executable
And, not to mention, it isn't even linked to libc! What gives?
Well, there are a couple of drawbacks in Zig. By default, Zig does not enable PIE by default, while that seems to be the case for the C program. So, we'll enable that in the build.zig. Next, checksec says theres no stack canaries. It also doesn't say there are stack canaries for the C program, for that matter, however I suspect that is because our C program only actually has the main function that calls off to printf, while Zig must have many more functions of its own. But, at the moment, in order to enable stack canaries in Zig, you must both link libc and compile with a safe optimization mode like ReleaseSafe.
Enabling all of that for the Zig compiler brings us up all the way to... 78K ...what? Compiling with the same settings except without libc brings that down to 17K, and 6K on ReleaseSmall. It seems that linking libc is adding a boatload of extra stuff.
So, I tried static linking libc in the hopes that it would be able to optimize out a lot of the libc code while also having stack protectors.
glibc didn't like static linking, according to the Zig compiler:
$ zig build run -Dtarget="x86_64-linux-gnu" -Doptimize=ReleaseSafe
run
└─ run zigtest0
└─ zig build-exe zigtest0 ReleaseSafe x86_64-linux-gnu failure
error: error: libc of the specified target requires dynamic linking
So then I tried with musl, and that compiled. But checksec still reports no stack canaries!
That's fine, I guess. We don't need to put down stack canaries. I guess. And so, if that is the case, then we'll just go for the ReleaseSmall PIE no-libc version. Which is 6K, and still a lot less than the C program. As a side note, gcc does seem to be able to statically include glibc. But, whatever.
Zig's standard library is made to be libc-optional. It works with it, and it works without it. Zig takes advantage of that by, instead of using libc as is standard for many other compilers in the programming world, using their standard library as a replacement for libc, handling all of the necessary startup code that you might normally rely on libc for.
But C is supposed to be the small and low level language language! Where you have control over everything!? Why is Zig, the fancy new modern kid on the block doing so much better out of the gate?
Well, that's because libc tends to be the default, and defaults aren't always the best or good.
Alright, so Zig does it better at first. But we're C, the low level programming language that can control everything. Can't we do the same, if not better?
In the Zig standard library, they directly use syscalls. And, in fact, looking at the Zig program's objdump -D, all of the syscalls used in the program appear to have ended up inlined directly.
So, let's try to do the same:
#include <unistd.h>
int main(void)
{
write(STDOUT_FILENO, "Hello, world!\n", 14);
return 0;
}
$ gcc -Oz main.c && strip a.out && ./a.out
Hello, world!
...and it still compiled down to 15K. We are still technically using libc, as we're actually using libc's syscall wrappers to do everything, and so libc is still doing everything else, including the startup code.
So, we simply must write our own startup code. And our own syscall wrapper.
typedef unsigned long ulong;
__asm__ (
/* For the first three arguments in the C calling convention, they actually
* line up with the first three arguments for syscalls, therefore we don't
* need to move anything around, other than the syscall ID. */
".globl _sysc1\n"
".globl _sysc3\n"
"_sysc3:\n"
"mov %rcx, %rax\n"
"syscall\n"
"ret\n"
"_sysc1:\n"
"mov %rsi, %rax\n"
"syscall\n"
"ret\n"
);
ulong _sysc3(ulong a, ulong b, ulong c, ulong id);
ulong _sysc1(ulong a, ulong id);
/* Referenced https://github.com/ziglang/zig/blob/c15755092821c5c27727ebf416689084eab5b73e/lib/std/os/linux/syscalls.zig#L453 */
#define SYS_WRITE 1
#define SYS_EXIT 60
#define STDOUT_FILENO 1
void _start(void)
{
_sysc3(STDOUT_FILENO, (ulong)"Hello, world!\n", 14, SYS_WRITE);
_sysc1(0, SYS_EXIT);
}
Now, that should be pretty small, right? All this is, is a "Hello, world!\n" string stored somewhere, two function calls, and two syscall definitions.
gcc -Oz -fno-builtin -nostdlib -ffreestanding -Wl,--no-dynamic-linker -no-pie -fno-stack-protector main.c && strip a.out && ./a.out
Hello, world!
And this comes out to... 8.8K. Still larger than Zig, but it is doing a lot better. In fact, the only executable code is this:
0000000000401000 <.text>:
401000: 48 89 c8 mov %rcx,%rax
401003: 0f 05 syscall
401005: c3 ret
401006: 48 89 f0 mov %rsi,%rax
401009: 0f 05 syscall
40100b: c3 ret
40100c: 50 push %rax
40100d: 48 8d 35 ec 0f 00 00 lea 0xfec(%rip),%rsi # 0x402000
401014: 6a 01 push $0x1
401016: 59 pop %rcx
401017: 6a 0e push $0xe
401019: 5a pop %rdx
40101a: 6a 01 push $0x1
40101c: 5f pop %rdi
40101d: e8 de ff ff ff call 0x401000
401022: 6a 3c push $0x3c
401024: 31 ff xor %edi,%edi
401026: 5e pop %rsi
401027: 5a pop %rdx
401028: e9 d9 ff ff ff jmp 0x401006
That itself is only 45 bytes. And the .rodata section takes up only another 14. So then why is the entire thing 8984 bytes?
Looking at readelf -a for both of the programs, it is strange, because Zig has even more program headers and sections. Then, I noticed something: gcc seems to be aligning the offsets of each of the program headers to its correpsonding alignment. However, Zig is not doing that.
C program:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000254 0x0000000000000254 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x000000000000002d 0x000000000000002d R E 0x1000
LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000
0x0000000000000058 0x0000000000000058 R 0x1000
NOTE 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x0000000000000030 0x0000000000000030 R 0x8
NOTE 0x0000000000000230 0x0000000000400230 0x0000000000400230
0x0000000000000024 0x0000000000000024 R 0x4
GNU_PROPERTY 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x0000000000000030 0x0000000000000030 R 0x8
GNU_EH_FRAME 0x0000000000002010 0x0000000000402010 0x0000000000402010
0x0000000000000014 0x0000000000000014 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
Zig program:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R 0x8
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000968 0x0000000000000968 R 0x1000
LOAD 0x0000000000000968 0x0000000000001968 0x0000000000001968
0x00000000000006d1 0x00000000000006d1 R E 0x1000
LOAD 0x0000000000001040 0x0000000000003040 0x0000000000003040
0x00000000000002a0 0x0000000000000fc0 RW 0x1000
LOAD 0x00000000000012e0 0x00000000000042e0 0x00000000000042e0
0x0000000000000004 0x0000000000000041 RW 0x1000
DYNAMIC 0x0000000000001200 0x0000000000003200 0x0000000000003200
0x00000000000000e0 0x00000000000000e0 RW 0x8
GNU_RELRO 0x0000000000001040 0x0000000000003040 0x0000000000003040
0x00000000000002a0 0x0000000000000fc0 R 0x1
GNU_EH_FRAME 0x0000000000000708 0x0000000000000708 0x0000000000000708
0x0000000000000064 0x0000000000000064 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000001000000 RW 0x0
Zig's highest offset only goes up to 0x12e0, or 4832 bytes. The C program's highest offset is 0x2010, or 8208 bytes. Those numbers seem proportional to the actual binary sizes.
The reason for the C program's size seems to be that they are forcing the offsets to be aligned, while Zig takes advantage of not needing to, it seems. So, is there a way to un-align our C program's program header offsets?
I thought maybe it had something to do with Zig using LLVM and GCC not using LLVM, so I tried compiling with Clang, but that only reduced it by 300 bytes, and the program header offsets were still aligned.
We're not out of luck though: we still have the option of writing a linker script. And with that, we should be able to explicitly specify the alignment of the program sections.
ENTRY(_start);
SECTIONS
{
. = 0x10000 + SIZEOF_HEADERS;
.text ALIGN(16) (READONLY) :
{
*(.text)
*(.text*)
}
.rodata ALIGN(16) (READONLY) :
{
*(.rodata)
*(.rodata*)
}
}
With this, it is no longer constrained to 0x1000 alignment. And, surprisingly, we beat Zig.
$ gcc -g -Oz -fno-builtin -nostdlib -ffreestanding -Wl,--no-dynamic-linker -no-pie -fno-stack-protector -T linker.ld main.c && strip a.out && ./a.out
Hello, world!
This binary executable is a mere 1384 bytes, about 5 times smaller than our 6K Zig binary.
Of course, we also need appropriate Zig reference. Our original example is no longer as valid, since we're not using PIE and just doing a single write syscall (which is technically not correct; the correct way to do it would be to repeat it until it has successfully transmitted all of your data, but we'll rely on the syscall on Linux tending to work the first time anyway).
const std = @import("std");
pub export fn _start() void {
_ = std.os.linux.write(std.os.linux.STDOUT_FILENO, "Hello, world!\n", 14);
std.os.linux.exit(0);
}
And the result is...!
Zig still beats us at 1032 bytes. That was honestly quite funny when I saw that.
But, remember, this is only by 352 bytes. That's not a lot of data. Let's directly compare the sections:
- .text Zig is 19 bytes smaller.
- .rodata is the same. (side note: I wonder if you can chop off a byte by getting rid of the null terminator.)
- .eh_frame C is 36 bytes smaller.
- .eh_frame_hdr C is 8 bytes smaller.
- .comment Zig is 8 bytes smaller.
- .shstrtab Zig is 38 bytes smaller.
- And C has additional sections .note.gnu.property and .note.gnu.build-id, respectively taking up 48 bytes and 36 bytes.
At the very least, this accounts for 105 of those bytes. The C program also has more section headers, and more program headers, so those probably take up some space as well. We are also forcing alignment by 16, which is not necessary, and some bytes may be being lost to that.
ENTRY(_start);
SECTIONS
{
. = 0x10000 + SIZEOF_HEADERS;
.text ALIGN(1) (READONLY) :
{
*(.text)
*(.text*)
}
.rodata ALIGN(1) (READONLY) :
{
*(.rodata)
*(.rodata*)
}
/DISCARD/ :
{
*(.note.gnu.property)
*(.note.gnu.build-id)
}
}
$ gcc -g -Oz -fno-builtin -nostdlib -ffreestanding -Wl,--no-dynamic-linker -Wl,--build-id=none -no-pie -fno-stack-protector -T linker.ld main.c && strip a.out && ./a.out
Hello, world!
So, removing those two additional sections and changing alignment from 16 to 1, we end up with 960 bytes. We beat Zig! For now. I'm sure if I tried for longer, I could get the Zig program to take up a comparable amount of space, but I think this is enough to show how small C can actually get if you get rid of the bloat.