Monday, June 23, 2008

Chicken and egg problem with Propolice in runtime linker/loader

Some background first: Back in 2006, I was frustrated because FreeBSD was somewhat lagging behind other open-source operating systems in term of integrated security features. One of them is a GCC extension originally named Propolice or SSP for Stack Smashing Protection. As its name lets sound, it protects (very efficiently) against stack based buffer overflows. Historically Propolice has been developed by Hiroaki Etoh at IBM for gcc-2.95.3 and then gcc-3.4.4 as an external patch, but it has now been included in the mainstream, starting at gcc-4.1. The patch to integrate Propolice in FreeBSD has been existing for more than two years on my website, but then FreeBSD only provided gcc-3.4.4 and heavily patching a contributed software is ruled out by policy, so it couldn't be committed in FreeBSD-6. I missed the FreeBSD-7 window for various reasons, and now I'm working to get it committed to FreeBSD-8 (aka CURRENT).

How does Propolice work? The compiler identifies functions that might be vulnerable (containing a stack based buffer) and during their prologue, pushes a one-word canary between the return address stored in the stack and the local variables. In the function's epilogue, the canary is checked against its original value and if it has changed then a buffer overflow occurred and the program is aborted. The canary is initially in the BSS segment but is initialized to a random value by a function called during the program startup (namely, a constructor). Both the canary and the initializer function are provided in FreeBSD's libc.

When I sent the patch for review back in april, Antoine Brodin noticed that when build world is performed with -fstack-protector-all (which makes GCC to protect all functions instead of only those containing a local buffer), it breaks the whole system. There were actually various problems, such as the
initializer function being protected itself: during its prologue the canary was equal to zero but during the epilogue its value had been set to a random value meanwhile so obviously the saved value did't match... This problem has been resolved quickly. The nasty problem lay in the runtime loader (aka rtld-elf): once it was installed, all programs would fail with SIGSEGV.

When a dynamically-linked program is run, the kernel always transfers control to rtld behing the scene, instead of the actual program. The purpose is to do runtime linking of libraries needed by the program, which includes resolving symbols and performing relocations, before actually transfering control to it. So I've recompiled rtld without SSP, but it was still crashing. I've narrowed down the segfault to a call mmap(2) which turned out to be the first call into libc, against which rtld was statically linked. One of the very first thing rtld has to do is to relocate itself, mainly to be able to access global data which are addressed through GOT (Global Offset Table). This was the very problem. Given that all libc functions were protected with Propolice, mmap(2)'s prologue tried to push the canary, which is accessed through the «__stack_chk_guard» global symbol. This means it used a pointer from the GOT, which had not been initialized at this point.

As an additional note (and a reminder for me ;p), I came to thinking that the problem could also arise in the canary initializer which stands in rtld's .init section. After some thinking, I realized that usual .init and .fini sections were handled by rtld itself, so rtld's ones are actually never run I think.

Obviously rtld must been compiled without SSP. As a temporary solution, libc is not allowed to be compiled with -fstack-protector-all. I think a better solution would be to create a librtld containing symbols required by rtld and compiled without SSP.

Sharp minds have certainly understood that if the original patch worked without -fstack-protector-all it was just a matter of chance because no functions during relocation of rtld's GOT entries had been elected by GCC to be protected.