Our Dear Leader Sam Hocevar has previously blogged about PIC and inline ASM. Today I came across a sort of extension to this problem.
Consider the following code, which implements a double word compare and swap using the x86 cmpxchg8b instruction (for a bonus you can lock it to make it atomic).
#include <stdio.h> typedef struct double_word_t { int a; int b; } double_word; /* atomically compare old and mem, if they are the same then copy new back to mem */ int compare_and_swap(double_word *mem, double_word old, double_word new) { char result; __asm__ __volatile__("lock; cmpxchg8b %0; setz %1;" : "=m"(*mem), "=q"(result) : "m"(*mem), "d" (old.b), "a" (old.a), "c" (new.b), "b" (new.a) : "memory"); return (int)result; } int main(void) { double_word w = {.a = 0, .b = 0}; double_word old = {.a = 17, .b = 42}; double_word new = {.a = 12, .b = 13}; /* old != w, therefore nothing happens */ compare_and_swap(&w, old, new); printf("Should fail -> (%d,%d)\n", w.a, w.b); /* old == w, therefore w = new */ old.a = 0; old.b = 0; compare_and_swap(&w, old, new); printf("Should work -> (%d,%d)\n", w.a, w.b); return 0; }
This type of CAS can be used to implement lock-free algorithms (I've previously blogged about that sort of thing).
The problem is that the cmpxchg8b uses the ebx register, i.e. pseudo code looks like:
if(EDX:EAX == Destination) { ZF = 1; Destination = ECX:EBX; } else { ZF = 0; EDX:EAX = Destination; }
PIC code reserves ebx for internal use, so if you try to compile that with -fPIC you will get an error about not being able to allocate ebx.
A first attempt to create a PIC friendly version would simply save and restore ebx and not gcc anything about it, something like:
__asm__ __volatile__("pushl %%ebx;" /* save ebx used for PIC GOT ptr */ "movl %6,%%ebx;" /* move new_val2 to %ebx */ "lock; cmpxchg8b %0; setz %1;" "pop %%ebx;" /* restore %ebx */ : "=m"(*mem), "=q"(result) : "m"(*mem), "d" (old.b), "a" (old.a), "c" (new.b), "m" (new.a) : "memory");
Unfortunately, this isn't a generic solution. It works fine with the PIC case, because gcc will not allocate ebx for anything else. But in the non-PIC case, there is a chance that ebx will be used for addr. This would cause a probably fairly tricky bug to track down!
The solution is to use the #if __PIC__ directive to either tell gcc you're clobbering ebx in the non-PIC case, or just keep two versions around; one that saves and restores ebx for PIC and one that doesn't.