On an 386 you can read the rdtsc by simply doing
unsigned long long result; __asm__ __volatile__("rdtsc" : "=A" (result));
Note, however, in the gcc docs the documentation for the =A constraint has an important caveat
Specifies the a or d registers. This is primarily useful for 64-bit integer values (when in 32-bit mode) intended to be returned with the d register holding the most significant bits and the a register holding the least significant bits.
Thus this is not what you want when using amd64 in 64 bit mode with 64 bit registers. Follow the example of the kernel code, and do the shifts by hand
#define rdtscll(val) do { \ unsigned int __a,__d; \ asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \ } while(0)
footnote: why oh why can't everyone use a real 64 bit architecture?