A few people have noticed and wondered what linux-gate.so.1 is in their binaries with newer libc's.
ianw@morrison:~$ ldd /bin/ls linux-gate.so.1 => (0xffffe000) librt.so.1 => /lib/tls/librt.so.1 (0xb7fdb000) libacl.so.1 => /lib/libacl.so.1 (0xb7fd5000) libc.so.6 => /lib/tls/libc.so.6 (0xb7e9c000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7e8a000) /lib/ld-linux.so.2 (0xb7feb000) libattr.so.1 => /lib/libattr.so.1 (0xb7e86000)
It's actually a shared library that is exported by the kernel to provide a way to make system calls faster. Most architectures have ways of making system calls that are less expensive than taking a full trap; sysenter on x86 (syscall on AMD I think) and epc on IA64 for example.
If you want the gist of how it works, first we can pull it apart. The following program reads and dumps the so on a x86 machine. Note it's just a kernel page, so you can just dump getpagesize() should you want to; though you can't directly call write on it (i.e. you need to memcpy and then write). Below I pull apart the headers.
#include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> #include <string.h> #include <elf.h> #include <alloca.h> int main(void) { int i; unsigned size = 0; char *buf; Elf32_Ehdr *so = (Elf32_Ehdr*)0xffffe000; Elf32_Phdr *ph = (Elf32_Phdr*)((void*)so + so->e_phoff); size += so->e_ehsize + (so->e_phentsize * so->e_phnum); for (i = 0 ; i < so->e_phnum; i++) { size += ph->p_memsz; ph = (void*)ph + so->e_phentsize; } buf = alloca(size); memcpy(buf, so, size); int f = open("./kernel-gate.so", O_CREAT|O_WRONLY, S_IRWXU); int w = write(f, buf, size); printf("wrote %d (%s)\n", w, strerror(errno)); }
At this stage you should have a binary you can look at with, say readelf.
ianw@morrison:~/tmp$ readelf --symbols ./kernel-gate.so Symbol table '.dynsym' contains 15 entries: Num: Value Size Type Bind Vis Ndx Name [--snip--] 11: ffffe400 20 FUNC GLOBAL DEFAULT 6 __kernel_vsyscall@@LINUX_2.5 12: 00000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.5 13: ffffe440 7 FUNC GLOBAL DEFAULT 6 __kernel_rt_sigreturn@@LINUX_ 2.5 14: ffffe420 8 FUNC GLOBAL DEFAULT 6 __kernel_sigreturn@@LINUX_2.5
__kernel_vsyscall is the function you call to do the fast syscall magic. But I bet you're wondering just how that gets called?
It's easy if you poke inside the auxiliary vector that is passed to ld, the dynamic loader by the kernel. There's a couple of ways to see it; via an environment flag, peeking into /proc/self/auxv or on PowerPC it is passed as the forth argument to main().
ianw@morrison:~/tmp$ LD_SHOW_AUXV=1 /bin/true AT_SYSINFO: 0xffffe400 AT_SYSINFO_EHDR: 0xffffe000 AT_HWCAP: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe AT_PAGESZ: 4096 AT_CLKTCK: 100 AT_PHDR: 0x8048034 AT_PHENT: 32 AT_PHNUM: 7 AT_BASE: 0xb7feb000 AT_FLAGS: 0x0 AT_ENTRY: 0x8048960 AT_UID: 1000 AT_EUID: 1000 AT_GID: 1000 AT_EGID: 1000 AT_SECURE: 0 AT_PLATFORM: i686
Notice how the AT_SYSINFO symbols refers to the fast system call function in our kernel shared object? Also notice that the EHDR flag points to the library its self.
If you start to poke through the glibc source code and look how the sysinfo entry is handled you can see the dynamic linker will choose to use the library function for system calls if it is available. If that flag is never passed by the kernel it can fall back to the old way of doing things.
IA64 works in the same way, although we keep our kernel shared library at 0xa000000000000000. You can see how the shared object is quite an elegant design that allows maximum compatibility across and within architectures, since you have abstracted the calling mechanism away from userspace. A 386 can call the same way as a Pentium IV through the library and the kernel will make sure the appropriate thing is done in __kernel_vsyscall.