A few people have noticed and wondered what linux-gate.so.1 is in
their binaries with newer libc's.
ianw@morrison:~$ ldd /bin/ls
linux-gate.so.1 => (0xffffe000)
librt.so.1 => /lib/tls/librt.so.1 (0xb7fdb000)
libacl.so.1 => /lib/libacl.so.1 (0xb7fd5000)
libc.so.6 => /lib/tls/libc.so.6 (0xb7e9c000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7e8a000)
/lib/ld-linux.so.2 (0xb7feb000)
libattr.so.1 => /lib/libattr.so.1 (0xb7e86000)
It's actually a shared library that is exported by the kernel to provide
a way to make system calls faster. Most architectures have ways of
making system calls that are less expensive than taking a full trap;
sysenter on x86 (syscall on AMD I think) and epc on IA64 for
example.
If you want the gist of how it works, first we can pull it apart. The
following program reads and dumps the so on a x86 machine. Note it's
just a kernel page, so you can just dump getpagesize() should you
want to; though you can't directly call write on it (i.e. you need
to memcpy and then write). Below I pull apart the headers.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <elf.h>
#include <alloca.h>
int main(void)
{
int i;
unsigned size = 0;
char *buf;
Elf32_Ehdr *so = (Elf32_Ehdr*)0xffffe000;
Elf32_Phdr *ph = (Elf32_Phdr*)((void*)so + so->e_phoff);
size += so->e_ehsize + (so->e_phentsize * so->e_phnum);
for (i = 0 ; i < so->e_phnum; i++)
{
size += ph->p_memsz;
ph = (void*)ph + so->e_phentsize;
}
buf = alloca(size);
memcpy(buf, so, size);
int f = open("./kernel-gate.so", O_CREAT|O_WRONLY, S_IRWXU);
int w = write(f, buf, size);
printf("wrote %d (%s)\n", w, strerror(errno));
}
At this stage you should have a binary you can look at with, say
readelf.
ianw@morrison:~/tmp$ readelf --symbols ./kernel-gate.so
Symbol table '.dynsym' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
[--snip--]
11: ffffe400 20 FUNC GLOBAL DEFAULT 6 __kernel_vsyscall@@LINUX_2.5
12: 00000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.5
13: ffffe440 7 FUNC GLOBAL DEFAULT 6 __kernel_rt_sigreturn@@LINUX_ 2.5
14: ffffe420 8 FUNC GLOBAL DEFAULT 6 __kernel_sigreturn@@LINUX_2.5
__kernel_vsyscall is the function you call to do the fast syscall
magic. But I bet you're wondering just how that gets called?
It's easy if you poke inside the auxiliary vector that is passed to
ld, the dynamic loader by the kernel. There's a couple of ways to
see it; via an environment flag, peeking into /proc/self/auxv or on
PowerPC it is passed as the forth argument to main().
ianw@morrison:~/tmp$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO: 0xffffe400
AT_SYSINFO_EHDR: 0xffffe000
AT_HWCAP: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x8048034
AT_PHENT: 32
AT_PHNUM: 7
AT_BASE: 0xb7feb000
AT_FLAGS: 0x0
AT_ENTRY: 0x8048960
AT_UID: 1000
AT_EUID: 1000
AT_GID: 1000
AT_EGID: 1000
AT_SECURE: 0
AT_PLATFORM: i686
Notice how the AT_SYSINFO symbols refers to the fast system call
function in our kernel shared object? Also notice that the EHDR flag
points to the library its self.
If you start to poke through the glibc source code and look how the
sysinfo entry is handled you can see the dynamic linker will choose to
use the library function for system calls if it is available. If that
flag is never passed by the kernel it can fall back to the old way of
doing things.
IA64 works in the same way, although we keep our kernel shared library
at 0xa000000000000000. You can see how the shared object is quite an
elegant design that allows maximum compatibility across and within
architectures, since you have abstracted the calling mechanism away from
userspace. A 386 can call the same way as a Pentium IV through the
library and the kernel will make sure the appropriate thing is done in
__kernel_vsyscall.