Debugging __thead variables from coredumps

Fri 01 July 2011

I think the old-fashioned coredump is a little under-appreciated these days. I'm not sure when it changed, but I even had to add myself to /etc/security/limits.conf to raise my ulimit to even create one.

Anyway, debugging __thread variables from coredumps is a bit of a pain. For the uninitiated, the __thread identifier specifies a thread-local variable, i.e. every thread gets its own copy of the variable automatically.

The implementation of this is highly architecture specific. The reason is that TLS entries need to be accessed via a register kept as part of the thread state, and thus every architecture chooses their own register and builds their own ABI. On x86-32, which is very register-limited, you certainly don't want to dedicate a register to a pointer to TLS variables and take it out of operation. Luckily there is the hang-over from the 70's (60's? 50's?) — segmentation. Without going into real detail, segment registers can be used to offset into a region of memory based on a look-up of a region descriptor stored in a table.

Above, you see a simplified example of the %gs register loaded with the index value 2, and thus when you access %gs:20 what you are saying is "find entry 2 in the global descriptor table (GDT), follow it and offset 20 into that region.

The kernel gives each thread its own GDT (i.e. the GDT register is part of the thread-state). Thus __thread variables are stored based on segment offsets and — voila — thread-local storage. Now, there's a few tricks here. For various reasons, a process can not setup entries in the GDT; this is a privileged operation that must be done by the kernel. There is actually a special system call for threads to setup their TLS areas in the GDT — set_thread_area. When a new thread starts, the thread-library and dynamic linker conspire to allocate and load any static TLS data (i.e. if you have a global __thread variable initialised to some value, then every thread must see that value when it starts) and then calls this to make sure the variables are ready to go. After that, the gs register is filled with the index of that GDT entry, and all TLS access goes via it. That, in a nut-shell, is TLS for x86-32.

Now, to the problem. Take, for example, the following short program:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

int __thread foo;

void* thread(void *in) {

        foo = (int)in;

        printf("foo is %d\n", foo);

        while (1) {
                sleep(10);
        }
}

int main(void) {

        pthread_t threads[5];
        int i;

        for (i=0; i<5; i++) {
                pthread_create(&threads[i], NULL,
                               thread, (void*)i);
        }

        sleep(5);
        abort();
}

We start a few threads and then abort to make it dump core. But, if you try and examine foo:

$ gdb ./thread core
GNU gdb (GDB) 7.2-debian
Reading symbols from /home/ianw/tmp/thread/thread...done.
[New Thread 4970]
[New Thread 4975]
[New Thread 4974]
[New Thread 4973]
[New Thread 4972]
[New Thread 4971]

Core was generated by `./thread'.
Program terminated with signal 6, Aborted.
#0  0xffffe424 in __kernel_vsyscall ()
(gdb) print foo
Cannot find thread-local variables on this target

It seems that gdb doesn't know how to find the value of foo because its not a variable in the usual sense ("this target", in this case, means a coredump). It relies on accessing via the gs register, which relies on the current processes' GDT state, which has since been destroyed. If you care to consult the canonical source of TLS info, you can find out exactly why this is so hard to figure out generically. However, with some work, we can start to figure out the value by hand.

A coredump is really a ELF file full of just two things: a bunch of LOAD segments that are just dumps of the process memory regions, and a NOTE section that includes a bunch of notes that the kernel dumps out for us such as the current register state, the process id, the signal that killed us, etc. Here's an example of a core file under readelf

$ readelf --headers ./core
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
...
  OS/ABI:                            UNIX - System V
...
  Type:                              CORE (Core file)
...
There are no sections in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  NOTE           0x0003d4 0x00000000 0x00000000 0x006b4 0x00000     0
  LOAD           0x001000 0x08048000 0x00000000 0x00000 0x01000 R E 0x1000
  LOAD           0x001000 0x08049000 0x00000000 0x01000 0x01000 RW  0x1000
  LOAD           0x002000 0x0804a000 0x00000000 0x21000 0x21000 RW  0x1000
  LOAD           0x023000 0x46015000 0x00000000 0x00000 0x1b000 R E 0x1000
  LOAD           0x023000 0x46030000 0x00000000 0x01000 0x01000 R   0x1000
  LOAD           0x024000 0x46031000 0x00000000 0x01000 0x01000 RW  0x1000
...

You can examine the notes; with readelf we see:

$ readelf --notes ./core

Notes at offset 0x000003d4 with length 0x000006b4:
  Owner                 Data size      Description
  CORE                 0x00000090      NT_PRSTATUS (prstatus structure)
  CORE                 0x0000007c      NT_PRPSINFO (prpsinfo structure)
  CORE                 0x000000a0      NT_AUXV (auxiliary vector)
  LINUX                0x00000030      Unknown note type: (0x00000200)
  CORE                 0x00000090      NT_PRSTATUS (prstatus structure)
  LINUX                0x00000030      Unknown note type: (0x00000200)
  CORE                 0x00000090      NT_PRSTATUS (prstatus structure)
  LINUX                0x00000030      Unknown note type: (0x00000200)
  CORE                 0x00000090      NT_PRSTATUS (prstatus structure)
  LINUX                0x00000030      Unknown note type: (0x00000200)
  CORE                 0x00000090      NT_PRSTATUS (prstatus structure)
  LINUX                0x00000030      Unknown note type: (0x00000200)
  CORE                 0x00000090      NT_PRSTATUS (prstatus structure)
  LINUX                0x00000030      Unknown note type: (0x00000200)

This shows 5 notes of type NT_PRSTATUS; it should be little surprise that each of these notes describes the process status of one running thread. When gdb pops up [New Thread 4973] that's because it just hit another note that describes the new thread.

To actually make sense of the note, however, we need some other tools that look deeper. The elfutils based tools give us a better description of the various notes in the coredump; more akin to what GDB is interpreting them as. Below I've extracted one thread's info:

$ eu-readelf --notes core
...
  CORE                 144  PRSTATUS
    info.si_signo: 6, info.si_code: 0, info.si_errno: 0, cursig: 6
    sigpend: <>
    sighold: <>
    pid: 4970, ppid: 4960, pgrp: 4970, sid: 4960
    utime: 0.000000, stime: 0.000000, cutime: 0.000000, cstime: 0.000000
    orig_eax: 270, fpvalid: 0
    ebx:           4970  ecx:           4970  edx:              6
    esi:              0  edi:     1176018932  ebp:     0xbfa5b710
    eax:              0  eip:     0xffffe424  eflags:  0x00000206
    esp:     0xbfa5b6f8
    ds: 0x007b  es: 0x007b  fs: 0x0000  gs: 0x0033  cs: 0x0073  ss: 0x007b
  LINUX                 48  386_TLS
    index: 6, base: 0xb6043b70, limit: 0x000fffff, flags: 0x00000051
    index: 7, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
    index: 8, base: 0x00000000, limit: 0x00000000, flags: 0x00000028

What's particularly interesting here is that the note type that was previously unknown (Unknown note type: (0x00000200)) has been resolved for us — it is in fact of type NT_386_TLS and is a dump of the GDT entries for the thread.

So, if we examine how our function is accessing our TLS variable by disassembling the thread function:

(gdb) disassemble thread
Dump of assembler code for function thread:
   0x08048514 <+0>:    push   %ebp
   0x08048515 <+1>:    mov    %esp,%ebp
   0x08048517 <+3>:    sub    $0x18,%esp
   0x0804851a <+6>:    mov    0x8(%ebp),%eax
   0x0804851d <+9>:    mov    %eax,%gs:0xfffffffc
   0x08048523 <+15>:   mov    %gs:0xfffffffc,%edx
   0x0804852a <+22>:   mov    $0x8048670,%eax
   0x0804852f <+27>:   mov    %edx,0x4(%esp)
   0x08048533 <+31>:   mov    %eax,(%esp)
   0x08048536 <+34>:   call   0x80483f4 <printf@plt>
   0x0804853b <+39>:   movl   $0xa,(%esp)
   0x08048542 <+46>:   call   0x8048404 <sleep@plt>
   0x08048547 <+51>:   jmp    0x804853b <thread+39>

Examining the disassembly we can see that foo is accessed via an offset of -4 from %gs (this is OK, as our limit value is maxed out. See the TLS ABI doc for more info). Now, we can examine gs and see which selector it is telling us to use:

(gdb) print $gs >> 3
$31 = 6

Above we shift out the last 3 bits, as these refer to the privilege level (bits 0 and 1) and if this is a GDT or LDT reference (bit 2). Thus, looking at the GDT descriptor for index 6:

LINUX                 48  386_TLS
  index: 6, base: 0xb6043b70, limit: 0x000fffff, flags: 0x00000051

we can finally do some maths from the base-address to figure out the value:

(gdb) print *(int*)(0xb6043b70 - 4)
$34 = 3

Thus we have found our TLS value; in this case for thread 3 the value is indeed 3. I caution this is the simplest possible case; other "models" (see the TLS doc, again) may not be so simple to work out by hand, but this would certainly be how you would start.