Poking around inside the compiler

Shehjar, one of our students, came up with the interesting little

int i = 5;
printf("%d %d %d", i, i--, ++i);

and wondered exactly what it should do. Well, the C standard says that arguments to functions are evaluated first, but does not specify in what order they are evaluated. So this is a classic example of undefined behaviour (in fact it's even listed in the common list of not a bug not a bug by the GCC manual). In fact, ++ and friends don't actually create sequence points within the code (points where things are known to be evaluated), so they can play host to all sorts of undefined behaviour like above.

Of course, gcc doesn't leave you blind to this. If you turn on -Wall you get warning: operation on 'i' may be undefined. You do compile with -Wall right?

But this doesn't change the fact that that code prints 5,6,5 on x86 and on Itanium it prints 5,5,5. This suggested some investigation.

Firstly, lets simplify this down to

extern blah(int, int, int);

void function(void)
{
        int i = 5;
        blah(i, i--, ++i);
}

The problem is, looking at the output of the assembly doesn't tell us that much, because GCC is smart and by the time it has got to assembly, it is simply moving constant values into registers. So we need some way to peek into what GCC thinks it's doing to come up with those values.

Hence the magic -fdump flag. Building with this gives us dumps of the stages GCC is going through as it generates it's code.

$ gcc -fdump-tree-all -S -O ./test.c

This gives us lots of dumps, in numerical order as gcc is moving along. With modern GCC's, a good place to pick things up is to look is in the .gimple output file. Gimplification is the process GCC goes through optimising its internal tree structure (called GIMPLE after SIMPLE, apparently).

The .gimple file is pretty-printed from the tree representation back into C code. So, on i386 it looks like.

x86 $ cat test.c.t08.gimple

;; Function function (function)

function ()
{
  int i.0;
  int i;

  i = 5;
  i = i + 1;
  i.0 = i;
  i = i - 1;
  blah (i, i.0, i);
}

And on Itanium it looks like

ia64 $ cat test.c.t08.gimple
;; Function function (function)

function ()
{
  int i.0;
  int i;

  i = 5;
  i.0 = i;
  i = i - 1;
  i = i + 1;
  blah (i, i.0, i);
}

At this point, we can clearly see how gcc ends up with the idea that the code translates into 5,6,5 on x86 and 5,5,5 on Itanium. We also know that it is not the fault of optimisation, since we can see just what it is about to optimise matches the output we get. Why exactly it reaches this conclusion on the different architectures escapes me at the moment, but I'm slowing reading through and understanding the GCC code base hoping to one day find enlightenment :) Certainly passing arguments is very different on x86 compared to IA64 (on the stack rather than via registers) so no doubt some small piece of the backend config tweaks this into happening.

Another interesting thing to look at is the raw output of the tree gcc is using. You can see that via something like

$ gcc -S -fdump-tree-all-raw  -o test test.c

Which gives you an output like

;; Function function (function)

function ()
@1      function_decl    name: @2       type: @3       srcp: test.c:4
                         extern         body: @4
@2      identifier_node  strg: function lngt: 8
@3      function_type    size: @5       algn: 8        retn: @6
                         prms: @7
@4      bind_expr        type: @6       vars: @8       body: @9
@5      integer_cst      type: @10      low : 8
@6      void_type        name: @11      algn: 8
@7      tree_list        valu: @6
@8      var_decl         name: @12      type: @13      scpe: @1
                         srcp: test.c:6                artificial
                         size: @14      algn: 32       used: 1
@9      statement_list   0   : @15      1   : @16      2   : @17
                         3   : @18      4   : @19
@10     integer_type     name: @20      unql: @21      size: @22
                         algn: 64       prec: 36       unsigned
                         min : @23      max : @24
@11     type_decl        name: @25      type: @6       srcp: <built-in>:0
@12     identifier_node  strg: i.0      lngt: 3
@13     integer_type     name: @26      size: @14      algn: 32
                         prec: 32       min : @27      max : @28
@14     integer_cst      type: @10      low : 32
@15     modify_expr      type: @6       op 0: @29      op 1: @30
@16     modify_expr      type: @13      op 0: @29      op 1: @31
@17     modify_expr      type: @13      op 0: @8       op 1: @29
@18     modify_expr      type: @13      op 0: @29      op 1: @32
@19     call_expr        type: @13      fn  : @33      args: @34
@20     identifier_node  strg: bit_size_type           lngt: 13
@21     integer_type     name: @20      size: @22      algn: 64
                         prec: 36
@22     integer_cst      type: @10      low : 64
@23     integer_cst      type: @10      low : 0
@24     integer_cst      type: @10      low : -1
@25     identifier_node  strg: void     lngt: 4
@26     type_decl        name: @35      type: @13      srcp: <built-in>:0
@27     integer_cst      type: @13      high: -1       low : -2147483648
@28     integer_cst      type: @13      low : 2147483647
@29     var_decl         name: @36      type: @13      scpe: @1
                         srcp: test.c:5                size: @14
                         algn: 32       used: 1
@30     integer_cst      type: @13      low : 5
@31     plus_expr        type: @13      op 0: @29      op 1: @37
@32     minus_expr       type: @13      op 0: @29      op 1: @37
@33     addr_expr        type: @38      op 0: @39
@34     tree_list        valu: @29      chan: @40
@35     identifier_node  strg: int      lngt: 3
@36     identifier_node  strg: i        lngt: 1
@37     integer_cst      type: @13      low : 1
@38     pointer_type     size: @14      algn: 32       ptd : @41
@39     function_decl    name: @42      type: @41      srcp: test.c:1
                         undefined      extern
@40     tree_list        valu: @8       chan: @43
@41     function_type    size: @5       algn: 8        retn: @13
                         prms: @44
@42     identifier_node  strg: blah     lngt: 4
@43     tree_list        valu: @29
@44     tree_list        valu: @13      chan: @45
@45     tree_list        valu: @13      chan: @46
@46     tree_list        valu: @13      chan: @7

You can see the @X points to other nodes, which builds up to an entire tree structure of different nodes which represent the program. You can read more about the internal representation here.

Take home point? Using these dumps is a good way to start peering into what GCC is actually doing underneath its fairly extensive hood.

How Linux on IA64 boots, roughly

IA64 Linux Boot Flowchart

I toyed with the idea of vectorising this, but it seemed too hard.

In essence, SAL (System Abstraction Layer) starts the machine with one processor running (the boot processor) and the others asleep.

The boot processor jumps to _start in head.S, which checks a variable task_for_booting_cpu to see if it is the boot processor or not.

After that, it jumps into platform specific code and gets the machine ready to go. Eventually we fall into smp_init() which starts the other processors. For each other processor in the system we call do_boot_cpu which sends an IPI (inter-processor interrupt) wakeup to the other processor.

The other processor wakes up, and again jumps into _start, however this time when it checks the task_for_booting_cpu it will be set as the idle thread, so it knows it is not the boot processor. It jumps into start_secondary but largely follows more or less the same path, but skipping the platform setup stuff. Eventually it calls smp_callin to flag back to the boot processor that it is alive and sitting in the idle thread.

The boot processor waits a few seconds for each CPU to check in as alive before assuming the worse and moving on. Once all the CPUs are online, the system is pretty much booted.

Import a patch into subversion

I keep expecting there to be an easy way to do this (feel free to comment if there is), but I wanted to replicate the BitKeeper bk import -tpatch behaviour with subversion where you can give and diff and it will be imported into the repository. Importantly this involves adding and removing files as per the patch.

I thus wrote svnapply.py to do this. It actually turned out to be a little more in depth than I had hoped, as there are a few nuances with different diff outputs.

I even tested it by importing a 33mb kernel diff, comparing an export of it to a tree patched using the usual patch, and also re-creating the diff and applying it again, again comparing to the normally patched version. It all came out the same.

Comparing Montecito and Prescott cores

With the release of the Montecito manuals I wondered about the relative sizes of Itanium and Pentium.

Below is a quick hack comparison between a Montecito dual-core processor and a Prescott (early Pentium 4) core (I used this because both were made with a 90nm process). I've very roughly chopped off the L2 cache of the Prescott, and tried to superimpose it over the Itanium core. Remembering the extra yellow bits are L2 cache on the Itanium, and they roughly have the same L1 cache (16K L1I on both, 16K L1D on I2, 12K L1D on Prescott) the Itanium core logic comes out looking about 10-15% smaller. Montecito and Prescott

I got the Prescott die sizes from chip-architect.com and the Montecito sizes from the picture from Wikipedia article. If you want to check my maths, one pixel across is 0.0573mm and one pixel down is 0.575mm.

As you can see, the Itanium has a lot of cache. I will be interested to see how the Montecito stands up against the new Sun and IBM offerings over the next few months.

How to cancel a pending CVS remove

If you accidently locally do cvs rm on a file and want it back (and you haven't committed, of course), just cvs add it and it will restore the latest version.

The trick is to not copy back the missing file and try to add that, otherwise you'll get an error about how the file should be removed and is still there (or is back again). If you have a later version of file you are trying to resurrect move it back after you do the (re)``add``.

Nibble frequencies in a kernel image

Have you ever wondered what a frequency histogram of nibbles in a dump of an uncompressed Itanium Linux kernel image is? Well, wonder no more

0x0     6175200  37.0%  *******************************************************
0x1     1612596   9.7%  ***************
0x2     1261945   7.6%  ************
0x3     632509    3.8%  ******
0x4     1098759   6.6%  **********
0x5     590878    3.5%  ******
0x6     753510    4.5%  *******
0x7     576100    3.4%  ******
0x8     1194081   7.1%  ***********
0x9     462349    2.8%  *****
0xa     408003    2.4%  ****
0xb     275851    1.7%  ***
0xc     439190    2.6%  ****
0xd     263658    1.6%  ***
0xe     353211    2.1%  ****
0xf     607788    3.6%  ******

I swear I was not procrastinating when I came up with this; I was (am) trying to find a bug where I get a totally crap value in a register, and I wondered if that value was somehow being pulled out of the kernel code.

Terse guide to getting a cross compiler on Debian

It's not exactly point and click, but it does work, eventually. This talks about ia64, substitute your target architecture (.

  • apt-get source binutils.
  • Build with TARGET=ia64 fakeroot debian/rules binary-cross, install.
  • apt-get source gcc-4.0 and run GCC_TARGET=ia64 ./debian/rules control.
  • Now have a look at debian/control and take note of the pacakges with -ia64-cross. You need to make these with dpkg-cross. Download the packages from the archive and run dpkg-cross -a ia64 -b package.deb. Install. There is also a tool in unstable called apt-cross which makes this painless.
  • Try building gcc with GCC_TARGET=ia64 DEB_CROSS_INDEPENDENT=yes dpkg-buildpackage -rfakeroot. It will tell you if there are any missing dependencies.
  • Build should complete, and you should have ia64-linux-gnu-blah tools.

The only problem with this approach is that your packages depend on the gcc-4.0-base package, and if this is updated in the archives you need to rebuild all your cross compilers. Considering this package consists of readme files, this is slightly annoying.

Update: I have written a program get-from-pool.py to help you finding the files for the other architecture which you need to build with dpkg-cross.

Update 2: I have a patch that enables you to build packages without depending on the underlying system. See Debian bug #347484.

Update 3: updated for today's environment.