Shehjar, one of our students, came up with the interesting little
int i = 5; printf("%d %d %d", i, i--, ++i);
and wondered exactly what it should do. Well, the C standard says that arguments to functions are evaluated first, but does not specify in what order they are evaluated. So this is a classic example of undefined behaviour (in fact it's even listed in the common list of not a bug not a bug by the GCC manual). In fact, ++ and friends don't actually create sequence points within the code (points where things are known to be evaluated), so they can play host to all sorts of undefined behaviour like above.
Of course, gcc doesn't leave you blind to this. If you turn on -Wall you get warning: operation on 'i' may be undefined. You do compile with -Wall right?
But this doesn't change the fact that that code prints 5,6,5 on x86 and on Itanium it prints 5,5,5. This suggested some investigation.
Firstly, lets simplify this down to
extern blah(int, int, int); void function(void) { int i = 5; blah(i, i--, ++i); }
The problem is, looking at the output of the assembly doesn't tell us that much, because GCC is smart and by the time it has got to assembly, it is simply moving constant values into registers. So we need some way to peek into what GCC thinks it's doing to come up with those values.
Hence the magic -fdump flag. Building with this gives us dumps of the stages GCC is going through as it generates it's code.
$ gcc -fdump-tree-all -S -O ./test.c
This gives us lots of dumps, in numerical order as gcc is moving along. With modern GCC's, a good place to pick things up is to look is in the .gimple output file. Gimplification is the process GCC goes through optimising its internal tree structure (called GIMPLE after SIMPLE, apparently).
The .gimple file is pretty-printed from the tree representation back into C code. So, on i386 it looks like.
x86 $ cat test.c.t08.gimple ;; Function function (function) function () { int i.0; int i; i = 5; i = i + 1; i.0 = i; i = i - 1; blah (i, i.0, i); }
And on Itanium it looks like
ia64 $ cat test.c.t08.gimple ;; Function function (function) function () { int i.0; int i; i = 5; i.0 = i; i = i - 1; i = i + 1; blah (i, i.0, i); }
At this point, we can clearly see how gcc ends up with the idea that the code translates into 5,6,5 on x86 and 5,5,5 on Itanium. We also know that it is not the fault of optimisation, since we can see just what it is about to optimise matches the output we get. Why exactly it reaches this conclusion on the different architectures escapes me at the moment, but I'm slowing reading through and understanding the GCC code base hoping to one day find enlightenment :) Certainly passing arguments is very different on x86 compared to IA64 (on the stack rather than via registers) so no doubt some small piece of the backend config tweaks this into happening.
Another interesting thing to look at is the raw output of the tree gcc is using. You can see that via something like
$ gcc -S -fdump-tree-all-raw -o test test.c
Which gives you an output like
;; Function function (function) function () @1 function_decl name: @2 type: @3 srcp: test.c:4 extern body: @4 @2 identifier_node strg: function lngt: 8 @3 function_type size: @5 algn: 8 retn: @6 prms: @7 @4 bind_expr type: @6 vars: @8 body: @9 @5 integer_cst type: @10 low : 8 @6 void_type name: @11 algn: 8 @7 tree_list valu: @6 @8 var_decl name: @12 type: @13 scpe: @1 srcp: test.c:6 artificial size: @14 algn: 32 used: 1 @9 statement_list 0 : @15 1 : @16 2 : @17 3 : @18 4 : @19 @10 integer_type name: @20 unql: @21 size: @22 algn: 64 prec: 36 unsigned min : @23 max : @24 @11 type_decl name: @25 type: @6 srcp: <built-in>:0 @12 identifier_node strg: i.0 lngt: 3 @13 integer_type name: @26 size: @14 algn: 32 prec: 32 min : @27 max : @28 @14 integer_cst type: @10 low : 32 @15 modify_expr type: @6 op 0: @29 op 1: @30 @16 modify_expr type: @13 op 0: @29 op 1: @31 @17 modify_expr type: @13 op 0: @8 op 1: @29 @18 modify_expr type: @13 op 0: @29 op 1: @32 @19 call_expr type: @13 fn : @33 args: @34 @20 identifier_node strg: bit_size_type lngt: 13 @21 integer_type name: @20 size: @22 algn: 64 prec: 36 @22 integer_cst type: @10 low : 64 @23 integer_cst type: @10 low : 0 @24 integer_cst type: @10 low : -1 @25 identifier_node strg: void lngt: 4 @26 type_decl name: @35 type: @13 srcp: <built-in>:0 @27 integer_cst type: @13 high: -1 low : -2147483648 @28 integer_cst type: @13 low : 2147483647 @29 var_decl name: @36 type: @13 scpe: @1 srcp: test.c:5 size: @14 algn: 32 used: 1 @30 integer_cst type: @13 low : 5 @31 plus_expr type: @13 op 0: @29 op 1: @37 @32 minus_expr type: @13 op 0: @29 op 1: @37 @33 addr_expr type: @38 op 0: @39 @34 tree_list valu: @29 chan: @40 @35 identifier_node strg: int lngt: 3 @36 identifier_node strg: i lngt: 1 @37 integer_cst type: @13 low : 1 @38 pointer_type size: @14 algn: 32 ptd : @41 @39 function_decl name: @42 type: @41 srcp: test.c:1 undefined extern @40 tree_list valu: @8 chan: @43 @41 function_type size: @5 algn: 8 retn: @13 prms: @44 @42 identifier_node strg: blah lngt: 4 @43 tree_list valu: @29 @44 tree_list valu: @13 chan: @45 @45 tree_list valu: @13 chan: @46 @46 tree_list valu: @13 chan: @7
You can see the @X points to other nodes, which builds up to an entire tree structure of different nodes which represent the program. You can read more about the internal representation here.
Take home point? Using these dumps is a good way to start peering into what GCC is actually doing underneath its fairly extensive hood.