Unfortunately this morning I got hit by a
bug where
an updated library broke an existing program.
The first thing I noticed was that if I rebuilt the program in question
against the new library, everything worked again. This sort of thing
points to a (probably unintentional) ABI change.
The source code was large, so I needed to try an zero in on what was
happening a bit more. I figured if this was an ABI change, it should
show up in the assembly. Thus I created a dump of both binary images
with objdump --disassemble.
I then ran that through tkdiff to see where I stood. This showed up
about 1500 differences, but looking at them they were mostly
@@ -2138 +2138 @@
-4000000000006dd0: 01 98 81 03 38 24 [MII] addl r51=7264,r1
+4000000000006dd0: 01 98 01 03 28 24 [MII] addl r51=5184,r1
As you may know, on IA64 r1 is defined as the gp or global
pointer register. Functions aren't just functions on IA64, they have a
function descriptor which contains both the function address and a
value for the global pointer. The add instruction can take up to a
22 bit operand, so by adding to the global pointer you can offset into a
region of 4MB of memory (2:sup:22 = 4MB) directly. When gcc builds
your program, it sets r1 to point to the .got section of your
binary. Now between the start of the binary and the GOT there is a whole
bunch of stuff, notably unwind info, which might push the offsets out.
So we can pretty much ignore all of these when looking for the root of
our problem.
So a bit more sed and grep gives you a much reduced list of changes, and
one in particular jumps out ...
-4000000000051a2c: 04 00 10 90 mov r38=512
+4000000000051a2c: 24 00 08 90 mov r38=258
This is where the very handy addr2line comes into play. Running that
over the binary gives us
ianw@lime:~/tmp/openssh-3.8.1p1/build-deb$ addr2line --exe ./ssh 4000000000051a2c
../../openbsd-compat/bsd-arc4random.c:60
Peeking at that code
static RC4_KEY rc4;
void arc4random_stir(void)
{
unsigned char rand_buf[SEED_SIZE];
60-->memset(&rc4, 0, sizeof(rc4));
if (RAND_bytes(rand_buf, sizeof(rand_buf)) <= 0
... blah blah ...
This looks a lot like the sizeof(RC4_KEY) has changed on us. If our
library has a different idea about the size of things than we do, it's
sure to be a recipe for disaster. A little test program confirms the
hypothesis.
#include "openssl/rc4.h"
main(void)
{
printf("%d\n", sizeof(RC4_KEY));
}
-- 0.9.7e-3 --
ianw@lime:~/tmp$ ./test
258
-- 0.9.7g-1 --
ianw@lime:~/tmp$ ./test
512
Of course, the "what" is the easy bit. Finding out why the size is
different is left as an exercise, and a reason why your projects should
always keep a ChangeLog in excruciating detail.