On Changes

Debian bug 342658 has illustrated some ways to not run an Open Source project.

Firstly, and by far most importantly, keep a detailed change log. An entry like "Support for Blah has been added" is essentially useless, because it doesn't give someone looking in a detailed view of what has happened. The only sane way to do this is to keep a proper ChangeLog following (something like) the GNU ChangeLog format. For an example of how to do this without fail, see the glibc ChangeLog. I can attest that single ChangeLog file has helped chase down more bugs than anything else, including revision histories. Other good examples are gcc, binutils and emacs. If I can't figure out why something changed from that, I can always go back to the person who did the change to find it, or cross reference it with mailing list postings.

Secondly, if you're writing a library then test cases to stop regressions are of fundamental importance. Again glibc gets this right, where fixes to major bugs must be accompanied by a test case. You should be aiming for correctness and portability, and there's just no way to confirm this unless you actually test your library.

Without these two basic facilities, anyone fresh coming into your code is left treading water without anything to hold on to. And since Open Source is all about getting as many eyes as possible on the code, you need to offer people something to grab on to to avoid drowning in your sea of code.

how not to print a newline

fputc((int)"\n",fp);

IEC 60027-2 Units

It has been pointed out to me that in a previous post I incorrectly suggested the Ki/Mi/Gi style of units for describing binary multiples has something to do with SI units. These style of units are officially described in IEC 60027-2: Letter symbols to be used in electrical technology - Part 2: Telecommunications and electronics, thus you might like to refer to them as "IEC Units" or similar.

The NIST reference has this to say:

It is important to recognise that the new prefixes for binary multiples are not part of the International System of Units (SI), the modern metric system. However, for ease of understanding and recall, they were derived from the SI prefixes for positive powers of ten.

It is a considerable faux pas in many circles to get these things mixed up.

Detecting SEGV read/write on IA64

Today someone asked me why the following code works

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <limits.h>
#include <features.h>


void segv_handler(int sig, siginfo_t *sip, void *c)
{
        struct sigcontext *scp = (struct sigcontext *)c;
        unsigned long i = *((unsigned long*)scp->sc_ip);
        printf("%s fault\n", ((i >> 21) & 1) ? "write" : "read");
        exit(1);
}


int main(void)
{
        struct sigaction sa;
        int *x;
        int m;

        sa.sa_handler = SIG_DFL;
        sa.sa_sigaction = segv_handler;
        sigemptyset(&sa.sa_mask);
        sigaddset(&sa.sa_mask,SIGIO);
        sigaddset(&sa.sa_mask,SIGALRM);

        sa.sa_flags = SA_SIGINFO;

        if (sigaction(SIGSEGV,&sa,NULL)) {
                printf(" Error assigning signal!\n");
        }

        x = valloc(1024);

        mprotect(x,1024,PROT_NONE);  //make it none access

        /* read fault */
        m = x[0];
        /* write fault */
        /* x[0] = 1;   */

        return 0;
}

This appears to be an extremley bad way to find out if you have taken a read or write fault with a SEGV on IA64.

It's actually a fluke that this works at all. Its trying to match part of the instruction opcode to see if it is a load or store but gets it completely wrong. Compile it with ICC and it will fail.

On IA64 we are provided with si_isr, the interrupt status register, which can tell us exactly what has happened. Below is the right way to do it.

#define _GNU_SOURCE 1
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <limits.h>
#include <features.h>


void segv_handler(int sig, siginfo_t *sip, void *c)
{
        struct sigcontext *scp = (struct sigcontext *)c;
        unsigned long i = *((unsigned long*)scp->sc_ip);

        int x = (sip->si_isr >> 32) & 1; // bit 32
        int w = (sip->si_isr >> 33) & 1; // bit 33
        int r = (sip->si_isr >> 34) & 1; // bit 34
        printf("%s|%s|%s fault\n",
               r ? "r" : " ",
               w ? "w" : " ",
               x ? "x" : " ");
        exit(1);
}


int main(void)
{
        struct sigaction sa;
        volatile int *x;
        int m = 100;

        sa.sa_handler = SIG_DFL;
        sa.sa_sigaction = segv_handler;
        sigemptyset(&sa.sa_mask);
        sigaddset(&sa.sa_mask,SIGIO);
        sigaddset(&sa.sa_mask,SIGALRM);

        sa.sa_flags = SA_SIGINFO;

        if (sigaction(SIGSEGV,&sa,NULL)) {
                printf(" Error assigning signal!\n");
        }

        x = valloc(1024);

        mprotect(x,1024,PROT_NONE);  //make it none access

        /* read fault */
        m = x[0];
        /* write fault */
        //x[0] = 100;
        return 0;
}

Pthreads mutex subtleties

Is the following code valid?

#include <stdio.h>
#include <signal.h>
#include <pthread.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void sigint_handler(int sig)
{
        pthread_mutex_unlock(&mutex);
        return;
}


int main(void)
{

        signal(SIGINT, sigint_handler);

        pthread_mutex_lock(&mutex);

        printf("About to wait\n");

        pthread_mutex_lock(&mutex);

        printf("Ok, done!\n");
}

It certainly works on Linux, but on at least FreeBSD it will not as deadlock detection will kick in when the thread tries to lock the mutex it already has locked. Using signals in threaded code is probably a bad idea but I believe the deadlock detection is broken in this case.

Of course, the right way to do this is to use a condition variable and use a pthread_cond_wait() call in main() and wake it up from the signal handler. This works fine.

Moral of the story -- check the return value of pthread_mutex_lock() because sometimes it may not be what you expect! It's much easier to debug a error message about pthread_mutex_lock returning some strange EDEADLCK code than to try and find why your multithreaded application seems to be randomly crashing on another operating system.

hnb - bad code

for an example of how not to do things, hnb is pretty good. it's a shame, because it looked like a handy, console based todo list with an xml backing. I don't mean to trash opensource work, but here's some constructive critisim

  • the code follows no known standard coding convention.

    indent(1L)
    

    can help you there

  • whenever you use the magic cast operator in C, be really really sure you know what you are doing.

    Node *pos=(Node *)data;
    ...
    return (int)pos;
    

    The code does this at least a hundred times! Obviously some code that was copied from somewhere but renders the program usless on any 64 bit machine, where sizeof(int) != sizeof(pointer). This also says something about abstraction, since it does it so many times this really hints it should have been broken out.

  • If someone says "XML", think Python, Perl, Ruby ... anything but C