The problem is that I have a large mbox file full of messages which I want to strip spam from with spamassassin. Ordering of the messages is important (i.e. they must go in and come out the other end in order, with identified spam removed).
My solution is to run the following script from formail thusly
$ formail < in-mbox -s /home/ianw/spam-clean.py spam-free-box
#!/usr/bin/python import os, sys, popen2 if len(sys.argv) != 2: print "usage: spam-clean.py output" sys.exit(1) #read in message from stdin. message = sys.stdin.read() sa = popen2.Popen3("/usr/bin/spamc -c") sa.tochild.write(message) sa.tochild.close() if sa.wait() != 0: print "discarding spam ..." sys.exit(1) else: print "ok ..." f = open(sys.argv[1], 'a') f.write(message) f.close()
It's slow, but a little profiling shows me that most of the time is spent asleep (i.e. in the wait() call). One neat project would be to daemonise this and take messages in a fifo manner so we could run a few spamassassins at the same time but maintain message order.