Python SocketServer class

The socketserver class is very nifty, but the inbuilt documentation is a bit obscure. Below is a simple SocketServer application that simply listens on port 7000; run it and telnet localhost 7000 and you should be greeted by a message; type HELLO and you should get a message back, and QUIT should end the whole thing.

import SocketServer, time, select, sys
from threading import Thread

COMMAND_HELLO = 1
COMMAND_QUIT  = 2

# The SimpleRequestHandler class uses this to parse command lines.
class SimpleCommandProcessor:
    def __init__(self):
        pass

    def process(self, line, request):
        """Process a command"""
        args = line.split(' ')
        command = args[0].lower()
        args = args[1:]

        if command == 'hello':
            request.send('HELLO TO YOU TO!\n\r')
            return COMMAND_HELLO
        elif command == 'quit':
            request.send('OK, SEE YOU LATER\n\r')
            return COMMAND_QUIT
        else:
            request.send('Unknown command: "%s"\n\r' % command)


# SimpleServer extends the TCPServer, using the threading mix in
# to create a new thread for every request.
class SimpleServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):

    # This means the main server will not do the equivalent of a
    # pthread_join() on the new threads.  With this set, Ctrl-C will
    # kill the server reliably.
    daemon_threads = True

    # By setting this we allow the server to re-bind to the address by
    # setting SO_REUSEADDR, meaning you don't have to wait for
    # timeouts when you kill the server and the sockets don't get
    # closed down correctly.
    allow_reuse_address = True

    def __init__(self, server_address, RequestHandlerClass, processor, message=''):
        SocketServer.TCPServer.__init__(self, server_address, RequestHandlerClass)
        self.processor = processor
        self.message = message

# The RequestHandler handles an incoming request.  We have extended in
# the SimpleServer class to have a 'processor' argument which we can
# access via the passed in server argument, but we could have stuffed
# all the processing in here too.
class SimpleRequestHandler(SocketServer.BaseRequestHandler):

    def __init__(self, request, client_address, server):
        SocketServer.BaseRequestHandler.__init__(self, request, client_address, server)

    def handle(self):
        self.request.send(self.server.message)

        ready_to_read, ready_to_write, in_error = select.select([self.request], [], [], None)

        text = ''
        done = False
        while not done:

            if len(ready_to_read) == 1 and ready_to_read[0] == self.request:
                data = self.request.recv(1024)

                if not data:
                    break
                elif len(data) > 0:
                    text += str(data)

                    while text.find("\n") != -1:
                        line, text = text.split("\n", 1)
                        line = line.rstrip()

                        command = self.server.processor.process(line,
                                                                self.request)

                        if command == COMMAND_HELLO:
                            break
                        elif command == COMMAND_QUIT:
                            done = True
                            break

        self.request.close()

    def finish(self):
       """Nothing"""

def runSimpleServer():
    # Start up a server on localhost, port 7000; each time a new
    # request comes in it will be handled by a SimpleRequestHandler
    # class; we pass in a SimpleCommandProcessor class that will be
    # able to be accessed in request handlers via server.processor;
    # and a hello message.
    server = SimpleServer(('', 7000), SimpleRequestHandler,
                          SimpleCommandProcessor(), 'Welcome to the SimpleServer.\n\r')

    try:
        server.serve_forever()
    except KeyboardInterrupt:
        sys.exit(0)

if __name__ == '__main__':
    runSimpleServer()

Uninitalised Variables and Optimisation

What is wrong with the following code

#include <stdio.h>

int main(void)
{
        char **blah;
        char *astring = "hello, world\n";

        *blah = astring;

        printf("%s\n", *blah);

        return 0;
}

Fairly obvious when it is layed out like this; that *blah = astring should be blah = &astring (this might be less obvious when it is buried deep within several functions :). blah is uninitalised, so you can't dereference it.

Unfortunatley, this code will compile with -Wall with no warnings. This is because of a little fact

-Wuninitialized
           Warn if an automatic variable is used without first being
           initialized or if a variable may be clobbered by a "setjmp"
           call.

           These warnings are possible only in optimizing compilation,
           because they require data flow information that is computed
           only when optimizing.  If you don't specify -O, you simply
           won't get these warnings.

So always turn on at least -O to get the full checking gcc can give you, and you'll probably catch things like the above before they even segfault.

Mandrake 10 for Itanium 2

Seeing as I have broken everyone's access to the main server at work by blowing out the IP quota by downloading the DVD ISO of the new Mandrake 10 for Itanium 2; I should at least write a few notes.

The installation was on a stock rx2600 with nothing really very interesting. The first problem was that it doesn't seem to boot from the EFI menu. This was easily fixed by slipping into the EFI console and manually booting the elilo included. This leads into the second problem, which is that by default the console goes to the VGA output which isn't great when you're using the managment card (as would be the usual case with a Itanium machine, I'm guessing). This is also easily fixed by appending console=ttyS0 ... but the default elilo boots straight into Linux without giving a chance to specify options, so you have to quickly break into elilo while it's decompressing the kernel.

At this stage, I managed to boot the installation. I just kept pressing enter until it stopped, when I realised that it was launching the X11 installer. This is bad choice for an Itanium machine ... especially for distribution aimed at clusters. I was left with no apparent way to get the installer working ... at a guess I decided to add text as a kernel command line paramater, which seems to drop you into the text installer.

You have to press enter each time it loads a module, and it seems to be loading different modules than it says it is.

The text install gives all sorts of whacky characters and highlights don't work correctly. This might be a terminal setting for the managment card, but I'm not sure.

Once I figured out how to tell the partitioner to auto-invoke and just partition the disk how it wanted, the next step simply hung, doing nothing. I restarted and tried as best I could to manually partition, and it still hung doing nothing. I tried again telling it to use the existing partitions, and it still hung.

So I currently haven't got any further. I'll update this if I ever do (I'm trying to subscribe to the beta list but it doesn't seem to exist), but so far the Debian installer is looking pretty good.

Unimplemented Address Bits

Although IA64 provides a full 64 bit address space, not all implementations implement all 64 bits of it. This goes for other processors to; on the Alpha the minimum implemented bits is 48 (the same rules about sign extending VA's apply to Alpha too). This means that a virtual address may have a number of unimplemented bits in the middle of it; for example (taken from Mosberger, pg. 150)

                        +---> IMPL_VA_MSB
                        |
+------+----------------+---------------------------+
| VRN  |  unimplemented |   implemented              |
+------+----------------+---------------------------+

This leads to a large hole in the middle of the address space. The thing that leads to a hole in the middle is the extra rule that the unimplemented bits must be sign extended (i.e. made the same as) the most significant bit of the implemented address bits.

Mosberger uses full 64 bit addresses in the example so it is a little hard to conceptualise this gap, but it's easy if you imagine you only have a three bit machine. Bit 1 is like the VRN, Bit 2 is unimplemented, and Bit 3 is your implemented address bits.

The full address space of the three bit machine looks like

Bit 1 Bit 2 Bit 3
1 1 1
1 1 0
1 0 1
1 0 0
0 1 1
0 1 0
0 0 1
0 0 0

Now, do another table with Bit 2 (the "unimplemented bit") sign extended from Bit 3

Bit 1 Bit 2 Bit 3
1 1 1
1 0 0
1 1 1
1 0 0
0 1 1
0 0 0
0 1 1
0 0 0

Below, the unimplemented addresses are in bold. You can see if you just take Bit 1 and Bit 3, you end up with the entire two bit address space (i.e. 00,01,10,11) and a hole in the middle!

Bit 1

Bit 2

Bit 3

1

1

1

1

0

0

1

1

1

1

0

0

0

1

1

0

0

0

0

1

1

0

0

0

You can think of the addresses as aliased, since if you follow the rules you can only get to addresses in your two bit address space. However, you can of course not follow the rules and request from the processor a dodgey address; at that point it will raise an exception.

Half baked Python Mutex

If you find yourself in the following situation : you have to use separate processes because Python doesn't really support sending signals with multiple threads but you need some sort of mutual exclusion between these processes. Your additional Iron Python ingredient is that you don't really want to have dependencies on any external modules not shipped with Python that implement SysV IPC or shared objects (though this is probably the most correct solution).

If you're a Python programmer and not a C programmer, you may not realise that mmap can map memory anonymously as it's not mentioned in the help. In this case, anonymously means that it is not really backed by a file, but by system memory. This doesn't seem to be documented in the Python documenation, but will work with most any reasonable Unix system.

Here is a Half Baked Mutex based on anonymous shared memory. This version really is half baked, because it uses the "Bakery Algorithm" designed by Lamport (which Peter Chubb taught me about in distributed systems). It differs slightly though -- our maximum ticket number must be less than 256 because of interactions between the python mmap, which treats the mmaped area as a string and the ascii character set (via ord and chr). This means heavily contested locks will be a bit "jerky" as the system waits to free up a few slots before continuing. We have a smattering of calls to sleep to attempt to reduce busy waiting. The only other caveat is "hashing" the PID down to a 1024 byte array -- a collision would probably be fatal to the algorithm.

It's not perfect, but it might something like it might get you out of a bind.

import os
import sys
from time import sleep

class HalfBakedMutex:
    def __init__(self):
        import mmap
        #C is an array of people waiting for a ticket
        self.C = mmap.mmap(-1, 1024,
                                  mmap.MAP_SHARED|mmap.MAP_ANONYMOUS)
        #N is list of tickets people are holding
        self.N = mmap.mmap(-1, 1024,
                           mmap.MAP_SHARED|mmap.MAP_ANONYMOUS)

    def take_a_number(self):
        #pick a number to "see the baker"
        i = os.getpid() % 1024

        #find current maximum number
        while True:
            #indicate we are currently getting a number
            self.C[i] = chr(1)

            max = 0
            for j in range(1024):
                if (ord(self.N[j])) > max:
                    max = ord(self.N[j])
            #we can't have max > 256 as chr(256) will fail
            if (max + 1 < 256):
                break
            else:
                self.C[i] = chr(0)
                sleep(0.1)

        #take next maximum
        self.N[i] = chr(max + 1)
        self.C[i] = chr(0)

    def lock(self):

        #first, take a number to see the baker
        self.take_a_number()

        i = os.getpid() % 1024

        for j in range(1024):
            #make sure the process isn't currently getting a ticket
            while ord(self.C[j]) == 1:
                sleep(0.1)
                continue

            # If process j has a ticket, i.e.
            #    N[j] > 0
            # AND either the process has a lower ticket, or the same
            # ticket and a lower PID, i.e.
            #   (N[j],j) < (N[i],i)
            # wait for it to run
            while (ord(self.N[j]) > 0) and (ord(self.N[j]),j) < (ord(self.N[i]),i) :
                sleep(0.1)
                continue

        #if we made it here, it is our turn to run!
        return

    def unlock(self):
        i = os.getpid() % 1024
        self.N[i] = chr(0)


mut = HalfBakedMutex()

os.fork()
os.fork() # 4 processes
os.fork() # 8 processes
os.fork() # 16 processes
os.fork() # 32 processes
os.fork() # 64 processes

while True:
    mut.lock()
    print(" ------ " + `os.getpid()` + " ------ ")
    mut.unlock()

PowerPC char nuances

By default, char on PowerPC defaults to be a unsigned char, unlike most other architectures (3-8 of the ABI).

All EBCDIC machines seem to have char defined as unsigned. I wouldn't know EBCDIC if it hit me in the face, and I doubt many people born in the 80's would either. What seems more likely is that PowerPC chose this in it's ABI due to architectural limitations around type promotion of chars to integers. PowerPC doesn't have an instruction to move and sign extend all at the same time like other architectures. For example, the following code

void f(void)
{
    char a = 'a';
        int i;

        i = a;
}

produces the following ASM on 386

f:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movb    $97, -1(%ebp)
        movsbl  -1(%ebp),%eax
        movl    %eax, -8(%ebp)
        leave
        ret

See the movsbl instruction; which in AT&T syntax means "move sign extending the byte to a long". Now watch the same thing on PowerPC with --fsigned-char

f:
        stwu 1,-32(1)
        stw 31,28(1)
        mr 31,1
        li 0,97
        stb 0,8(31)
        lbz 0,8(31)
        extsb 0,0
        stw 0,12(31)
        lwz 11,0(1)
        lwz 31,-4(11)
        mr 1,11
        blr

Note here you have to do a load clearing the top bits (lbz) and then sign extend it in a separate operation (extsb). Of course, if you do that without the -fsigned-char it just loads, without the extra clear.

So, without knowing too much about the history, my guess is that the guys at IBM/Motorola/Whoever were thinking EBCDIC when they designed the PowerPC architecture where type promotion with sign extend was probably going to be a largely superfluous instruction. The world of Linux (and consequently an operating system that ran on more than one or two architectures) appeared, so they defined the ABI to have char as unsigned because that's what they had before. Now we are stuck with this little nuance.

Moral of the story: don't assume anything, be that sizeof(long) or the signed-ness of a char.

removing mbox duplicates

Due to a snafu over the weekend my inbox (and every other folder) managed to get about 15 copies of every single message. Luckily it's easy to get rid of duplicates with formail (part of the Debian procmail package).

#!/bin/sh
formail -D 1000000 idcache < $1 -s > $1.tmp && mv $1.tmp $1
rm idcache

Opentel ODT4200PVR

I recently purchased the Opentel ODT4200 PVR from oznetics for $460. I bought via ebay and picked up from them and had problems at all; in fact they were very helpful.

Overall, we are quite happy with the unit. It has a large enough 80GB harddisk, and most importantly two tuners so you can watch one channel whilst recording another. Unlike the more expensive models, you can not record two channels at once. There is also no way to access files (USB or Firewire, etc) though the box comes with instructions on how to upgrade the harddrive should you wish to. The remote is a good size and very functional.

As I said, we are quite happy, but there are a few points of annoyance. I am running the latest 1.27 firmware. Some of the issues, both bugs and wishlist include:

  • The recording interface is quite unintuitive. When you add a "reservation" after selecting the time you automatically start a new reservation. Should you wish to change the file name for the recorded program, you need to go into the list of reservations and select one you added, and then modify the file name.
  • The reservation list only shows the channel and date and time information for your reservations. It would be nice if this showed the filename it is going to record to. It would be nice to have a few more reservation options, like "only do this on weekdays" or "repeat this 13 times".
  • There are many digital channels, most of which you want to skip over for normal use (some, like high def you want to skip all the time). The Opentel has a nice feature of favourites which allow you to group only the channels you want into one of a few favourites groups. However, one more than one occasion the box has become a bit confused if I'm in favourites mode or all channels mode requiring about five switches between modes and random channel changing, etc to come back.
  • On more than one occasion it has been randomly unable to lock onto channels and simply displays a black screen. The only way to get it to come back is to power cycle the box. This may be a heat issue as we have the box in a cupboard, and has not been a major issue.
  • If a show is recording, you can not modify in any way any of your existing recordings. This means changing their filenames, deleting an existing recording, etc.
  • I'm not sure what the deal is with the 4:3, 16:9 and letterbox formats. There is an "auto" format which seems to just be 16:9 mode. It would be nice if it just used 4:3 for non-widescreen broadcasts and letterbox when it was.
  • Fast forward using the time slider (rather than just kicking into 2x, 4x or 16x speedup this mode shows a standard time bar that you move a pointer along) occasionally gets confused. When this happens it generally ends up playing the audio for where you stopped but displaying a paused static image from where you left off. What would be really nice is if a tap of the [>|] button skipped 30 seconds, and hold it down to "slide" the time. As it is, you can get the feel of how long to hold down the button to slide past most of the ads after a few shows :)
  • "Time-shift" for pausing live TV is basically useless. Once you have started the time shift, you can't swap channels without stopping the recording and you can't rewind. So you might as well just start a normal recording, which you can watch back whilst it is recording anyway.
  • Some of the error messages are in engrish.
  • It can get very hot, though we do keep it in a cabinet. Turn the box "off" spins down the harddrive but it still remains quite warm to touch.
  • The harddrive appears to be almost constantly churning away. You can't hear it in normal operation with the TV playing, but it does make you wonder what it is doing. I can only assume it is a background defragmenter to try and keep space contiguous.
  • Guide information is only for the current and next show. This is a network thing, not a Opentel thing.
  • Channel 10 has some white dots which are apparently due to overscan from incorrectly (that might be a bit strong, I don't fully understand) broadcast VBI information.

In conclusion, the above bugs are all annoyances, and in most day to day use the unit works fine. This is a great unit if you don't want to hit the $900 mark for the Topfield model or spend more time fiddling with MythTV than watching TV. I wouldn't say we watch that much, but it's nice to be able to record the good stuff and watch it at your leisure and the Opentel works great for us.