Django toolchain on Debian

Although Django is well packaged for Debian, I've recently come to the conculsion that the packages are really not what I want. The problem is that my server runs Debian stable, while my development laptop runs unstable, and Django revisions definitely fall into the "unstable" category. There really is no way to use a system Django 1.1 on one side, and a system Django 1.0 on the other.

After a bit of work, I think I've got something together that works, and I post it here in the hope it is useful for someone else. This info has been gleaned from similar references such as this <http://www.danceric.net/2009/03/26/django-virtualenv-and-mod_wsgi/> and this <http://www.saltycrane.com/blog/2009/05/notes-using-pip-and-virtualenv-django/>.

This is aimed at running a server using Debian stable (5.0) for production and an unstable environment for development. You actually need both to get this running. This is based on a project called "project" that lives in /var/www

  1. First step is to install python-virtualenv on both.

  2. Create a virtualenv on both, using the --no-site-packages to make it a stand-alone environment. This is like a chroot for python.

    $ virtualenv --no-site-packages project
    New python executable in project/bin/python
    Installing setuptools............done.
    
  3. The unstable environment has a file you'll need to copy into the stable environment - bin/activate_this.py. The stable version of python-virtualenv isn't recent enough to include this file, but you need it to essentially switch the system python into the chrooted environment. This will come in handy later when setting up the webserver.

  4. There are probably better ways to keep the two environments in sync, but I simply take a manual approach of doing everything twice, once in each. So from now on, do the following in both environments.

  5. Activate the environment

    /var/www$ cd project
    /var/www/project$ . bin/activate
    (project) /var/www/project$
    
  6. Use easy_install to install pip

    (project) /var/www/project$ easy_install pip
    Searching for pip
    Reading http://pypi.python.org/simple/pip/
    Reading http://pip.openplans.org
    Best match: pip 0.4
    Downloading http://pypi.python.org/packages/source/p/pip/pip-0.4.tar.gz#md5=b45714d04f8fd38fe8e3d4c7600b91a2
    Processing pip-0.4.tar.gz
    Running pip-0.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Wu9O-U/pip-0.4/egg-dist-tmp-xjSdxq
    warning: no previously-included files matching '*.txt' found under directory 'docs/_build'
    no previously-included directories found matching 'docs/_build/_sources'
    zip_safe flag not set; analyzing archive contents...
    pip: module references __file__
    Adding pip 0.4 to easy-install.pth file
    Installing pip script to /var/www/project/bin
    
    Installed /var/www/project/lib/python2.5/site-packages/pip-0.4-py2.5.egg
    Processing dependencies for pip
    Finished processing dependencies for pip
    
  7. Install setuptools, also using easy_install (for some reason, pip can't install it). There is a trick here, you need to specify at least version 0.6c9 or there will be issues with the SVN version on Debian stable when you try to get Django in the next step.

    (project) /var/www/project$ easy_install setuptools==0.6c9
    Searching for setuptools==0.6c9
    Reading http://pypi.python.org/simple/setuptools/
    Best match: setuptools 0.6c9
    Downloading http://pypi.python.org/packages/2.5/s/setuptools/setuptools-0.6c9-py2.5.egg#md5=fe67c3e5a17b12c0e7c541b7ea43a8e6
    Processing setuptools-0.6c9-py2.5.egg
    Moving setuptools-0.6c9-py2.5.egg to /var/www/project/lib/python2.5/site-packages
    Removing setuptools 0.6c8 from easy-install.pth file
    Adding setuptools 0.6c9 to easy-install.pth file
    Installing easy_install script to /var/www/project/bin
    Installing easy_install-2.5 script to /var/www/project/bin
    
    Installed /var/www/project/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg
    Processing dependencies for setuptools==0.6c9
    Finished processing dependencies for setuptools==0.6c9
    
  8. Create a requirements.txt with the path to the Django SVN for pip to install, then and then install it.

    (project) /var/www/project$ cat requirements.txt
    -e svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django
    (project) /var/www/project$ pip install -r requirements.txt
    Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1))
      Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django
    
    (project) /var/www/project$ pip install -r requirements.txt
    Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1))
      Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django
    ... so on ...
    
  9. Almost there! You can keep installing more Python requirements with pip if you need, but we've got enough here to start.

  10. Create a file in /var/www/project called project-python.py. This will be the Python interpreter the webserver uses, and basically exec's itself into the virtalenv. The file should contain the following:

    activate_this = "/var/www/project/bin/activate_this.py"
    execfile(activate_this, dict(__file__=activate_this))
    
    from django.core.handlers.modpython import handler
    
  11. Now it's time to start the Django project. I like to create a new directory called project, which will be the parent directory kept in the SCM with all the code, media, templates, database (if using SQLite) etc. In this way to keep the two environments up-to-date I simply svn ci on one side, and svn co on the other.

    (project) /var/www/project$ mkdir project
    (project) /var/www/project/project$ mkdir db django media www
    (project) /var/www/project/project$ cd django/
    (project) /var/www/project/project/django$ django-admin startproject myproject
    
  12. Last step now is to wire-up Apache to serve it all up. The magic is making sure you specify the correct PythonHandler that you made before to use the virtualenv, and include the right paths so you can find it and all the required Django settings.

    DocumentRoot /var/www/project
    
    <Location "/">
        SetHandler python-program
        PythonHandler project-python
        PythonPath "['/var/www/project/','/var/www/project/project/django/'] + sys.path"
        SetEnv DJANGO_SETTINGS_MODULE myproject.settings
        PythonDebug On
    </Location>
    
    Alias /media /var/www/project/project/media
    <Location "/media">
        SetHandler none
    </Location>
    <Directory "/var/www/project/project/media">
        AllowOverride none
        Order allow,deny
        Allow from all
        Options FollowSymLinks Indexes
    </Directory>
    

With all this, you should be up and running in a basic but stable environment. It's easy enough to update packages for security fixes, etc via pip after activating your virtualenv.

SIGTTOU and switching to canonical mode

Here's an interesting behaviour that, as far as I can tell, is completley undocumented, sightly consfusing but fairly logical. Your program should receive a SIGTTOU when it is running in the background and attempts to output to the terminal -- the idea being that you shouldn't scramble the output by mixing it in while the shell is trying to operate. Here's what the bash manual has to say

Background processes are those whose process group ID differs from the
terminal's; such processes are immune to key- board-generated signals.
Only foreground processes are allowed to read from or write to the
terminal.  Background processes which attempt to read from (write to)
the terminal are sent a SIGTTIN (SIGTTOU) signal by the terminal
driver, which, unless caught, suspends the process.

So, consider the following short program, which writes some output and catches any SIGTTOU's, with an optional flag to switch between canonical and non-canonical mode.

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <termios.h>
#include <unistd.h>

static void sig_ttou(int signo) {
   printf("caught SIGTTOU\n");
   signal(SIGTTOU, SIG_DFL);
   kill(getpid(), SIGTTOU);
}

int main(int argc, char *argv[]) {

   signal(SIGTTOU, sig_ttou);

   if (argc != 1) {
      struct termios tty;

      printf("setting non-canoncial mode\n");
      tcgetattr(fileno(stdout), &tty);
      tty.c_lflag &= ~(ICANON);
      tcsetattr(fileno(stdout), TCSANOW, &tty);
   }

   int i = 0;
   while (1) {
      printf("  *** %d ***\n", i++);
      sleep(1);
   }
}

This program ends up operating in an interesting manner.

  1. Run in the background, canonical mode : no SIGTTOU and output gets multiplexed with shell.

    $ ./sigttou &
      *** 0 ***
    [1] 26171
    $   *** 1 ***
      *** 2 ***
      *** 3 ***
    
  2. Run in the background, non-canonical mode : SIGTTOU delivered

    $ ./sigttou 1 &
    [1] 26494
    ianw@jj:/tmp$ setting non-canoncial mode
    caught SIGTTOU
    
    
    [1]+  Stopped                 ./sigttou 1
    
  3. Run in the background, canonical mode, tostop set via stty : SIGTTOU delivered, seemingly after a write proceeds

    $ stty tostop
    $ ./sigttou &
    [2] 26531
    ianw@jj:/tmp$   *** 0 ***
    caught SIGTTOU
    
    
    [2]+  Stopped                 ./sigttou
    

You can see a practical example of this by comparing the difference between cat file & and more file &. The semantics make some sense -- anything switching off canonical mode is like to be going to really scramble your terminal, so it's good to stop it and let it's terminal handling functions run. I'm not sure why canoncial background is considered useful mixed in with your prompt, but someone, somewhere must have decided it was so.

Update: upon further investigation, it is the switching of terminal modes that invokes the SIGTTOU. To follow the logic through more, see the various users of tty_check_change in the tty driver.

Using frozen chocolate to visualise microwave heat distribution

My attempt at answering that most important of questions : where should one place their plate in the microwave to achieve maximal heating?

Review : The Race for a New Game Machine

I recently finished The Race for a New Game Machine: Creating the Chips Inside the XBox 360 and the Playstation 3 (David Shippy and Mickie Phipps); an interesting insight into the processor development process from some of the lead architects.

The executive summary is : Sony, Toshiba and IBM (STI) decided to get together to create the core of the Playstation 3 — the Cell processor. Sony, with their graphics and gaming experience, would do the Synergistic Processing Elements; extremely fast but limited sub-units specialising in doing 3D graphics and physics work (i.e. great for games). IBM would do a Power based core that handled the general purpose computing requirements.

The twist comes when Microsoft came along to IBM looking for the Xbox 360 processor, and someone at IBM mentioned the Power core that was being worked on for the Playstation. Unsurprisingly, the features being built for the Playstaion also interested Microsoft, and the next thing you know, IBM is working on the same core for Microsoft and Sony at the same time, without telling either side.

This whole chain of events makes for a very interesting story. The book is written for a general audience, but you'll probably get the most out of it if you already have some knowledge of computer architecture; if you're trying to understand some of the concepts referred to from the two line descriptions you'll get a bit lost (H&P it is not).

The only small criticism is that it sometimes falls into reading a bit like a long LinkedIn recommendation. However, the book is very well paced, and throws in just enough technical tidbits amongst the corporate and personal dramas to make it a very fun read.

One thing that is talked about a bit is the fan-out of four (FO4) metric used in the designers quest to push the chip as fast as possible (and, as mentioned many times in the book, faster than what Intel could do!). I thought it might be useful to expand on this interesting metric a bit.

FO4

One problem facing chip architects is that, thanks to Moore's Law, it is hard to find a constant to compare design versus implementation. For example, you may design an amazing logic-block to factor large integers into products of prime numbers, but somebody else with better fabrication facilities might be able to essentially brute-force a better solution by producing faster hardware using a much less innovative design.

Some metric is needed that can compare the two designs discounting who has the better fabrication process. This is where the FO4 comes in.

When you change the input to a logic gate, it is not like it magically flips the output to the correct level instantaneously. There is a latency while everything settles to its correct level — the gate delay. The more gates connected to the output of a gate the more current required, which has additional effects on latency. The FO4 latency is defined as the time required to flip an inverter gate connected to (fanned-out) to four other inverter gates.

Fan-out of four

Thus you can describe the latency of other logic blocks in multiples of FO4 latencies. As this avoids measuring against wall-time it is an effective description of the relative efficiency of logic designs. For example, you may calculate that your factoriser has a latency of 100 FO4. Just because someone else's 200 FO4 factoriser gets a result a few microseconds faster thanks to their fancy ultra-low-FO4-latency fabrication process, you can still show that your design, at least a priori, is better.

The book refers several times to efforts to reduce the FO4 of the processor as much as possible. The reason this is important in this context is that the maximum latency on the critical path will determine the fastest clock speed you can run the processor at. For reasons explained in the book high clock speed was a primary goal, so every effort had to be made to reduce latencies.

All modern processors operate as a production line, with each stage doing some work and passing it on to the next stage. Clearly the slowest stage determines the maximum speed that the production line can run at (weakest link in the chain and all that). For example, if you clock at 1Ghz, that means each cycle takes 1 nanosecond (1s / 1,000,000,000Hz). If you have a F04 latency of say, 10 picoseconds, that means any given stage can have a latency of no more than 100 FO4 — otherwise that stage would not have enough time to settle and actually produce the correct result.

Thus the smaller you can get the FO4 latencies of your various stages, the higher you can safely up the clock speed. One way around long latencies might be to split-up your logic into smaller stages, making a much longer pipeline (production line). For example, split your 100 FO4 block into two 50 FO4 stages. You can now clock the processor higher, but this doesn't necessarily mean you'll get actual results out the end of the pipeline any faster (as Intel discovered with the Pentium 4 and it's notoriously long pipelines and corresponding high clock rates).

Of course, this doesn't even begin to describe the issues with superscalar design, instruction level parallelism, cache interaction and the myriad of other things the architects have to consider.

Anyway, after reading this book I guarantee you'll have an interesting new insight the next time you fire-up Guitar Hero.

Dig Jazz Applet, V2

It seems the ABC updated the DIG Jazz now-playing list format, breaking V1. Some quick flash disassembly and a bit of hacking, and order is restored. As a bonus, it now shows the upcoming songs.

DIG Jazz now-playing Gnome applet

Source or Debian package.

Quickly describing hash utilisation

I think the most correct way to describe utilisation of a hash-table is using chi-squared distributions and hypothesis and degrees of freedom and a bunch of other things nobody but an actuary remembers. So I was looking for a quick method that was close-enough but didn't require digging out a statistics text-book.

I'm sure I've re-invented some well-known measurement, but I'm not sure what it is. The idea is to add up the total steps required to look-up all elements in the hash-table, and compare that to the theoretical ideal of a uniformly balanced hash-table. You can then get a ratio that tells you if you're in the ball-park, or if you should try something else. A diagram should suffice.

Scheme for acquiring a hash-utilisation ratio

This seems to give quite useful results with a bare minimum of effort, and most importantly no tricky floating point math. For example, on the standard Unix words with a 2048 entry hash-table, the standard DJB hash came out very well (as expected)

Ideal 2408448
Actual 2473833
----
Ratio 0.973569

To contrast, a simple "add each character" type hash:

Ideal 2408448
Actual 6367489
----
Ratio 0.378241

Example code is hash-ratio.py. I expect this measurement is most useful when you have a largely static bunch of data for which you are attempting to choose an appropriate hash-function. I guess if you are really trying to hash more or less random incoming data and hence only have a random sample to work with, you can't avoid doing the "real" statistics.

Relocation truncated to fit - WTF?

If you code for long enough on x86-64, you'll eventually hit an error such as:

(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `array' defined in foo section in ./pcrel8.o

Here's a little example that might help you figure out what you've done wrong.

Consider the following code:

$ cat foo.s
.globl foovar
  .section   foo, "aw",@progbits
  .type foovar, @object
  .size foovar, 4
foovar:
   .long 0

.text
.globl _start
 .type function, @function
_start:
  movq $foovar, %rax

In case it's not clear, that would look something like:

int foovar = 0;

void function(void) {
  int *bar = &foovar;
}

Let's build that code, and see what it looks like

$ gcc -c foo.s

$ objdump --disassemble-all ./foo.o

./foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
   0:        48 c7 c0 00 00 00 00   mov    $0x0,%rax

Disassembly of section foo:

0000000000000000 <foovar>:
   0:        00 00          add    %al,(%rax)
   ...

We can see that the mov instruction has only allocated 4 bytes (00 00 00 00) for the linker to put in the address of foovar. If we check the relocations:

$ readelf --relocs ./foo.o

Relocation section '.rela.text' at offset 0x3a0 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  00050000000b R_X86_64_32S      0000000000000000 foovar + 0

The R_X86_64_32S relocation is indeed only a 32-bit relocation. Now we can tickle this error. Consider the following linker script, which puts the foo section about 5 gigabytes away from the code.

$ cat test.lds
SECTIONS
{
 . = 10000;
 .text : { *(.text) }
 . = 5368709120;
 .data : { *(.foo) }
}

This now means that we can not fit the address of foovar inside the space allocated by the relocation. When we try it:

$ ld -Ttest.lds ./foo.o
./foo.o: In function `_start':
(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `foovar' defined in foo section in ./foo.o

What this means is that the full 64-bit address of foovar, which now lives somewhere above 5 gigabytes, can't be represented within the 32-bit space allocated for it.

For code optimisation purposes, the default immediate size to the mov instructions is a 32-bit value. This makes sense because, for the most part, programs can happily live within a 32-bit address space, and people don't do things like keep their data so far away from their code it requires more than a 32-bit address to represent it. Defaulting to using 32-bit immediates therefore cuts the code size considerably, because you don't have to make room for a possible 64-bit immediate for every mov.

So, if you want to really move a full 64-bit immediate into a register, you want the movabs instruction. Try it out with the code above - with movabs you should get a R_X86_64_64 relocation and 64-bits worth of room to patch up the address, too.

If you're seeing this and you're not hand-coding, you probably want to check out the -mmodel argument to gcc.

YUI ButtonGroup Notes

Some tips and things to check if your YUI ButtonGroup isn't behaving as you wish it would.

  • Double-check your <body> tag has class="yui-skin-sam"

  • Unlike in the documentation example, you can't just put a call to YAHOO.widget.ButtonGroup pointing to your div anywhere in your HTML and expect it to work. You've got to wait for it to be ready with something like:

    <script type="text/javascript">
    YAHOO.util.Event.onContentReady("my_button_div", function() {
      var oButtonGroup = new YAHOO.widget.ButtonGroup("my_button_div");
    });
    </script>
    
  • You can easily get an image in each button. For example, if your button is defined as:

    <span id="my-button-id" class="yui-button yui-radio-button yui-button-checked">
     <span class="first-child">
       <button type="button" hidefocus="true"></button>
     </span>
    </span>
    

    Simply add a CSS class something like:

    .yui-button#my-button-id button { background:url(http://server/image.jpg) 50% 50% no-repeat; }
    

Hopefully, this will save someone else a few hours!