LCA Speakers

Mon 22 January 2007

Just for fun I took a photo of the LCA 2007 speakers whilst they were on stage. It didn't turn out too bad, but if anyone can help add notes around the people I don't know (click on the photo for the Flickr interface) that would be cool!

|LCA2007 Speakers|

image:: http://farm1.static.flickr.com/117/363809942_fd644fc210.jpg

alt:	LCA2007 Speakers

Network Manager, PPTP

Fri 12 January 2007

After this post by Stephen Thorne I was inspired to get PPTP tunneling working with network manager so I can easily get online at LCA2007, and you all can't see what I'm surfing. The plan is to tunnel back through my house. Here is a terse overview of what I did

Install network-manager package.
Remove entries from /etc/network/interfaces. That pretty much gets network manager working.
Restart network manager with sudo /etc/dbus-1/event.d/25NetworkManager restart. Generally do this after you fiddle with anything behind network-managers back.
Look for network-manager-pptp package, realise it is only on Ubuntu. Rebuild for debian. Install and restart
On my home router, forward port 1723 to my server.
Install pptpd package on server.
# modprobe ppp_mppe on server and laptop
# echo "ianw pptpd password *" > /etc/ppp/chap-secrets on server
Modify the IP paramters in /etc/pptpd.conf on server to be sane.
Modify /etc/ppp/pptpd-options on server to send a local DNS server.
Enable IP forwarding in /etc/sysctl.conf on server

Now, back to my laptop, I add a new VPN connection with network-manager. You can put the following in a file and "import" it, and it may work.

[main]
Description=wienand
Connection-Type=pptp
PPTP-Server=my.home.server
Use-Peer-DNS=yes
Encrypt-MPPE=no
Encrypt-MPPE-128=yes
Compress-MPPC=no
Compress-Deflate=no
Compress-BSD=no
PPP-Lock=yes
Auth-Peer=no
Refuse-EAP=no
Refuse-CHAP=no
Refuse-MSCHAP=no
MTU=1416
MRU=1416
LCP-Echo-Failure=10
LCP-Echo-Interval=10
PPP-Custom-Options=
Peer-DNS-Over-Tunnel=yes
X-NM-Routes=
Use-Routes=no

The big trick for me was having Auth-Peer turned off Restart network-manager after this.

Attempt to connect to your new VPN via network-manager. If you are very lucky, it will "just work". If you are unlucky, you will spend the next few hours staring at the syslogs on both server and client, after turning up the level of debugging for both.

My only problem now is how do I get a domain suffix to my laptop so I can easily get to my home network resources, which all live at server.wienand.home, without editing /etc/resolv.conf by hand? Any suggestions welcome!

Inside the Machine - Jon Stokes

Fri 12 January 2007

I've enjoyed Ars Technica for a long time, even from the pre-history before RSS when you had to remember what sites you liked to visit. Having an interest in computer architecture, I thus grabbed (a signed copy of) Jon Stokes' Inside the Machine quickly when it came out.

Having learnt what I know about architecture the "traditional" way (e.g. textbooks and courses) I was interested in the "beginners guide" approach. The early parts of the book, explaining the basics of microprocessors, rely quite heavily on analogies (the "file clerk", the "document storage room", the "SUV factory" etc). Personally, I'm not sure how much this aids the understanding of the material — for mine the length of time spent describing the analogies gets in the way of the material. I understand, however, that I am not the target market for the early part of the book. The introduction to instruction encoding with the "DLW" architecture serves as a good illustration; it is the type of stuff I think should be in every introductory CS course. The diagrams throughout are very clear, and it really lives up to its billing as an "illustrated guide".

For mine, what is most impressive is the later chapters, which are an unrivalled review of x86 and PowerPC architecture. They are clearly well researched, and step you through the architecture and its history logically and clearly. The level of detail is perfect, giving you more than enough depth to understand what is happening but not bothering to delve into irrelevant esoteric implementation details which would simply make the book fatter (c.f. H&P). If you've studied architecture before you can skip to Chapter 5 and dive into this bit straight away.

If you have more than a passing interest, I still think investing in a copy of Hennessy and Patterson and plowing through the first few chapters (and appendixes) is an unrivalled introduction. But this book is about 1/3rd as thick, much easier reading and, more importantly, is the only current compendium on modern (i.e. still in production/development) architectures. I shudder to think how long was spent pouring over architecture manuals, whitepapers and old HOTCHIPS papers to distill the useful information it contains. Computer architecture is a fascinating art, and this book may well be the best passport to the otherwise inaccessible city of transistors just below your fingertips.

Stressing the TLB with matrices

Wed 10 January 2007

When playing around with memory management a matrix multiply is often mentioned as a TLB thrasher, and a good way to stress your system. It's interesting to illustrate this process a little.

The first thing to remember is that C99 (6.5.2.1 to be precise) dictates that arrays are always stored in row-major order, as shown below.

So, remembering back to first year algebra, to multiply two matrices we multiply and sum rows and columns, illustrated linearly as the computer see it below. Black lines are page boundaries, while boxes are matrix elements.

The simplest code to do this is usually along the lines of

int a[LEN][LEN];
int b[LEN][LEN];
int r[LEN][LEN];

int i,j,k

for(i=0; i < LEN; i++)
     for(j = 0; i < LEN; j++)
           for(k = 0; k < LEN; k++)
             r[i][j] += a[i][k] + b[k][j];

We can see how we have a repeated linear walk over the memory holding matrix b, done for each element in a. If we make b sufficiently large (say, a few hundred pages) we can be sure that by the time we get to the end, we have kicked out the earlier TLB entries. Then we start all over again, and start faulting pages back in. Thus a matrix multiply problem comes down to doing repeated page-sized stride walks over a linear region.

Looking at the diagram, it is also clearer what can solve this problem. We can either have more TLB entries, so we don't have to kick out entries which we will use again, or have larger pages -- e.g. less black lines. Or a smarter algorithm ...

LCD Scroller

Tue 26 December 2006

I received for Christmas a cool little LCD scrolling panel. Of course the first thing I did was sniff the crappy Windows control program to figure out how to program it.

It doesn't have a particular brand name that I can see, but it comes up as a Prolific USB to Serial converter when plugged in. It looks like the photos below.

www.**flickr**.com

The protocol to program it, if you could call it that, consists of sending at 9600 baud, N81, a leading byte 0xAA followed by a one byte number between 1 and 5 for the scroll speed, your message of up to 73 characters (more makes it crash) and a trailing byte of 0xCC.

I have written lcd-scroller program to control it. QA seems to be pretty poor on this thing; send too many characters and it locks up or resets, and occasionally it shows PLEASE WRITE AGAIN! for no apparent reason. However, it's still fun to stream an RSS feed of the cricket scores or similar!

Set-bit Population Counts

Fri 15 December 2006

We were visiting our friends at HP yesterday, and whilst discussing some of the Itanium product line plans, the comment that certain T L A's have a real sweet spot for Itanium because it can tell them the number of set bits in a word very quickly. I was aware of the popcnt instruction, but wasn't aware people were buying machines to use it.

I instrumented some code to run a test -- one using the GCC built-in (which falls through to the popcnt instruction on Itanium), one using a re-implementation of the generic GCC version, and one described in the excellent "Algorithms for programmers" book by Jörg Arndt (download it now if you don't have it!).

A naive implementation would do it with 64 shifts, which would take forever (for a fairly good explanation of why shifting takes so long, see the patent on optimised bit-shifting). The gcc built-in way is a clever hack:

const int count_table[256] =
{
    0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
    3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
};

static inline unsigned long countit_table(unsigned long x) {
  int i, ret = 0;

  for (i = 0; i < 64; i += 8)
    ret += count_table[(x >> i) & 0xff];

  return ret;
}

The results I saw, in cycles, were:

Algorithm	2Ghz AMD Opteron 270	1.5Ghz Itanium2
Built in	30	6
Table lookup	45	27
Shifting	60	48

The AMD GCC built-in seems to create some slightly tighter code than my extract table lookup version -- it doesn't unroll the loop and I assume the cache does a good job of the prediction. But if your applications rely on finding set bits heavily, then Itanium is the platform for you!

Telstra mobile internet pricing considered harmful

Wed 06 December 2006

I recently saw an ad for the new Nokia NSeries where a smiling guy takes some happy snaps and clicks a few buttons to upload them to Flickr.

I was that guy (well, without the Nokia) until I got my bill for doing it.

The resultant photo is completely crap, but it cost so much I can't bring myself to delete it.

I'm pretty sure the guy in the ad won't be smiling when he gets his bill, especially since the NSeries has got another megapixel over my phone's camera!

Django setup notes

Sat 25 November 2006

This might be helpful if you're new to Django and figuring out how to develop and deploy it effectively.

I tend to keep my projects together in a subversion repository, which means I like everything in one place. I develop locally, and then deployment consists of svn update on the remote server. I generally lay things out like below.

/var/www/project
|-- db
|-- django
|   `-- project
|       |-- __init__.py
|       |-- application
|       |   |-- __init__.py
|       |   |-- models.py
|       |   |-- views.py
|       |-- manage.py
|       |-- settings.py
|       |-- urls.py
|-- media
|   |-- admin
|   |   `-- media -> /usr/share/python-support/python-django/django/contrib/admin/media/
|   |-- images
|   |   `-- something.png
|   |-- blah.css
`-- www
    |-- base.html
    |-- application
        |-- something.html

db stores a SQLite DB -- generally enough for me but of course you can use a "real" database. Because my projects are small, a database backup then consists of a svn ci. You need to make sure the web server has permissions to this. As a hack I link /var/www/project/db/dbfile.db into my local development tree, which generally isn't under /var/www/. This makes it easier to deploy.
django holds generated stuff, views, models, etc. Run django-admin.py in this directory. Apps live underneath it too.
media stores static media. I use the Debian packages, so link into the default store for the admin media files, and setup ADMIN_MEDIA_PREFIX in settings.py to be /media/admin/.
www holds the HTML templates.

I then setup Apache with a snippet like

DocumentRoot /var/www/project

<Location "/">
         SetHandler python-program
             PythonHandler django.core.handlers.modpython
             PythonPath "['/var/www/project/django'] + sys.path"
             SetEnv DJANGO_SETTINGS_MODULE project.settings
             PythonDebug On
</Location>

Alias /media /var/www/project/media
<Location "/media">
         SetHandler none
</Location>
<Directory "/var/www/project/media">
          AllowOverride none
              Order allow,deny
              Allow from all
              Options FollowSymLinks Indexes
</Directory>

This lets Apache serve up the static files, which it obviously does well. The only other trick is getting the Django server to serve up the static stuff when I'm developing with it locally. I add the following to the bottom of urls.py

if "GATEWAY_INTERFACE" not in os.environ:
    urlpatterns += patterns('',
            (r'^media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': '/path/to/local/dev/project/media/'}),
                )

This makes sure it only serves when it is not running under mod_python.