Investigating Byzantine Failure

I was at the airshow watching a C-17 Globemaster do its thing and the guy giving the spiel says how it is a completely fly-by-wire system, with quadruple redundancy. I thought to myself "that's a lot of redundancy" as it zoomed overhead.

As it happened, we were talking about the Byzantine generals on Thursday, and it finally sunk in.

The Byzantine Generals is a description of coming to consensus in a distributed system. The original paper gets pretty hairy, but the results are worth remembering.

The problem states there are a number of generals deciding if they should attack the enemy. Communication between the generals is considered secure (in the simplest case) but the alliance of all the generals is unknown. Loyal generals obviously tell the truth, but a dis-loyal general can either lie or tell the truth. A majority of the generals must concur; divided they can not defeat the enemy.

The implications for a distributed system are fairly clear. Reliable communication might be as simple as TCP/IP, and although computers aren't going to lie they may be faulty and not telling you the right thing. If that involves the difference between being above a mountain or flying into it, you want to be sure you get it right.

If there are just two of us, we can see that if one of us might be a traitor we can not decide on a course of action. If I say the height is 1000 feet and you say it is 2000 feet, there is no real way to decide who is right.

Byzantine Generals

From the picture above, we can also see that if there are three of us, with one of us possibly disloyal, we can not ensure consistency. In the first case, the loyal general knows that one might be disloyal, but can't rely on the other lieutenant being loyal because he may have seen conflicting messages, and might go either way. So he can't be sure whether it will be an attack or retreat, so the situation is hopeless.

In the other case, the general is trying to confuse matters. He knows he is disloyal, so the other two are loyal. But this means he simply puts the system into an inconsistent state, where one lieutenant might go other way. Sounds a bit like something out of an episode of 24!

The paper shows that you can reduce all problems like this to some simple rules. It describes that to handle n faulty nodes, you require:

  1. 3n+1 working nodes
  2. 2n+1 independent communication channels
  3. n+1 rounds of communication

It turns out this is a real problem in real life and in real aeroplanes; the best paper I found was "The Real Byzantine Generals", Driscol, Hall, Paulitsch, Zumsteg, Digital Avionics Systems Conference, Salt Lake City, October 2004 (via IEEExpore if you have it). It mentions how the "anthropomorphic" basis for the problem may make many think they are immune to it, which is an interesting point.

So although 4 redundant systems sounds like a lot, it's just barely enough if you want the systems to have any sort of consensus.

on open source bigots

In this post Mike introduces the concept of an "Open Source Bigot" in response to this article. A perhaps more reasoned article appears here. The thread that runs between them is that you may not be able to make a profit by releasing your software as free software. Note I mean free in the FSF sense, and in the sense a open source bigot would mean. Below I refer to non-free open source software; basically I mean this to encompass any license that doesn't require you redistribute modified source code.

Something didn't sit right with me when I was reading these articles. It took me a bus trip home to think of why. In economics you study the "tragedy of the commons". Most everyone is aware of this theory these days; if you have a common resource it doesn't cost you anything to use more of it, but degrades the overall opportunity for everyone to use it. The usual example is the village field, where everyone sees it as advantageous to breed one more head of cattle, even though their actions combined will lead to overgrazing.

We learn that to avoid this you need some form of regulation over the resource. You might charge carbon dioxide output and setup a carbon credits market, or make it illegal to (and fine people who) dump into rivers.

But my use of non-free open source software doesn't influence anyone else's ability to use that software! The tragedy of the commons doesn't apply! Indeed, software is unique in that it costs basically nothing in terms of resources to duplicate and distribute it, and it doesn't cost any more for one person or a million to use it. In this case, the source code isn't the river or the village field. But there is something that is like the village field -- those ideas that the software manifests. The software does something; it may do it not so well, or very well indeed. When it doesn't do it well, we call it a bug or a crappy API; when it does we call it efficient or a nice API.

If everyone just uses the software, non-free open source is a great boon. It costs nothing to distribute and the people using it are reaping benefits they otherwise wouldn't have had. If they contribute back to their community, so much the better.

But when a for profit company takes non-free open source software and uses it in a for profit product, they might also find and fix bugs or enhance the operation of the software. Leveraging the non-free open source software to do something better for you is analogous to leveraging the nice field to have one more head of cattle. This company now has a competitive advantage. Not only this, but a bug isn't a bug until you find it is a bug (a little philosophical, but I think you get the point) meaning that once it is found, while it remains unfixed it represents a regression. It's a similar situation with an unclean API; once someone writes a cleaner one the old one becomes obsolete.

I mentioned that to overcome the commons problem we need some sort of regulation. Herein lies the GPL and "open source bigots". The GPL requires you to give back those changes to the community you got them from. When you release your code under some sort of BSD license and it's picked up by a for profit commercial vendor, you've effectively made your own little tragedy of the commons. You're left with a product is that is worse thanks to the (newly found, but unreported and unfixed) bugs and unclean API's (that now have good replacements) and can't expect to get any changes back.

And why should I expect anyone to give me back changes when it is not in their interest to do so? As Mike says, "Open Source isn't a knife, competition is a knife". Why sharpen the knife of the competitor? I know Mike and the Atlassian guys give their changes back; but if their changes are going to make Bugzilla a better competitor to them, what logical reason do they have to give them back? Ethics? Ethics is nice but not dependable. It is said that the market is never wrong and everything about markets says you don't give your competitive advantages to your competitors, ethics or not. Using the GPL pushes this from an ethical requirement to something enforceable, which changes the equation.

So you may not be able to make a profit releasing your software as free software, but if you're using non-free open source components in your proprietary, commercial product I think you are in a precarious situation. Can you say you'll always pay back the community for what they gave you? Can you say the same about everyone else?

It took us a long time to realise that pumping carbon dioxide into the atmosphere was even likely to be doing any harm. Will we some day realise that releasing code with no quid pro quo ended up disadvantaging us more than it helped? There are many licenses that don't require you to return your changes to the community you took them from and many developers prefer distribute code with such licenses. People develop software for many reasons, sometimes just to scratch an itch, sometimes just for fun and sometimes as a challenge. Who am I to say anyone's choice of license is wrong? Maybe I'm just an open source bigot.