Local qemu/kvm virtual machines, 2018

For work I run a personal and a work VM on my laptop. When I was at VMware I dogfooded internal builds of Workstation which worked well, but was always a challenge to have its additions consistently building against latest kernels. About 5 and half years ago, the only practical alternative option was VirtualBox. IIRC SPICE maybe didn't even exist or was very early, and while VNC is OK to fiddle with something, completely impractical for primary daily use.

VirtualBox is fine, but there is the promised land of all the great features of qemu/kvm and many recent improvements in 3D integration always calling. I'm trying all this on my Fedora 28 host, with a Fedora 28 guest (which has been in-place upgraded since Fedora 19), so everything is pretty recent. Periodically I try this conversion again, but, spoiler alert, have not yet managed to get things quite right.

As I happened to close an IRC window, somehow my client seemed to crash X11. How odd ... so I thought, everything has just disappeared anyway; I might as well try switching again.

Image conversion has become much easier. My primary VM has a number of snapshots, so I used the VirtualBox GUI to clone the VM and followed the prompts to create the clone with squashed snapshots. Then simply convert the VDI to a RAW image with

$ qemu-img convert -p -f vdi -O raw image.vdi image.raw

Note if you forget the progress meter, send the pid a SIGUSR1 to get it to spit out a progress.

virt-manager has come a long way too. Creating a new VM was trivial. I wanted to make sure I was using all the latest SPICE gl etc., stuff. Here I hit some problems with what seemed to be permission denials on drm devices before even getting the machine started. Something suggested using libvirt in session mode, with the qemu:///session URL -- which seemed more like what I want anyway (a VM for only my user). I tried that, put the converted raw image in my home directory and the VM would boot. Yay!

It was a bit much to expect it to work straight away; while GRUB did start, it couldn't find the root disks. In hindsight, you should probably generate a non-host specific initramfs before converting the disk, so that it has a larger selection of drivers to find the boot devices (especially the modern virtio drivers). On Fedora that would be something like

sudo dracut --no-hostonly --regenerate-all -f

As it turned out, I "simply" attached a live-cd and booted into that, then chrooted into my old VM and regenerated the initramfs for the latest kernel manually. After this the system could find the LVM volumes in the image and would boot.

After a fiddly start, I was hopeful. The guest kernel dmesg DRM sections showed everything was looking good for 3D support, along with the glxinfo showing all the virtio-gpu stuff looking correct. However, I could not get what I hoped was trivial automatic window resizing happening no matter what. After a bunch of searching, ensuring my agents were running correctly, etc. it turns out that has to be implemented by the window-manager now, and it is not supported by my preferred XFCE (see https://bugzilla.redhat.com/show_bug.cgi?id=1290586). Note you can do this manually with xrandr --output Virtual-1 --auto to get it to resize, but that's rather annoying.

I thought that it is 2018 and I could live with Gnome, so installed that. Then I tried to ping something, and got another selinux denial (on the host) from qemu-system-x86 creating icmp_socket. I am guessing this has to do with the interaction between libvirt session mode and the usermode networking device (filed https://bugzilla.redhat.com/show_bug.cgi?id=1609142). I figured I'd limp along with ICMP and look into details later...

Finally when I moved the window to my portrait-mode external monitor, the SPICE window expanded but the internal VM resolution would not expand to the full height. It looked like it was taking the height from the portrait-orientation width.

Unfortunately, forced swapping of environments and still having two/three non-trivial bugs to investigate exceeded my practical time to fiddle around with all this. I'll stick with VirtualBox for a little longer; 2020 might be the year!

Thunderbird 54 external editor

For many years I've used Thunderbird with Alexandre Feblot's external editor plugin to allow me to edit mail with emacs. Unfortunately it seems long unmaintained and stopped working on a recent upgrade to Thunderbird 54 when some deprecated interfaces were removed. Brian M. Carlson seemed to have another version which also seemed to fail with latest Thunderbird.

I have used my meagre Mozilla plugin skills to make an update at https://github.com/ianw/extedit/releases. Here you can download an xpi that passes the rigorous test-suite of ... works for me.

Australia, ipv6 and dd-wrt

It seems that other than Internode, no Australian ISP has any details at all about native IPv6 deployment. Locally I am on Optus HFC, which I believe has been sold to the NBN, who I believe have since discovered that it is not quite what they thought it was. i.e. I think they have more problems than rolling out IPv6 and I won't hold my breath.

So the only other option is to use a tunnel of some sort, and it seems there is really only one option with local presence via SixXS. There are other options, notably He.net, but they do not have Australian tunnel-servers. SixXS is the only one I could find with a tunnel in Sydney.

So first sign up for an account there. The process was rather painless and my tunnel was provided quickly.

After getting this, I got dd-wrt configured and working on my Netgear WNDR3700 V4. Here's my terse guide, cobbled together from other bits and pieces I found. I'm presuming you have a recent dd-wrt build that includes the aiccu tool to create the tunnel, and are pretty familiar with logging into it, etc.

Firstly, on dd-wrt make sure you have JFFS2 turned on for somewhere to install scripts. Go Administration, JFFS2 Support, Internal Flash Storage, Enabled.

Next, add the aiccu config file to /jffs/etc/aiccu.conf

# AICCU Configuration

# Login information
username USERNAME
password PASSWORD

# Protocol and server listed on your tunnel
protocol tic
server tic.sixxs.net

# Interface names to use
ipv6_interface sixxs

# The tunnel_id to use
# (only required when there are multiple tunnels in the list)
#tunnel_id <your tunnel id>

# Be verbose?
verbose false

# Daemonize?
daemonize true

# Require TLS?
requiretls true

# Set default route?
defaultroute true

Now you can add a script to bring up the tunnel and interface to /jffs/config/sixxs.ipup (make sure you make it executable) where you replace your tunnel address in the ip commands.

# wait until time is synced
while [ `date +%Y` -eq 1970 ]; do
sleep 5
done

# check if aiccu is already running
if [ -n "`ps|grep etc/aiccu|grep -v grep`" ]; then
aiccu stop
sleep 1
killall aiccu
fi

# start aiccu
sleep 3
aiccu start /jffs/etc/aiccu.conf

sleep 3
ip -6 addr add 2001:....:....:....::/64 dev br0
ip -6 route add 2001:....:....:....::/64 dev br0

sleep 5

#### BEGIN FIREWALL RULES ####
WAN_IF=sixxs
LAN_IF=br0

#flush tables
ip6tables -F

#define policy
ip6tables -P INPUT DROP
ip6tables -P FORWARD DROP
ip6tables -P OUTPUT ACCEPT

# Input to the router
# Allow all loopback traffic
ip6tables -A INPUT -i lo -j ACCEPT

#Allow unrestricted access on internal network
ip6tables -A INPUT -i $LAN_IF -j ACCEPT

#Allow traffic related to outgoing connections
ip6tables -A INPUT -i $WAN_IF -m state --state RELATED,ESTABLISHED -j ACCEPT

# for multicast ping replies from link-local addresses (these don't have an
# associated connection and would otherwise be marked INVALID)
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-reply -s fe80::/10 -j ACCEPT

# Allow some useful ICMPv6 messages
ip6tables -A INPUT -p icmpv6 --icmpv6-type destination-unreachable -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type packet-too-big -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type time-exceeded -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type parameter-problem -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT
ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-reply -j ACCEPT

# Forwarding through from the internal network
# Allow unrestricted access out from the internal network
ip6tables -A FORWARD -i $LAN_IF -j ACCEPT

# Allow some useful ICMPv6 messages
ip6tables -A FORWARD -p icmpv6 --icmpv6-type destination-unreachable -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type packet-too-big -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type time-exceeded -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type parameter-problem -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type echo-request -j ACCEPT
ip6tables -A FORWARD -p icmpv6 --icmpv6-type echo-reply -j ACCEPT

#Allow traffic related to outgoing connections
ip6tables -A FORWARD -i $WAN_IF -m state --state RELATED,ESTABLISHED -j ACCEPT

Now you can reboot, or run the script, and it should bring the tunnel up and you should be correclty firewalled such that packets get out, but nobody can get in.

Back to the web-interface, you can now enable IPv6 with Setup, IPV6, Enable. You leave "IPv6 Type" as Native IPv6 from ISP. Then I enabled Radvd and added a custom config in the text-box to get DNS working with google DNS on hosts with:

interface br0
{
AdvSendAdvert on;
prefix 2001:....:....:....::/64
 {
 };
 RDNSS 2001:4860:4860::8888 2001:4860:4860::8844
 {
 };
};

(again, replace the prefix with your own)

That is pretty much it; at this point, you should have an IPv6 network and it's most likely that all your network devices will "just work" with it. I got full scores on the IPv6 test sites on a range of devices.

Unfortunately, even a geographically close tunnel still really kills latency; compare these two traceroutes:

$ mtr -r -c 1 google.com
Start: Fri Jan 15 14:51:18 2016
HOST: jj                          Loss%   Snt   Last   Avg  Best  Wrst StDev
1. |-- 2001:....:....:....::      0.0%     1    1.4   1.4   1.4   1.4   0.0
2. |-- gw-163.syd-01.au.sixxs.ne  0.0%     1   12.0  12.0  12.0  12.0   0.0
3. |-- ausyd01.sixxs.net          0.0%     1   13.5  13.5  13.5  13.5   0.0
4. |-- sixxs.sydn01.occaid.net    0.0%     1   13.7  13.7  13.7  13.7   0.0
5. |-- 15169.syd.equinix.com      0.0%     1   11.5  11.5  11.5  11.5   0.0
6. |-- 2001:4860::1:0:8613        0.0%     1   14.1  14.1  14.1  14.1   0.0
7. |-- 2001:4860::8:0:79a0        0.0%     1  115.1 115.1 115.1 115.1   0.0
8. |-- 2001:4860::8:0:8877        0.0%     1  183.6 183.6 183.6 183.6   0.0
9. |-- 2001:4860::1:0:66d6        0.0%     1  196.6 196.6 196.6 196.6   0.0
10.|-- 2001:4860:0:1::72d         0.0%     1  189.7 189.7 189.7 189.7   0.0
11.|-- kul01s07-in-x09.1e100.net  0.0%     1  194.9 194.9 194.9 194.9   0.0

$ mtr -4 -r -c 1 google.com
Start: Fri Jan 15 14:51:46 2016
HOST: jj                          Loss%   Snt   Last   Avg  Best  Wrst StDev
1.|-- gateway                    0.0%     1    1.3   1.3   1.3   1.3   0.0
2.|-- 10.50.0.1                  0.0%     1   11.0  11.0  11.0  11.0   0.0
3.|-- ???                       100.0     1    0.0   0.0   0.0   0.0   0.0
4.|-- ???                       100.0     1    0.0   0.0   0.0   0.0   0.0
5.|-- ???                       100.0     1    0.0   0.0   0.0   0.0   0.0
6.|-- riv4-ge4-1.gw.optusnet.co  0.0%     1   12.1  12.1  12.1  12.1   0.0
7.|-- 198.142.187.20             0.0%     1   10.4  10.4  10.4  10.4   0.0

When you watch what is actually using ipv6 (the ipvfoo plugin for Chrome is pretty cool, it shows you what requests are going where), it's mostly all just traffic to really big sites (Google/Google Analytics, Facebook, Youtube, etc) who have figured out IPv6.

Since these are exactly the type of places that have made efforts to get caching as close as possible to you (Google's mirror servers are within Optus' network, for example) and so you're really shooting yourself in the foot going around it using an external tunnel. The other thing is that I'm often hitting IPv6 mirrors and downloading larger things for work stuff (distro updates, git clones, image downloads, etc) which is slower and wasting someone else's bandwith for really no benefit.

So while it's pretty cool to have an IPv6 address (and a fun experiment) I think I'm going to turn it off. One positive was that after running with it for about a month, nothing has broken -- which suggests that most consumer level gear in a typical house (phones, laptops, TVs, smart-watches, etc) is either ready or ignores it gracefully. Bring on native IPv6!

On VMware and GPL

I do not believe any of the current reporting around the announced case has accurately described the issue; which I see as a much more subtle question of GPL use across API layers. Of course I don't know what the real issue is, because the case is sealed and I have no inside knowledge. I do have some knowledge of the vmkernel, however, and what I read does not match with what I know.

An overview of ESXi is shown below

overview of vmkernel and vmkapi

There is no question that ESXi uses a lot of Linux kernel code and drivers. The question as I see it is more around the interface. The vmkernel provides a well-described API known as vmkapi. You can write drivers directly to this API; indeed some do. You can download a SDK.

A lot of Linux code has been extracted into vmkLinux; this is a shim between Linux drivers and the vmkapi interface. The intent here is to provide an environment where almost unmodified Linux drivers can interface to the proprietary vmkernel. This means vendors don't have to write two drivers, they can re-use their Linux ones. Of course, large parts of various Linux sub-systems' API are embedded in here. But the intent is that this code is modified to communicate to the vmkernel via the exposed vmkapi layer. It is conceivable that you could write a vmkWindows or vmkOpenBSD and essentially provide a shim-wrapper for drivers from other operating systems too.

vmkLinux and all the drivers are GPL, and released as such. I do not think there could be any argument there. But they interface to vmkapi which, as stated, is an available API but part of the proprietary kernel. So, as I see it, this is a much more subtle question than "did VMware copy-paste a bunch of Linux code into their kernel". It goes to where the GPL crosses API boundaries and what is considered a derived work.

If nothing else, this enforcement increasing clarity around that point would be good for everyone I think.

Netgear CG3100D-2 investigation

The Netgear CG3100D-2 is the default cable-modem you get for Telstra Cable, at least at one time. Having retired it after changing service providers, I wanted to see if it was somewhat able to be re-purposed.

In short it's hackability is low.

First thing was to check out the Netgear Open Source page to see if the source had anything interesting. There is some source, but honestly when you dig into the platform code and see things like kernel/linux/arch/mips/bcm963xx/setup.c:

/***************************************************************************
 * C++ New and delete operator functions
 ***************************************************************************/

/* void *operator new(unsigned int sz) */
void *_Znwj(unsigned int sz)
{
    return( kmalloc(sz, GFP_KERNEL) );
}

/* void *operator new[](unsigned int sz)*/
void *_Znaj(unsigned int sz)
{
return( kmalloc(sz, GFP_KERNEL) );
}
...

there's a bit of a red-flag that this is not the cleanest code in the world (I guess it interfaces with some sort of cross-platform SDK written in some sort of C++).

So next we can open it up, where it turns out there are two separate UARTs as shown in the following image.

UART connections on Netgear CG3100D 2BPAUS

One of these is for the bootloader and eCOS environment, and the other seems to be connected to the Linux side.

A copy of the boot-logs for the bootloader and eCOS and Linux don't show anything particuarly interesting. The Linux boot does identify itself as Linux version 2.6.30-V2.06.05u while the available source lists its version as 2.6.30-1.0.5.83.mp2 so it's questionable if the source matches whatever firmware has made it onto the modem.

We do see that this identifies as a BCM338332 which seems to be one of the many sub-models of the BCM3383 SoC cable-modem solution. There is an OpenWrt wiki page that indicates support is limited.

Both Linux and eCos boot to a login prompt where all the usual default combinations of login/passwords fail. So my next thought was to try and get to the firmware via the bootloader, which has a simple interface

BCM338332 TP0 346890
Reset Switch - Low GPIO-18 50ms
MemSize:            128 M
Chip ID:     BCM3383G-B0

BootLoader Version: 2.4.0alpha14R6T Pre-release Gnu spiboot dual-flash reduced DDR drive linux
Build Date: Mar 24 2012
Build Time: 14:04:50
SPI flash ID 0x012018, size 16MB, block size 64KB, write buffer 256, flags 0x0
Dual flash detected.  Size is 32MB.
parameter offset is 49944

Signature/PID: a0e8


Image 1 Program Header:
   Signature: a0e8
     Control: 0005
   Major Rev: 0003
   Minor Rev: 0000
  Build Time: 2013/4/18 04:01:11 Z
 File Length: 3098751 bytes
Load Address: 80004000
    Filename: CG3100D_2BPAUS_V2.06.02u_130418.bin
         HCS: 1e83
         CRC: b95f4172

Found image 1 at offset 20000

Image 2 Program Header:
   Signature: a0e8
     Control: 0005
   Major Rev: 0003
   Minor Rev: 0000
  Build Time: 2013/10/17 02:33:29 Z
 File Length: 3098198 bytes
Load Address: 80004000
    Filename: CG3100D_2BPAUS_V2.06.05u_131017.bin
         HCS: 2277
         CRC: a6c0fd23

Found image 2 at offset 800000

Image 3 Program Header:
   Signature: a0e8
     Control: 0105
   Major Rev: 0002
   Minor Rev: 0017
  Build Time: 2013/10/17 02:22:30 Z
 File Length: 8277924 bytes
Load Address: 84010000
    Filename: CG3100D_2BPAUS_K2630V2.06.05u_131017.bin
         HCS: 157e
         CRC: 57bb0175

Found image 3 at offset 1000000

Enter '1', '2', or 'p' within 2 seconds or take default...
. .

Board IP Address  [0.0.0.0]:           192.168.2.10
Board IP Mask     [255.255.255.0]:
Board IP Gateway  [0.0.0.0]:
Board MAC Address [00:10:18:ff:ff:ff]:

Internal/External phy? (e/i/a)[a]
Switch detected: 53125
ProbePhy: Found PHY 0, MDIO on MAC 0, data on MAC 0
Using GMAC0, phy 0

Enet link up: 1G full


Main Menu:
==========
  b) Boot from flash
  g) Download and run from RAM
  d) Download and save to flash
  e) Erase flash sector
  m) Set mode
  s) Store bootloader parameters to flash
  i) Re-init ethernet
  p) Print flash partition map
  r) Read memory
  w) Write memory
  j) Jump to arbitrary address
  X) Erase all of flash except the bootloader
  z) Reset

Flash Partition information:

Name           Size           Offset
=====================================
bootloader   0x00010000     0x00000000
image1       0x007d0000     0x00020000
image2       0x007c0000     0x00800000
linux        0x00800000     0x01000000
linuxapps    0x00600000     0x01800000
permnv       0x00010000     0x00010000
dhtml        0x00200000     0x01e00000
dynnv        0x00040000     0x00fc0000
vennv        0x00010000     0x007f0000

The "read memory" seems to give you one byte at a time and I'm not certain it actually works. So I think the next step is solder some leads to dump out the firmware from the flash-chip directly, which is on the underside of the board. At that point, I imagine the passwords would be easily found in the image and you might then be able to leverage this into some sort of further hackability.

If you want a challenge and have a lot of time on your hands, this might be your platform — but practically I think the best place for this is the recycling bin.

Bash arithmetic evaluation and errexit trap

In the "traps for new players" category:

count=0
things="0 1 0 0 1"

for i in $things;
do
   if [ $i == "1" ]; then
       (( count++ ))
   fi
done

echo "Count is ${count}"

Looks fine? I've probably written this many times. There's a small gotcha:

((expression))
The expression is evaluated according to the rules described below under ARITHMETIC EVALUATION. If the value of the expression is non-zero, the return status is 0; otherwise the return status is 1. This is exactly equivalent to let "expression".

When you run this script with -e or enable errexit -- probably because the script has become too big to be reliable without it -- count++ is going to return 0 (post-increment) and per above stop the script. A definite trap to watch out for!

Finding out if you're a Rackspace instance

Different hosting providers do things slightly differently, so it's sometimes handy to be able to figure out where you are. Rackspace is based on Xen and their provided images should include the xenstore-ls command available. xenstore-ls vm-data will give you a handy provider and even region fields to let you know where you are.

function is_rackspace {
  if [ ! -f /usr/bin/xenstore-ls ]; then
      return 1
  fi

  /usr/bin/xenstore-ls vm-data | grep -q "Rackspace"
}

if is_rackspace; then
  echo "I am on Rackspace"
fi

Other reading about how this works:

ip link up versus status UP

ip link show has an up filter which the man page describes as display running interfaces. The output also shows the state, e.g.

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
  link/ether e8:03:9a:b6:46:b3 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
  link/ether c4:85:08:2a:6e:3a brd ff:ff:ff:ff:ff:ff

It is not described what the difference is between these two.

The state output is derived from IFLA_OPERSTATE as per this little bit in ip/ipaddress.c in iproute2

if (tb[IFLA_OPERSTATE])
  print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

as per operstates.txt a status of UP means that the interface is up and can be used to send packets.

In contrast, ip link show up runs a different filter

if (filter.up && !(ifi->ifi_flags&IFF_UP))
  return 0;

Thus only interfaces that are in state IFF_UP are shown. This means it is administratively up, but this does not mean it is available and ready to send packets.

This may make a difference if you're trying to use the output of ip link in any sort of automation where you have to probe the available network-interfaces and know where packets are going to get out or not.