Handling hostnames, UDP and IPv6 in Python

So, you have some application where you want the user to specify a remote host/port, and you want to support IPv4 and IPv6.

For literal addresses, things are fairly simple. IPv4 addresses are simple, and RFC2732 has things covered by putting the IPv6 address within square brackets.

It gets more interesting as to what you should do with hostnames. The problem is that getaddrinfo can return you multiple addresses, but without extra disambiguation from the user it is very difficult to know which one to choose. RFC4472 discusses this, but there does not appear to be any good solution.

Possibly you can do something like ping/ping6 and have a separate program name or configuration flag to choose IPv6. This comes at a cost of transparency.

The glibc implementation of getaddrinfo() puts considerable effort into deciding if you have an IPv6 interface up and running before it will return an IPv6 address. It will even recognise link-local addresses and sort addresses more likely to work to the front of the returned list as described here. However, there is still a small possibility that the IPv6 interface doesn't actually work, and so the library will sort the IPv6 address as first in the returned list when maybe it shouldn't be.

If you are using TCP, you can connect to each address in turn to find one that works. With UDP, however, the connect essentially does nothing.

So I believe probably the best way to handle hostnames for UDP connections, at least on Linux/glibc, is to trust getaddrinfo to return the sanest values first, try a connect on the socket anyway just for extra security and then essentially hope it works. Below is some example code to do that (literal address splitter bit stolen from Python's httplib).

import socket

DEFAULT_PORT = 123

host = '[fe80::21c:a0ff:fb27:7196]:567'

# the port will be anything after the last :
p = host.rfind(":")

# ipv6 literals should have a closing brace
b = host.rfind("]")

# if the last : is outside the [addr] part (or if we don't have []'s
if (p > b):
    try:
        port = int(host[p+1:])
    except ValueError:
        print "Non-numeric port"
        raise
    host = host[:p]
else:
    port = DEFAULT_PORT

# now strip off ipv6 []'s if there are any
if host and host[0] == '[' and host[-1] == ']':
    host = host[1:-1]

print "host = <%s>, port = <%d>" % (host, port)

the_socket = None

res = socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_DGRAM)

# go through all the returned values, and choose the ipv6 one if
# we see it.
for r in res:
    af,socktype,proto,cname,sa = r

    try:
        the_socket = socket.socket(af, socktype, proto)
        the_socket.connect(sa)
    except socket.error, e:
        # connect failed!  try the next one
        continue

    break

if the_socket == None:
    raise socket.error, "Could not get address!"

# ready to send!
the_socket.send("hi!")