Printing files side-by-side

I'm really not sure if there is an eaiser way to do this, but here is my newly most-used utility. It puts two files beside each other; but as opposed to sdiff/diff -y doesn't analyse it, and as opposed to paste keeps to fixed widths. Here is a screen shot.

$  python ./side-by-side.py --width 40 /tmp/afile.txt /tmp/afile2.txt
this is a file                           |  i have here another file
that has lines of                        |  that also has some text and
text.  to read                           |  some lines.  although it
                                         |  is slightly longer than the other
this is a really really really really re *  file with all these words
                                         |  in
                                         |  it

I'd love to hear from all the Python freaks how you could get the LOC even lower; every time I do something like this I find out about a new, quicker, way to recurse a list :)

#!/usr/bin/python

import sys, os
from optparse import OptionParser

class InFile:
    def __init__(self, filename):
        try:
            self.lines=[]
            self.maxlen = 0
            for l in open(filename).readlines():
                self.lines.append(l.rstrip())
        except IOError, (error, message):
            print "Can't read input %s : %s" % (filename, message)
            sys.exit(1)

        self.nlines = len(self.lines)
        if self.nlines == 0:
            self.lines.append("")
        self.maxlen = max(map(len, self.lines))

    # pad to the max len, with a extra space then the deliminator
    def pad_lines(self, nlines, width=0, nodiv=False, notrunc=False):
        if width == 0:
            width = self.maxlen
        pad = []
        for i in range(0, width):
            pad += " "
        # add on some extra for the divider and spaces
        pad += "   "
        padlen = len(pad)
        for i in range(0, nlines):
            try:
                linelen = len(self.lines[i])
            except IndexError:
                self.lines.append("")
                linelen = 0
            if (linelen > width):
                linelen = width
                if not notrunc:
                    pad[-2] = "*"
            elif nodiv:
                pad[-2] = " "
            else:
                pad[-2] = "|"
            self.lines[i] = self.lines[i][:linelen] +  "".join(pad[linelen - padlen:])

usage= "side-by-side [-w width] file1 file2 ... filen"

parser = OptionParser(usage, version=".1")

parser.add_option("-w", "--width", dest="width", action="store", type="int",
                      help="Set fixed width for each file", default=0)
parser.add_option("--last-div", dest="lastdiv", action="store_true",
                  help="Print divider after last file", default=False)
parser.add_option("--no-div", dest="nodiv", action="store_true",
                  help="Don't print any divider characters", default=False)
parser.add_option("--no-trunc", dest="notrunc", action="store_true",
                  help="Don't show truncation with a '*'", default=False)

(options, args) = parser.parse_args()

flist = []
if (len(args) == 0):
    print usage
    sys.exit(1)
for f in args:
    flist.append(InFile(f))

max_lines = max(map(lambda f: f.nlines, flist))

for i in range(0,len(flist)):
    if (len(flist)-1 == i):
        options.nodiv = not options.lastdiv
    flist[i].pad_lines(max_lines, options.width, options.nodiv, options.notrunc)

for l in range(0, max_lines):
    for f in flist:
        print f.lines[l],
    print

update: Leon suggests

pr -Tm file1 file2

Which is pretty close, but doesn't seem to put any divider between the files. Still might be a handy tool for your toolbox. It seems, from the pr man page, the word for doing this sort of thing is columnate.

update 2: Told you I'd learn new ways to iterate! Stephen Thorne came up with a neat solution. Some extra tricks he used

  • ' ' * n gives you n spaces. Pretty obvious when you think of it!
  • Usage of generators with the yield statement.
  • The itertools package with it's modified zip like izip function.

All very handy tips for your Python toolbox. If you're learning Python I'd reccommend solving this problem as you can really put to use some of the niceties of the language.