Merging directories with the magic of Python.

We finally got the last projects out of that monstrosity ‘Final Cut Server’, but one project at the end was a nightmare to export, and we weren’t sure which files from the end actually were in a different version of the project that we already had.

We essentially needed to merge two different versions of projects directories, making sure not to lose any files, and we didn’t want to lose the organization of the files.

Here’s a quick python script I wipped up to make it quicker.

With the 4000 odd files in the project, it took under a second to run, and it turned out we only had about 20 files which hadn’t already been merged.  Much simpler to sort out.

The script took about 10 minutes to write and test. This is why you should learn to program.  Hacking stuff like this up is easy, and saves *so* much time.

(Yes, you probably could do this with a couple of lines of perl or BASH, but what the heck.)

#!/usr/bin/env python
from subprocess import Popen, PIPE
from os import stat
from os.path import basename, abspath
def run(*command):
    found = Popen(command, stdout=PIPE)
    return found.communicate()[0]
def files_in(dirname):
    return [x for x in run('find', abspath(dirname), '-type','f', '-print0').split(chr(0)) if x]
if __name__ == '__main__':
    from sys import argv
    try:
        sourcedir = files_in(argv[1])
        destdir = files_in(argv[2])
    except IndexError:
        print 'Usage:'
        print argv[0], '  '
        print 'Where you want to check if files in  are also in '
        print '(but perhaps with a different relative path)'
        exit(1)
    print '---------------------------------------------------'
    print '{0} files in {1}'.format(len(sourcedir), abspath(argv[1]))
    print '{0} files in {1}'.format(len(destdir), abspath(argv[2]))
    print '---------------------------------------------------'
    destnames = {}
    for destfile in destdir:
        destnames[basename(destfile)] = {'size': stat(destfile).st_size,
                                         'path': destfile }
    for newfile in sourcedir:
        base = basename(newfile)
        if base not in destnames:
            print newfile, 'is NOT in the new dir'
        else:
            destfile = destnames[base]
            if stat(newfile).st_size != destfile['size']:
                print '{0}({1}) differs from {2}({3})'.format(
                      newfile, stat(newfile).st_size,
                      destfile['path'], destfile['size'])

merging directories with the magic of python

We finally got the last projects out of that monstrosity ‘Final Cut Server’, but one project at the end was a nightmare to export, and we weren’t sure which files from the end actually were in a different version of the project that we already had.

We essentially needed to merge two different versions of projects directories, making sure not to lose any files, and we didn’t want to lose the organization of the files.

Here’s a quick python script I wipped up to make it quicker.

With the 4000 odd files in the project, it took under a second to run, and it turned out we only had about 20 files which hadn’t already been merged. Much simpler to sort out.

The script took about 10 minutes to write and test. This is why you should learn to program. Hacking stuff like this up is easy, and saves so much time.

(Yes, you probably could do this with a couple of lines of perl or BASH, but what the heck.)

python string concatination speed

I made a quick python script to convert OpenLP song (lyric) databases into the presentation format used by ProPresenter.

https://github.com/danthedeckie/OpenLP-To-ProPresenter5-Converter

is the link. I put it together in a few hours, it should have been quicker, but I’m still re-acquainting myself with python.

It is a nice language.

One thing I did today, while cleaning up a bit, was wonder about something I remember from doing python years ago - string concatenation. Joining two strings (texts) together.

"Hello" + "World" -> "HelloWorld"

I remembered something about it being slow, and python recommending using

''.join(("Hello", "World"))

which seems to me one of the most blatently ugly obscure gotchas I’ve come across in a long while.

Anyway, I refactored my code into that style - converting databases and song lyrics and writing XML stuff is pretty much all Text formatting and concatination.

It made no discernable difference. So I went back to normal easier to read x += y, and x = y + z type of code.