S3Cmd Modifications - Saturday

Released! I finished up the last of the testing for parallel download and upload handling last night. Most of the time between when I actually got it working and now was spent adding in some error handling, debugging and modifying the display slightly. For the impatient the changes are available at http://github.com/pcorliss/s3cmd-modification

A few things that went better than expected

The speed boost on uploads and downloads makes this patch a must have for anyone doing a lot of transfers to and from s3. Especially when lots of small files are involved. See my previous post for an example. I'm really happy it worked out as well as it did and the code looks pretty clean.
I was worried about the learning curve necessary but I think I picked Python up fairly quickly. There are a wealth of built-in classes that came in very handy. Additionally lots of documentation out there to get started fairly quickly.

Challenges

Python's white-space sensitive syntax. 1-tab != 4 spaces, this came up very early in the process as I was getting familiar with python. I crave a language like Java where syntax is very explicit. Although the mandatory formatting makes things a lot more readable by default.
Still trying to find a decent IDE. Geany is interesting but I don't know if I'll stick with it.
Threading is always a bit of a pain. But trying to shoe-horn it into an established code base in a language you're looking at for the first time can be a challenge. Specifically handling exceptions within a thread and not having to kill it manually. Thankfully Python has some handy classes that made things a bit easier on me.
The original project had a progress report feature that used carriage returns to rewrite the display output line. This works great for sequential download process but simultaneous downloads output gibberish. I spent hours trying to replicate it with multiple threads and eventually just gave up and went with a simple file-started/file-finished output.

Features Missed

Splitting files automatically turned from seemingly simple to complex when I dived into the existing code. It would be possible but would require more modification than I was comfortable with.
I wasn't able to track down the issues with large bucket support like I had planned. Partially since I no longer have access to s3 buckets with millions of files.
The caching implementation I conceived of seemed a little sloppy and I ran out of time.

I forked the project and put it up on github if anyone is interested in downloading it and giving it a try. The patch to the current trunk version (r437) is also available there as well. Using it is as simple as using the original command. Except you have the --parallel option which enables my changes and the --workers=n option where you can specify how many workers to start when doing a transfer. These configuration options are now part of the .s3cfg file as well.
http://github.com/pcorliss/s3cmd-modification

The original project is open source as is the patch. So download and use at will!