S3Cmd Modifications - Thursday

Part of the work I've been doing with s3cmd is to add in a threading feature. This should speed up data transfers to s3 for data sets of more than a few files. I got this working on Tuesday and spent part of Wednesday refining it and expanding it such that output is what you would expect and adding some basic error handling. The results of some simple threading have been impressive so far. Anytime you can get an 8x speed boost I'd consider that a win. See below for the breakdown.

A few challenges surrounding error handling have cropped up. I'll have to spend some extra time making sure they're handled properly and don't cause the program to hang. Also python's whitespace sensitivity and syntax is really starting to irritate me. Perhaps a framework like Django will turn it around for me. But I have to say I'm not sure I'll want to go back once this week is over.

Generated 1000 16K files from random data.
pcorliss@hawaii:~/projects/s3cmd$ ls rand | wc -l
1001
pcorliss@hawaii:~/projects/s3cmd$ du -s rand
16032    rand


s3cmd from ubuntu 10.04 repository
pcorliss@hawaii:~/projects/s3cmd$ time s3cmd put rand/* s3://50proj-test-bucket/rand/
rand/0.out -> s3://50proj-test-bucket/rand/0.out  [1 of 1001]
 16384 of 16384   100% in    0s    20.71 kB/s  done
rand/1.out -> s3://50proj-test-bucket/rand/1.out  [2 of 1001]
 16384 of 16384   100% in    0s    47.86 kB/s  done
...
real    6m27.871s
user    0m2.400s
sys    0m0.810s


s3cmd trunk with threading modifications
pcorliss@hawaii:~/projects/s3cmd$ time ./source/s3cmd --parallel put rand/* s3://50proj-test-bucket/rand/
File 'rand/1.out' stored as 's3://50proj-test-bucket/rand/1.out' (16384 bytes in 0.5 seconds, 31.49 kB/s) [2 of 1001]
File 'rand/103.out' stored as 's3://50proj-test-bucket/rand/103.out' (16384 bytes in 0.5 seconds, 29.75 kB/s) [7 of 1001]
File 'rand/104.out' stored as 's3://50proj-test-bucket/rand/104.out' (16384 bytes in 0.5 seconds, 29.58 kB/s) [8 of 1001]
File 'rand/100.out' stored as 's3://50proj-test-bucket/rand/100.out' (16384 bytes in 0.5 seconds, 29.24 kB/s) [4 of 1001]
...
real    0m47.216s
user    0m2.010s
sys    0m0.790s