S3Cmd Modifications - Thursday
Part of the work I've been doing with s3cmd is to add in a threading feature. This should speed up data transfers to s3 for data sets of more than a few files. I got this working on Tuesday and spent part of Wednesday refining it and expanding it such that output is what you would expect and adding some basic error handling. The results of some simple threading have been impressive so far. Anytime you can get an 8x speed boost I'd consider that a win. See below for the breakdown.
A few challenges surrounding error handling have cropped up. I'll have to spend some extra time making sure they're handled properly and don't cause the program to hang. Also python's whitespace sensitivity and syntax is really starting to irritate me. Perhaps a framework like Django will turn it around for me. But I have to say I'm not sure I'll want to go back once this week is over.
Generated 1000 16K files from random data.
s3cmd from ubuntu 10.04 repository
s3cmd trunk with threading modifications
A few challenges surrounding error handling have cropped up. I'll have to spend some extra time making sure they're handled properly and don't cause the program to hang. Also python's whitespace sensitivity and syntax is really starting to irritate me. Perhaps a framework like Django will turn it around for me. But I have to say I'm not sure I'll want to go back once this week is over.
Generated 1000 16K files from random data.
pcorliss@hawaii:~/projects/s3cmd$ ls rand | wc -l
1001
pcorliss@hawaii:~/projects/s3cmd$ du -s rand
16032 rand
s3cmd from ubuntu 10.04 repository
pcorliss@hawaii:~/projects/s3cmd$ time s3cmd put rand/* s3://50proj-test-bucket/rand/
rand/0.out -> s3://50proj-test-bucket/rand/0.out [1 of 1001]
16384 of 16384 100% in 0s 20.71 kB/s done
rand/1.out -> s3://50proj-test-bucket/rand/1.out [2 of 1001]
16384 of 16384 100% in 0s 47.86 kB/s done
...
real 6m27.871s
user 0m2.400s
sys 0m0.810s
s3cmd trunk with threading modifications
pcorliss@hawaii:~/projects/s3cmd$ time ./source/s3cmd --parallel put rand/* s3://50proj-test-bucket/rand/
File 'rand/1.out' stored as 's3://50proj-test-bucket/rand/1.out' (16384 bytes in 0.5 seconds, 31.49 kB/s) [2 of 1001]
File 'rand/103.out' stored as 's3://50proj-test-bucket/rand/103.out' (16384 bytes in 0.5 seconds, 29.75 kB/s) [7 of 1001]
File 'rand/104.out' stored as 's3://50proj-test-bucket/rand/104.out' (16384 bytes in 0.5 seconds, 29.58 kB/s) [8 of 1001]
File 'rand/100.out' stored as 's3://50proj-test-bucket/rand/100.out' (16384 bytes in 0.5 seconds, 29.24 kB/s) [4 of 1001]
...
real 0m47.216s
user 0m2.010s
sys 0m0.790s