A few challenges surrounding error handling have cropped up. I'll have to spend some extra time making sure they're handled properly and don't cause the program to hang. Also python's whitespace sensitivity and syntax is really starting to irritate me. Perhaps a framework like Django will turn it around for me. But I have to say I'm not sure I'll want to go back once this week is over.
Generated 1000 16K files from random data.
pcorliss@hawaii:~/projects/s3cmd$ ls rand | wc -l
1001
pcorliss@hawaii:~/projects/s3cmd$ du -s rand
16032 rands3cmd from ubuntu 10.04 repository
pcorliss@hawaii:~/projects/s3cmd$ time s3cmd put rand/* s3://50proj-test-bucket/rand/
rand/0.out -> s3://50proj-test-bucket/rand/0.out [1 of 1001]
16384 of 16384 100% in 0s 20.71 kB/s done
rand/1.out -> s3://50proj-test-bucket/rand/1.out [2 of 1001]
16384 of 16384 100% in 0s 47.86 kB/s done
...
real 6m27.871s
user 0m2.400s
sys 0m0.810ss3cmd trunk with threading modifications
pcorliss@hawaii:~/projects/s3cmd$ time ./source/s3cmd --parallel put rand/* s3://50proj-test-bucket/rand/
File 'rand/1.out' stored as 's3://50proj-test-bucket/rand/1.out' (16384 bytes in 0.5 seconds, 31.49 kB/s) [2 of 1001]
File 'rand/103.out' stored as 's3://50proj-test-bucket/rand/103.out' (16384 bytes in 0.5 seconds, 29.75 kB/s) [7 of 1001]
File 'rand/104.out' stored as 's3://50proj-test-bucket/rand/104.out' (16384 bytes in 0.5 seconds, 29.58 kB/s) [8 of 1001]
File 'rand/100.out' stored as 's3://50proj-test-bucket/rand/100.out' (16384 bytes in 0.5 seconds, 29.24 kB/s) [4 of 1001]
...
real 0m47.216s
user 0m2.010s
sys 0m0.790s
I've tried the current git of your modifications to s3 and it helped our cause a great deal. Uploading a thousand files is now a matter of seconds/bandwidth. :)
ReplyDeleteOn a plain Debian Lenny install with python 2.5 I had to patch s3cmd though it seems. I'm by no means an expert but a simple:
sed 's@threading.active_count@threading.activeCount' -i s3cmd
fixed it for me.
Cheers,
Alex