August 30, 2010

Celebrity Freebie List - Monday

Abrupt change of course and time for something a little risqué. This week will be focused on a consumer facing product, specifically the celebrity list AKA freebie list AKA laminated list. For those not familiar, the list is short list of celebrities whom you are allowed to sleep with if given the opportunity. Originally made popular by an episode of Friends

I'll be using Java (maybe Python) on Google App Engine for this project. I've never used App Engine prior to this but it's free. And so far I've managed to avoid any hosting costs in my previous projects. The secondary benefit is it should run indefinitely without monitoring or maintenance. This seems particularly relevant since during initial research I stumbled on 3 facebook apps that did something similar to what I'm proposing. All of them threw an error on loading. Either they never worked or they have since been abandoned

Project Summary: Build an application in Java or Python on Google App Engine to allow users to create and save their list of celebrity freebies.

Planned Features:
  • Auto-complete
  • Image search
  • Facebook app setup and integration
  • Post to social networks
Monetization:
  • Google AdSense sign up and setup
If There's Time:
  • Port to either Java or Python for speed comparison

August 28, 2010

S3Cmd Modifications - Saturday

Released! I finished up the last of the testing for parallel download and upload handling last night. Most of the time between when I actually got it working and now was spent adding in some error handling, debugging and modifying the display slightly. For the impatient the changes are available at http://github.com/pcorliss/s3cmd-modification

A few things that went better than expected
  • The speed boost on uploads and downloads makes this patch a must have for anyone doing a lot of transfers to and from s3. Especially when lots of small files are involved. See my previous post for an example. I'm really happy it worked out as well as it did and the code looks pretty clean.
  • I was worried about the learning curve necessary but I think I picked Python up fairly quickly. There are a wealth of built-in classes that came in very handy. Additionally lots of documentation out there to get started fairly quickly.
Challenges
  • Python's white-space sensitive syntax. 1-tab != 4 spaces, this came up very early in the process as I was getting familiar with python. I crave a language like Java where syntax is very explicit. Although the mandatory formatting makes things a lot more readable by default.
  • Still trying to find a decent IDE. Geany is interesting but I don't know if I'll stick with it.
  • Threading is always a bit of a pain. But trying to shoe-horn it into an established code base in a language you're looking at for the first time can be a challenge. Specifically handling exceptions within a thread and not having to kill it manually. Thankfully Python has some handy classes that made things a bit easier on me.
  • The original project had a progress report feature that used carriage returns to rewrite the display output line. This works great for sequential download process but simultaneous downloads output gibberish. I spent hours trying to replicate it with multiple threads and eventually just gave up and went with a simple file-started/file-finished output.
Features Missed
  • Splitting files automatically turned from seemingly simple to complex when I dived into the existing code. It would be possible but would require more modification than I was comfortable with.
  • I wasn't able to track down the issues with large bucket support like I had planned. Partially since I no longer have access to s3 buckets with millions of files.
  • The caching implementation I conceived of seemed a little sloppy and I ran out of time.
I forked the project and put it up on github if anyone is interested in downloading it and giving it a try. The patch to the current trunk version (r437) is also available there as well. Using it is as simple as using the original command. Except you have the --parallel option which enables my changes and the --workers=n option where you can specify how many workers to start when doing a transfer. These configuration options are now part of the .s3cfg file as well.
http://github.com/pcorliss/s3cmd-modification

The original project is open source as is the patch. So download and use at will!

August 26, 2010

S3Cmd Modifications - Thursday

Part of the work I've been doing with s3cmd is to add in a threading feature. This should speed up data transfers to s3 for data sets of more than a few files. I got this working on Tuesday and spent part of Wednesday refining it and expanding it such that output is what you would expect and adding some basic error handling. The results of some simple threading have been impressive so far. Anytime you can get an 8x speed boost I'd consider that a win. See below for the breakdown.

A few challenges surrounding error handling have cropped up. I'll have to spend some extra time making sure they're handled properly and don't cause the program to hang. Also python's whitespace sensitivity and syntax is really starting to irritate me. Perhaps a framework like Django will turn it around for me. But I have to say I'm not sure I'll want to go back once this week is over.

Generated 1000 16K files from random data.
pcorliss@hawaii:~/projects/s3cmd$ ls rand | wc -l
1001
pcorliss@hawaii:~/projects/s3cmd$ du -s rand
16032    rand


s3cmd from ubuntu 10.04 repository
pcorliss@hawaii:~/projects/s3cmd$ time s3cmd put rand/* s3://50proj-test-bucket/rand/
rand/0.out -> s3://50proj-test-bucket/rand/0.out  [1 of 1001]
 16384 of 16384   100% in    0s    20.71 kB/s  done
rand/1.out -> s3://50proj-test-bucket/rand/1.out  [2 of 1001]
 16384 of 16384   100% in    0s    47.86 kB/s  done
...
real    6m27.871s
user    0m2.400s
sys    0m0.810s


s3cmd trunk with threading modifications
pcorliss@hawaii:~/projects/s3cmd$ time ./source/s3cmd --parallel put rand/* s3://50proj-test-bucket/rand/
File 'rand/1.out' stored as 's3://50proj-test-bucket/rand/1.out' (16384 bytes in 0.5 seconds, 31.49 kB/s) [2 of 1001]
File 'rand/103.out' stored as 's3://50proj-test-bucket/rand/103.out' (16384 bytes in 0.5 seconds, 29.75 kB/s) [7 of 1001]
File 'rand/104.out' stored as 's3://50proj-test-bucket/rand/104.out' (16384 bytes in 0.5 seconds, 29.58 kB/s) [8 of 1001]
File 'rand/100.out' stored as 's3://50proj-test-bucket/rand/100.out' (16384 bytes in 0.5 seconds, 29.24 kB/s) [4 of 1001]
...
real    0m47.216s
user    0m2.010s
sys    0m0.790s

August 24, 2010

S3Cmd Modifications - Tuesday

Yesterday I had a lot of success breaking open the s3cmd structure and adding in an additional command line argument to modify the default root object of a cloud front object. By 4pm I had a patch all bundled up and ready to submit. Unfortunately the email I received this morning made my heart sink.
Hey Phil, I've already done a patch for default root objects. See
http://sourceforge.net/mailarchive...

Luke
I took a look at the patch and I'm happy that at least he made the same changes I did line for line. In the meantime I'll be getting started on parallel downloads and uploads. I've done something similar with Perl in the past. So hopefully this goes smoothly and someone hasn't already submitted a patch to do the same thing.

August 23, 2010

S3Cmd Modifications - Monday

This week I'm going to work on one of the pain points I ran into during the zip code project. Specifically s3cmd lacks a few features that would be nice to have. After completion I'll see if the project owners are interested in integrating my changes back into the original product.

Project Summary: Modify the s3cmd project to support new features and expand on old ones. Contribute modifications back to the community.

Features
  • Cloudfront Default Root Object Support
  • Parralel Uploads and Downloads
  • Better support for very large buckets and files
  • Automatic file splitting and joining of large files (> 2Gb)
  • s3sync local caching

If There's Time
  • Look at the s3fuse project as well as s3cmd and see if that could be worked on.

August 22, 2010

Image Processing - Sunday

Released! Looks like I'm cutting it close again and it's only the second week. Next week I'll have to work on something a little less ambitious (or maybe work a little harder). Since Friday I've added some documentation, put together the Amazon EC2 AMI (with Varnish) and reworked how the service gets started so that it's a little more performant.

A few things that went better than expected.
  • Ruby is a pretty fast language to get off the ground with. That's not to say there weren't challenges but by Wednesday and Thursday I was enjoying working with it. It makes my job a lot easier.
  • Multiple storage targets were implemented, I feel like this is a huge win for both me and anyone that uses the product. Not having to rely on just Amazon's S3 or your own disk and instead being able to specify both as a write targets is a major feature that sets this project apart from others.
  • This was my first open source project of any note. And the first commit to my github account. I'm pretty proud of giving back to the open source community. Hopefully my stuff gets used by a wide variety of people.


Challenges
  • Ruby's serving options are a bit limited. I had thought that Sinatra and Mongrel would be able to handle more than one simultaneous request since they were both thread safe. But I was wrong. I wasted 3 hours today trying to get it working only to be told by someone from the #sinatra channel on the freendoe IRC server that it wasn't going to happen. Major bummer. I don't understand how the ruby community tolerates having to duplicate their memory footprint for every simultaneous request they want to serve.
  • I had worked with Ruby before but never on one being built from scratch. Major hurdles over the first 3 days included getting an IDE setup (Netbeans, probably not going back) and figuring out basic project structure (what goes in lib, config versus staying in the root?). Hopefully future Ruby projects will have less lost time.

Features Missed
  • A lot of features were dropped for this release. I had a lot of big ideas about what I could do at the beginning but in the end I just ran out of time.
  • Security Audit - Didn't get a chance to do more than cursory input fuzzing. Looks like Sinatra does a good job of making sure folks don't insert bad stuff.
  • Better Error Handling - There are a few areas where error handling isn't done. And a few more where more verbose errors about what went wrong could be presented to the end user.
  • Admin Page - Had to fall back to just using a static YML file. Additionally about 1/3 of the configurable options I had been hoping for were cut. So not a major loss.
  • Sample Code - Never even got a chance to touch this. Thankfully it's a fairly straightforward web service. Folks should be able to spin it up and get it running relatively easily.

It's open source so you can check it out from the following link.
http://github.com/pcorliss/SinMagick

If you're interested in running this for your own business or personal use. You can purchase the pre-configured AMI. ($50 - Amazon will require you to login first and confirm the purchase)

August 20, 2010

Image Processing - Friday

It's been a long five days. Every time I considered writing an update I realized how far behind I was and decided to crank out some code instead. It looks like it paid off too. I have a functioning Sinatra application which I've named SinMagick. I've even uploaded the initial source checkout to github. I'll be following it up with a few bug fixes, documentation and tests. More details to follow on Saturday and Sunday when I wrap up the last few items on the ToDo list below.

Check out the final feature list below as well as some screen shots of the app in action below.

Features
File and URL Uploading via standard POST
Storage to multiple locations (S3 and local disk)
Support for Redundant Storage with configurable priorities
Scaling, cropping, rotation, grayscale, format changes
Caching of transformed images
Image efficiency calculator (Determines whether PNGs should be JPGs instead)
Easily add other transforms
Open Source
Scalable

ToDo
Documentation
Tests
EC2 AMI Setup
Load Testing

File Uploader for Diagnostics Uploaded File
Raw Image Output Transformed Image
Photo Obtained from Wikipedia
Creative Commons Attribution ShareAlike 3.0

August 16, 2010

Image Processing - Monday

It's time for a meat and potatoes, backend project. The first project was designed to be small and contained but it still took me until Saturday to finish it. This week I'll be taking on something much more ambitious. Prior to 50projects I worked at a number of companies that handled images in various capacities so the concept isn't entirely new. This service will be similar in a few respects, notably that it will transform images, it will use ImageMagick libraries to do so and it will optionally use Amazon S3 to store images. However this will be an entirely new product using unique ideas conceived of separately from previous work environments. In other words please don't sue me.

Project Summary: A web service to handle image transforms, storage and caching of images for use by web sites.

Features
  • jMagick versus rMagick comparison - which is faster
  • Upload original Images
  • Download transformed Images
  • Admin page to configure parameters
  • Varnish caching for faster serving
  • Source code on GitHub
  • Sample code for integration
Monetization Features
  • 32-bit and 64-bit versions on amazon's AMI marketplace
If There's Time
  • SQS for large scale distributed processing

August 14, 2010

Intelligent Address Entry - Released - Saturday

Just finished up the last revision to the demo scripts and attached an amazon payments link for commercial licenses. Since Thursday I've cleaned up the error handling (Thanks JB!), tested it for compatibility with all major browsers past and present and connected the data backend to Amazon's Cloud Front CDN for faster loading. The CDN setup was more experimental than necessary. But the incremental cost should be trivial for now.

Items that I'm glad I touched on and will definitely come in handy in the future included jQuery, Amazon Cloud Front and Amazon Simple Pay. I'm certain these will be used multiple times in the future.

One item turned out to be challenges that I didn't expect. Setting Amazon's Cloud Front default root object is a pain with the current tools available. I ended up using a development build of a 3rd party python library to set the parameter. As Amazon's Management Console doesn't have a setting exposed.

A couple of features didn't end up getting built. For starters this didn't turn into a JavaScript library like I had intended on Monday. I provided a demo for folks to get started on integrating this into their own products but the demo didn't really include any original code.

Another item that didn't go as planned was global address storage via cookie for users. The more I thought about it the more I realized what a giant privacy mess it could turn into. The work put into it was abandoned shortly thereafter.

The final scripts, data and implementation details are below. As well as a payment link for commercial use. If you have an questions please direct them to me at pcorliss@50projects.com



http://zip.50projects.com

August 12, 2010

Intelligent Address Entry - Thursday

This portion took a little longer than expected due to some issues with getting the last pieces of furniture around the new apartment delivered and installed. But I'm pretty happy with how it turned out. I used jQuery for the jsonp requests and jQuery UI for the auto complete feature.

I'll be refining it today and tomorrow with a release target of Friday night.

August 10, 2010

Intelligent Address Entry - Tuesday

Yesterday I found that the zip code database isn't very accessible and the first hurdle was going to be just obtaining it. After some digging around I found that this data could be extracted via a 10 year old Census database, various private companies, from the USPS on a CD and also via a one-shot lookup tool on the USPS website. Not quite sure why they don't just make the entire database available in it's raw form. Regardless I wrote a small script to automate the data collection via the web form and parse the output. It took about 11 hours to extract all zip codes (55457 zip code and city pairs). I've provided a link to the list below as well as the quick scripts to extract and parse the data from the USPS website. The extraction script could run quite a bit faster by making the requests in parralel. I just didn't see a reason to hammer the USPS website and I wasn't in a rush.

Zip Code City pairs as of (2010-08-09)
Simple Shell Script using curl to connect to USPS site
Simple Perl script to do quick regexes on the input

While that job was running I wrote another script to convert the output to JSON files. Then followed that by using jQuery to query the JSON files and treat the data like an autocomplete mechanism. I'll be finishing that up today hopefully. But I haven't touched javascript since early 2000 and jQuery is completely foreign to me. It's nice to see that these new APIs are available though. It's going to make the implementation phase smoother.

August 09, 2010

Intelligent Address Entry - Monday

Project Summary: This week I'll be creating a JavaScript library to assist web clients with filling in city and state data during address entry via their zip code. Additional features will include global address storage and a few other features. This project should be small and allow me to get up to speed with the short 1-week development and release cycles.

Features
  • Obtain Zip-Code City Data
  • Zip-Code -> City call
  • Address Recall from Cookie
  • Minification
  • Hosting via Amazon CDN

Monetization Features
  • Restrict running host to hostname specified in JS
  • Pay wall and code generator for custom JS
-or-
  • Licensing page for simple $50 commercial license
  • Google Alerts to find javascript in use commercially

Error Handling and Edge Cases
  • Invalid Zip Codes and addresses
  • International
  • APO Boxes and special addresses

Extra stuff

August 06, 2010

Project Zero - Friday

Yesterday was spent hefting boxes and constructing bookcases. I'll be glad when we wrap up the unpacking and furniture purchases sometime next week.

Today I've been putting the finishing touches on the blog and associated pages. I've decided to go with something simple for now, mostly because I've realized that I lack an eye for design and layout. Among other things, I'm hopeful 50 projects will force me to get better at skills I've never taken the time to develop. Design, layout and art of any kind have been things I've shied away from for a long time. Hopefully I'll get a chance to exercise some of those long forgotten muscles.

That being said Times New Roman and Courier are perfectly readable fonts and should be used more often. But I'm a sucker for Serifs.

August 04, 2010

Project Zero - Wednesday

Unwilling to leave well enough alone, I ditched the single disk ubuntu install and decided to pull out the big guns: a 6 disk raid 10 install. See the benchmarks below. Not bad for 6 year old SATA drives! Unfortunately the temperatures they're running at (53C) have me a little worried. I'll likely migrate them to some HDD enclosures I have laying around.

6-drive Raid 10 Single Drive

August 03, 2010

Project Zero - Tuesday

Looks like the PSU Newegg.com shipped out to me was faulty. It would partially power the machine but the EATX12V 4-pin port wasn't receiving any power. I plugged in my 240W PSU from my media center machine and managed to post late Monday night. I followed that up by dancing a short victory jig, then promptly fell asleep.

This morning I found a hole-in-the-wall PC repair place. Broken computers filled the shelves behind the counter. The owner sold me a dubious looking 400W PSU (likely used) for $35. Thankfully it was sufficient and the machine posts with its own PSU now. I'll need to consider purchasing a higher end PSU.

Ubuntu was a breeze to install to a single disk, and now I'm up and running with dual monitors. I'll be spending most of the afternoon writing copy and getting some blog posts prepped.

August 02, 2010

Project Zero - Monday

Project Summary: This is an initial project to set up the infrastructure necessary for me to be successful and stay focused over the coming year. Not a traditional project so much as a pre-project: hence the name Project Zero.
  • Purchase and construct a desk and chair
  • Purchase and construct a new workstation
  • Setup 50projects.com domain, sub domains, blog and site
  • Design a logo
  • Research business cards and order if prudent
  • Write base copy for 50 projects
  • Spam network with links to the blog to get the word out
Desk and Chair constructed yesterday. I managed to find a desk that was sturdy and suitably 'nice' at IKEA that didn't break the bank. The chair is mighty comfy as well. Workstation components came in this afternoon, so I'll be spending this evening working on that.

August 01, 2010

Introduction

Welcome to the 50 projects Blog. For those curious, check out the about page for the full details on what's going on. In short, every week I'll be starting a new project that will focus on something new and will use a wide array of different languages and technologies. I'll be working on an idea from initial research and development all the way to release and marketing. Every Monday morning I'll move on to a new idea.

It's going to be an exciting year. I hope you folks enjoy following my progress, and hopefully you'll get some use out of these projects.