Intelligent Address Entry - Tuesday
Yesterday I found that the zip code database isn't very accessible and the first hurdle was going to be just obtaining it. After some digging around I found that this data could be extracted via a 10 year old Census database, various private companies, from the USPS on a CD and also via a one-shot lookup tool on the USPS website. Not quite sure why they don't just make the entire database available in it's raw form. Regardless I wrote a small script to automate the data collection via the web form and parse the output. It took about 11 hours to extract all zip codes (55457 zip code and city pairs). I've provided a link to the list below as well as the quick scripts to extract and parse the data from the USPS website. The extraction script could run quite a bit faster by making the requests in parralel. I just didn't see a reason to hammer the USPS website and I wasn't in a rush.
Zip Code City pairs as of (2010-08-09)
Simple Shell Script using curl to connect to USPS site
Simple Perl script to do quick regexes on the input
While that job was running I wrote another script to convert the output to JSON files. Then followed that by using jQuery to query the JSON files and treat the data like an autocomplete mechanism. I'll be finishing that up today hopefully. But I haven't touched javascript since early 2000 and jQuery is completely foreign to me. It's nice to see that these new APIs are available though. It's going to make the implementation phase smoother.
Zip Code City pairs as of (2010-08-09)
Simple Shell Script using curl to connect to USPS site
Simple Perl script to do quick regexes on the input
While that job was running I wrote another script to convert the output to JSON files. Then followed that by using jQuery to query the JSON files and treat the data like an autocomplete mechanism. I'll be finishing that up today hopefully. But I haven't touched javascript since early 2000 and jQuery is completely foreign to me. It's nice to see that these new APIs are available though. It's going to make the implementation phase smoother.