November 29, 2010

Those Pesky Body Scanners - Monday

Back from vacation refreshed and ready to start on another project. This week I'll be working on a site to help travelers understand the security measures in place in airports and what their options are. For example I flew through Chicago's Midway airport and the Ft. Lauderdale airport on my trip and didn't encounter one of the now infamous backscatter x-ray machines. I did however get my hands swabbed and run through what I think was a gas spectrometer looking for chemical traces of explosives and saw a senior citzen getting patted down thouroughly by a TSA agent. This seems like information an informed flier may want access to prior to flying.

Project Summary: Develop a web app to provide information on specific airport security and TSA requirements.

Features:

  • Import a list of all airports world wide.
  • Track down passenger volume data for sorting purposes.
  • Allow users to provide updates.

November 22, 2010

Vacation Edition - Monday

Part of the reason it's 50 projects instead of 52 projects is that I wanted to set aside 2 weeks for vacation. I'll see if I can post some pictures of the sights down in the Florida Keys.

Features
Suntan
Snorkeling
Seafood
Alliterations

November 20, 2010

Lazy Raid - Week 2 - Saturday

Not the best week for me. I was able to integrate the Jerasure library into the existing LazyRaid code. However about 1 in 20 recovery attempts fail consistently. I spent Tuesday through Thursday troubleshooting the issue and wasn't able to track down the problem. I finished integrating it but a redundancy solution that only successfully recovers something 19/20 times isn't much of a redundancy solution. For now singly disk redundancy will have to be okay.

Additionally packaging the application for Windows has proven to be an issue as well. Tools like ocra and RubyScript2Exe don't want to work without major tweaking and I just don't have the patience. For now a separate branch will be available with the compiled windows .so library for the xor calculation. Compiling it was another amazing hassle. Ruby and windows just doesn't seem to mix.

So in summary no working significant code released this week. Plenty of non-working code is available on the jerasure branch on github though. https://github.com/pcorliss/LazyRaid/tree/jerasure (NOT FUNCTIONAL! do not use for production data)

Check the windows branch for the compiled dll and modified disk handling code. https://github.com/pcorliss/LazyRaid/tree/windows


Tomorrow I'll post the XOR speed comparison between ruby and the native C library I wrote to handle the XOR calculation.

November 15, 2010

Lazy Raid - Week 2 - Monday

My partner is on vacation this week so I figured I'd take a little breather as well and instead of working on a fresh project it was time to refine this one some more.

I'll be targeting some big ticket features such as a GUI and double disk failure as well as some simpler stuff like unit tests and better error handling.

Features:
  • Double Disk Failure
  • Desktop GUI
  • Cross-Platform Distribution for Mac, Windows and Linux.
  • Unit Tests
  • Robust Error Handling

November 14, 2010

Lazy Raid - Sunday

Sometimes technology is like magic. I wrote the underlying code for this program. I tested it, iterated over it, and added new features as I went. But still at the end of the day when I purposefully delete a file and then run the command to recover that file from the parity blocks and other files on different drives. I get positively giddy when the file reappears. It's like magic, except a well understood process that's merely being obscured behind a command line interface.

For testing purposes I created 3 drives of 5Gb each. Then placed on them some large files totalling about 5.3Gb in size. Then I added them to the LazyRaid configuration, told it to generate parity bits for the drives and it spit out about 2.7Gb of parity bits spread across all three drives. That's single drive redundancy using roughly 1/2 the required space. You can get even bigger space savings when using more drives (ParitySize = FileSize/(NumDisks-1)).

The code is available on github and will require you to compile a ruby C extension for your machine.
https://github.com/pcorliss/LazyRaid

Challenges:
  • Ruby is slow - I posted about some challenges mid-week with Ruby's lack of an XOR function for Strings. The code I posted was slow but workable for small datasets. However when working with files that can be up to 2Gb in size a slow XOR function just isn't going to cut it. I ended up writing a Ruby C extension to take care of the heavy lifting since it seems Ruby just wasn't up to the task. I'll post a little more with some speed comparisons next week. Perhaps I can stir up the hornets nest in the Ruby community to generate some traffic and perhaps a more elegant solution.

Features Missed:
  • Double Disk Failure Redundancy - RAID6 uses Galois field calculations to do parity calculations in addition to XOR calculations. Unfortunately I wasn't really up to the task of implementing that this week considering I was working only on basic functionality and struggling with Ruby speed limitations.
  • FUSE Integration - As the project moved forward integrating it into the OS seemed less important and I headed instead towards running it as a command line app.
  • Background Parity Calculations - I just ran out of time on this one. Although adding in some sort of IO monitoring and throttling wouldn't be too difficult it wasn't as high on the priority list as some of the other items.

November 10, 2010

Lazy Raid - Wednesday

Things are coming along a little slowly but I felt I'd share the following code tidbit so others don't run into the same trouble I did.
class String
  def ^ (second)
    s = ""
    s.force_encoding("ASCII-8BIT")
    [self.size,second.size].max.times do |i|
      s << ((self[i] || 0).ord ^ (second[i] || 0).ord)
    end
    return s
  end
end
via http://www.ruby-forum.com/topic/95760

It looks like Ruby doesn't have the XOR function available for Strings. This causes some issues when you're working on computing parity blocks for binary data like I am. Further complicating matters is that there are several examples for computing XOR for strings but most of the posts on the subject are from prior to Ruby 1.9 which introduced new defaults for accessing Strings like they're arrays. Prior to Ruby 1.9 "foo"[0] would return 102 or the ASCII value of "f". Now it returns "f". Which is great because that's what most people probably expect. But all previous examples that relied on this behavior don't work.

November 08, 2010

Lazy Raid - Monday

About 4 years ago I started collecting a lot of digital media. I'm going to skip over how this content was procured. At the time I was storing it on my desktop computer's hard drive and using XBMC to stream it to hacked original XBOX. It worked great until I started running out of disk space. So I bought a few more hard drives and everything was fine until those filled up. So I bought some external enclosures and more disks. Eventually those filled up too.

At this point I had an idea to build a client side program that would lazily create parity blocks across a disparate set of drives to maintain the drive's independence from one another and also create a small safety net in case one of the drives should fail. A very similar system to this already exists in a RAID-5 set, UnRAID and Drobo devices. But all of those rely on either a dedicated device or system. This would be an independent client that would run as a daemon on a host system. I spent a couple of weekends working on the project but never finished it. The software was written in Perl and I'm guessing was probably quite a mess given my standards today. This week I'll be starting fresh and hopefully finishing it off.

Project Summary: Build a RAID like system for storing data independent from the host system.

Features:
  • Single disk failure redundancy
  • Double disk failure redundancy
  • Background parity calculations
  • Consistency checks
  • FUSE integration

November 07, 2010

Abuse Server - Sunday

Yesterday I put the finishing touches on AbuseJet and committed them to github. This project went exceptionally well with only minor bumps and bruises along the way. Features just seemed to fall in to place where previously I was worried this particular project was going to require burning the midnight oil. Instead it was all tied up by yesterday afternoon. I left myself a day to write this blog post and upload a working build.xml.

I'm particularly excited about a clever implementation of AbuseJet that would eliminate captchas except for bots and malicious users. For example a normal user would never see a captcha on your site while you use AbuseJet. But a malicious user or bot would create a few accounts using the same IP address and that would automatically trigger conditional captchas for all future data entry or user creations. Further abuse would trigger tarpitting and eventually blocking. Rendering that IP useless quickly. Meanwhile the built in reporting could be used to run a script to delete the content or optionally alert a human that spam is being created.

AbuseJet
  • Prevent DDOS attacks
  • Tarpit spam bots
  • Conditional captchas 
https://github.com/pcorliss/AbuseJet


I also fixed an issue with EchoServer. Seems I had uploaded the .class files instead of the .java files. You folks need to keep me more honest :-)

https://github.com/pcorliss/Echo-Server

November 04, 2010

Abuse Server - Thursday

I'm in the enviable position of having completed the bulk of the work for this project ahead of schedule. But the big features listed are running and working correctly. I've even gotten the initial prototype up on github.

https://github.com/pcorliss/AbuseJet

More work still needs to be done, documentation, alerts, reporting, etc....

In the meantime enjoy.

November 01, 2010

Abuse Server - Monday

At a previous place of employment we had massive problems with abuse. Early on it was fairly rare since our platform wasn't as well known as something like drupal or wordpress. We started with inconsistent flare ups of abusive activity. Most of it wasn't even automated. Just a one or two users solving captchas and creating accounts then posting links to pharmaceuticals. Often the abusive activity was ignored since it wasn't visible and we didn't have any automated tools to handle bulk deletion of accounts or content. But after a while it became very visible. We started getting hit by a single ip creating thousands of pieces of content an hour. Cleanup took almost as long as it took to create the content and in some cases longer. After blocking the abusive IP we started getting hit from multiple IPs and soon we were getting hit by a botnet. The response was to put captchas on almost all of the content creation tools and put permanent blocks on the abusive IP addresses.

This week's project will center around countering abuse like this. I'll be developing an open source stand alone service that a company can plug into their architecture. I'll post the github link as soon as I've finished the first prototype.

Project Summary: Service which interacts with web services to help protect against abusive traffic.

Features:
Simple Restful Interface
Standalone Service
User Configurable Thresholds and Throttling
Alerts