Ruby String XOR Optimizations

As part of the LazyRaid project I spent a non trivial amount of time trying to develop an algorithm to perform an XOR operation in ruby against two blocks of data read from files. My initial try was a pure ruby implementation and was based off of some forum posts. These worked fine for small strings but when used on blocks of data larger than a few kilobytes it took an unacceptably long time.
#Initial XOR method from forums - http://www.ruby-forum.com/topic/175539
#0.627 Mb/s (All times are based on a single core of my development machine - i5 750 @ 2.67GHz)
def xor1(second)
self.bytes.zip(second.bytes).map { |a,b| (a^b).chr}.join
end
At this point I started investigating writing a C extension to give the process a boost and eventually succeeded after a night of furious hacking. As you can see in the benchmarks below this was about a 2000x speedup and I decided to move on to other features. It wasn't until after the project was finished that I started wondering what sort of speedup I could get using a pure ruby implementation and if it would even be comparable to my naive first stab at a ruby C extension. So I started trying different methods out.

This next method avoids the bytes method and accesses each string char directly then converts them to their ordinal values performs the XOR and pushes it on to the string. More than a 2x speedup from the first implementation.
#1.589 Mb/s
def xor2(second)
s = ""
[self.size,second.size].max.times do |i|
s << ((self[i] || 0).ord ^ (second[i] || 0).ord)
end
return s
end
Further refinement involves using the zip method to combine the strings as byte arrays resulting in some marginal speedups.
#1.881 Mb/s
def xor3(second)
s = ""
#s.force_encoding("ASCII-8BIT")
[self.size,second.size].max.times.zip(self.each_byte.to_a,second.each_byte.to_a) do |i,a,b|
s << (a ^ b).chr
end
return s
end
Enter the pack and unpack methods. Seemingly the quickest way to convert the string into byte arrays.
#2.613 Mb/s
def xor4(second)
s = []
self.unpack('C*').zip(second.unpack('C*')) {|a,b| s.push( (a||0) ^ (b||0) ) }
return s.pack('C*')
end
I attempted the operation with unpack options larger than 1 byte. This works great for large data sets that have sizes evenly divisible by 8 bytes. The 0-7 bytes at the end will be ignored. Some extra code could probably be written to account for this. As the speedup is substantial and this represents the fastest ruby implementation I was able to produce.
#13.369 Mb/s
def xor5(second)
s = []
self.unpack('Q*').zip(second.unpack('Q*')) {|a,b| s.push( (a||0) ^ (b||0) ) }
return s.pack('Q*')
end
And finally a more robust method that properly takes into account all sizes and differing lengths of strings. This implementation is essentially the same as xor4.
#2.605 Mb/s
def xor6(second)
self.length > second.length ? (string1=self;string2=second) : (string2=self;string1=second)
s = []
string1.unpack('C*').zip(string2.unpack('C*')) {|a,b| s.push( (a||0) ^ (b||0) ) }
return s.pack('C*')
end
In the end my fastest ruby implementation was still about 2 orders of magnitude slower than the C solution. I'm sure there's more room for performance improvements and I hope someone out there takes a look at this and finds a more optimal ruby only solution. Until then I'll likely start looking to C more often when I need to do some performance optimization. You can see the code used in this post as well as the C libraries used on github at https://github.com/pcorliss/Ruby-String-XOR-Optimizations

Note: Towards the end of preparing this post I came across the ruby gem fast_xor  which is based on C like my solution. However fast_xor stands out since it runs at 5x the speed my implementation does. It's included in the benchmarks below and you should probably implement it if you're considering doing any non-trivial XOR work in your ruby application. Otherwise xor6 should be suitable for most use cases.
pcorliss@hawaii:~/projects/ruby_benchmarks$ ruby benchmark.rb 
Ensuring parity calculation is correct.
xor1:1d9d0030fe31415e1fa9dd8b7b782315
xor2:1d9d0030fe31415e1fa9dd8b7b782315
xor3:1d9d0030fe31415e1fa9dd8b7b782315
xor4:1d9d0030fe31415e1fa9dd8b7b782315
xor5:1d9d0030fe31415e1fa9dd8b7b782315
xor6:1d9d0030fe31415e1fa9dd8b7b782315
xorc:1d9d0030fe31415e1fa9dd8b7b782315
xor!:1d9d0030fe31415e1fa9dd8b7b782315
Performing Benchmarks
user system total real
xor1 10 times: 79.040000 0.440000 79.480000 ( 79.689313)
xor2 10 times: 31.410000 0.000000 31.410000 ( 31.467753)
xor3 10 times: 26.280000 0.240000 26.520000 ( 26.583489)
xor4 10 times: 18.690000 0.400000 19.090000 ( 19.136069)
xor5 10 times: 3.720000 0.020000 3.740000 ( 3.740085)
xor6 10 times: 18.650000 0.500000 19.150000 ( 19.191539)
xorc 10 times: 0.040000 0.000000 0.040000 ( 0.041738) #My C implementation
xor! 10 times: 0.000000 0.000000 0.000000 ( 0.011803) #fast_xor gem
xorc 1000 times: 4.370000 1.390000 5.760000 ( 5.785931) #My C implementation
xor! 1000 times: 1.160000 0.000000 1.160000 ( 1.170025) #fast_xor gem