Recently about Ruby
Near Infinity recently announced the release of Grant, a Ruby on Rails plugin for securing and auditing access to your Rails model objects, and I'm here to tell you a little bit about it. There are two primary pieces of Grant, model security and model audit. I'll be focusing on model security for this post and will address model audit in a later entry.
Grant's model security is deliberately designed to force the developer to make conscious security decisions about what CRUD operations a user should be allowed to perform on your model objects. It doesn't care how you choose to authenticate and authorize your users to perform a CRUD operation, it only cares that you actually do it.
Rather than specify which operations are restricted, Grant restricts all CRUD operations unless they're explicitly granted to the user. It also restricts adding or removing items from has_many and has_and_belongs_to_many associations. Only allowing operations explicitly granted forces you to make conscious security decisions. While it obviously can't ensure you make the correct decisions, it should help ease the latent fear that you've inadvertently forgotten to secure something.
Enough talk, let me show you an example of how you might use it. To enable model security you simply include the Grant::ModelSecurity module in your model class. In this example you see three grant statements. The first grants find (aka read) permission to everyone. The second example grants create, update, and destroy permission when the passed block evaluates to true, which in this case happens when the model is editable by the current user. You can put any code you want in that block as long as it returns a boolean value. Similarly, the third grant statement permits additions and removals from the tags association when it's block evaluates to true. A Grant::ModelSecurityError is raised if any grant block evaluates to false or nil.
class EditablePage < ActiveRecord::Base
include Grant::ModelSecurity
has_many :tags
grant(:find) { true }
grant(:create, :update, :destroy) do |user, model|
model.editable_by_user? user
end
grant(:add => :tags, :remove => :tags) do |user, model, associated_model|
model.editable_by_user? user
end
def editable_by_user? user
user.administrator?
end
end
There's a lot more to the grant statement than shown in the above example. For instance, you can have multiple grant statements for the same action. Ultimate permission to perform the action will not be granted unless all grant blocks evaluate to true.
As you can see, Grant is pretty simple to use, but it's not going to do the dirty work for you. It's up to you to make the proper security decisions. Grant's just there to make sure you don't forget.
"We're missing some coverage..."
While creating coverage reports for a fairly new JRuby on Rails project, we noticed that our coverage numbers weren't quite right: certain classes were missing from the coverage reports. Rcov doesn't know about classes unless they are required: not a problem for models, but we were missing tests for some controllers and libraries.
Oops.
To properly correct this problem, I wrote the coverage_helper (to live alongside the test_helper). Basically, this causes all of the classes to be required so that rcov knows about them.
test\coverage_helper.rb
require 'test/unit'
require 'test_helper'
class CoverageHelper < Test::Unit::TestCase
def test_coverage
['app', 'lib'].each {|path| Dir.glob("#{path}/**/*.rb") {|file|
require File.expand_path(file.chomp('.rb'))
}}
assert true
end
end
Simply include this test in your rcov builds and the problem is solved.
Because of how rcov counts lines and the way Ruby class loading works, you'll never see files with 0% coverage. However, at least you will see all of your classes listed and those that aren't covered will have a low percentage.
Is this really necessary?
First, the File.expand_path makes sure that your files are only required once. I hate random warning messages because constants are initialized twice (among other issues).
Second, no, I didn't need to make this a test, but it just seemed nicer to. I added the assert true simply because I didn't feel right about not asserting something in the test.
Third, as long as one uses the Rails scripts to generate the skeletons for your code, this scenario should never happen (because Rails will create all of the appropriate tests). However, there is the tendency not to use the generated scripts when they don't output what you want, which is what we have discovered (Rails and Legacy Database Schemas aren't a perfect fit). Also, sometimes I just forget to use them.
What if you can't run rcov?
One minor glitch of running JRuby on Windows is that the File.separator is technically incorrect (it's '/' instead of '\'). This usually isn't an issue... except when using rcov. Since rcov executes from the shell, the arguments requiring file names and/or directories won't work because the separator is the wrong direction from what Windows is expecting.
The fix is to add a couple of methods to the File class to address the problem.
Windows Separator Fix
class File
@@is_windows = ENV['OS'] &&
(not ENV['OS'].downcase.match(/^windows/).nil?)
def self.fix_name(name)
@@is_windows ? name.tr(File::SEPARATOR, File::ALT_SEPARATOR) : name
end
def self.fixed_join(*files)
self.fix_name(self.join files)
end
end
The reason to do the ENV['OS'] truth check first is that in JRuby on Solaris (where our CI is), that property doesn't exist. We couldn't use the RUBY_PLATFORM variable either, as in JRuby it's always assigned to 'java'.
I should note that I've only use these fixed separator methods when necessary (in my rcov rake task). The 'normal' separator has worked in every other situation I've run into.
Renae Bair's post on The Ranting Rubyists hits a lot of nails on the head. I will freely admit to being a developer who is interested in continually learning new technologies - perhaps even at the expense of the ones I currently develop in - and I try to contribute a little back by blogging and speaking at conferences like No Fluff Just Stuff on a semi-frequent basis. But Renae's point is that many people in the development world seem to be all about the New, New Thing and ready to dismiss the old things without a second thought. My feeling is that the old things don't go away, often we just end up piling more things on top. (It's new technologies all the way down.) Sometimes there certainly is wholesale replacement, but from what I've experienced usually you just mix in the new things and things become that much more heterogeneous.
I think it's fine to continually push "forward" to newer and better technologies that help you do the same thing in half the time, or in half the code, or allow things to execute on twice the processors, or scale twice as much. But at the same time it is simply not cool or very intelligent to dismiss the very tools that get you paid and perhaps got you where you are today. Sometimes the intent is just that; to dismiss the old in favor of the new for the purpose of making money. Sometimes the intent is merely the intellectual curiosity the best developers usually possess, and in fact the best people in any field possess. A few years ago I told a friend "Get Comfortable Being Uncomfortable." What I meant was to learn new things and push yourself to think about doing things better and more efficiently than you currently are doing them. Sometimes this means switching or advocating a new tool; sometimes it means using your existing tools more effectively. And always it means you can't rest on your laurels and you are always challenging the status quo. Many people don't like this. Well, too bad, because reality is that things change and Resistance is futile.
My day job is still mainly Java and web applications, though I also have managed to squeeze Ruby, Groovy, and Python in there (and of course realized the power of JavaScript) over time. I speak on mostly Java-related stuff like Hibernate and Spring and Groovy a bit. And currently I'm learning about new things (to me anyway) like functional languages such as Lisp and Clojure and Scala. Not because I think I'm going to rewrite the application I'm currently working on in a different language and/or framework, but because over time I feel learning new and different things makes me a better developer, architect, designer, etc. I know that the Java code I write today, while still crap, is way better than the crap I wrote several years ago, and has been influenced by learning Python and Ruby and Groovy and others. While it is still Java, I don't try to write overly generic, overly engineered things like I used to (well, perhaps not as much as I used to anyway). I just try to get the tasks I need to get done, done. If I need to make something more generic later, I can do it. But in addition to the power of just learning new things, I think the more well-rounded you are the better off you are and the better equipped you are to solve new problems. And maybe you'll find a much better way to solve them because you have a more diverse knowledge "portfolio" at your disposal.
So, getting back to Renae's post, I think it's a great idea to continue learning new things and pushing better ways of doing things, if for no other reason than to ensure your own relevance and marketability as a developer but hopefully because you enjoy it! But while it's OK to voice your opinion and seek new and better things, don't just rip to shreds the things that got you to where you are. In the past I've made comments to people like "Java sucks" and "I'd rather be doing Blub programming" and I've tried to curb that and realize that things change, we know more today than yesterday, and to just "Get Comfortable Being Uncomfortable." You might not always get to program in Blub but that shouldn't stop you from expanding what you know, and by the way the sphere of your knowledge should include more than just technical knowledge and probably should include things like economics, finance, culture, art, literature, sports, etc. Whatever. Just make yourself more well-rounded and you'll be better for it, in all aspects of life.
What's a Bloom filter?
A hash map works by applying a hash function to data, and associating the output with the input data. Considering a simple hash map which stores only values, and not buckets of values:
Traits of a hash map key set
- After an item has been stored, its existance will be known. It will never go missing. There is no chance of saying an item has not been stored after it has been stored.
- After an item has been stored, it may be identified as a different item. This happens when the hash function generates the same value for distinct data. So there is a risk of falsely confirming item B has been stored after only storing item A.
- The space required to hold these keys will be the product of the key size and key count. While key size is probably fixed, key count is variable and will lead to more space needed as more items are added.
Introducing Bloom filters
A Bloom filter shares traits 1 and 2 of a hash map key set, but not trait 3. The space required by a Bloom filter does not increase as more items are stored. Rather than processing data with one hash function and using the result as a lookup value, a Bloom filter uses multiple hash functions to identify booleans in a fixed size array.Bloom filter traits
- Same as hash map key set trait 1
- Same as hash map key set trait 2
- Storage space is small, but depends on the desired false positive rate
- Lookup time depends on number of hash functions; can be run in parallel
- Items cannot be removed
Code
This tacky, easily improvable code was released to the public domain and promises nothing.0: class MediocreBloomFilter 1: M = 16 # number of boolean fields in filter 2: K = 3 # number of hash functions to find boolean fields 3: 4: attr_accessor :v # the filter, a vector of boolean fields 5: 6: def initialize() 7: @v = Array.new(M, 0) 8: end 9: 10: def track_item(item) 11: indicies_for?(item).each do |i| 12: @v[i] = 1 13: end 14: end 15: 16: def item_tracked?(item) 17: hits = 0 18: indicies_for?(item).each do |i| 19: hits = hits + @v[i] 20: end 21: hits == K 22: end 23: 24: def indicies_for?(item) 25: indicies = [] 26: md5 = Digest::MD5.new 27: hash = "" 28: 29: md5.update(item.to_s) 30: hash = md5.hexdigest 31: 32: K.times do |k| 33: indicies << HEX.index(hash[k].chr.to_s.upcase) 34: end 35: 36: indicies 37: end 38: 39: HEX = [ "0", "1", "2", "3", "4", "5", "6", "7", 40: "8", "9", "A", "B", "C", "D", "E", "F" ] 41: end
- Line 7 sets the filter storage to 16 integers. It could have been two bytes if I wanted to do bitwise operations.
- Line 24 runs the hash functions against the target data. Since this is tacky sample code only one hash function is executed (line 30), and multiple slices are taken from the value and converted into array indicies (at line 33).
- Using integers rather than booleans has a nice side effect. It permits removal of items by using the integers as reference counts. I haven't tested it, but minor changes should be needed around lines 12 and 19, and a removal function to decrement the counts.
- item: "abc"
- md5: 900150983cd24fb0d6963f7d28e17f72
- indicies: [9, 0, 0]
- filter: [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- item: "abd"
- md5: 4911e516e5aa21d327512e0c8b197616
- indicies: [4, 9, 1]
- filter: [1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- item: "seth schroeder"
- md5: 2c38f5fd13873c139567551ca1b7496e
- indicies: [2, 12, 3]
- filter: [1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0]
References
- Using Bloom Filters
- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
- Burton Bloom, Space/time trade-offs in hash coding with allowable errors, CACM, 13(7):422-426, July 1970.
- Paul E. Black, "Bloom filter", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 16 May 2008. Available from: http://www.nist.gov/dads/HTML/bloomFilter.html
P.S.: If you've read this far THANK YOU. This entry has also been posted to my personal blog.
Problem 10 of Project Euler is to sum the prime numbers less than two million. Solutions are supposed to take no more than one minute, so a decent prime number detector is mandatory. I'm sure lots of free, high quality solutions are available, but I have this problem with DIY / NIH syndrome. Somehow I found the sieve of Eratosthenes and a quick, tacky swipe at it was good enough.
This process is to make a list of numbers from 2 through some limit. Start with the lowest number in the list, and remove all multiples from the list. Next time through, pick the next smallest number that hasn't been crossed off yet and eliminate its multiples. Stop when your current number meets the square root of the limit. After that, the remaining numbers are prime:
def eratosthenes_sieve(ceiling)
vals = (0..ceiling).to_a
vals[0..1] = [nil, nil]
Math.sqrt(ceiling).floor.times do |val|
next unless vals[val]
((val ** 2)..ceiling).step(val) do |mult|
vals[mult] = nil
end
end
vals
end
A big hash like this is useful for O(1) checking for prime/not prime, but the solution requires the sum of the primes:
def primes_through(ceiling)
eratosthenes_sieve(ceiling).compact
end
sum = 0
primes_through(ARGV[0].to_i).each { |val| sum = sum + val }
puts sum
I'm sure there's a more Ruby-esque way to write that. I wanted to use Python for xrange goodness, but I'm still not at peace with its quirks. Groovy would have been nice, but the free editor support is still quite young. Ruby fit right in the gap.

