Recently about Ruby

System Metrics is a new Rails 3 Engine providing a clean web interface to the performance metrics instrumented with ActiveSupport::Notifications. It will collect and display the notifications baked into Rails and any additional custom ones you add using ActiveSupport::Notification#instrument.

System Metrics is not intended to be a replacement for performance monitoring solutions such as New Relic. However, it is especially handy for quickly identifying performance problems in a development environment. It's also a great alternative for private networks disconnected from the Internet.

You can find more information about System Metrics on the System Metrics site. Please kick the tires and let us know what you think.

System Metrics Detail View

This post is going to demonstrate thrift usage by searching a Lucene index from Ruby.

Thrift In a Nutshell

Essentially thrift is a serialization and RPC framework that allows you to communicate between programs that are not necessarily written in the same language. Thrift is used by defining data types and services in a .thrift file. You then run the .thrift file against the thrift compiler which generates the stub code needed for clients and servers. Currently thrift will generate code for C++, C#, Erlang, Haskell, Java, Objective C/Cocoa, OCaml, Perl, PHP, Python, Ruby, and Squeak. For a more detailed description of thrift along with instructions on how to install thrift if needed, consult the thrift wiki

Generating the Lucene Index

Our first step is to generate a small, simple lucene index. To build our index, 50,000 fake person records were downloaded from the Fake Name Generator in a comma delimited file. Each person record will contain a first name, last name, address and email address. Our indexing code will be very simple and will not be using any of lucene's advanced features.
public class IndexBuilder {

    public static void main(String[] args) throws Exception {
        String namesFile = "names.csv";
        Document doc = new Document();
        Field[] fields = new Field[]{new Field("firstName", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS),
                new Field("lastName", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS),
                new Field("address", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS),
                new Field("email", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS)};
        addFieldsToDocument(doc, fields);

        BufferedReader reader = new BufferedReader(new FileReader(namesFile));

        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File("blog-index")),new IndexWriterConfig(Version.LUCENE_31, new StandardAnalyzer(Version.LUCENE_31)));

        String line;
        while ((line = reader.readLine()) != null) {
            String[] personData = getPersonData(line);
            setFieldData(personData, fields);
            indexWriter.addDocument(doc);
        }
        indexWriter.optimize();
        indexWriter.close();
    }

    private static String[] getPersonData(String line) {
        return line.split(",");
    }

    private static void setFieldData(String[] data, Field[] fields) {
        int index = 0;
        for (Field field : fields) {
            field.setValue(data[index++]);
        }
    }

    private static void addFieldsToDocument(Document doc, Field[] fields) {
        for (Field field : fields) {
            doc.add(field);
        }
    }
}

Creating the .thrift File

The next step will be to define what objects and services we want in our .thrift file, which will be called lucene_search.thrift. The lucene_search.thrift file is intentionally very basic. For more details on the structure of .thrift files consult the thrift wiki tutorial
//all generated java code will have the following for package name
namespace java bbejeck.thrift.gen

//this is the person object 
struct Person {
  1: string firstName,
  2: string lastName,
  3: string address,
  4: string email
}

//exception used to send meaningful error messages back to user
exception LuceneSearchException {
  1: string message
}

//service definition used by client and server
service LuceneSearch { 
    list<Person> search(1: string query) throws (1:LuceneSearchException error) 
}
As you can see from the example above, the .thrift file format is completely language agnostic. Next we need to generate our java and ruby code. The following were run from the command line:
  • $ thrift --gen java lucene_search.thrift
  • $ thrift --gen rb lucene_search.thrift
The generated code ends up in two directories named gen-java/ and gen-rb/ respectively. The files generated for java are LuceneSearch.java, LuceneSearchException.java and Person.java. The generated ruby files are lucene_search.rb, lucene_search_types.rb and lucene_search_constants.rb. In our next step, we are going to use generated java code to write our thrift server.

Thrift Server - Java

Thrift generates all the stub code you need for a server to expose your service or program. The only code we will need to write is a class that implements the generated Iface interface (defined in the LuceneSearch class), which contains the search method defined in our .thrift file.
public class LuceneThriftServer {
    private static final int PORT = 9090;
    private static int numberThreads = 5;

    public static void main(String[] args) throws Exception {
        TServerSocket serverSocket = new TServerSocket(PORT, 100000);
        LuceneSearch.Processor searchProcessor = new LuceneSearch.Processor(new SearchHandler(args[0]));
        if (args.length > 1) {
            numberThreads = Integer.parseInt(args[1]);
        }
        TThreadPoolServer.Args serverArgs = new TThreadPoolServer.Args(serverSocket);
        serverArgs.maxWorkerThreads(numberThreads);
        TServer thriftServer = new TThreadPoolServer(serverArgs.processor(searchProcessor).protocolFactory(new TBinaryProtocol.Factory()));
        thriftServer.serve();
    }

Iface Implementation

The SearchHandler class actually does the work of searching the lucene index. One tradeoff made here is that any exception while searching is caught and re-thrown as a LuceneSearchException. While it's usually not a great idea to just re-throw an exception, in this case it makes sense to do so. Since the LuceneSearchException is defined in the lucene_search.thrift file, the generated client code will handle that exception. So instead of receiving a generic thrift exception when an error occurs, the client should receive a more meaningful error message.
public class SearchHandler implements LuceneSearch.Iface {
    private IndexSearcher searcher;
    private QueryParser queryParser;
    private static final int MAX_RESULTS = 1000;

    public SearchHandler(String indexPath) {
        try {
            searcher = new IndexSearcher(FSDirectory.open(new File(indexPath)), true);
            queryParser = new QueryParser(Version.LUCENE_31, null, new StandardAnalyzer(Version.LUCENE_31));
            queryParser.setAllowLeadingWildcard(true);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public List<Person> search(String query) throws LuceneSearchException {
        List<Person> results = new ArrayList<Person>();
        try {
            Query q = queryParser.parse(query);
            TopDocs topDocs = searcher.search(q, MAX_RESULTS);
            for (ScoreDoc sd : topDocs.scoreDocs) {
                Document document = searcher.doc(sd.doc);
                results.add(getPersonFromDocument(document));
            }
        } catch (Exception e) {
            throw new LuceneSearchException(e.getMessage());
        }
        return results;
    }

    private Person getPersonFromDocument(Document document) {
        Person p = new Person();
        p.firstName = document.get("firstName");
        p.lastName = document.get("lastName");
        p.address = document.get("address");
        p.email = document.get("email");

        return p;
    }
}
The next step in our process is to write the client.

Thrift Client - Ruby

Writing the thrift ruby client is even easier than writing the server code. If you have not already done so, install the thrift gem by running "gem install thrift" to get the required thrift library code. All the code you need for your client is already generated by thrift. At this point we are only doing what is needed to get the client to communicate with the server.
module ThriftConnection

  class LuceneClient

    def initialize(host='localhost', port=9090)
      socket = Thrift::Socket.new(host, port)
      @transport = Thrift::BufferedTransport.new(socket)
      protocol_factory = ::Thrift::BinaryProtocolFactory.new
      protocol = protocol_factory.get_protocol(@transport)
      @transport.open
      @client = LuceneSearch::Client.new(protocol)
    end

    def search(query)
      @client.search(query)
    end

    def close
      @transport.close
    end

  end
end

Running With Scissors

This section has the odd title "Running With Scissors", because like actual running with scissors, what we are about to do may not be a great idea. In all the thrift generated code there is a warning at the top "DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING", obviously I don't, but I'm not going to let that stop me at this point (sometimes you just have to see if you can get something to work!) What we've done is implement method_missing in the generated Person class (found in lucene_search_types.rb) so we can specify searches ala ActiveRecord style. What we are going to do to accomplish this is use a regular expression to pull out what fields to search for and use the arguments passed in as the search values. The regular expression here is fairly simple and only aims to handle simple searches.
#added to translate from symbol to expected search format
  SEARCH_KEYS_MAPPING = {:first_name => 'firstName',
                                            :last_name => 'lastName',
                                            :email => 'email',
                                            :address => 'address'}


  def self.method_missing(method_name, *args)
    lucene_client = ThriftConnection::LuceneClient.new
    query = ""
        #handles find_by_first_name etc
    if method_name.to_s =~ /^find_by_([a-z]+_?[a-z]*)$/
      query = "#{SEARCH_KEYS_MAPPING[$1.to_sym]}:#{args[0]}"
        #handles find_by_first_name_or_last_name, find_by_first_name_and_email 
    elsif method_name.to_s =~/^find_by_([a-z]+_[a-z]+)_([a-z]+)_([a-z]+_?[a-z]*)$/
      query ="#{SEARCH_KEYS_MAPPING[$1.to_sym]}:#{args[0]} #{$2.upcase} #{SEARCH_KEYS_MAPPING[$3.to_sym]}:#{args[1]}"
    else
       raise ArgumentError.new("search method pattern #{method_name} not recognized")
    end

    results = lucene_client.search(query)
    lucene_client.close
    results
  end
As we'll see in the next section, this actually worked, but I still view this more as a useful experiment. First of all this was placed in generated code, so any time you make changes you would have to manually get the method_missing definition back into the Person class. Secondly, Lucene search syntax is really not all that hard to learn.

Testing

All of what we have done so far would not be worth much if we could not verify our work with some testing. Here is the unit test to verify that we are indeed able to search a Lucene index from Ruby. To get some names to search on I simply ran
head names.csv
and then used some of the information in various combinations to get counts of what searches should return. For example to get an idea of what searching for a first name of Elizabeth or last name of Krause would return I ran
cat names.csv | grep -iE 'elizabeth|krause' | wc -l 
which returned a count of 289. So, first making sure that our thrift server was running in the background, here is the unit test that was run to verify our Ruby client searching against a Lucene index.
class SearchTest < Test::Unit::TestCase

  def setup
    @lucene_client = ThriftConnection::LuceneClient.new
  end


  def teardown
    @lucene_client.close
  end

  def test_search_client_first_name
    persons = @lucene_client.search("firstName:Tia")
    assert_equal(5, persons.length)

    persons.each do |person|
      assert_equal("Tia", person.firstName)
    end
  end

  def test_search_person_class_first_name
    persons = Person.find_by_first_name("Tia")
    assert_equal(5, persons.length)

    persons.each do |person|
      assert_equal("Tia", person.firstName)
    end
  end

  def test_search_client_first_name_email_domain
    persons = @lucene_client.search("+firstName:Elizabeth +email:*pookmail.com")
    assert_equal(59, persons.length)
  end

  def test_search_person_class_first_name_email_domain
    persons = Person.find_by_first_name_and_email("elizabeth", "*pookmail.com")
    assert_equal(59, persons.length)
  end

  def test_search_client_first_name_and_last_name
    persons = @lucene_client.search("+firstName:Elizabeth +lastName:Krause")
    assert_equal(1, persons.length)
    person = persons[0]

    assert_equal("Elizabeth", person.firstName)
    assert_equal("Krause", person.lastName)
  end

  def test_search_person_class_first_name_and_last_name
    persons = Person.find_by_first_name_and_last_name("elizabeth", "krause")
    assert_equal(1, persons.length)
    person = persons[0]

    assert_equal("Elizabeth", person.firstName)
    assert_equal("Krause", person.lastName)
  end

  def test_search_person_class_first_name_or_last_name
    persons = Person.find_by_first_name_or_last_name("elizabeth", "krause")
    assert_equal(289, persons.length)
  end

  def test_invalid_search
    assert_raises ArgumentError do
      Person.find_person_by_name("tia")
    end
  end

end

Conclusion

Thrift is a compelling alternative for RPC or message passing where one might otherwise be using either REST, Java RMI or middleware (JMS, AMQP). There is a great comparison of how thrift performs against other forms of RPC in this thrift tutorial from OCI found near the end of the article. It is hoped the reader was able to learn something useful. Thanks for your time

Resources

Full source for the blog including the generated code can be found on github. If you are interested in running the test you can download lucene-thrift-example.tar.gz extract the tar file and execute the runSearchTest.sh script. You do not need to have thrift installed to run the test.
  • For more information on thrift the thrift wiki is a great start
  • More information on Lucene can be found here

My project recently started developing a web app using Rails 3 and Extjs, a pure JavaScript frontend, i.e. no ERB, haml, etc. We have several multi-page forms where we use a JavaScript model to hold the form data and upon save serialize the JSON back to the server. If successful, the updated data is rendered as JSON and sent back to the client.

The model has a simple one-to-many relationship (parent/children)

{parent = { id: 1, name: 'Jim', children: 
[{id: 2, name: 'Larry',parent_id:1},
{id: 3, name:'Curly',parent_id:1},
{id:4, name: 'Moe',parent_id:1}] } 
class Parent < ActiveRecord::Base
  attr_accessible :name
  has_many :children 
end

class Child < ActiveRecord::Base
  belongs_to :parent
end

My expectation was that when I save the parent data, via update_attributes, both the parent and child data save together in a single transaction.

class ParentsController< ApplicationController
  ...
  def update
    parent = Parent.find(params[:id])
    parent.update_attributes(params)
    ...
  end
end

Being new to developing Rails apps in the "real world" I had not dealt with this, what I thought to be, common scenario. I found that out of the box the parent data saves, but not the child data. I first explored the book "Agile Web Development with Rails", but didn't find what I needed. Next I checked out the Rails Guides, in particular the section on associations The first setting I found was the :autosave setting:

If you set the :autosave option to true, Rails will save any loaded members and destroy members that are marked for destruction whenever you save the parent object.

Seemed like a good fit.

has_many :children, :autosave => true

However, the children still did not save when calling update_attributes (I'm still not sure exactly what this is used for)

Next stop, Google. Some searching brought me to the ActiveRecord "accepts_nested_attributes_for" method. It turns out this is exactly what I needed. So I added "accepts_nested_attributes_for :children" to Parent

class Parent < ActiveRecord::Base
  attr_accessible :name
  has_many :children
  accepts_nested_attributes_for :children
end

However, just adding "accepts_nested_attributes_for :children" wasn't enough. It turns out that with the accepts_nested_attributes_for method, the Parent class receives a method called "children_attributes=(attributes)". This allows for the parent's children attributes to be updated. This also means I have to update my JSON so that the "children" property is called "children_attributes".

parent = { id: 1, name: 'Jim', children_attributes: 
[{id: 2, name: 'Larry',parent_id:1},
{id: 3, name:'Curly',parent_id:1},
{id:4, name: 'Moe',parent_id:1}] } 

Since I am using "attr_accessible" I also have to include "children_attributes" in the list

class Parent < ActiveRecord::Base
  attr_accessible :name, :children_attributes
  has_many :children
  accepts_nested_attributes_for :children
end

Now calling "parent.update_attributes" saves both the parent and its children, within a single transaction. Rails will handle inserts/updates/deletes appropriately. If the child has an id, the child is updated. If the child is missing an id, a new child is added. If the child has the "_destroy" attribute set, the child is removed.

The last todo was to serialize the JSON back to the client upon success. When trying this:

render :json => parent.to_json(:include => :children)

my JSON included "children", not "children_attributes":

{parent = { id: 1, name: 'Jim', children: 
[{id: 2, name: 'Larry',parent_id:1},
{id: 3, name:'Curly',parent_id:1},
{id:4, name: 'Moe',parent_id:1}] } 

For the save to work correctly, the JSON needs "children_attributes", (which is a bit of a disconnect between relationship and nested attribute naming conventions) I was not able to figure out how to get Rails and "to_json" to use "children_attributes". For my solution I did not ":include => :children" in my call to "to_json" and I overrode "as_json". In my "as_json" I manually add the "children attributes" to the parent JSON.

  def as_json(options={})
    json = super(options)
    json[:children_attributes]=[]
    
    self.children.each do |child|
      json[:children_attributes] << {:id => child.id, :name => child.name}
    end
    json
  end

Looking back on getting the save to work, it seems rather simple. However, several things made this a little harder than it should have been. The first being the documented coverage. Surprisingly, this was not covered in the book "AWDR" or Rails Guides. It is spelled out quite well in the API docs, however, but you need to know about the NestedAttributes module The ":autosave" setting misled me a bit. I will need to go back and do a bit more research on this. Lastly, the naming convention for nested attributes threw me off and the fact that I had to "rig" as_json. Perhaps there is a way to handle this by "convention".

For further reading check out Ryan's blog or the Nested Attributes API docs.

Capistrano is great to work with. It's simple, powerful and flexible. For the last couple of weeks I've been building a series of Capistrano tasks that check the status and configuration of our web servers. I'm pretty happy with the results, so I though I would share the basic structure of what I've come up with. I'm just going to provide a couple of tasks here, along with all of the support methods, to provide a framework, but the idea would be to expand it with whatever commands would be run to verify the health of a server.

There were several requirements that I had to keep in mind when I was designing this

  1. The cap tasks should use the environment configuration files that we already have.
  2. Because there are a large number of tasks we should be able to run them individually, or with a single command we should be able to run the entire suite.
  3. The cap script should produce well formatted, easy to read output, so that it's clear what's broken and where.


Control Support Methods

While building the validation tasks I found that I was having to do the same basic operations fairly often:

  1. Issue a command on a remote host.
  2. Parse the output of that command.
  3. Issue more commands based on the output.

Or sometimes I just wanted to run a series of commands on one host, verifying that it was correct before moving on to the next. It's possible to make Capistrano work this way, but it's not well documented. By default, if you give Capistrano multiple commands to run on multiple hosts, the it will run the commands in host order. However what I really wanted is for Capistrano to run the commands in command order.

For instance: given that we have host-A, host-B, host-C, and I want to run commands doA, doB, and doC by default Capistrano will run:

    HostA > doA
    HostB > doA
    HostC > doA
    HostA > doB
    HostB > doB
    HostC > doB
    HostA > doC
    HostB > doC
    HostC > doC

But what I needed it to do was:

    HostA > doA
    HostA > doB
    HostA > doC
    HostB > doA
    HostB > doB
    HostB > doC
    HostC > doA
    HostC > doB
    HostC > doC

By using the roles that we already have configured for our deploy tasks with Capistrano we were able to create a few support methods that issue commands in the order that we want.


each_host

The each_host method is used to iterate through our configured hosts. The method prints out the hostname (which is important for the look of the script's output) sets the current host, then yields the block that was passed into it.

  def each_host
    roles[:web].each do |host|
      print_hostname host
      set(:current_host, host.host)
      yield host
    end
  end


run_serial

The run_serial method runs the given command, but only on the current host, yielding the output of the command and the name of the output stream (:out or :err). The ssh channel we don't have much use for.

  def run_serial(command)
    run command, :hosts => fetch(:current_host) do |chan, stream, data|
      yield stream, data
    end
  end


run_primary

We also have a primary server set in our configuration. We can create a few more support methods that take advantage of the primary host, for issuing command on one host only.

  def run_primary(command)
    run command, :hosts => fetch(:primary) do |chan, stream, data|
      set_current_host channel
      yield stream, data
    end
  end


currently_primary?

Having a method that tells us if we're currently on the primary host is helpful as well.

  def currently_primary?
    fetch(:current_host) == fetch(:primary).host
  end


set_current_host

The current host can also be set from the ssh channel. This method is called from the run_primary method, but this needs to be called when we call the default run method.

  def set_current_host(channel)
    host = channel.properties[:host]
    print_hostname host
    set(:current_host, host)
  end

Now we can write tasks that execute in in command order, execute tasks just on the primary host, and as we're preforming checks we print out the current host, so we're able to keep track of where we are.



Printing Support Methods

Because printing the script output out to the command line is an important part of the script I'm going to go over the printing support next. When running the server checks I wanted the output to look like rspec or cucumber. As each check is performed I wanted to see a colorful pass or fail message with the values that I was checking against to be highlighted. Also once the checks were finished I wanted to see a list of everything that failed, grouped by host.


print_hostname

This is the print_hostname method that was mentioned above. Everytime we move to a different host this method prints out the hostname.

  def print_hostname(name)
    puts "    Host: #{grey(name)} >"
  end


step

Also at the beginning of every task I wanted to print the task name and the number of the task.

  def step(name)
    step = fetch(:step)
    puts "\n[#{step}] #{name}"
    set(:step, step+1)
  end


failure

The failure method is called when a check fails. It prints the failure message, and stores it under the current hostname so that it can be printed again after all the checks have been done. If the expected or actual parameters are included it prints those along with the message, otherwise just the message is printed.

  def failure(message, expected=nil, actual=nil)
    host = fetch(:current_host)
    complete = (expected || actual) ?
      "#{message} Expected: #{grey(expected)} Actual: #{grey(actual)}" :
      message
    errors = fetch(:errors)
    unless errors[host]
      errors[host] = []
    end
    errors[host] << message
    log_item(complete, :fail)
  end


pass

The pass method is called when a check passes.

  def pass(message)
    log_item(message, :pass)
  end


log_item

The log_item method is called by the pass and failure methods. The method includes a warn and empty status which can also be used by the tasks when printing their status.

  def log_item(message, flag=nil)
    status = case flag
      when :pass : "      [#{green("PASS")}] "
      when :fail : "      [#{red("FAIL")}] "
      when :warn : "      [#{orange("WARN")}] "
      else "      "
    end
    puts "#{status}#{message}"
  end


colorize

The methods below handle the coloring of the text as it's printed to the command line. This was actually pretty fun figuring out. Every script should have fancy colorized output. (and now every script of mine will, muaha ha ha)

  def red(text)
    colorize(text, 31)
  end

  def orange(text)
    colorize(text, 33)
  end

  def green(text)
    colorize(text, 32)
  end

  def grey(text)
    colorize(text, 37)
  end

  def colorize(text, color_code)
    "\e[#{color_code}m#{text}\e[0m"
  end


Start and Finish Tasks

Now that all that's taken care of we can start writing some actual Capistrano tasks to take advantage of our fine grained control and fancy printing.


These server check tasks all share common startup and finish tasks that need to be run before and after any one of the tasks are run, or when the entire suite is run. This is defined using Capistrano's :start and :finish callbacks, but should only be run for the check tasks.
  TASK_LIST = [
    "check:all",
    "check:environment_variables",
    "check:middleware_versions",
    "check:apache_configuration"]

  on :start, 'check:setup', :only => TASK_LIST
  on :finish, 'check:print_errors', :only => TASK_LIST

setup

The setup task handles whatever specific setup needs to be done, but at the very least the step and errors variables need to be defined. They're used by the printing methods. Also, all of the tasks below are inside of the check namespace.

  namespace :check do

    task :setup do
      set :step, 1
      set :errors, {}
    end


print_errors

The print errors method is run after all the checks have been run to print a convenient list of errors grouped by host. This keeps errors from being lost in the output.

  task :print_errors do
    errors = fetch(:errors)
    if (errors.size() > 0)
      print "\n ==== #{red 'Errors'} ===="
      errors.each do |host,list|
        print_hostname host
        list.each {|e| puts "      #{e}"}
      end
    end
  end


Validation Tasks

The meat of the script are in the validation tasks. Out current script has fifteen different tasks that do everything from checking environment variables, to verifying directory and file permissions, to checking network and database statuses. If it can be automated, we'll find a way to get it in. Rather then include concrete examples though I'm just going to include some skeleton methods to show the method structure to illustrate how the support methods are used together in the Capistrano tasks.


Simple Tasks

This task executes one command on each server and validates the output from a list of expected results. This form is used by our script to check environment variables, commands on the sudo list, and programs in the cron, all of which can be read and verified with one command. I'm just including the validateVariable, and getVariable methods here to show the pass and failure methods.

  VARIABLES = [
    { :key => 'KEY', :value => /Expected Value/ },
    { :key => 'KEY', :value => /Expected Value/ },
    { :key => 'KEY', :value => /Expected Value/ }]

  task :simple do
    step "A Simple Check"
    run "env" do |channel, stream, data|
      set_current_host channel
      VARIABLES.each do |map|
        validateVariable(data, map)
      end
    end
  end

  def validateVariable(data, map)
    value = getVariable(data, map[:key])
    (value.match map[:value]) ?
      pass("Env #{grey(map[:key])} is set to #{grey(value)}") :
      failure("Env #{grey(map[:key])} is incorrect", map[:value].inspect, value)
  end

  def getVariable(data, key)
    result = data.match /^#{key}=(.*)/
    unless result
      failure "Env #{grey(key)} is not set."
      return nil
    end
    result[1]
  end


Tasks that run multiple commands on the same host

Most of the tasks in our validation script run multiple commands on the same host, using the each_host and run_serial methods. This allows us to read files and act on the values in those files. It's alse used because the script output looks better organized when running tasks in host order. The task below itterates through a list of standard directories, then a list of configured directories. For each directory a run_serial command is executed to see if the directory exists, then once the lists have been gone through it's done again on the next host.

  task :serial_example do
    step "Run multiple commands on a host"
    each_host do
      DIRECTORIES.each do |dir|
        verify_directory dir
      end

      [:deploy_to, :transfer_path, :cache_path].each do |key|
        dir = fetch(key)
        if (dir)
          verify_directory dir
        else
          falure "No directory configured for #{grey(key)}"
        end
      end

    end
  end

  def verify_directory(path)
    command = "[ -d #{path}] && echo 'true' || echo 'false'"
    run_serial command do |stream, data|
      if (data.strip == 'true')
        pass "Verified that #{grey(path)} exists."
      else
        failure "Directory at #{grey(path)} does not exist."
      end
    end
  end


Tasks using different server combinations

Here's a command that runs on the primary host, reads a file, then acts across multiple hosts. We do something like this to verify that the Apache configuration is correct. We know the httpd.conf is the same across hosts, but we want to verify that the files and directories that are in the configuration are actually on the host in question.

  task :primary_example do
    step "Verifying Certificates"
    path = fetch(:path_to_httpd_conf)
      certs = {}
      run_primary "grep SSLC[eA] #{path}" do |stream,data|

        if (stream == :err)
          failure "No http configuration file at #{grey(path)}"
          break
        end

        keys = [
          'SSLCertificateFile',
          'SSLCertificateKeyFile',
          'SSLCACertificatePath',
          'SSLCARevocationPath']

        keys.each do |key|
          certs[key] = read_http_conf_value(data,key)
        end

        pass "Read Certificate Paths from HTTP Configuration"
      end

      each_host do
        verify_file certs['SSLCertificateFile']
        verify_file certs['SSLCertificateKeyFile']
        verify_directory certs['SSLCACertificatePath']
        verify_directory certs['SSLCARevocationPath']
      end
    end


And finally we need a task that runs the whole suite of validation tasks. The all task does nothing itself, it just runs all of the other tasks.

  desc "Run all of the validation tasks"
  after "check:all",
    "check:environment_variables",
    "check:middleware_versions",
    "check:apache_configuration"

  task :all do
  end

Then to execute the cap task you would type is:

cap -q production check:all

And watch as all the beautiful passes and fails scroll by, though hopefully more of the former.

You've no doubt heard about JRuby, which lets you run Ruby code on the JVM. This is nice, but wouldn't it be nicer if you could write Java code on a Ruby VM? This would let you take advantage of the power of Ruby 1.9's new YARV (Yet Another Ruby VM) interpreter while letting you write code in a statically-typed language. Without further ado, I'd like to introduce RJava, which does just that!

RJava lets you write code in Java and run it on a Ruby VM! And you still get the full benefit of the Java compiler to ensure your code is 100% correct. Of course with Java you also get checked exceptions and proper interfaces and abstract classes to ensure compliance with your design. You no longer need to worry about whether an object responds to a random message, because the Java compiler will enforce that it does.

You get all this and more but on the power and flexibility of a Ruby VM. And because Java does not support closures, you are ensured that everything is properly designed since you'll be able to define interfaces and then implement anonymous inner classes just like you're used to doing! Even when JDK 8 arrives sometime in the future with lambdas, you can rest assured that they will be statically typed.

As a first example, let's see how you could filter a collection in RJava to find only the even numbers from one to ten. In Ruby you'd probably write something like this:

evens = (1..10).find_all { |n| n % 2 == 0 }

With RJava, you'd write this:

List<Integer> evens = new ArrayList<Integer>();
for (int i = 1; i <= 10; i++) {
  if (i % 2 == 0) {
    evens.add(i);
  }
}

This example shows the benefits of declaring variables with specific types, how you can use interfaces (e.g. List in the example) when declaring variables, and shows how you also get the benefits of Java generics to ensure your collections are always type-safe. Without any doubt you know that "evens" is a List containing Integers and that "i" is an int, so you can sleep soundly knowing your code is correct. You can also see Java's powerful "for" loop at work here, to easily traverse from 1 to 10, inclusive. Finally, you saw how to effectively use Java's braces to organize code to clearly show blocks, and semi-colons ensure you always know where lines terminate.

I've just released RJava on GitHub, so go check it out. Please download RJava today and give it a try and let me know what you think!