Recently by Bill Bejeck

This post is going to demonstrate thrift usage by searching a Lucene index from Ruby.

Thrift In a Nutshell

Essentially thrift is a serialization and RPC framework that allows you to communicate between programs that are not necessarily written in the same language. Thrift is used by defining data types and services in a .thrift file. You then run the .thrift file against the thrift compiler which generates the stub code needed for clients and servers. Currently thrift will generate code for C++, C#, Erlang, Haskell, Java, Objective C/Cocoa, OCaml, Perl, PHP, Python, Ruby, and Squeak. For a more detailed description of thrift along with instructions on how to install thrift if needed, consult the thrift wiki

Generating the Lucene Index

Our first step is to generate a small, simple lucene index. To build our index, 50,000 fake person records were downloaded from the Fake Name Generator in a comma delimited file. Each person record will contain a first name, last name, address and email address. Our indexing code will be very simple and will not be using any of lucene's advanced features.
public class IndexBuilder {

    public static void main(String[] args) throws Exception {
        String namesFile = "names.csv";
        Document doc = new Document();
        Field[] fields = new Field[]{new Field("firstName", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS),
                new Field("lastName", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS),
                new Field("address", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS),
                new Field("email", "", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS)};
        addFieldsToDocument(doc, fields);

        BufferedReader reader = new BufferedReader(new FileReader(namesFile));

        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File("blog-index")),new IndexWriterConfig(Version.LUCENE_31, new StandardAnalyzer(Version.LUCENE_31)));

        String line;
        while ((line = reader.readLine()) != null) {
            String[] personData = getPersonData(line);
            setFieldData(personData, fields);
            indexWriter.addDocument(doc);
        }
        indexWriter.optimize();
        indexWriter.close();
    }

    private static String[] getPersonData(String line) {
        return line.split(",");
    }

    private static void setFieldData(String[] data, Field[] fields) {
        int index = 0;
        for (Field field : fields) {
            field.setValue(data[index++]);
        }
    }

    private static void addFieldsToDocument(Document doc, Field[] fields) {
        for (Field field : fields) {
            doc.add(field);
        }
    }
}

Creating the .thrift File

The next step will be to define what objects and services we want in our .thrift file, which will be called lucene_search.thrift. The lucene_search.thrift file is intentionally very basic. For more details on the structure of .thrift files consult the thrift wiki tutorial
//all generated java code will have the following for package name
namespace java bbejeck.thrift.gen

//this is the person object 
struct Person {
  1: string firstName,
  2: string lastName,
  3: string address,
  4: string email
}

//exception used to send meaningful error messages back to user
exception LuceneSearchException {
  1: string message
}

//service definition used by client and server
service LuceneSearch { 
    list<Person> search(1: string query) throws (1:LuceneSearchException error) 
}
As you can see from the example above, the .thrift file format is completely language agnostic. Next we need to generate our java and ruby code. The following were run from the command line:
  • $ thrift --gen java lucene_search.thrift
  • $ thrift --gen rb lucene_search.thrift
The generated code ends up in two directories named gen-java/ and gen-rb/ respectively. The files generated for java are LuceneSearch.java, LuceneSearchException.java and Person.java. The generated ruby files are lucene_search.rb, lucene_search_types.rb and lucene_search_constants.rb. In our next step, we are going to use generated java code to write our thrift server.

Thrift Server - Java

Thrift generates all the stub code you need for a server to expose your service or program. The only code we will need to write is a class that implements the generated Iface interface (defined in the LuceneSearch class), which contains the search method defined in our .thrift file.
public class LuceneThriftServer {
    private static final int PORT = 9090;
    private static int numberThreads = 5;

    public static void main(String[] args) throws Exception {
        TServerSocket serverSocket = new TServerSocket(PORT, 100000);
        LuceneSearch.Processor searchProcessor = new LuceneSearch.Processor(new SearchHandler(args[0]));
        if (args.length > 1) {
            numberThreads = Integer.parseInt(args[1]);
        }
        TThreadPoolServer.Args serverArgs = new TThreadPoolServer.Args(serverSocket);
        serverArgs.maxWorkerThreads(numberThreads);
        TServer thriftServer = new TThreadPoolServer(serverArgs.processor(searchProcessor).protocolFactory(new TBinaryProtocol.Factory()));
        thriftServer.serve();
    }

Iface Implementation

The SearchHandler class actually does the work of searching the lucene index. One tradeoff made here is that any exception while searching is caught and re-thrown as a LuceneSearchException. While it's usually not a great idea to just re-throw an exception, in this case it makes sense to do so. Since the LuceneSearchException is defined in the lucene_search.thrift file, the generated client code will handle that exception. So instead of receiving a generic thrift exception when an error occurs, the client should receive a more meaningful error message.
public class SearchHandler implements LuceneSearch.Iface {
    private IndexSearcher searcher;
    private QueryParser queryParser;
    private static final int MAX_RESULTS = 1000;

    public SearchHandler(String indexPath) {
        try {
            searcher = new IndexSearcher(FSDirectory.open(new File(indexPath)), true);
            queryParser = new QueryParser(Version.LUCENE_31, null, new StandardAnalyzer(Version.LUCENE_31));
            queryParser.setAllowLeadingWildcard(true);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public List<Person> search(String query) throws LuceneSearchException {
        List<Person> results = new ArrayList<Person>();
        try {
            Query q = queryParser.parse(query);
            TopDocs topDocs = searcher.search(q, MAX_RESULTS);
            for (ScoreDoc sd : topDocs.scoreDocs) {
                Document document = searcher.doc(sd.doc);
                results.add(getPersonFromDocument(document));
            }
        } catch (Exception e) {
            throw new LuceneSearchException(e.getMessage());
        }
        return results;
    }

    private Person getPersonFromDocument(Document document) {
        Person p = new Person();
        p.firstName = document.get("firstName");
        p.lastName = document.get("lastName");
        p.address = document.get("address");
        p.email = document.get("email");

        return p;
    }
}
The next step in our process is to write the client.

Thrift Client - Ruby

Writing the thrift ruby client is even easier than writing the server code. If you have not already done so, install the thrift gem by running "gem install thrift" to get the required thrift library code. All the code you need for your client is already generated by thrift. At this point we are only doing what is needed to get the client to communicate with the server.
module ThriftConnection

  class LuceneClient

    def initialize(host='localhost', port=9090)
      socket = Thrift::Socket.new(host, port)
      @transport = Thrift::BufferedTransport.new(socket)
      protocol_factory = ::Thrift::BinaryProtocolFactory.new
      protocol = protocol_factory.get_protocol(@transport)
      @transport.open
      @client = LuceneSearch::Client.new(protocol)
    end

    def search(query)
      @client.search(query)
    end

    def close
      @transport.close
    end

  end
end

Running With Scissors

This section has the odd title "Running With Scissors", because like actual running with scissors, what we are about to do may not be a great idea. In all the thrift generated code there is a warning at the top "DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING", obviously I don't, but I'm not going to let that stop me at this point (sometimes you just have to see if you can get something to work!) What we've done is implement method_missing in the generated Person class (found in lucene_search_types.rb) so we can specify searches ala ActiveRecord style. What we are going to do to accomplish this is use a regular expression to pull out what fields to search for and use the arguments passed in as the search values. The regular expression here is fairly simple and only aims to handle simple searches.
#added to translate from symbol to expected search format
  SEARCH_KEYS_MAPPING = {:first_name => 'firstName',
                                            :last_name => 'lastName',
                                            :email => 'email',
                                            :address => 'address'}


  def self.method_missing(method_name, *args)
    lucene_client = ThriftConnection::LuceneClient.new
    query = ""
        #handles find_by_first_name etc
    if method_name.to_s =~ /^find_by_([a-z]+_?[a-z]*)$/
      query = "#{SEARCH_KEYS_MAPPING[$1.to_sym]}:#{args[0]}"
        #handles find_by_first_name_or_last_name, find_by_first_name_and_email 
    elsif method_name.to_s =~/^find_by_([a-z]+_[a-z]+)_([a-z]+)_([a-z]+_?[a-z]*)$/
      query ="#{SEARCH_KEYS_MAPPING[$1.to_sym]}:#{args[0]} #{$2.upcase} #{SEARCH_KEYS_MAPPING[$3.to_sym]}:#{args[1]}"
    else
       raise ArgumentError.new("search method pattern #{method_name} not recognized")
    end

    results = lucene_client.search(query)
    lucene_client.close
    results
  end
As we'll see in the next section, this actually worked, but I still view this more as a useful experiment. First of all this was placed in generated code, so any time you make changes you would have to manually get the method_missing definition back into the Person class. Secondly, Lucene search syntax is really not all that hard to learn.

Testing

All of what we have done so far would not be worth much if we could not verify our work with some testing. Here is the unit test to verify that we are indeed able to search a Lucene index from Ruby. To get some names to search on I simply ran
head names.csv
and then used some of the information in various combinations to get counts of what searches should return. For example to get an idea of what searching for a first name of Elizabeth or last name of Krause would return I ran
cat names.csv | grep -iE 'elizabeth|krause' | wc -l 
which returned a count of 289. So, first making sure that our thrift server was running in the background, here is the unit test that was run to verify our Ruby client searching against a Lucene index.
class SearchTest < Test::Unit::TestCase

  def setup
    @lucene_client = ThriftConnection::LuceneClient.new
  end


  def teardown
    @lucene_client.close
  end

  def test_search_client_first_name
    persons = @lucene_client.search("firstName:Tia")
    assert_equal(5, persons.length)

    persons.each do |person|
      assert_equal("Tia", person.firstName)
    end
  end

  def test_search_person_class_first_name
    persons = Person.find_by_first_name("Tia")
    assert_equal(5, persons.length)

    persons.each do |person|
      assert_equal("Tia", person.firstName)
    end
  end

  def test_search_client_first_name_email_domain
    persons = @lucene_client.search("+firstName:Elizabeth +email:*pookmail.com")
    assert_equal(59, persons.length)
  end

  def test_search_person_class_first_name_email_domain
    persons = Person.find_by_first_name_and_email("elizabeth", "*pookmail.com")
    assert_equal(59, persons.length)
  end

  def test_search_client_first_name_and_last_name
    persons = @lucene_client.search("+firstName:Elizabeth +lastName:Krause")
    assert_equal(1, persons.length)
    person = persons[0]

    assert_equal("Elizabeth", person.firstName)
    assert_equal("Krause", person.lastName)
  end

  def test_search_person_class_first_name_and_last_name
    persons = Person.find_by_first_name_and_last_name("elizabeth", "krause")
    assert_equal(1, persons.length)
    person = persons[0]

    assert_equal("Elizabeth", person.firstName)
    assert_equal("Krause", person.lastName)
  end

  def test_search_person_class_first_name_or_last_name
    persons = Person.find_by_first_name_or_last_name("elizabeth", "krause")
    assert_equal(289, persons.length)
  end

  def test_invalid_search
    assert_raises ArgumentError do
      Person.find_person_by_name("tia")
    end
  end

end

Conclusion

Thrift is a compelling alternative for RPC or message passing where one might otherwise be using either REST, Java RMI or middleware (JMS, AMQP). There is a great comparison of how thrift performs against other forms of RPC in this thrift tutorial from OCI found near the end of the article. It is hoped the reader was able to learn something useful. Thanks for your time

Resources

Full source for the blog including the generated code can be found on github. If you are interested in running the test you can download lucene-thrift-example.tar.gz extract the tar file and execute the runSearchTest.sh script. You do not need to have thrift installed to run the test.
  • For more information on thrift the thrift wiki is a great start
  • More information on Lucene can be found here
From time to time I think all developers have done some form of benchmarking. I recently discovered Caliper which is according to the site - "Caliper is Google's open-source framework for writing, running and viewing the results of Java Microbenchmarks". I am aware that micro-benchmarking can be misleading depending on who is writing the tests, but sometimes they are very helpful in getting a feel for how your code is running.

Background

Micro benchmarks are dead simple to do. Get the current time in milliseconds, execute your code, then get the time in milliseconds again and subtract the difference. So why use a tool like Caliper? To me Caliper is great because it has a very familiar JUnit type of structure and feel to it. Instead of trying to describe what Caliper does, it's probably easiest to look at a simple example I put together. I decided to benchmark implementations of Heap Sort,Merge Sort,Quick Sort and the sort method of the java.lang.Arrays class. Not a terribly original idea I admit, but it was quick to do and it makes the point. NOTE: Package statement and imports left out intentionally for brevity
public class SortingBenchmarks extends SimpleBenchmark {
	
	private static final int SIZE = 100000;
	private static final int MAX_VALUE = 80000;
	private int[] values;

	@Override
	protected void setUp() throws Exception {
		values = new int[SIZE];
		Random generator = new Random();
		for (int i = 0; i < values.length; i++) {
			values[i] = generator.nextInt(MAX_VALUE);
		}
	}

	public void timeHeapSort(int reps) {
		for (int i = 0; i < reps; i++) {
			HeapSort.sort(values);
		}
	}

	public void timeMergeSort(int reps) {
		for (int i = 0; i < reps; i++) {
			MergeSort.sort(values);
		}
	}
	
	public void timeQuickSort(int reps) {
		for (int i = 0; i < reps; i++) {
			QuickSort.sort(values);
		}
	}

	public void timeArraysSort(int reps) {
		for (int i = 0; i < reps; i++) {
			Arrays.sort(values);
		}
	}
}

Getting Started

Here are the basic steps to writing benchmarks in Caliper:
  1. Extend the class SimpleBenchmark
  2. Do any test setup/clean up in respective setUp or tearDown methods (Similar to JUnit setUp/tearDown)
  3. Write the methods that will execute the code to benchmark starting with the word "time" (Again similar to JUnit, just "time" instead of "test")
  4. Place the code you want to benchmark inside your timeSomeOperation methods

Getting and Running Caliper

To get started with Caliper
  1. Go here to get a read-only svn link to the source code
  2. Check out the code then cd into <CALIPER_INSTALL_DIR> and run ant (obviously you need ant installed and on your path)
To actually run your benchmarks there is a bash script included in the project that you can use to run Caliper from the command line. I took a little different approach as I wanted to run from inside my Eclipse project.
  1. In <CALIPER_INSTALL_DIR>/build/caliper-0.0/lib/ there are two jar files, allocation.jar and caliper-0.0.jar. These will need to be on the classpath of your project. I chose to create a user library in Eclipse
  2. JUnit will also need to be on the project classpath. Again for me it's referenced in a user library
  3. I created a driver class that simply executes the main method of the com.google.caliper.Runner class. I pass in the name of my benchmark class by setting up a run configuration in Eclipse for my driver class
Here is the code for my driver class:
package bbejeck.caliper;

public class CaliperRunner {

    public static void main(String[] args) {
	com.google.caliper.Runner.main(args[0]);
    }

}
After running the driver this is the output received in the Eclipse console screen:
0% Scenario{vm=java, trial=0, benchmark=HeapSort} 10889428.57 ns; ?=69810.62 ns @ 3 trials
25% Scenario{vm=java, trial=0, benchmark=MergeSort} 9066618.18 ns; ?=44341.70 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=QuickSort} 3312312.93 ns; ?=21028.91 ns @ 3 trials
75% Scenario{vm=java, trial=0, benchmark=ArraysSort} 3104668.79 ns; ?=23965.54 ns @ 3 trials

 benchmark    ms logarithmic runtime
  HeapSort 10.89 ==============================
 MergeSort  9.07 =========================
 QuickSort  3.31 ==
ArraysSort  3.10 =

vm: java
trial:

Conclusion

Caliper is fairly new and is still has some work to be done, but I think that it's a great tool that could be very useful. Another potentially useful feature is that Caliper can automatically publish your benchmarks. Full source code for my examples can be found on github

JEE 6 and Spring MVC

| | Comments (1) | TrackBacks (0)
With the release of JEE 6 and the Servlet 3.0 specification came support for asynchronous servlets. While continuations and Comet are not new, the fact that it is now part of the servlet specification, and could be "baked in" to an application, piqued my curiosity. Although I have not used plain servlets in development for some time, I have been using Spring MVC. So I wanted to see what would happen if I added asynchronous support to a Spring-MVC DispatcherServlet. I created a very simple web application using Spring version 3.0.2 and annotation configuration. I would like to make clear this is purely an experiment and is not being used production.

Environment Used

For this blog I'm using:
  • a T61 Thinkpad running Ubuntu 10.10(64Bit) and 4G ram
  • Java JDK 1.6.0_21
  • Glassfish Version 3. (I tried Tomcat 7 and Jetty 8, but had the best luck at this point with Glassfish)

DispatcherServlet Code

My first step was to extend Spring's DispatcherServlet.
@WebServlet(urlPatterns = {"/async/*"}, asyncSupported = true, name = "async")
public class AsyncDispatcherServlet extends DispatcherServlet {
    private ExecutorService executor;
    private static final int NUM_ASYNC_TASKS = 15;
    private static final long TIME_OUT = 10 * 1000;
    private final Log log = LogFactory.getLog(AsyncDispatcherServlet.class);
@Override
    public void init(ServletConfig config) throws ServletException {
        super.init(config);
        executor = Executors.newFixedThreadPool(NUM_ASYNC_TASKS);
    }

    @Override
    public void destroy() {
        executor.shutdownNow();
        super.destroy();
    }

    @Override
    protected void doDispatch(final HttpServletRequest request, final HttpServletResponse response) throws Exception {
        final AsyncContext ac = request.startAsync(request, response);
        ac.setTimeout(TIME_OUT);
        FutureTask task = new FutureTask(new Runnable() {

            @Override
            public void run() {
                try {
                    log.debug("Dispatching request " + request);
                    AsyncDispatcherServlet.super.doDispatch(request,response );
                    log.debug("doDispatch returned from processing request " + request);
                    ac.complete();
                } catch (Exception ex) {
                    log.error("Error in async request", ex);
                }
            }
        }, null);

        ac.addListener(new AsyncDispatcherServletListener(task));
        executor.execute(task);
    }
The only methods overridden were init, destroy and doDispatch. I won't go into detail on init and destroy, what they do is obvious. All the interesting work is done in doDispatch. The doDispatch method starts the asynchronous request, then wraps the call to super.doDispatch in a runnable and passes that into an executor service. There are a few key points to consider here:
  1. The @WebServlet annotation at the class definition level. This is part of Servlet 3.0 specification, you can now declare servlets, filters etc via annotations, although you can still use web.xml. To enable asynchronous support set the 'asyncSupported' attribute to true.
  2. On line 22 the setting of a timeout for the asyncContext object. In this case the timeout is 10 seconds
  3. On line 38 setting an AsyncContextEventListener.
  4. The application server thread returns almost immediately.

Listener Code

Getting the request to run asynchronously is only half the battle. The other half is setting up hooks to handle different events during the life-cycle of the asynchronous request. The Servlet 3.0 spec added the AsyncListener interface. AsyncListener has 4 methods, onStartAsync, onComplete, onError and onTimeout. For the AsyncDispatcherServlet we have the inner class AsyncDispatcherServletListener that takes a FutureTask object as a constructor argument.
private class AsyncDispatcherServletListener implements AsyncListener {

        private FutureTask futureTask;

        public AsyncDispatcherServletListener(FutureTask futureTask) {
            this.futureTask = futureTask;
        }

        @Override
        public void onTimeout(AsyncEvent event) throws IOException {
            log.warn("Async request did not complete timeout occured");
            handleTimeoutOrError(event, "Request timed out");
        }

        @Override
        public void onComplete(AsyncEvent event) throws IOException {
            log.debug("Completed async request");
        }

        @Override
        public void onError(AsyncEvent event) throws IOException {
            log.error("Error in async request", event.getThrowable());
            handleTimeoutOrError(event, "Error processing " + event.getThrowable().getMessage());
        }

        @Override
        public void onStartAsync(AsyncEvent event) throws IOException {
            log.debug("Async Event started..");
        }

        private void handleTimeoutOrError(AsyncEvent event, String message) {
            PrintWriter writer = null;
            try {
                future.cancel(true);
                HttpServletResponse response = (HttpServletResponse) event.getAsyncContext().getResponse();
                //HttpServletRequest request = (HttpServletRequest) event.getAsyncContext().getRequest();
                //request.getRequestDispatcher("/app/error.htm").forward(request, response);
                writer = response.getWriter();
                writer.print(message);
                writer.flush();
            } catch (IOException ex) {
                log.error(ex);
            } finally {
                event.getAsyncContext().complete();
                if (writer != null) {
                    writer.close();
                }
            }
        }
    }
The onStartAsync and onComplete methods merely log a statement, but certainly could be used to open and close resources respectively. The only methods that do any work are onTimeout and onError, delegating to the handleTimeoutOrError method, passing a message and the AsyncEvent object. In handleTimeoutOrError we will call cancel on the futureTask object, write the message to the response stream, then mark the asyncContext as completed. While we are writing the error directly to the response stream, we could have just as easily forwarded to an error page by using the commented out call to request.getRequestDispatcher().forward (obviously you would eliminate lines 38-40).

Web Application Structure

This web application is very simple and has only two controllers - SimpleViewControler and SearchController.


@Controller
public class SimpleViewController {
    
    @RequestMapping({"/","/index.htm"})
    public String showHome(){
        return "index";
    }

    @RequestMapping({"/error.htm"})
    public String error(){
        return "error";
    }
@Controller
public class SearchController {

    @RequestMapping("/search.htm")
    public String doSearch(@RequestParam(value = "latency", defaultValue = "2000") long latency,
                                        @RequestParam(value = "blowup", defaultValue = "false") boolean blowUp,
                                        Model model) throws Exception {

        String searchResult = getSearchResult(latency, blowUp);

        model.addAttribute("result", searchResult);
        return "searchResult";
    }

    @RequestMapping("/search.ajax")
    public void doSearchAjax(@RequestParam(value = "latency", defaultValue = "2000") long latency,
            @RequestParam(value = "blowup", defaultValue = "false") boolean blowUp,
            HttpServletResponse response) throws Exception {

        String searchResult = getSearchResult(latency, blowUp);
        
        PrintWriter writer = null;
        try {
            writer = response.getWriter();
            writer.print(searchResult);
            writer.flush();
        } finally {
            if (writer != null) {
                writer.close();
            }
        }
    }

    private String getSearchResult(long latency, boolean blowUp) throws Exception {

        if (blowUp) {
            throw new RuntimeException("Bad error happened in controller");
        }

        Thread.sleep(latency);

        StringBuilder builder = new StringBuilder("Some search/whatever results being returned");
        Date now = new Date();
        builder.append(" @").append(now);
        
        return builder.toString();
    }
The latency and blowup parameters in SearchController are used to simulate different response times and errors respectively. Although there is the doSearchAjax method which writes directly to the response stream, in the tests that are run, we will only be using the doSearch method. There are the usual context files, which are very light due to the annotation configuration and a web.xml file (needed for the regular DispatcherServlet).

Testing

Now it's time to see if this experiment works at all. JMeter is a great tool and it is what I used to load test our simple web application. I have set up three tests.
  1. A "control" test - There are two thread-groups consisting of 50 threads each and will ramp up to run all threads in 3 seconds. One thread group will make requests to /app/index.htm and the other thread group will make requests to /app/search.htm. The thread-groups will execute simultaneously and loop 3 times for a total of 300 requests. Each thread-group has a "listener" attached to it to measure throughput, and there is a listener attached to the test to measure overall throughput. This test will give us our baseline. The requests to /app/search.htm will not set any parameters, so each request will have the default value of 2 seconds for latency.
  2. The "asynchronous" test - This test will measure the effect of using asynchronous servlets in the application. Setup is identical to the control test above with one exception - the search requests will go to /async/search.htm and hit the AsynchronousDispatcherServlet.
  3. An error condition test - This test will be structured a little differently. The thread-group for /app/index.htm has a longer ramp up time, but will remain the same otherwise. The thread group for /async/search.htm will add a JMeter option known as a 'RandomController'. There will be 3 possible search requests sent, a valid request, a request with the latency parameter set to 12 seconds causing a timeout and a request with the blowup parameter set to true, so a RuntimeException will be thrown.

Test Results

Control Test
Request Throughput
Overall 296.174/minute
Index 164.489/minute
Search 148.569/minute
Asynchronous Test
RequestThroughput
Overall849.177/minute
Index2,909.796/minute
Search429.84/minute
Error Test
RequestThroughput
Overall115.848/minute
Index2,803.738/minute
Search57.974/minute

As we can see from the test results, sending the search requests through the AsyncronousRequestDispatcher increased application throughput. The request per minute numbers don't mean that much though, given that the web application was so simple and the test was very contrived. What matters more is that the index requests had roughly the same response time and were seemingly unaffected when asynchronous support was used for the search requests

Summary

For me there were two main takeaways from this experiment:

  • Even though asynchronous support seemed help with throughput, it is still using a thread pool which consumes server resources, so it should be only be applied to very select parts of an application.
  • By setting timeouts and getting a chance to handle them gracefully via the event listener, asynchronous support acts a "circuit breaker" of sorts. This could be valuable when your application makes requests to outside resources that may be down or otherwise unresponsive.

Resources

Source for everything is available on github.
To run the JMeter tests

  1. Download JMeter and extract the tar/zip file to some directory
  2. Copy all of the *.jmx files in the jmeter directory from the github site for the code into <JMeter install>/bin. From the bin directory run jmeter or jmeter.bat depending on your platform. Once JMeter is up and running select File and you should see AsyncWebTestControl.jmx, AsyncWebTestErrors.jmx, AsyncWebTest.jmx in the File menu. Just click on one of those to open then Ctrl+r to run a test
  3. Download the war file and deploy to glassfish. I placed the war file in the autodeploy directory in glassfish. On my laptop it's in /usr/local/servers/glassfishv3/glassfish/domains/domain1/autodeploy.

Learning ANTLR part I

| | Comments (2) | TrackBacks (0)
This year one of my goals is to try and become proficient in using ANTLR. I think that learning to translate text or build an external DSL is skill that, although not used everyday, will be very useful to know. For my first attempt I settled on something fairly easy, a SQL like grammar that could be used to search for files and the content within those files. You should also be able to narrow the search results based on when the file was last modified. My goal is to take something like the following:
select * from /logs where file="*.out" and pattern="foobar" and modified < 2 days ago
select * from /logs where file='*.out' and pattern='foobar' and modified between 20 and 30 minutes ago
and translate it to the corresponding find command and pipe the results to xargs and grep:
find /logs -name '*.out' -mtime -2 | xargs grep 'foobar'
find /logs -name '*.out' -mmin +20 -mmin -30 | xargs grep 'foobar'
As an aside, if you are not familiar with xargs, check out this xargs tutorial or the xargs man pages , it's a great utility that executes a command with the output of a previous command.

Disclaimer

Now before the villagers gather up with torches and pitch forks to run me out of town (I'm channeling Young Frankenstein here), I would like to make somewhat of a disclaimer. I am not suggesting a new language or discouraging learning the *nix command line tools. The point here is to learn ANTLR. I found it more interesting to translate something I use everyday on my current project, versus some of the other "Hello World" ANTLR examples I have seen. So other than a using this grammar as a learning exercise, I don't see it as being useful.

Introduction

ANTLR is a deep topic, so obviously one blog post can not go into any great detail. So what follows is not in-depth coverage of ANTLR, but a detailed description of the grammar developed. I will explain each section as well as some of the decisions and trade-offs I made. For my development environment I'm using:
  1. Eclipse 3.5.1
  2. Java 6
  3. The ANTLR IDE plugin for Eclipse. You could also use ANTLRWorks, the gui development environment for ANTLR. ANTLRWorks is an excellent tool, I just felt more comfortable to do this work in Eclipse.
  4. ANTLR version 3.2
  5. Mac OS X 10.6.2.
So with all of that out of the way, let's get started looking at the grammar.

options, @header

grammar FQL;
options {
     language = Java;
}
@header {
     package bbejeck.antlr.fql;
}
Here I am specifying a combined grammar named FQL. (FQL is short for File Query Language and yes, I know the name sucks) In options I'm specifying that I want the generated code to be Java. I could have also specified C,C++ or Python here as well. ANTLR also has support for generating code in Ruby, but with the version I am using (v 3.2) I could not get it to work. I did find ANTLR Ruby. I have not tried it out, but from the documentation it looks promising. The @header option is setting the package for the generated parser code. This is also where I would have specified any needed imports.

@members

The @members section is where you place instance variables and methods that will be placed and used in the generated parser. Most likely the code in the members section will be used in embedded actions in the parser rules.
 @members {
  private StringBuilder findBuilder = new StringBuilder("find ");
  
  private StringBuilder filter = new StringBuilder();
  
  private void addString(String s){
    if(s!=null){
        findBuilder.append(s);
     }
  }
  
  private String buildTimeArg(String s, String snum, String sign){
       StringBuilder timeBuilder = new StringBuilder();
       int num = Integer.parseInt(snum);
       
       if(s.equals("days")){
           return timeBuilder.append(" -mtime ").append(sign).append(num).toString();
       }
       if(s.equals("hours")){
           return timeBuilder.append(" -mmin ").append(sign).append((num*60)).toString();
       }
       
       return timeBuilder.append(" -mmin ").append(sign).append(num).toString();
  }
  
  protected void mismatch(IntStream input, int ttype, BitSet follow) throws RecognitionException{
        throw new MismatchedTokenException(ttype,input);
  }
  
  public Object recoverFromMismatchedSet(IntStream input, RecognitionException e, BitSet follow) throws RecognitionException{
     throw e;
  }
  
}
The two StringBuilders findBuilder and filter will be used by embedded actions to build up our translated query. The reason for two StringBuilders will be explained when we cover the parsing rules. The addString method is to check for optional tokens that could be null. I could have easily checked for null in the embedded code within each rule, but I felt it cluttered the grammar too much. The buildTimeArg method is used as sort of a poor man's symbol table to translate the modified clause to the proper time format for the mmin or mtime arguments. The final two methods override how the generated parser responds to recognition errors (the generated parser extends ANTRL's Parser class which in turn extends the BaseRecognizer class). By default ANTLR will recover from recognition errors and continue on, trying to read more tokens if available. But in this grammar, if there is a recognition error along the way I want to stop processing right there.

@rulecatch

Each parser rule is converted into a method call in the generated parser with a try - catch block surrounding the parsing code. The catch statement here will be embedded in each one of the try-catch blocks in the parser.
@rulecatch{
    catch (RecognitionException e){
            throw e;
      }
}
If you remember from the previous section we want to stop parsing stop when RecognitionExceptions are encountered, so we re-throw the caught exception.

@lexer::header

Here we are specifying the package for the generated lexer.
@lexer::header {
  package bbejeck.antlr.fql;
}
Now let's move on to the parsing rules.

Parsing Rules

evaluate returns [String query]
      :  query';' {$query = builder.toString() + filter.toString() ;}
      ;

query
       :   select_stmt where_stmt
       ;

select_stmt
      :  'select' '*' 'from' directory
      ;
Here evaluate is our top level rule and returns a String, translated and built as the input is parsed. Anything within the curly braces is code that will be embedded in the generated parser. Note how we reference query from the grammar by placing a '$' before the word 'query'. Also note that the string returned is a concatenation from the two StringBuilders we declared in the @members section. The query rule is comprised of a select_stmt followed by a where_stmt. The select_stmt is "select * from" followed by the directory rule.
directory
       : (p='.'{addString($p.text);} | (p='/'?{addString($p.text);}IDENT{addString($IDENT.text);})+ )
       ;
The directory rule accepts either a '.', a relative or an absolute path. If the first expression is not provided there must be at least one path expression denoted by the '+'. The variable 'p' is used to give a handle to the '.' or '/' token so it can be extracted . IDENT is a lexer rule which will be explained a little bit later. All tokens here are passed into the addString method defined in the members section.
where_stmt
       :  ('where'  clause ('and' clause)* ) ?
       ;
clause
       : file_name
       | pattern
       | modified
       ;
The where_stmt rule expects the string 'where' followed by 0 or more clauses. Also the entire where_stmt is optional. Here I chose form over substance. By that I mean the grammar as it stands here will allow multiple clause's that would not make sense, i.e multiple file_name arguments etc. I could have specified an exact order of clauses that would have also effectively set the limit of clauses entered, but I would rather the grammar be flexible and trust that the user knows what they want to do.
  
file_name
       : 'file'  '=' STRING_LITERAL
         {addString(" -name ");addString($STRING_LITERAL.text);}
       ;

pattern
       :   'pattern'  '=' STRING_LITERAL
             { filter.append(" | xargs grep  ").append($STRING_LITERAL.text); }
       ;
The file_name rule sets the -name argument again using the addString method. The lexer rule STRING_LITERAL will accept whatever the user inputs. The pattern rule builds up the grep command. Here we see the use of the second StringBuilder filter that was defined in the @members section. I feel that having a second StringBuilder to capture text for the grep filter is a hack. The issue is that the grep command needs to be last in our translated query, but I really want the where statement to be in any order. So by placing the tokens captured by the pattern rule in a separate StringBuilder I can easily guarantee the grep statement will be last.
modified
       :  modified_less
       |  modified_more
       |  modified_between
       ;
The modified rule has three options. This portion builds the mmin/mtime argument(s) for the find command.
   
modified_less
       :   'modified'  '<'  INTEGER time_span                             
           { addString(buildTimeArg($time_span.text,$INTEGER.text,"-")); }                     
       ; 
  
modified_more                     
       :   'modified'  '>' INTEGER time_span
           { addString(buildTimeArg($time_span.text,$INTEGER.text,"+")); }
       ;

modified_between
       :   'modified' 'between' int1=INTEGER 'and' int2=INTEGER time_span
            { addString(buildTimeArg($time_span.text,$int1.text,"+")); }
            { addString(buildTimeArg($time_span.text,$int2.text,"-")); }
       ;
The grammar allows you to specify searching by the time a file was last modified. Here we use the method buildTimeArg to translate the input to the correct argument for either mmin (minutes modified) or mtime (days modified). Also take note of setting the two variables int1 and int2. Those are used to disambiguate which INTEGER token to use.
time_span
       :   'days'
       |   'minutes'
       |   'hours'
       ;
The time_span rule allows input of days, minutes or hours. The hours argument is converted into minutes by the buildTimeArg method. That's it for the parsing rules, now on to the lexer rules.

Lexer Rules

fragment DIGIT : '0'..'9';
fragment LETTER : 'a'..'z'|'A'..'Z' ;

STRING_LITERAL : '\''.*'\'';
INTEGER : DIGIT+ ;
IDENT : LETTER(LETTER | DIGIT)* ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+  {$channel=HIDDEN;};
DIGIT and LETTER are not lexer rules, as you can see by the fragment definition. These are used for making the grammar more readable. In the WS definition the {$channel=HIDDEN;} is used to ignore whitespace in the input.

Test Code

I used the following code to test the grammar from the command line:
public class FQLTester {

public static void main(String[] args) throws Exception{
     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
     String line = null;
     System.out.println("Enter your search:");
     while((line = reader.readLine())!= null){
         if(line.equalsIgnoreCase("quit")){
            System.exit(0);
         }
        CharStream charstream = new ANTLRStringStream(line);
        FQLLexer lexer = new FQLLexer(charstream);

        TokenStream tokenStream = new CommonTokenStream(lexer);
        FQLParser parser = new FQLParser(tokenStream);

        String parsed = null;
        try{
            parsed = parser.evaluate();
            System.out.println("parsed query is ["+parsed+"]");
            Process process = Runtime.getRuntime().exec(new String[]{"sh","-c",parsed});
            InputStream input = process.getInputStream();
            BufferedReader procReader = new BufferedReader(new InputStreamReader(input));
            String searchResults = null;
            while((searchResults=procReader.readLine())!=null){
                  System.out.println(searchResults);
            }
        }catch(Exception e){
               e.printStackTrace();
        }
      System.out.println("Enter your search:");
    }
}
Since this blog is just scratching the surface as far as ANTLR's capabilities are concerned, I plan to be writing more about ANTLR in the near future. Full source code for everything presented is available here. More resources for learning ANTLR are: That's it for now, thanks for your time.