Recently about General

Learning ANTLR part I

| | Comments (1) | TrackBacks (0)
This year one of my goals is to try and become proficient in using ANTLR. I think that learning to translate text or build an external DSL is skill that, although not used everyday, will be very useful to know. For my first attempt I settled on something fairly easy, a SQL like grammar that could be used to search for files and the content within those files. You should also be able to narrow the search results based on when the file was last modified. My goal is to take something like the following:
select * from /logs where file="*.out" and pattern="foobar" and modified < 2 days ago
select * from /logs where file='*.out' and pattern='foobar' and modified between 20 and 30 minutes ago
and translate it to the corresponding find command and pipe the results to xargs and grep:
find /logs -name '*.out' -mtime -2 | xargs grep 'foobar'
find /logs -name '*.out' -mmin +20 -mmin -30 | xargs grep 'foobar'
As an aside, if you are not familiar with xargs, check out this xargs tutorial or the xargs man pages , it's a great utility that executes a command with the output of a previous command.

Disclaimer

Now before the villagers gather up with torches and pitch forks to run me out of town (I'm channeling Young Frankenstein here), I would like to make somewhat of a disclaimer. I am not suggesting a new language or discouraging learning the *nix command line tools. The point here is to learn ANTLR. I found it more interesting to translate something I use everyday on my current project, versus some of the other "Hello World" ANTLR examples I have seen. So other than a using this grammar as a learning exercise, I don't see it as being useful.

Introduction

ANTLR is a deep topic, so obviously one blog post can not go into any great detail. So what follows is not in-depth coverage of ANTLR, but a detailed description of the grammar developed. I will explain each section as well as some of the decisions and trade-offs I made. For my development environment I'm using:
  1. Eclipse 3.5.1
  2. Java 6
  3. The ANTLR IDE plugin for Eclipse. You could also use ANTLRWorks, the gui development environment for ANTLR. ANTLRWorks is an excellent tool, I just felt more comfortable to do this work in Eclipse.
  4. ANTLR version 3.2
  5. Mac OS X 10.6.2.
So with all of that out of the way, let's get started looking at the grammar.

options, @header

grammar FQL;
options {
     language = Java;
}
@header {
     package bbejeck.antlr.fql;
}
Here I am specifying a combined grammar named FQL. (FQL is short for File Query Language and yes, I know the name sucks) In options I'm specifying that I want the generated code to be Java. I could have also specified C,C++ or Python here as well. ANTLR also has support for generating code in Ruby, but with the version I am using (v 3.2) I could not get it to work. I did find ANTLR Ruby. I have not tried it out, but from the documentation it looks promising. The @header option is setting the package for the generated parser code. This is also where I would have specified any needed imports.

@members

The @members section is where you place instance variables and methods that will be placed and used in the generated parser. Most likely the code in the members section will be used in embedded actions in the parser rules.
 @members {
  private StringBuilder findBuilder = new StringBuilder("find ");
  
  private StringBuilder filter = new StringBuilder();
  
  private void addString(String s){
    if(s!=null){
        findBuilder.append(s);
     }
  }
  
  private String buildTimeArg(String s, String snum, String sign){
       StringBuilder timeBuilder = new StringBuilder();
       int num = Integer.parseInt(snum);
       
       if(s.equals("days")){
           return timeBuilder.append(" -mtime ").append(sign).append(num).toString();
       }
       if(s.equals("hours")){
           return timeBuilder.append(" -mmin ").append(sign).append((num*60)).toString();
       }
       
       return timeBuilder.append(" -mmin ").append(sign).append(num).toString();
  }
  
  protected void mismatch(IntStream input, int ttype, BitSet follow) throws RecognitionException{
        throw new MismatchedTokenException(ttype,input);
  }
  
  public Object recoverFromMismatchedSet(IntStream input, RecognitionException e, BitSet follow) throws RecognitionException{
     throw e;
  }
  
}
The two StringBuilders findBuilder and filter will be used by embedded actions to build up our translated query. The reason for two StringBuilders will be explained when we cover the parsing rules. The addString method is to check for optional tokens that could be null. I could have easily checked for null in the embedded code within each rule, but I felt it cluttered the grammar too much. The buildTimeArg method is used as sort of a poor man's symbol table to translate the modified clause to the proper time format for the mmin or mtime arguments. The final two methods override how the generated parser responds to recognition errors (the generated parser extends ANTRL's Parser class which in turn extends the BaseRecognizer class). By default ANTLR will recover from recognition errors and continue on, trying to read more tokens if available. But in this grammar, if there is a recognition error along the way I want to stop processing right there.

@rulecatch

Each parser rule is converted into a method call in the generated parser with a try - catch block surrounding the parsing code. The catch statement here will be embedded in each one of the try-catch blocks in the parser.
@rulecatch{
    catch (RecognitionException e){
            throw e;
      }
}
If you remember from the previous section we want to stop parsing stop when RecognitionExceptions are encountered, so we re-throw the caught exception.

@lexer::header

Here we are specifying the package for the generated lexer.
@lexer::header {
  package bbejeck.antlr.fql;
}
Now let's move on to the parsing rules.

Parsing Rules

evaluate returns [String query]
      :  query';' {$query = builder.toString() + filter.toString() ;}
      ;

query
       :   select_stmt where_stmt
       ;

select_stmt
      :  'select' '*' 'from' directory
      ;
Here evaluate is our top level rule and returns a String, translated and built as the input is parsed. Anything within the curly braces is code that will be embedded in the generated parser. Note how we reference query from the grammar by placing a '$' before the word 'query'. Also note that the string returned is a concatenation from the two StringBuilders we declared in the @members section. The query rule is comprised of a select_stmt followed by a where_stmt. The select_stmt is "select * from" followed by the directory rule.
directory
       : (p='.'{addString($p.text);} | (p='/'?{addString($p.text);}IDENT{addString($IDENT.text);})+ )
       ;
The directory rule accepts either a '.', a relative or an absolute path. If the first expression is not provided there must be at least one path expression denoted by the '+'. The variable 'p' is used to give a handle to the '.' or '/' token so it can be extracted . IDENT is a lexer rule which will be explained a little bit later. All tokens here are passed into the addString method defined in the members section.
where_stmt
       :  ('where'  clause ('and' clause)* ) ?
       ;
clause
       : file_name
       | pattern
       | modified
       ;
The where_stmt rule expects the string 'where' followed by 0 or more clauses. Also the entire where_stmt is optional. Here I chose form over substance. By that I mean the grammar as it stands here will allow multiple clause's that would not make sense, i.e multiple file_name arguments etc. I could have specified an exact order of clauses that would have also effectively set the limit of clauses entered, but I would rather the grammar be flexible and trust that the user knows what they want to do.
  
file_name
       : 'file'  '=' STRING_LITERAL
         {addString(" -name ");addString($STRING_LITERAL.text);}
       ;

pattern
       :   'pattern'  '=' STRING_LITERAL
             { filter.append(" | xargs grep  ").append($STRING_LITERAL.text); }
       ;
The file_name rule sets the -name argument again using the addString method. The lexer rule STRING_LITERAL will accept whatever the user inputs. The pattern rule builds up the grep command. Here we see the use of the second StringBuilder filter that was defined in the @members section. I feel that having a second StringBuilder to capture text for the grep filter is a hack. The issue is that the grep command needs to be last in our translated query, but I really want the where statement to be in any order. So by placing the tokens captured by the pattern rule in a separate StringBuilder I can easily guarantee the grep statement will be last.
modified
       :  modified_less
       |  modified_more
       |  modified_between
       ;
The modified rule has three options. This portion builds the mmin/mtime argument(s) for the find command.
   
modified_less
       :   'modified'  '<'  INTEGER time_span                             
           { addString(buildTimeArg($time_span.text,$INTEGER.text,"-")); }                     
       ; 
  
modified_more                     
       :   'modified'  '>' INTEGER time_span
           { addString(buildTimeArg($time_span.text,$INTEGER.text,"+")); }
       ;

modified_between
       :   'modified' 'between' int1=INTEGER 'and' int2=INTEGER time_span
            { addString(buildTimeArg($time_span.text,$int1.text,"+")); }
            { addString(buildTimeArg($time_span.text,$int2.text,"-")); }
       ;
The grammar allows you to specify searching by the time a file was last modified. Here we use the method buildTimeArg to translate the input to the correct argument for either mmin (minutes modified) or mtime (days modified). Also take note of setting the two variables int1 and int2. Those are used to disambiguate which INTEGER token to use.
time_span
       :   'days'
       |   'minutes'
       |   'hours'
       ;
The time_span rule allows input of days, minutes or hours. The hours argument is converted into minutes by the buildTimeArg method. That's it for the parsing rules, now on to the lexer rules.

Lexer Rules

fragment DIGIT : '0'..'9';
fragment LETTER : 'a'..'z'|'A'..'Z' ;

STRING_LITERAL : '\''.*'\'';
INTEGER : DIGIT+ ;
IDENT : LETTER(LETTER | DIGIT)* ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+  {$channel=HIDDEN;};
DIGIT and LETTER are not lexer rules, as you can see by the fragment definition. These are used for making the grammar more readable. In the WS definition the {$channel=HIDDEN;} is used to ignore whitespace in the input.

Test Code

I used the following code to test the grammar from the command line:
public class FQLTester {

public static void main(String[] args) throws Exception{
     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
     String line = null;
     System.out.println("Enter your search:");
     while((line = reader.readLine())!= null){
         if(line.equalsIgnoreCase("quit")){
            System.exit(0);
         }
        CharStream charstream = new ANTLRStringStream(line);
        FQLLexer lexer = new FQLLexer(charstream);

        TokenStream tokenStream = new CommonTokenStream(lexer);
        FQLParser parser = new FQLParser(tokenStream);

        String parsed = null;
        try{
            parsed = parser.evaluate();
            System.out.println("parsed query is ["+parsed+"]");
            Process process = Runtime.getRuntime().exec(new String[]{"sh","-c",parsed});
            InputStream input = process.getInputStream();
            BufferedReader procReader = new BufferedReader(new InputStreamReader(input));
            String searchResults = null;
            while((searchResults=procReader.readLine())!=null){
                  System.out.println(searchResults);
            }
        }catch(Exception e){
               e.printStackTrace();
        }
      System.out.println("Enter your search:");
    }
}
Since this blog is just scratching the surface as far as ANTLR's capabilities are concerned, I plan to be writing more about ANTLR in the near future. Full source code for everything presented is available here. More resources for learning ANTLR are: That's it for now, thanks for your time.
Thanks to Near Infinity's generous training budget, I had the opportunity last week to spend 3 days with several members of the Object Mentors group. These guys have an enormous amount of experience, especially "Uncle" Bob Martin. So I thought I would share a few of the pearls of wisdom they dropped along the way:

  • "Programming is a social exercise" - I thought this was a really good point. It was mentioned in the context of pair-programming, but I think it has far-reaching implications. Software development is more than just running through a bunch of formulas and crunching out an answer. Interaction within the team and with domain experts is crucial, to not only build the software right, but build the right software.
  • "A refactoring is something that takes a few seconds, or a few minutes at most" - I was really impressed with the importance of automated refactorings in his discussions. I think most of the time that I spend "refactoring" is in small, manual edits, whereas most of the time that he spends is in using automated refactoring to "chunk" his edits. I definitely need to learn these keystrokes and refactorings better. Maybe I should start a "Refactoring Driven Development" movement...
  • "Don't put refactoring on the schedule; do it all the time" - Simple, but effective. My tendency is to want to spend all my time refactoring, but this curbs that, because it forces me to refactor while I'm delivering user stories. 
  • "There are 3 essential design skills: nose, vision, and plan" - A nose for recognizing design smells, a vision for seeing a good design for your codebase, and an ability to come up with a plan to get from point A to point B. 
  • "Testing trumps good design" - This bothered me at first, but I think it's a really good point. The idea  here is not to say that design is not important. But, rather, if you are forced to choose between between a "bad" design that allows better test coverage (e.g., less encapsulation), and "good" design which is hard to test, choose testability. The reason here is that the biggest roadblock to changing your codebase is not bad design, but FEAR of breaking something. If you know you will know when you've broken something, then you can retrofit a better design later.
  • "There is self-worth tied up in "finishing" something" - He also drew a distinction between having something working (which some developers will call "finishing" it), and finishing it - making it not only work, but be thoroughly tested, maintainable, etc.
  • Presentation layer - he talked about having the thinnest possible UI layer, which talks to a presentation layer to find out everything about how it should render. Then test the UI and business logic completely separately
  • "You aren't doing agile development unless you are tracking your velocity and remaining story points in terms of passing, automated acceptance tests" - I balked at this at first, feeling like it was too easy to use this as a performance to beat the team over the head with. After I thought about it, though, it's really about the definition of done. We are done if all the features we said would be working are working. And how can we know this? Only by testing them. Continuously. Which means it should be automated.
  • Acceptance tests don't need to be end-to-end, and, in fact, shouldn't be. This is another one that made me hesitate. After all, how do you know the feature is really working unless you go all the way from the UI to the database and back? Well, in short, because you're the developer. There's more value in being able to test features fast, constantly, than in being able to truly test them all the way from one end to the other. Mock/stub out what you need to to make that happen.
  • FitNesse is cool. This is the second time I've played around with that tool, and the second time I've been impressed with its power and simplicity. I definitely need to play around with it some more.

Hello, World!

| | Comments (0) | TrackBacks (0)
I'm a new software engineer here at Near Infinity, and very excited to be aboard! Hope to put some real content on this blog soon.
I was considering switching from PC to a Mac, but I have Adobe CS4 Design Premium and didn't want to pay the $1,799 to buy it again. It was surprisingly difficult to find any information about whether or not I could do it, and if so how. Some Adobe employees at a trade show even told me that it couldn't be done.

It turns out, though, that for $6.25 shipping and handling and the promise that you'll stop using and destroy your older version, they'll let you change. They call it Cross-Platform Swap. Here's how you do it:

  1. Go to http://www.adobe.com/go/supportportal If you have an account, log in. If you don't, you'll need to create one.

  2. Under Get Support, Click on Orders and Returns

  3. Choose the issue type Return / Exchange / Refund and then click the button that says Proceed to Online Form

  4. Fill out the form, typing Cross Platform Swap in the subject line. Submit.
    -- In the notes field, here's what I typed: I am switching to a Mac and would like to do a cross-platform swap. I understand that my existing copy must be destroyed as soon as the new one arrives.

  5. The next day, you'll get an email saying your issue is resolved and providing no useful information. Instructions for what to do next are actually in the ticket itself, which you can get back to on the support portal. Here's what you do:
    -- Call 1-800-833-6687 and follow the prompts to customer service (#2 then #4)
    -- Have your case number and credit card ready
    -- Give them your credit card info and verify all your contact information
And that's it. They charge your credit card for $6.25 shipping and handling. Your old serial number gets invalidated, you destroy the old copy, and they send you the new one.

As a side note, here's a tip for getting customer service from Adobe: Instead of going to the support portal when you have a question, go to the pages on their website where they have prices and information about purchasing products. Almost immediately, you'll get a helpful person IMing you asking if they can help. These IM helpers are nowhere to be found (ironically) in the support section of their site.

Oh, and their customer support phone numbers aren't that easy to find, so here they are:
Adobe Customer Support: 1-800-585-0774 option #4
or
Adobe Customer Support: 1-800-833-6687 option #2 then option #4.
 
Good luck platform-swappers!

Having just completed a week of SharePoint Power User training, one thing is evident -- I am far from being a Power User. What they should call the class is SharePoint User Tapas. Trainees get just a little taste of all that SharePoint can do. It's enough to make us say "Ooh, I love this," and "I'm not so crazy about that." I can certainly see the benefits of implementing SharePoint in an organization, and I also understand the dangers if it is not done well. This leads me to one conclusion -- don't be afraid to think big, but start small. Choose one business problem to focus on, make sure it's one you think you can get your organization to buy in to, and go from there. It is not very likely that SharePoint will be the best solution for every business problem that you have.

Training always leaves you with a head full ideas for how to implement solutions based on the new knowledge you have gained. It's like meeting a new love interest for the first time -- you think of all of the things that "could be," but you have to go on a lot of dates before you can sort out what is reality versus fantasy in terms of your expectations.