Seth Schroeder

XML
20070522 Tuesday May 22, 2007
Learning to code Groovy Java &t

My introduction to Groovy

    Contents:
  1. Intro
  2. Code
  3. Results
  4. Code review
  5. Tools
  6. Conclusion
  7. Side notes

 

Idiomatic Groovy code looks and works like a mash-up of Java and Ruby... JRuby would be an intuitive name. After all, the "J" in the real JRuby has more to do with deployment and execution than look and feel of the code. Which do developers spend more time with?

The proof of the pudding is in the eating -- how Groovy is it? I decided to learn Groovy by writing a minimal RSS reader. It's no big chore in Java, but I was happily surprised how my Java and Ruby skills came into play. Please remember that this is my first shot at Groovy:


 

Grouping rss items by keyword

 1:  def groupByKeyword(groups) {
 3:      def map = [:]
 4:
 5:      try {
 6:          stream = url.openStream();
 7:          def slurper = new XmlSlurper();
 8:          def rss = slurper.parse(stream);
 9:
10:          groups.each {
11:              def group = it.toUpperCase();
12:              map[group] = [];
13:
14:              rss.channel.item.each {
15:                  if (it.title.text().toUpperCase() =~ group) {
16:                      map[group] += it.title;
17:                  }
18:              };
19:          }
20:      } catch (e) {
21:          println e;
22:      } finally {
23:          if (stream)
24:              stream.close();
25:      }
26:
27:      return map;
28:  }

collecting parameters and printing the results

29:    static void main(args) {
30:
31:        def url = args[0];
32:
33:        def keywords = args[1..<args.length];
34:    
35:        def reader = new RssReader(url);
36:
37:        def groups = reader.groupByKeyword(keywords);
38:
39:        groups.entrySet().each {
40:            println "$it.key"
41:
42:            it.value.each {
43:                println "\t* $it"
44:            }
45:        }
46:    }

 

Results

Input:

http://digg.com/rss/indexprogramming.xml java[s] java[^s] groovy haskell erlang microsoft emacs lisp

Output:

GROOVY
ERLANG
JAVA[^S]
EMACS
LISP
JAVA[S]
    * Top 5 javascript frameworks
    * The Javascript Programming Language
HASKELL
MICROSOFT

Input:

http://digg.com/rss/indexprogramming.xml java[s] java[^s] groovy haskell erlang microsoft emacs lisp

Output:

GROOVY
ERLANG
JAVA[^S]
EMACS
LISP
    * pregexp: Portable Regular Expressions for Scheme and Common Lisp
JAVA[S]
    * JavaScript - the world's most ubiquitous computing runtime... and it's not going away soon.
HASKELL
    *  Haskell's HTTP library: how it sucks, how to fix it
MICROSOFT
    *  Microsoft funds questionable study attacking GPL 3 draft process

 

A mini code review

  1. Line #1 demonstrates several Groovy-isms. The line starts a non-static method definition. The return type and parameter type were omitted.
  2. An empty map is defined at Line #3. I forgot to end the statement with a semicolon, but that is optional in Groovy.
  3. Lines 6..9 are more Java-ish than Groovy. I was inclined to work this way due to immature tool support. The Groovy plugin for Eclipse reported syntax errors when the parens were removed, but the code ran fine. The java-mode in emacs couldn't indent code lacking parens and semicolons.
  4. Ruby coders will recognize line 10. Notice the "it" variable on line 11. Groovy automatically defined "it" is a reference to the item being iterated over.
  5. Groovy packs a lot of code into line 14. The Groovy chunk "rss.channel.item" directly matches the RSS structure of <rss><channel><item>. That shrink-wrapped coupling was minimal and readable. I couldn't find an obvious reason to prefer xpath or other xml navigation methods.
  6. Line 15 would have been merged into line 14, but I ran out of patience getting findAll to work.
  7. Line 16 uses += to add an item to an array. -= works as expected.
  8. The fading C/C++ programmer inside me was THRILLED to use if (reference) ... at line 23.
  9. This cheesy code depended on positional parameters. Given that, it was nice to slice them so easily at line 33. The ..< syntax defines a half-open interval [min, max) where min is included and max is excluded.
  10. Notice how "it" is defined in both the inner and outer loop near line 39. The C/C++ guy inside of me got all worked up about shadowing stack variables... that line of thinking is dead wrong about closures in Groovy.

 

Tools

Groovy is a very young language; its tools can't be judged fairly against Java's tools. The most significant problem I had was during debugging. Groovy is very involved in dispatching method calls, so a lot of debugging time will be spent stepping over / out of Groovy internals. That didn't work for me; the app seemed to resume processing. Other issues I noticed were lack of auto package import / cleanup, auto method completion, and the mistaken compile errors on missing parens.

The Groovy plugin did make it easy to start a new project. Installation worked great, and the dependencies were automatically added to the project after I created a Groovy class.


 

Conclusion

Pragmatic: The Groovy developers made a lot of welcome improvements to Java's syntax. It does leverage existing developer skills and would integrate tightly with existing Java code. The tools need a lot of features and some fixes.

Personal: I had bipolar Java/Ruby style swings while coding Groovy. It feels like almost an even compromise between the two... almost too flexible for my personality :). I would consider using it in new projects after Groovy 1.1 (annotations + more) has been released and the tools have matured.


 

Side notes:

  • Wow, that's a lot of .class files per lines of Groovy:
    $ wc -l *.groovy
          24 Hola.groovy
          90 RssReader.groovy
         114 total
    $ echo .groovy && ls *.groovy | wc -l; echo .class && ls *.class | wc - l;
    .groovy
           2
    .class
          14
    
  • Google Trends comparison of Groovy and JRuby
  • Groovy was started in early 2004! Version 1.0 was released almost three years later. This article describes the waning and waxing of progress implementing the language.
Posted by sschroed May 22 2007, 03:37:54 PM EDT
20070515 Tuesday May 15, 2007
Quick and dirty SQL histogram

Sometimes you really want a quick & dirty histogram while looking through a database:

  • when you suspect the mean value is misleading
  • when you want to understand how the values are distributed
  • ... and easily switch between different sources of values
  • ... without exporting data & switching applications

Here are the story scores from a recent front page of reddit.com: 175, 456, 140, 191, 230, 186, 134, 215, 171, 83, 102, 171, 182, 322, 193, 310, 338, 345, 174, 134, 92, 109, 241, 256, 132

A basic statistical query returns:

+-----+-----+-----+--------+
| max | min | avg | stddev |
+-----+-----+--------------+
| 456 |  83 | 203 |   90   |
+-----+-----+-----+--------+

The standard deviation seems awfully large. Maybe not many of the scores are close to the mean score of 203? What if another query could show the distribution?

+--------+----------+--------+----------+         +--------+----------+--------+----------+
| bucket | contents | _floor | _ceiling |         | bucket | contents | _floor | _ceiling |
+--------+----------+--------+----------+         +--------+----------+--------+----------+
|      1 |        8 |     83 |      157 |         |      1 |        4 |     83 |      119 |
|      2 |       10 |    158 |      232 |         |      2 |        4 |    120 |      156 |
|      3 |        2 |    233 |      307 |         |      3 |        8 |    157 |      193 |
|      4 |        4 |    308 |      382 |         |      4 |        2 |    194 |      230 |
|      5 |        1 |    383 |      457 |         |      5 |        2 |    231 |      267 |
+--------+----------+--------+----------+         |      6 |        0 |    268 |      304 |
                                                  |      7 |        3 |    305 |      341 |
                                                  |      8 |        1 |    342 |      378 |
                                                  |      9 |        0 |    379 |      415 |
                                                  |     10 |        1 |    416 |      452 |
                                                  +--------+----------+--------+----------+

The histogram on the left has fewer, larger buckets. This is a lot more informative than the mean & stddev. The histogram on the right uses more, smaller buckets. Maybe this is too verbose? What if you wanted seven buckets?

        update dhg.bucket_count set num_buckets = 7;
        select * from dhg.results;

        +--------+----------+--------+----------+
        | bucket | contents | _floor | _ceiling |
        +--------+----------+--------+----------+
        |      1 |        7 |     83 |      135 |
        |      2 |        7 |    136 |      188 |
        |      3 |        5 |    189 |      241 |
        |      4 |        1 |    242 |      294 |
        |      5 |        4 |    295 |      347 |
        |      6 |        0 |    348 |      400 |
        |      7 |        1 |    401 |      453 |
        +--------+----------+--------+----------+

Instructions:

  1. Insert numbers to be analyzed:
  2. INSERT INTO dhg.source SELECT foo FROM bar;
  3. Choose how many buckets in the histogram
  4. UPDATE dhg.bucket_count SET num_buckets = 10;
  5. Read the results!
  6. SELECT * FROM dhg.results_full;

Materials:

All views, tables, and functions will live in a dynamic histogram (dhg) schema. The SQL is pretty minimal yet hopefully reasonably structured and commented. The MySQL flavor is larger due to an implementation of width_bucket.

WARNING: The implementation suffers from a variety of rounding errors and poor error handling. This is a quick and dirty solution for rough estimates only.

NOTE: Using more than 20 buckets will need a small tweak. Grep the SQL for empty_buckets.

I'd love to hear criticisms, comments, & suggestions!

Posted by sschroed May 15 2007, 02:57:27 PM EDT
20070510 Thursday May 10, 2007
Client vs. server: who does what in an online spreadsheet?

"Data elements" are the focus of Section 5.2.1 of Dr. Fielding's dissertation. I think the focus is who does how much work to render the data. He lists three options: mostly server, mixed client & server, and mostly client:


 Server                                                          Client
|----------------------------------------------------------------------|
      |                    |                                 |
 1. Fixed format       2. encapsulated data & code    3. raw data & metadata
    (jpeg)                 (json, javascript)             (html <img> tag?)


This is an important decision for an online spreadsheet. Should the server send only literal values (option one)? Should the client evaluate functions and resolve references? (option two)?

Option 1:


    pro: parsing cell values into a tree and traversing it is tricky. Writing it once in Java seems like a reasonable approach.
    con: client has no idea which cells are related to each other. That would really complicate features like copying and pasting cell references.

Option 2:


    con: reimplement much of cell parsing & traversal in Javascript. Using Rhino to test it outside a browser mitigates this a little.
    pro: client knows which cells are related.

Option 3:


    con: no idea how to apply this approach for a spreadsheet.

I chose option two. Eventually the client will probably need to work directly with cell references. Better to get that work out of the way ahead of time. Not all of the logic needs to be rewritten. Some of it is only needed on one side:

common:


    * build parse tree from cell values
    * iterate over parse tree

server specific:


    * update reference values: A1 => B1, $G3 => $G4
    * collapse the parse tree back into a flat string

client specific:


    * resolve reference values (A1 = 123.45, G3 = A1 = 123.45).
    * evaluate functions (=SUM(G3,A1) => 246.9).

I didn't expect so many neat challenges. Even the prototype implementation needed parsing, recursion, evaluating simple math expressions, tree traversal, base10 & base26 conversion, and more. But all that is meat for another post.

Posted by sschroed May 10 2007, 05:29:59 PM EDT
20070430 Monday April 30, 2007
NFJS in general and JRuby in specific Several of the NearInfinity developers went to the Reston stop of the No Fluff Just Stuff conference. Some of the conferences were especially interesting. It was thrilling to hear strong, detailed skepticism of SOA from the expert panel. The arms race with .Net to host other programming languages has gone wild!
  • JHaskell.
  • Sleep. Quoted from the site: Sleep is heavily inspired by Perl with bits of Objective-C thrown in
  • Dynamic Java.
  • Groovy seems like Java "lite." The groovy compiler reportedly accepts most Java source, but also permits a more flexible yet terse syntax. This gained a lot of traction at the conference. Some people complained that the name sounded too much like "Ruby." Which leads to...
  • JRuby. The overall goal is to mate the slick Ruby syntax with the mature Java VM. I'm very hesitant to seriously consider it for a few reasons:
    1. The interpreter will variously introduce bugs and lack features. This will be an ongoing issue as Ruby is an advanced, relatively young language; 2.0 is under active development.
    2. The Ruby VM will get much better over time. When it does, the argument for using the excellent Java VM will be much weaker.
  • For better or worse, I didn't see a competitor to Fortran.NET
Posted by sschroed Apr 30 2007, 06:43:14 PM EDT
20070425 Wednesday April 25, 2007
REST Network API for an online spreadsheet

Every spreadsheet application needs to support the creating, reading, updating, and deleting of sheets, columns, rows, and cells. The network protocol for an online spreadsheet could easily treat sheets, columns, etc. as resources and offer CRUD operations for manipulating them.

The requests below were hand generated and sent via telnet. The responses were copy and pasted from the console window (w/ a little pretty printing). A real frontend might use this API with XmlHttpRequest.

operation request response
Create a sheet
POST /fauxcel/finances HTTP/1.1
Host: 192.168.113.115:8080
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Length: 0
Date: Wed, 25 Apr 2007 19:26:39 GMT
    
Populate a cell
POST /fauxcel/finances/A/1 HTTP/1.1
Host: 192.168.113.115:8080
Content-Type: text/text;charset=utf-8
Content-Length: 34

{"value":"123.45", "type":"value"}
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Length: 0
Date: Wed, 25 Apr 2007 19:36:37 GMT
Insert a column
POST /fauxcel/finances/A HTTP/1.1
Host: 192.186.113.115:8080
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Length: 0
Date: Wed, 25 Apr 2007 19:36:37 GMT
Read a column
GET /fauxcel/finances/B HTTP/1.1
Host: 192.168.113.115:8080
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Transfer-Encoding: chunked
Date: Wed, 25 Apr 2007 19:39:38 GMT

53
{"name":"finances","cells": [
   {
       "col":"B",
       "row":"1",
       "type":"value",
       "value":"123.45"
   }
]}
0
Delete a row
DELETE /fauxcel/finances/1 HTTP/1.1
Host: 192.168.113.115:8080
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Length: 0
Date: Wed, 25 Apr 2007 19:41:26 GMT

Note that the back end moved the value of cell A1 into cell B1 when a column was inserted before A. More details on that in a following post!

Posted by sschroed Apr 25 2007, 04:36:00 PM EDT
20070405 Thursday April 05, 2007
REST vs. SOAP vs. POX vs. TBD I have jumped into the fray of trying to sort out REST from SOAP from POX. Here are my initial opinions: REST : Fielding's introduction [ 1 ] lays out constraints and why they are important. I was hoping the subsequent chapter [ 2 ] focusing on REST and...
Posted by sschroed Apr 05 2007, 12:57:00 PM EDT