Bryan Weber

All | General | Java | Ruby | JRuby | Groovy | Scala | Erlang | Scheme
XML
20080403 Thursday April 03, 2008
Scala Pattern Matching Scala has pattern matching... so what's the big deal? If you are a Java developer the power of pattern matching will probably be lost on you at first, but after you gain some experience with it a light will go on inside your head. And if you are a functional programmer, then you would expect Scala to have nothing less than excellent support for pattern matching since it is a new language that is partially a functional language. So what is pattern matching? I don't know a formal definition, but it does several things and I will give an example of each. But before we go any further, there are 2 concepts that work well with pattern matching (but are not required); tuples and case classes. Both of these topics have been explained in numerous other blog posts so I will just post very short summaries here, but familiarizing yourself with these topics will further your understanding of the power of pattern matching.

Tuples

A tuple is a fixed size data structure that allows data to be of different types. Scala has a convenient syntax for tuples that looks like this:
(1, 2.0, "three")
Tuples are typically used instead of Lists or Arrays when the data types of the objects are not all the same.

Case Classes

In scala a case class is a special type of class (with some restrictions) that "exports its constructor parameters". So what does that mean? It means that you conveniently create and compare instances of the class. Pattern matching will even allow for some of the values to be compared and others to be set in one operation!
case class Foo(firstData:Int, secondData:Int)

Pattern Matching

First, pattern matching is a glorified switch statement.

x match {
  case 1 =>
    println("x is 1")
  case _ =>
    println("x is not 1")
}
val x = Foo(3,5)
x match {
  case Foo(z,5) =>
    println(z)
  case _ =>
    println("we didn't have a match")
}
This says try to match the value of x and when you find a case that is true, execute the code block associated with that case. Don't forget that _ in Scala is like a wildcard character so it is like a set of all values except for the values from the previous case blocks because they were executed earlier. Scala's only advantage over Java here is that you can use objects and not just primitives or Enums for comparison.

Second, pattern matching is like built in assertions (well, sort of).

x match {
  case 0 =>
    println("x is 0")
  case 1 => 
    println("x is 1")
}
So in this contrived binary checker, we check x for a value that is either 0 or 1. So what would happen if x contained a value that was NOT 0 or 1? A scala.MatchError exception would be thrown! This allows for a clean form of defensive programming in that the developer does not have to handle all of the possible error conditions right here in the code. The developer can code the "sucess cases" and let exceptions be thrown for the exception cases and those exceptions can be handled by an error handling layer someone else in the code.

Third, pattern matching is useful for variable assignment

(assigning multiple variables on one line, assigning only certain variables, etc) This is where tuples enter the picture again.
val a = (1,2.0,"three")
val (d,e,f) = a
println(d)
println(e)
println(f)
This will produce the following results:
1
2.0
three
Additionally, you could do:
val a = (1,2.0,"three")
val (b,2.0,"three") = a
println(b)
This will print out 1 as expected. But NOTE that if the values in the second and third positions of the tuple did NOT match then an exception would have been thrown! So you could NOT do:

The following example would throw an exception!

val a = (1,2.0,"three")
val (b,2.0,"3") = a
println(b)


Many people like to think of pattern matching in this way.
  • Do all of the bound variable values match?
  • If yes, then we have a match!
  • If not, what could the code do to make the statement true? (ie assign a value to an unbound variable...)
That's it for my brief intro to pattern matching. Hopefully this helps shed some light if it is a new concept to you.
Posted by bweber Apr 03 2008, 11:06:49 PM EST
20080307 Friday March 07, 2008
Scala and Maven with maven-scala-plugin Ok, so the documentation for maven-scala-plugin isn't quite perfect. To save you some time, here is a fully functional pom.xml file that works as of March 6, 2008.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <modelVersion>4.0.0</modelVersion>
    <groupId>mvn.scala.test</groupId>
    <artifactId>mvn.scala.test</artifactId>
    <name>Maven Scala Plugin Test</name>
    <packaging>jar</packaging>
    <version>0.0.1-SNAPSHOT</version>
    <description>
		Test for Maven Scala Plugin.
    </description>
  
  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
    <repository>
        <id>jline</id>
        <name>JLine Project Repository</name>
        <url>http://jline.sourceforge.net/m2repo</url>
    </repository>
  </repositories>  
  
  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
    <dependency>
	<groupId>org.scala-lang</groupId>
	<artifactId>scala-library</artifactId>
	<version>2.6.1</version>
    </dependency>
    <dependency>
	<groupId>jline</groupId>
	<artifactId>jline</artifactId>
	<version>0.9.94</version>
    </dependency>
  </dependencies>
  
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>

    <plugins>

      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
		<version>2.4</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
	    <configuration>
	      <mainClass>org.example.HelloWorld</mainClass>
	    </configuration>
      </plugin>
	
    </plugins>

  </build>
 
</project>

Some points of interest

Ok, so the first thing you might notice is that I have to have two explicit dependencies: jline and scala itself. maven-scala-plugin will compile your code without the scala dependency, but it cannot run the classes unless the jar is provided at runtime via a dependency. Jline I determined was required by simple trial and error.

The scala-maven-plugin site's documentation is incorrect. The group id is not "scala" as they claim, it is in fact: <groupId>org.scala-tools</groupId>

The sourceDirectory and testSourceDirectory do not have to be specified if you decide to use src/main/scala and src/test/scala (at least according to their documentation) but I have chosen to explicitly state them here anyway.

So, you need to put your scala code in src/main/scala and your tests in src/test/scala. The typical HelloWorld example could be saved to src/main/scala/HelloWorld.scala which looks like:

package org.example {
  object HelloWorld extends Application {
    println "hello"
  }
}
Then from the command line run:
mvn clean compile
mvn scala:run
And voila, you should see "hello" print to the console.
Posted by bweber Mar 07 2008, 12:01:33 AM EST
20080222 Friday February 22, 2008
JVM Concurrency with Scala Actors

Concurrency in Java is a nightmare. I used to think that when I first started using threads. After getting the hang of it a little I thought maybe, just maybe it wasn't so bad. Then I saw Brian Goetz speak at No Fluff and I realized that the state of concurrency in Java is impossible for a mere mortal to comprehend. Whenever a topic becomes too difficult, I have learned to step back and look at the big picture. There must be something that I am fundamentally doing wrong. I think that this is one of those situations, only it took over a decade for most of us to figure out, while a small group of people were shaking their heads the whole time.

That small group of people are the developers who understand functional programming languages. Functional programming languages do not have shared state. This makes concurrency in them a joy when compared to Java and most OOP languages. I recently took some time to start learning Erlang (Pragmatic Studio, very well done Joe and Dave!) and I was very impressed with what I saw. I really loved some of the aspects of Erlang. It is well known for having concurrency baked into its DNA, but I was impressed with several other things (things I had seen before, but just felt so natural in Erlang). Higher order functions, list comprehensions, hot deployment, and most importantly pattern matching. This isn't an blog about any of those topics however. This is about concurrency and in the title I mentioned concurrency on the JVM.

One of the things that I did NOT like about Erlang was that the sequential programming features seemed incomplete to me coming from an OOP world. Not to mention commas, semi-colons, periods and nothing... and when to use which! I remember wishing in class that Erlang ran on the JVM so I could use it for all my concurrency needs and call out to Java/JRuby/Groovy/SISC/etc for the meat of my application logic. Using the xmerl library (I know that there is a better one out there, but c'mon no xml library should be this difficult) made me really long for something better. So once again I took a step back and tried to look at the big picture.

Before going to the Erlang Pragmatic Studio I had started to read up on Scala (which I find difficult because I find the documentation to be very sparse and fairly poor) and Scala's actor library in particular. I admit that when I first looked at it I didn't fully get it. Learning Erlang helped cement some concepts that make looking at the Actor library much, much simpler. While I still prefer the Erlang way of receiving messages in particular I found the Scala Actor library to be decent. It is also worth nothing that the Scala actor library works with threads or with processes (actors, not OS processes). To me, the whole reason to use the actor library is to NOT be using threads, so I highly recommend the event based actors.

Actor concepts

In an actor based concurrency model, there are 3 fundamental concurrency primitives. First, there is spawn. Spawn creates a new actor which has a mailbox where it can receive messages. In Erlang spawn spawns functions and in Scala the equivalent is creating a new Actor and starting it. (Think creating a new thread and starting it). The second primitive is send (!). This is pretty similar in both Erlang and Scala. receiver ! message means send message "message" to "receiver". Scala actually has some additional methods for sending messages but we won't cover them here. Finaly there is receiving messages where a message is pulled from the mailbox if it matches a pattern and can be acted upon. Erlang blows away Scala on pattern matching from what I can tell, but I am admittedly not an expert on Scala pattern matching.

So some important topics to discuss now: What is an Actor, immutable state and pattern matching, and process linking.

What is an actor?

An actor is like a thread but without shared state. I sometimes refer to it as an programming language process because an actor is a process, but not an operating system process. It is a process that is managed by the runtime process and therefore must be very lightweight. Erlang has spent a lot of time getting processes very light weight. Erlang doesn't use the term actor, it just has processes, but they are the same thing. Each process or actor has a mailbox where it receives messages from other processes. Messages are the ONLY way that processes can communicate since they have no shared state.

Immutable state and pattern matching

Immutable state is key to functional programming languages as they are intended to have no side effects and immutable state means that no function can change the value of something (a variable) and thus introduce a side effect. Pattern matching in Erlang is brilliant. It matches the left side and the right side of the =. Conceptually there are 3 things that can happen here. If the left is unbound it will be bound with the value from the right side. This looks like variable assignment from OOP. If the left side is bound it must match the value from the right side or an exception will be thrown. This is because variables are immutable and they cannot be "assigned" another value. Finally, if the left side has a partial match it can assign values to unbound variables that match the pattern. A tuple is the simple way of understanding this.

A = "a",
B = "b",
{A,C} = {A,B}
This will match because the left side and right side are tuples of the same size and none of the matching values would change state. Since A equals A we are OK. And since C is unbound it will be assigned the value from B.
A = "a",
B = "b",
{A,B} = {A,C}
This would not match because C is unbound. Likewise, the following would not match because the values of B and C do not match:
A = "a",
B = "b",
C = "c",
{A,C} = {A,B}

Fault tolerance

So if you have heard of Erlang, you have probably heard that it makes concurrency and fault tolerance easy. We've looked at the concurrency primitives, but what about the fault tolerance primitives? They are link, unlink and process_flag. These are conceptually simple. If a process A links to process B then they are linked. If one dies then it will send a signal to all the processes that it is linked to prior to its own death. The linked processes can be system processes depending on whether or not they set their process_flag. If they did then they can trap exit messages from dying processes, if not then this process will die as well. So as you can probably imagine it is easy to create graphs of processes that will die when certain failures occur and that can be restarted or resurrected by some system processes. Scala's actors have these concepts as well. I won't go into any more detail about them here, but if you read the links at the bottom of this blog you can find some additional information in the Scala actor api.

So now that I have given a very long winded ill-explained description of actors, let's look at a real example in Erlang and in Scala:

Erlang

-module (blog).
-compile(export_all).
			
client(Pid) ->
	Pid ! {self(),request,foo},
	receive 
		{Pid,response,Response} ->
			io:format("got response ~p~n",[Response]),
			Pid ! exit
	end,
	io:format("client done~n").
		
server() ->
	receive
		{From,request,Request} ->
			io:format("got request ~p~n", [Request]),
			sleep(1000),
			From ! {self(),response,bar},
			server();
		_ ->
			io:format("server done~n")
	end.
	
sleep(Time) ->
	receive
	after Time -> void
  end.

foo() ->
	Pid = spawn(fun server/0),
	spawn(fun() -> client(Pid) end),
	exit.

Scala

import scala.actors.Actor
import scala.actors.Actor._
import scala.actors.OutputChannel

case class Request(data:Object)
case class Response(data:Object)
case object Exit

class Client(server:Actor) extends Actor {
  def act() {
    server ! new Request("foo")
    react {
      case Response(data) => 
        Console.println("got response " + data)
        server ! Exit
		println("client done")
        exit()
    }
  }
}

class Server() extends Actor with Sleeper {
  def act() {
    loop {
      react {
        case Request(data) =>
		  println("Got request " + data)
          sleep(1000,(sender:OutputChannel[Any]) => {
			sender ! new Response("bar")
		  },sender) 
        case _ =>
		  println("server done")
          exit()
      }
    }
  }
}

trait Sleeper {
  def sleep(time:Long,fun:(OutputChannel[Any])=>Unit,sender:OutputChannel[Any]) {
    reactWithin(time) {
      case _ =>
        fun(sender)
    }
  }
}

object Foo extends Application {
  var s = new Server()
  var c = new Client(s)
  s.start
  c.start
  println("exit")
}

So what does it do?

This code spawns two actors, 1 as a client and 1 as a server. The server listens for request messages and responds with response messages after sleeping for 1 second. If a message comes in that is not a request message, then the server exits. The client sends a request message to the server and then listens for the response. Once it receives its response it sends a message to the server that is not a request as a shutdown command. Not particularly interesting, but it does demonstrate some of the key differences between Erlang's processes and Scala's actors. One of the biggest differences is that a receive in Scala is a method that does not return. This means that any logic must be present in the receive and all code after the receive will never be executed. Erlang can pattern match on atoms and tuples and Scala pattern matches on case classes. One of Erlang's strong points is distributed code. In Erlang, when a process is spawned there can be no knowledge of whether or not that process is running on the same machine or remotely. It simply doesn't matter. Scala has remote actors as its equivalent, but the things I've covered here so far are really for replacing threads within a single JVM. The concepts don't change and maybe that will be the topic of a future blog post.

Resources:

Update

If you did not need the sleep function, you could use this shorthand notation to define the actors. Notice that the server actor is assigned to a variable that the client actor uses.

import scala.actors.Actor
import scala.actors.Actor._

case class Request(data:Any)
case class Response(data:Any)

object Foo extends Application {

	actor {
		server ! new Request("foo")
		react {
		  case Response(data) => 
		    Console.println("got response " + data)
		    server ! "Exit"
		    exit()
		}
	}

	var server = actor {
	    loop {
	      react {
	        case Request(data) => 
	          Console.println("got request " + data)
	          sender ! new Response("bar")
	        case _ =>
	   		  println("exiting server")
	          exit()
	      }
	    }
	}

}

Posted by bweber Feb 22 2008, 11:43:32 PM EST
20080124 Thursday January 24, 2008
Generated code... So most developers, especially Java developers because of its verbose syntax, have played around with generated code at one point or another. For those of you who haven't, it is a very, very, very easy concept to understand.

The most pathetic diagram you will ever see

Inputs -> Generator -> Generated

Some inputs, a template and some values for example, are passed into a generator and it generates some output. This could really be anything; XML, source files, view components, etc. One of the first rules of generators is do not modify the generated content. Why you ask? Playing devil's advocate, I could say that once the material is generated I won't ever need to generate again. Any changes to the material I make will already be present so they will build on top of each other. In some, possibly even many, cases this is very true. However, there are times when you will want to regenerate something or to generate another object that may require many of the same changes that you already made to previously generated objects. All of those changes have to be made to the generated material one at a time because the generator does not support them.

Well, OK, it may be acceptable to modify generated content when you are testing something out because it might be much faster than going through the entire generation process again. But beware, your changes are not repeatable and will be lost as soon as the generator runs again. So the general rule of thumb is, modify the inputs, not the generated material.

Let's introduce one additional component, some sort of runtime environment.

Pathetic diagram + Runtime

Inputs -> Generator -> Generated  <= Runtime


This pathetic chain symbolizes the same thing, but at the end of the process some "runtime" uses the generated objects. So, why would you ever modify the generated material aside from some quick step to verify that the generated material was ok in the context of the runtime? Well, its probably not a good idea. You are much better off modifying the inputs, thus creating a repeatable process.

Now, let's introduce a wrinkle to all this lovely over-simplification. What if the generator or runtime needs to be updated and is not fully backwards compatible? Well, if the runtime is updated then the generated objects have to be updated to work with the new runtime. If you modified the generated objects then you will either have to continue down this path or you will have to modify your inputs to update your generated objects with not only the new changes required by the updated runtime but also the changes you made prior to the upgrade.

If the generator is updated, then we have to change the inputs. Any changes that were made to the generated objects will be lost.

So here is the important question. Why in the world am I harping on all this hypothetical drivel? Well, because if you are using Rails or Grails as your web framework then most likely you are doing exactly what I am suggesting is a bad idea. Readers familiar with g/rails might say that this is not true because of the scaffolding support offered in both frameworks. But this leads to my exact point. Using scaffolding is good, but it only takes you so far. At some point you have to move beyond what you get out of the box. At this point, users are expected to go modify the generated material and hope that they never need to generate again. To me, using generators to get started is a good idea, but once you move away from the scaffolding you have crossed the point of no return.

So what can be done to prevent modifying generated code? Well, just like I suggested in my verbose drivel above, modify the inputs! In grails, for example, this means providing your changes not to the generated content, but in plug-ins that determine how your content is generated. Now, I don't believe that grails plug-ins are robust enough at the moment (I could be wrong about this) to really make this feasible, but I would like to see g/rails adopt this type of pattern. Both frameworks have strong runtime environments and very good generators. But let's change the focus from modifying the generated content to modifying the inputs so we have a repeatable code generation framework.
Posted by bweber Jan 24 2008, 10:39:02 PM EST
20080123 Wednesday January 23, 2008
No! Bad Groovy! Argh! Bad groovy! Bad groovy!
Groovy was designed to be tightly integrated with Java. JRuby was written to be an implementation of Ruby that runs on the JVM. But groovy has gone one step too far. It has incorporated one of Java's worst features! In Java, primitives do not throw exceptions for operations that are not within the range of the data type. That is to say, if you add 1 to Integer.MAX_VALUE you end up with Integer.MIN_VALUE! This is of course not only confusing but mathematically incorrect. This is the type of thing that should be taken care of at the programming language level in my humble opinion. So let's compare how Groovy and JRuby handle this simple calculation.
Program:
i = Integer.MAX_VALUE
j = i + 1
println j
println j.class

k = 12345678901234567890
l = k + 1
println l
println l.class
Output:
-2147483648 
class java.lang.Integer 
12345678901234567891 
class java.math.BigInteger
JRuby code:
require "java"

i = java.lang.Integer::MAX_VALUE
j = i + 1
puts j
puts j.class

k = 12345678901234567890
l = k + 1
puts l
puts l.class
JRuby output:
2147483648
Fixnum
12345678901234567891
Bignum
I like Groovy a lot because of its tight integration with Java, but this is something that Groovy missed big time. Breaking away from tight Java integration would have been perfectly fine in this case to provide results that any sane person would expect, ie the one that is mathematically correct.
Posted by bweber Jan 23 2008, 08:39:54 AM EST
20080109 Wednesday January 09, 2008
Creating data in groovy with builders

We often have the need to create data. We use data for integration tests, to populate database tables in production for releases, and many, many other reasons. This article talks about one way to get data into your database in a way should allow for it to be modified without too much trouble when those changes come your way in the not so distant future...

Is this even a good idea?

Something that occurred to be recently is that builders, as implemented in Groovy (or JRuby for that matter), might be a good way to create and manage test data for some Java unit tests. But I wasn't really sure, so I set out to determine if in fact it was a good idea or not. If you aren't familiar with builders, read Builders. For better information, read Manning's Groovy in Action. From now on, I will assume you are already familiar with builders. If you are too lazy to follow the links, builders lend themselves to displaying tree based data structures well because the code is in a form that visually represents the data structure. It does this by using some neat meta-programming tricks and closures. If you aren't familiar with meta-programming or closures you might want to read about them first as well. Wikipedia probably has better information on closures. [Wikipedia Closures]

What about XML?

In the past I have used xml to manage data for unit tests and in theory this means that Java objects or XSLT can be used to modify the data. However, in practice, while either is adequate for modifying test data, neither is particularly simple or elegant. Adding columns, modifying relationships, etc is typically not a trivial task when the data is in xml. Let's be honest (and I can't believe I'm about to put this in writing), for test data, I want to modify it in a Spreadsheet (Excel or Open Office's Spreadsheet or iWork Numbers). Why? Well, because adding and removing columns is simple, macros are supported, copy and paste is simple, etc. So why am I bringing this up? Well, how about because data in a builder (a tree) can be exported to csv with very little code, modified and imported back into the builder with minimal code. So why not store the data in csv format then? I cannot think of any compelling reasons not to as a matter of fact, especially if you were to use the same data for tests in different languages! But for now, I am keeping my data in groovy code, code that I believe is very readable. (To be fair, you could go from xml to csv and back again, but reading and manipulating xml is more difficult than reading the same data in the builder tree. I have included the code below that goes from a builder to csv and back. I challenge you to write code that does the same for xml. It might be possible, but I believe that most, if not all, people would have to write more code to achieve the same result. If you decide to keep your data in csv and not groovy then it might be a perfectly acceptable alternative to use xml, I'm not knocking it, I just think it can be done in a simpler way.)

Trees vs. Graphs

Your first thought might be, well data for unit tests is probably an object graph, not a tree! The data is most likely going to go into a RDBMS after all. While this is true that the data is a graph, it can be represented as a tree(s). A node in the tree can point to another node in the tree, not just one of its parent nodes. Think xml refs if you are having a tough time following the idea. The problem is that unlike xml refs there is no check to verify that the other end of the connection exists. Therefore, I suggest loading the data into a RDBMS and let it do what it is good at, enforcing the relational integrity. For testing, I suggest an in memory database like HSQL. As an aside, dividing the data into logical groups is also a good idea. (This can be done regardless of whether you use a tree or a graph.) It allows us to manage logical data sets independently. For example, I might have users and roles which are kept separate from office locations.

Referential Integrity

Pointers to data in other tree nodes are not referential so data integrity is not ensured. Again, you are probably thinking, it sounds like I am trying to convince you that builders are NOT appropriate for creating and managing test data, but wait... the database can do it and do it well. So why not let the db do that work for you? Since the data itself does not guarantee the referential integrity you do not get static analysis time protection. However, you can create unit tests that insert your data into an in memory data base and this will ensure that your data has its referential integrity. Assuming of course that you run your unit tests and that you have foreign key relationships!

Data Builders and Data Persisters

So, what does this code look like? We'll get to it shortly. But first, let's talk about what the "real" code will do? Well, there are 2 primary things for each set of data.

Data Builders

First, there is the data builder which is our tree of data. This is a groovy file that contains our actual data values. An example:
    def getData() {
        new NodeBuilder().users {
            user(user_id: "123", first_name: "Joe", last_name: "Smith") {
                address(address_type: "home", street: "123 Main St.", city: "Springfield", state: "MA", zip: "12345")
                address(address_type: "work", street: "456 South St.", city: "Boston", state: "MA", zip: "98765")
            }
            user(user_id: "456", first_name: "John", last_name: "Doe")
            user(user_id: "789", first_name: "Jane", last_name: "Doe")
        }
    }

Data Persisters

Then there is our persister, which in our case uses GPath (similar to XPath, but walks Groovy objects) to pull data out of the builder and persist it. In my case that means populating Hibernate objects and then saving the hibernate objects.
    def persist(tree) {
        tree.grep() { 
          User user = new User()
          user.userId = it.@user_id
          user.firstName = it.@first_name
          user.lastName = it.@last_name
          it.address.grep() { address ->
            Address addr = new Address()
            addr.addressType = address.@address_type
            addr.street = address.@street
            addr.city = address.@city
            addr.state = address.@state
            addr.zip = address.@zip
            user.addresses.add(addr)
          }
          userDao.save(user)
        }
    }

We separate these objects so that multiple data sets can be used depending on the environment or test suite. In other words, it is good to keep the data separate from the code that will manipulate the data.

The data

As an example, let's suppose we want to add some users to our database. Let's assume that there are 3 users and the first user has 2 addresses. The tree might look something like this in csv format:

users
,user,user_id,123,first_name,Joe,last_name,Smith
,,address,address_type,home,street,123 Main St.,city,Springfield,state,MA,zip,12345
,,address,address_type,work,street,456 South St.,city,Boston,state,MA,zip,98765
,user,user_id,456,first_name,John,last_name,Doe
,user,user_id,789,first_name,Jane,last_name,Doe
What might this data look like in xml?
  <users>
    <user user_id="123" first_name="Joe" last_name="Smith">
      <address address_type="home" street="123 Main St." city="Springfield" state="MA" zip="12345" />
      <address address_type="work" street="456 South St." city="Boston" state="MA" zip="98765" />
    </user>
    <user user_id="456" first_name="John" last_name="Doe" />
    <user user_id="789" first_name="Jane" last_name="Doe" />
  </users>
And in groovy?
users {
  user(user_id:"123",first_name:"Joe",last_name:"Smith") {
    address(address_type:"home",street:"123 Main St.",city:"Springfield",state:"MA",zip:"12345") 
    address(address_type:"work",street:"456 South St.",city:"Boston",state:"MA",zip:"98765")
  }
  user(user_id:"456",first_name:"John",last_name:"Doe")
  user(user_id:"789",first_name:"Jane",last_name:"Doe")
}

Which do you find to be the most readable? Some will say xml, most will say groovy. But more importantly, the impedance is zero for groovy because it is already groovy code and no transformation is necessary. Whereas, if we have xml some transformation is necessary.

Let's suppose that we also have a Role table for security
roles
,role,role_type,admin,description,System Administrator
,role,role_type,data_entry,description,Data Entry
What if we wanted to have a reference from a user to a role or vice versa? Imagine something like this snippet:
user,user_id,123,first_name,Joe,last_name,Smith
,role,admin
,role,data_entry

Getting back to an earlier point, seeing this should probably scare you a little. The string (think symbol if you like ruby or atom if you like many other programming languages) 'data_entry' is entered in two places so if it is modified in one place our structure breaks down. Well, with foreign key constraints in your db this should be solved for MOST (but not all) cases.

Isn't it about time for some code?

Code to convert a Builder to csv:

    def print_csv(node,indent) {
        def result = ""
        result += "," * indent
        result += "${node.name()},"
        node.attributes().each { attribute ->
            result += "${attribute.key},${attribute.value},"
        }
        result += "\n"

        node.grep() { child ->
            result += print_csv(child,indent+1)
        }
        result
    }

Code to convert csv to a Builder:

    def static read_csv(str) {
        def lasts = []
        def root
        str.split('\n').each {line ->
           def arr = line.split(',')
           def depth = 0
           // determine depth
           while ( arr[0] == "" ) {
               arr = arr[1..<arr.size()]
               depth++
           }
           // determine parent node
           def parent
           if ( depth == 0 )
             parent = null
           else
             parent = lasts[depth -1]
           // create new node
           Node node = new Node(parent,arr[0])
           // is this our root?
           if ( lasts.isEmpty() ) {
               root = node
           }
           lasts[depth] = node
           // process attributes
           def key = null
           arr[1..<arr.size()].each {
               if ( key == null ) {
                   key = it
               }
               else {
                   node.attributes().put(key,it)
                   key = null
               }
           }
        }
        root
    }    

Conclusion

So is keeping data in this format for unit tests a good idea or not? That's for you to decide!

Footnote

PS After writing this code, I became aware of the ObjectGraphBuilder in Groovy, which allows for relationships between nodes in the tree, but only to parent nodes. Since we are interested in relationships to nodes in other branches or other trees I haven't seen a compelling reason to switch to the other Builder implementation yet.

Posted by bweber Jan 09 2008, 10:24:53 PM EST
20071105 Monday November 05, 2007
RubyConf 2007 Day 3 Highlights of day 3 for me:
  • Adhearsion
  • RSpec
  • Solr
  • Metasploit
Adhearsion

VOIP library built on top of Asterisk (which sounds awful to have to use). Demos were cool. Test coverage was ... low to put it mildly. :)

RSpec

RSpec unit testing is fine, demo of ping pong development was simple, but the merge of another test framework (Fit I think? I can't recall for sure.) into the soon to be released RSpec version was simply awesome. Hats off to the team for getting me interested in testing again for the first time in quite some while. Basically, they've designed a DSL for BDD and a really cool looking UI. I've read a fair bit before on BDD but never really saw what distinguished it from TDD to make me investigate it further. I'm actually looking forward to the new version of RSpec being released now so I can start testing RSpec BDD style. There are 3 ways that the new framework can be used: pure ruby, dsl and I forget... (probably some variation of the first 2). The dsl version basically split the description part into a dsl (in theory to be used by "business users", which would almost never happen in practice) and the implementation part was still in ruby. Ryan D. had a problem with the dsl version because it was too far away from ruby syntax, but no one in the crowd backed him up. In fact, I think it is a positive. I'd like to see JBehave and other BDD frameworks adopt the dsl so that tests could be written for any scenarios regardless of the implementation language!

Solr

So its been a little while since I last used Lucene, but I really liked this presentation and I thought it was one of the most relevant talks. Solr basically exposes search via http. I have some questions about using solr for federated searches which I'll be investigating further myself, but I'm sure that a significant percentage of projects have the need for something at least similar to solr.

Metasploit

OK, I've seen metasploit before, but its been ported from perl to ruby (the part of it that was in perl anyway)... so now its assembly, c and ruby. And let's just say that attaching irb to running processes on remote boxes is way too easy and way cool.

Honorable mention

Justin's talk on identity (OpenID and CAS) was very practical and useful and rubigen might come in handy some day, but the presentation had too much video and not enough let's roll up our sleeves and write some code examples.
Posted by bweber Nov 05 2007, 10:58:26 PM EST
20071103 Saturday November 03, 2007
RubyConf 2007 Day 2 Highlights of day 2 for me:
  • IronRuby, JRuby and Rubinius
  • Mac OS X Loves Ruby
  • Matz Keynote

MRI (CRI) vs. IronRuby vs. JRuby vs. Rubinius vs. YARV

IronRuby, JRuby and Rubinius presentations opened the day. Nothing too new here. IronRuby is the port of Ruby to .NET. It has the farthest to go. JRuby is pretty far along. Rubinius contends that ports shouldn't be to .NET or the JVM, but to ruby itself, at least for as much of the runtime and kernel (read standard libraries) as possible. IMHO this would make the jobs of the IronRuby and JRuby teams much easier, but unfortuantely, Rubinius is not complete so they cannot build on top of Rubinius. YARV is the much anticipated Ruby 1.9 virtual machine that promises significant performance improvements, but it wasn't really discussed in detail at this point.

Mac OS X Loves Ruby

Focused on Leopard. In particular focused on RubyCocoa and DTRACE. The RubyCocoa examples were visually cool. One used Ruby to open a GUI and make it read a message out loud. XCode was used as the ruby editor, which it appears Apple is trying to coerce developers into using. The GUI was built using drag and drop in XCode and the ruby code was tied to GUI components by clicking and dragging in XCode. Another example used a script to attach ruby to a running process (TextEdit) and manipulate the process (resize, change window name, change editor text) real time in irb. This example was particularly interesting and grabbed my attention from a security standpoint. While this stuff was cool, I don't write software only for OS X so I won't be touching any of it. DTRACE is for debugging operating system calls by applications and it is included with Ruby on Leopard.

Matz Keynote

Focused mainly on Ruby 1.9 and 2.0 features. 1.9 is a transitional release that breaks compatibility with 1.8.6 syntax. Ruby 1.9 will be released before the holiday season, ie end of 2007, but it seems like it might be a historical footnote. In fact, Ruby 1.8.6 will still be the production stable release. Ruby 1.9 will switch from green threads to operating system threads. 1.9 introduces (somewhat controversially) parameters after optional parameters. Matz cleared up the air by stating that this was a transitional stage on the way to named parameters which will be added at some point in the future.

Oh, and everyone in attendance is aware that tonight is daylight savings time.
Posted by bweber Nov 03 2007, 08:43:25 PM EST
20071102 Friday November 02, 2007
RubyConf 2007 Day 1 Day 1 of RubyConf 2007 is complete. Highlights of the day for me were:
  • Jim Weirich's presentation on Advanced Ruby Class Design
  • Nathan Sobo's presentation on Treetop
  • Ryan Davis' presentation on Hurting Code for Fun and Profit

Advanced Ruby Class Design

Well done presentation specifically tailored to developers coming from Java or C#. Included examples from some of the most mature ruby libraries around. Jim wanted to present examples that were fun and interesting and he delivered.

Treetop

I had to choose between this presentation and ropes. I'd still like to learn more about ropes someday, but I'm really glad I went to this presentation. Treetop isn't for everyeone, but I love the ideas behind Treetop and I can't wait until it is slightly more mature (and [better] documented). I've already downloaded the gem (and facets as it is required) and played with Treetop a little. My first impression is that it does not work in JRuby 1.0.1 at all which is unfortunate. I'm not sure if this is a JRuby bug, a problem with something in facets or a problem with Treetop itself. UPDATE Recently released JRuby version 1.1b1 fixes the problem so it was a JRuby bug. Hopefully Treetop will be replacing Gold Parser Builder and antlr in my mythical developers toolbox soon. This project embodies the ruby community for me, it is approaching a very old problem in a very new way (admittedly Nathan isn't the idea originator, but who is doing parsing this way in the Java/.NET world?) and a very ruby way.

Hurting Code for Fun and Profit

A non-technical presentation, but very entertaining and relevant presentation nonetheless. I almost passed on this presentation because I thought the presentation was oddly named and the extract contained words like "ascetic" which I did not know the meaning of. And the reference to "for fund and profit" ("Smashing the stack for fun and profit" being the reference, at least, that's the first reference to "for fun and profit" that I'm aware of) didn't really seem to make sense either. So what exactly is hurting code? Well, it turns out that it is hurting (refactoring/re-writing/modifying) code that you do not like instead of hurting the offending developer(s). Code that "you do not like" is very open to interpretation and Ryan basically defined it as code that doesn't sit well with you (instinctual) as opposed to some concise algorithmic approach. The presentation did mention a couple of tools (all developed by Ryan I believe) that could potentially help detect code that probably should not sit well with you, but they were not really the focus of the presentation.

Looking forward to day 2!
Posted by bweber Nov 02 2007, 08:49:41 PM EST
20071023 Tuesday October 23, 2007
Writing testable Java code with stateless collaborators Writing testable Java code is an often discussed topic that has been covered ad nauseam. So what could another blog post possibly contribute? Well, I've been following a simple pattern recently and it makes for some very simple, clean (imho) and testable code. I never call a method that resides within the calling class. Ok, so maybe never is too strong, but every time you call a method within the same class you violate the principle of isolation, which is key to unit testing. Why does calling a method from within the same class violate the principle of isolation? Because the methods are tightly coupled and it is no longer possible to test the calling method without also testing the called method. So what is the answer? Well, one way is to use collaborators. The objects can be used, stubbed or mocked for unit tests and the calling and called objects can be tested separately.

So what are some arguments against this type of pattern? Well, many people complain about the following points: (1) the code is no longer as readable, (2) the visibility of the code is changed, (3) the potential explosion of classes, and (4) the logic of an "Object" is no longer contained within the class. Let's discuss these points one at a time. Code samples are included below for a side-by-side comparison so read the code and think about it as you go through the four points.

1) The code is no longer as readable. This may be the point with the most merit. Instead of reading the code and simple scrolling up or down to continue reading the reader must now find another class. Having a method name on the called class that is descriptive often makes the code just as readable, it simply requires the reader to go to another place to read the details of the implementation. IDE's alleviate this by allowing users to jump to other classes/methods.

2) The visibility of the code is changed. Instead of having a private or protected method inside of the class there is now an often times public method on another interface/class. However, the scope of the collaborator can be controlled inside the class so in my opinion the benefit of testability outweighs the changing of scope from method to field.

3) The potential explosion of classes. Methods can still be grouped together on collaborators by related functionality, but having a few extra classes is an acceptable trade-off for more testable code. Classes often tend to be simpler with this methodology. This can become an argument of preference between fewer, more complex classes or more classes that are simpler.

4) The logic of an object is no longer contained with the class. But it can be contained within a small number of classes in the same package and the classes tend to be simpler and classes tend to be even more well defined in terms of what they do.

There probably is no right or wrong answer here, to a degree it is a matter of preference, however, I feel pretty strongly that having more testable code without any major drawbacks is worth adopting this pattern. This pattern works particularly well when the called code is a stateless service that can be injected by an IOC container, such as Spring.

Let's look at samples side by side.

# All the logic in one class (NOT easily testable because bar must be tested with foo)

public class Foobar {
  public void foo() {
    for ( int i = 0; i < 10; i++ ) {
      bar();
    }
  }

  public void bar() {
    // do bar
  }
}
# Using collaborator (foo and bar can be tested completely independent of each other)

public class Foo {
  private Bar bar;

  public void setBar(Bar bar) {
    this.bar = bar;
  }

  public void foo() {
    for ( int i = 0; i < 10; i++ ) {
      bar.bar();
    }
  }
}

public class Bar {
  public void bar() {
    // do bar
  }
}
Posted by bweber Oct 23 2007, 12:03:38 AM EDT
20070730 Monday July 30, 2007
JRuby script in a signed jar OK, let's dive into the world of JRuby a little further, specifically let's touch on the boundary between Java and JRuby again. One excellent way to run JRuby for certain situations is to put the JRuby code (along with some Java code) in a signed, executable jar. A fairly common reason to do this would be the following: you want to run some ruby code (in the JVM) and you want to make sure that no one modifies the contents of the JRuby file (thus you sign the jar). Of course this does not offer you perfect protection, but it adds another layer making it more difficult to subvert your efforts. For the time being, I will assume that you have JDK 1.5 and not JDK 1.6... which means that the scripting framework is not available to you. So, simply include the JRuby jar file in your classpath and jar up the following class along with your ruby files. You should be sure to sign the jar when you create it and you should add pkg.JRubyRunner as your main class so the jar file is executable. Then executing "java -cp jruby.jar -jar MyJar.jar" will run your jruby code AND since the jar file is signed no one will be able to modify your ruby files! (Again, with some effort of course a hacker could modify the ruby file if they had access to the jar.)
package pkg;

import org.jruby.Main;

public class JRubyRunner {
  public static void main(String[] args) throws Exception {
    runJRubyScript("ipc.rb", args);
  }

  public static void runJRubyScript(String name, String... args) {
    // modify the args for jruby
    String[] args2 = new String[2 + args.length];
    if (args.length > 0) {
      System.arraycopy(args, 0, args2, 2, args.length);
    }
    args2[0] = "-e";
    args2[1] = "require '" + name + "'";
    // execute the ruby script
    Main.main(args2);
  }
}

Now, for a few subtle points that you may have missed, especially considering the simplicity of the code. For starters, why you may ask, not just pass in the name of the ruby file? Why use the -e option at all? Well, the reason is because the ruby file is located in the jar file so it is not accessible via java.io.File which JRuby uses to load the file. This makes perfect sense. And since it defeats the purpose to move the ruby file outside of the signed jar, we must find another way to get this to work. You could read in the entire contents of the file from the classpath and pass it into the -e option, but this is ugly and extremely prone to error, so we take advantage of an excellent feature of 'require' in JRuby. Require loads its files from the classpath. This is excellent, because it means that if we require just the file we want to run, it will be loaded from the classpath and and subsequent requires will be loaded from the classpath as well.
Posted by bweber Jul 30 2007, 09:28:15 PM EDT
20070704 Wednesday July 04, 2007
Configuring JRuby in IntelliJ IDEA 6.0.5 So you know Java, you've played around with Ruby and now you are interested in trying out JRuby in your favorite IDE (which naturally is IntelliJ IDEA) or maybe you just want to try out Ruby/JRuby... Unfortunately, JetBrains is still a little behind with their Ruby, and especially JRuby, support for IntelliJ. But have no fear, just follow the steps below and you'll be up and running without too much trouble.

[Legal disclaimer: This was tested on Mac OS X with Java 1.5, JRuby 1.0 and IntelliJ 6.0.5, but should work on any Windows or *nix based system with IntelliJ 6.0.x and the JRuby 0.1 plugin.]

[Legal disclaimer: The screen shots in these instructions are not actually perfect or exactly what you will see at all stages of the process, however they should contain all the information required.]

  1. Install Ruby Plugin for IntelliJ Instructions (Leaves this page) [Don't look for a JRuby plugin, install the Ruby plugin.]
  2. Download and Install JRuby (unzip to the desired installation directory) Instructions (Leaves this page)
  3. [UPDATE: I PUT IN A FEATURE REQUEST AND JETBRAINS INCLUDED IT SO NOW THERE IS SUCH A THING AS A JRUBY SDK SO THIS STEP IS NO LONGER REQUIRED! YEAH JETBRAINS!!!!] Create a Ruby link (*nix) or bat file (Windows) [UPDATE: Creating a bat file named ruby.bat does NOT work. IntelliJ looks for ruby.exe which makes this trick more of a pain... On Windows it is probably simpler to create an External Tool for JRuby right now. That way you can right click on a file and execute it and it doesn't require any fancy hacks. Instructions for creating/configuring the "external tool" are below.] that points to the JRuby executable [The IntelliJ plugin requires the executable file be called "ruby" and not "jruby" so we simply trick it by creating a file that points to the real JRuby executable. Hopefully JetBrains will change this for future releases.]
    • *nix soft link: ln -s jruby ruby
  4. Create a project with a Ruby module (note: create a Ruby module NOT a JRuby module as you won't find any such thing as a JRuby module. Nor do you need one for that matter, as the Ruby module will do just fine thank you very much. I suppose that JRuby wouldn't be very good if it couldn't just replace Ruby as that is what it is designed to do after all! Well, sort of anyway.)


  5. Point IntelliJ to your JRuby SDK



So now you can execute Ruby files using JRuby as your runtime in IntelliJ. But what if you want to call out to some Java code? In JRuby you have to manage the classpath if you want to call out to java classes. You can do this by setting the $CLASSPATH environment variable or by using the $CLASSPATH global variable. (which is only accessible after you have executed the " require 'java' " statement) Unfortunately, IntelliJ does not set either value when including a Java module. To be fair, the latter case wouldn't really be possible because of the sequencing that is required, but hopefully they do add support so that in the future the $CLASSPATH environment variable is set automatically for you so classpath dependencies can be managed by the IDE just like for other projects/modules.

The EASY CASE (Ruby modules only)

  1. Create java_cp file and enter classpath values [NOTE: Classpath values must end with a trailing slash due to the implementation of the java module in JRuby.]

  2. Include java_cp after java and before including any classes

  3. Run files by right clicking on the file and choosing Run or by selecting the file and hitting the Run shortcut key sequence.

The HARDER CASE (Ruby modules and Java modules in the same project)

  1. Change the ruby project SDK to Java (Otherwise your Ruby module will be fine but your Java modules will not work due to a bug in the Plugin.)

    See, I told you, you get an irrelevant, nasty error message!

  2. Create a JRuby External Tool (Carefully note the parameters and working directory used. Notice that I made the working directory the lib directory where I put my source file(s). This screen shot is inconsistent with the other screen shots taken because it came from another project, so just take my word for it. :))

  3. Run your Ruby files by selecting the file and choosing Tools -> JRuby.

So, there currently is not a perfect solution for running JRuby in IntelliJ IDEA, however these few steps should get you up and running in a minimal amount of time. In the simple case there are only really two steps that are required that wouldn't be present if you were just using the Ruby plugin itself: creating the link to the executable file and managing the classpath manually (in this case via an included file jruby_cp).

Posted by bweber Jul 04 2007, 10:20:46 PM EDT