Concurrency in Java is a nightmare. I used to think that when I first started using threads. After getting the hang of it a little I thought maybe, just maybe it wasn't so bad. Then I saw Brian Goetz speak at No Fluff and I realized that the state of concurrency in Java is impossible for a mere mortal to comprehend. Whenever a topic becomes too difficult, I have learned to step back and look at the big picture. There must be something that I am fundamentally doing wrong. I think that this is one of those situations, only it took over a decade for most of us to figure out, while a small group of people were shaking their heads the whole time.
That small group of people are the developers who understand functional programming languages. Functional programming languages do not have shared state. This makes concurrency in them a joy when compared to Java and most OOP languages. I recently took some time to start learning Erlang (Pragmatic Studio, very well done Joe and Dave!) and I was very impressed with what I saw. I really loved some of the aspects of Erlang. It is well known for having concurrency baked into its DNA, but I was impressed with several other things (things I had seen before, but just felt so natural in Erlang). Higher order functions, list comprehensions, hot deployment, and most importantly pattern matching. This isn't an blog about any of those topics however. This is about concurrency and in the title I mentioned concurrency on the JVM.
One of the things that I did NOT like about Erlang was that the sequential programming features seemed incomplete to me coming from an OOP world. Not to mention commas, semi-colons, periods and nothing... and when to use which! I remember wishing in class that Erlang ran on the JVM so I could use it for all my concurrency needs and call out to Java/JRuby/Groovy/SISC/etc for the meat of my application logic. Using the xmerl library (I know that there is a better one out there, but c'mon no xml library should be this difficult) made me really long for something better. So once again I took a step back and tried to look at the big picture.
Before going to the Erlang Pragmatic Studio I had started to read up on Scala (which I find difficult because I find the documentation to be very sparse and fairly poor) and Scala's actor library in particular. I admit that when I first looked at it I didn't fully get it. Learning Erlang helped cement some concepts that make looking at the Actor library much, much simpler. While I still prefer the Erlang way of receiving messages in particular I found the Scala Actor library to be decent. It is also worth nothing that the Scala actor library works with threads or with processes (actors, not OS processes). To me, the whole reason to use the actor library is to NOT be using threads, so I highly recommend the event based actors.
Actor concepts
In an actor based concurrency model, there are 3 fundamental concurrency primitives. First, there is spawn. Spawn creates a new actor which has a mailbox where it can receive messages. In Erlang spawn spawns functions and in Scala the equivalent is creating a new Actor and starting it. (Think creating a new thread and starting it). The second primitive is send (!). This is pretty similar in both Erlang and Scala. receiver ! message means send message "message" to "receiver". Scala actually has some additional methods for sending messages but we won't cover them here. Finaly there is receiving messages where a message is pulled from the mailbox if it matches a pattern and can be acted upon. Erlang blows away Scala on pattern matching from what I can tell, but I am admittedly not an expert on Scala pattern matching.
So some important topics to discuss now: What is an Actor, immutable state and pattern matching, and process linking.
What is an actor?
An actor is like a thread but without shared state. I sometimes refer to it as an programming language process because an actor is a process, but not an operating system process. It is a process that is managed by the runtime process and therefore must be very lightweight. Erlang has spent a lot of time getting processes very light weight. Erlang doesn't use the term actor, it just has processes, but they are the same thing. Each process or actor has a mailbox where it receives messages from other processes. Messages are the ONLY way that processes can communicate since they have no shared state.
Immutable state and pattern matching
Immutable state is key to functional programming languages as they are intended to have no side effects and immutable state means that no function can change the value of something (a variable) and thus introduce a side effect. Pattern matching in Erlang is brilliant. It matches the left side and the right side of the =. Conceptually there are 3 things that can happen here. If the left is unbound it will be bound with the value from the right side. This looks like variable assignment from OOP. If the left side is bound it must match the value from the right side or an exception will be thrown. This is because variables are immutable and they cannot be "assigned" another value. Finally, if the left side has a partial match it can assign values to unbound variables that match the pattern. A tuple is the simple way of understanding this.
A = "a",
B = "b",
{A,C} = {A,B}
This will match because the left side and right side are tuples of the same size and none of the matching values would change state. Since A equals A we are OK. And since C is unbound it will be assigned the value from B.
A = "a",
B = "b",
{A,B} = {A,C}
This would not match because C is unbound.
Likewise, the following would not match because the values of B and C do not match:
A = "a",
B = "b",
C = "c",
{A,C} = {A,B}
Fault tolerance
So if you have heard of Erlang, you have probably heard that it makes concurrency and fault tolerance easy. We've looked at the concurrency primitives, but what about the fault tolerance primitives? They are link, unlink and process_flag. These are conceptually simple. If a process A links to process B then they are linked. If one dies then it will send a signal to all the processes that it is linked to prior to its own death. The linked processes can be system processes depending on whether or not they set their process_flag. If they did then they can trap exit messages from dying processes, if not then this process will die as well. So as you can probably imagine it is easy to create graphs of processes that will die when certain failures occur and that can be restarted or resurrected by some system processes. Scala's actors have these concepts as well. I won't go into any more detail about them here, but if you read the links at the bottom of this blog you can find some additional information in the Scala actor api.
So now that I have given a very long winded ill-explained description of actors, let's look at a real example in Erlang and in Scala:
Erlang
-module (blog).
-compile(export_all).
client(Pid) ->
Pid ! {self(),request,foo},
receive
{Pid,response,Response} ->
io:format("got response ~p~n",[Response]),
Pid ! exit
end,
io:format("client done~n").
server() ->
receive
{From,request,Request} ->
io:format("got request ~p~n", [Request]),
sleep(1000),
From ! {self(),response,bar},
server();
_ ->
io:format("server done~n")
end.
sleep(Time) ->
receive
after Time -> void
end.
foo() ->
Pid = spawn(fun server/0),
spawn(fun() -> client(Pid) end),
exit.
Scala
import scala.actors.Actor
import scala.actors.Actor._
import scala.actors.OutputChannel
case class Request(data:Object)
case class Response(data:Object)
case object Exit
class Client(server:Actor) extends Actor {
def act() {
server ! new Request("foo")
react {
case Response(data) =>
Console.println("got response " + data)
server ! Exit
println("client done")
exit()
}
}
}
class Server() extends Actor with Sleeper {
def act() {
loop {
react {
case Request(data) =>
println("Got request " + data)
sleep(1000,(sender:OutputChannel[Any]) => {
sender ! new Response("bar")
},sender)
case _ =>
println("server done")
exit()
}
}
}
}
trait Sleeper {
def sleep(time:Long,fun:(OutputChannel[Any])=>Unit,sender:OutputChannel[Any]) {
reactWithin(time) {
case _ =>
fun(sender)
}
}
}
object Foo extends Application {
var s = new Server()
var c = new Client(s)
s.start
c.start
println("exit")
}
So what does it do?
This code spawns two actors, 1 as a client and 1 as a server. The server listens for request messages and responds with response messages after sleeping for 1 second. If a message comes in that is not a request message, then the server exits. The client sends a request message to the server and then listens for the response. Once it receives its response it sends a message to the server that is not a request as a shutdown command. Not particularly interesting, but it does demonstrate some of the key differences between Erlang's processes and Scala's actors. One of the biggest differences is that a receive in Scala is a method that does not return. This means that any logic must be present in the receive and all code after the receive will never be executed. Erlang can pattern match on atoms and tuples and Scala pattern matches on case classes. One of Erlang's strong points is distributed code. In Erlang, when a process is spawned there can be no knowledge of whether or not that process is running on the same machine or remotely. It simply doesn't matter. Scala has remote actors as its equivalent, but the things I've covered here so far are really for replacing threads within a single JVM. The concepts don't change and maybe that will be the topic of a future blog post.
Resources:
- Scala
- Scala Actors
- Erlang
- Ruby Actors
Update
If you did not need the sleep function, you could use this shorthand notation to define the actors. Notice that the server actor is assigned to a variable that the client actor uses.
import scala.actors.Actor
import scala.actors.Actor._
case class Request(data:Any)
case class Response(data:Any)
object Foo extends Application {
actor {
server ! new Request("foo")
react {
case Response(data) =>
Console.println("got response " + data)
server ! "Exit"
exit()
}
}
var server = actor {
loop {
react {
case Request(data) =>
Console.println("got request " + data)
sender ! new Response("bar")
case _ =>
println("exiting server")
exit()
}
}
}
}
4 Comments
Leave a comment
0 TrackBacks
Listed below are links to blogs that reference this entry: JVM Concurrency with Scala Actors.
TrackBack URL for this entry: http://www.nearinfinity.com/mt/mt-tb.cgi/472



Scala does not have the Erlang/OTP library and none of the hot upgrade features. How does its performance compare to Erlang? I couldn't find a comprehensive test that compares the number of clients, latency and throughput of both implementations.
1. google : scala otp
2. performance ( you can google that one too)
scala is _of course_ not as fast as erlang which is related to the very basic structure of the underlying VM (Beam JVM)
I think i read some benchmark that Scala Actors get about 60% performance compared to Erlang.
However most people tend to ignore that fact and still go with scala because they can "plug" Scala MUCH more easily in their infrastructure than Erlang.
Great article, but can you please elaborate further on: "Erlang blows away Scala on pattern matching from what I can tell, ". I haven't found this to be the case at all as Scala can also match on type (via case classes) and can do all the tuple matching that Erlang can do but like yourself I am no expert so I would like to hear your observations.
Just a side note that your updated code can also easily implement the sleep functionality as you had in the first Scala example.
@Buddy, The performance metric really doesn't matter as it's a vertical measurement. The big deal is that with the concurrency measures in place from both these language solutions, you can scale horizontally. On a side note, Scala runs on the JVM which has been heavily optimized for performance but as far as I know doesn't support TCO (but I believe that Scala itself takes care of it)
Thanks again for the article and I am interested in hearing more.
ilan berci
OTP is being replicated:
http://jonasboner.com/2008/12/11/real-world-scala-fault-tolerant-concurrent-asynchronous-components.html
typically jvm is much faster in benchmarks than erlang. scala actors come in 2 flavors and the "react" event driven with ptr passing as opposed to erlang's msg copy are likely faster. but this is all apples and oranges. if you like the jvm features use it. if you like the brevity of erlang's language and ecosystem, use it.