Is F# faster than C#

| | Comments (8) | TrackBacks (0)

Description

I had a bet with a co-worker that C# would outperform F# for a simple counting exercise. This turned out to be quite a surprise for many reasons. In this blog post I will try to explain why.

Problem

Sum the numbers from 1 to n using brute force (can't use the (1 + N)*(N/2) equation). Do this in both C# and F# and compare the speed. Since F# needs to do this recursively I thought this should be a no contest win for C#.

Using F#

We initially wrote the code without using tail recursion which immediately ended with a stack overflow. So we went back and wrote it doing it the functional way, with tail recursion. Here is what we came up with.

module FSharpLib

	let sumNum fromNum toNum =
		let rec sumNumTail fromNum toNum result =
			if fromNum > toNum then 
				result 
			else 
				sumNumTail (fromNum+1L) toNum (result+fromNum)
		sumNumTail fromNum toNum 0L

Using C# - Attempt 1

You'll see later why I say attempt 1. But, this is the simplest C# code I could think of.

public static long CSharpForLoop(long fromNumber, long toNumber) {
	long result = 0;
	for (long i = fromNumber; i <= toNumber; i++) {
		result += i;
	}
	return result;
}

The Results - Attempt 1

This is the results of running the above code using a fromNumber of 1 and a toNumber of 10000000 and running it 10 times alternating between C# and F# to filter out start-up times, etc.

F#0.8606902s11% faster
C#0.9556198s

I ran it like 20 times because I couldn't believe it. I looked at the C# code for any possible optimizations I could add. But, being so simple there wasn't a lot I could do.

Reflector is an amazing tool

I turned to my good friend Reflector to see what was going on. Here is what the F# code looked like when viewing it as C# code.

public static long sumNum(long fromNum, long toNum)
{
    FastFunc<long, FastFunc<long, FastFunc<long, long>>> sumNumTail = new clo@7();
    return FastFunc<long, long>.InvokeFast3<long, long>(sumNumTail, fromNum, toNum, 0L);
}
and clo@7.Invoke looks like
public override long Invoke(long fromNum, long toNum, long result)
{
	while (fromNum <= toNum)
	{
		result += fromNum;
		toNum = toNum;
		fromNum += 1L;
	}
	return result;
}
They used a while loop instead of a for loop. OK, I can do that. But, what happened to tail, I thought that tail is why the CLR is so good at functional language implementations.

C# - Attempt 2

public static long CSharpWhileLoop(long fromNumber, long toNumber) {
	long result = 0;
	while (fromNumber <= toNumber)
	{
		result += fromNumber;
		// I left out the toNum = toNum because it didn't do anything.
		fromNumber += 1L;
	}
	return result;
}
and the results (I've got you now F#)...

F#0.8606902s
C# For Loop0.9556198s
C# While Loop0.9105592s

Oh wait... DAMN! What could F# be doing to make it almost 10% faster than C#.

To the IL code Batman

This is the IL for the F# code.

	L_0000: nop              // while loop
	L_0001: nop 
	L_0002: ldarg.1          // fromNum <= toNum
	L_0003: ldarg.2 
	L_0004: ble.s L_0009     
	L_0006: nop              // return result
	L_0007: ldarg.3 
	L_0008: ret              
	L_0009: nop              // result += fromNum
	L_000a: ldarg.1 
	L_000b: ldc.i8 1
	L_0014: add              
	L_0015: ldarg.2          // fromNum += 1L
	L_0016: ldarg.3 
	L_0017: ldarg.1 
	L_0018: add              
	L_0019: starg.s result   // loop
	L_001b: starg.s toNum
	L_001d: starg.s fromNum
	L_001f: br.s L_0000      
	
This is the IL for the C# while loop.
	L_0000: nop                   // long result = 0L
	L_0001: ldc.i4.0
	L_0002: conv.i8 
	L_0003: stloc.0               
	L_0004: br.s L_0012           // while loop
	L_0006: nop                   // result += fromNumber
	L_0007: ldloc.0 
	L_0008: ldarg.0 
	L_0009: add                   
	L_000a: stloc.0               // fromNumber += 1L
	L_000b: ldarg.0 
	L_000c: ldc.i4.1 
	L_000d: conv.i8 
	L_000e: add                   
	L_000f: starg.s fromNumber    
	L_0011: nop 
	L_0012: ldarg.0               // fromNumber <= toNumber
	L_0013: ldarg.1 
	L_0014: cgt 
	L_0016: ldc.i4.0 
	L_0017: ceq 
	L_0019: stloc.2 
	L_001a: ldloc.2 
	L_001b: brtrue.s L_0006       // loop
	L_001d: ldloc.0 
	L_001e: stloc.1 
	L_001f: br.s L_0021
	L_0021: ldloc.1 
	L_0022: ret 
	

What just happened.

Wow, the C# compiler is quite a bit more verbose. First of all, C# makes the while loop into a do/while loop. Secondly, in the loop condition C# creates a temporary variable to store the comparison and then checks for 'true' ("brtrue.s"). If you look at the F# IL you'll see the use of the "ble.s" instructions (branch if less or equal) instead. Finally we see that C# uses a few "conv.i8" instructions (convert to 8 byte/64-bit int) which I can only guess adds to it's defeat.

Conclusion

So is F# just a more performant language or is C# just compiling to more robust IL code which as F# matures will end up suffering the same fate? I personally think (hope actually) the later is true. I'm not an expert in compilers but I would have to assume the C# compiler team is using every trick in the book to make highly performant IL code without sacrificing robustness and I can only assume that F# will get to a point where it needs to make the same concessions to handle some obscure cases.

8 Comments

Very nice article! I think you are dead on though with your assumption that once F# is fully integrated into VS 2010/.NET 4.0 it might succumb to a very slight performance hit.

From what I understand right now in the CTP its not fully integrated inside the .NET 4.0 CLR.

Paul McLeish said:

By looking at the IL you've presented it looks like you were running both samples in debug mode. Did you try release mode?

I played with the C# code you have above and found the following:

In debug mode, both the for loop and while loop versions perform roughly the same.

In release mode, the while loop performed ~1.4x faster than the debug version, and the for loop performed ~3.6x faster (for loop was ~2.5x faster than while loop).

I have never used F#, so I can't give any results regarding debug/release mode differences.

But you may want to run your tests again to see if there are any differences (Maybe you didn't lose the bet after all... maybe).

chaiguy said:

My thoughts are if both versions are equivalently "correct", then C# is simply at fault for compiling to less optimized code. That is to say that there's no excuse why complexities elsewhere in the language mean that it has to create a poorer compilation as compared to the F# compiler.

Personally I'd rather see the C# compiler become properly optimized in comparison to the F# compiler, not F# become slower. Don't know why you would "hope" for the latter. *rolls eyes*

Interesting post, nonetheless. Looking forward to fiddling with F#.

Ryan Morlok said:

F# is based on OCAML which was itself known for good performance. It allowed programmers to easily implement algorithms that would run at near-native speeds. It was even used with good success in some national programming competitions. The entire F# compiler has, of course, been developed from the ground up to output IL, but the language lends itself well to optimized output code.

How about comparing to a LINQ implementation? Good or bad, I'd be interested in seeing the results :)

"I would have to assume the C# compiler team is using every trick in the book to make highly performant IL"

Are you kidding? The C# compiler does far less stuff, as you just saw right here. The F# compiler determined the code was a loop, and compiled it so. (It didn't need to use the tail prefix 'cause it did even better.)

I'm not saying the C# compiler doesn't optimize, but it tries to keep a direct mapping from source to IL. (I think this is even sort of a goal for the C# compiler).

I think I heard that F# will do other things, like when creating a closure, it could pass captured values in as arguments instead of a closure object+fields.

Second, can you explain why you think the C# IL is more "robust"?

Anyways, it looks as if you compiled with debug (all the nops), so this isn't a very good comparison after all...

Steve Hawley said:

I looked at the output of the same code in managed C++ in release and found two issues: (1) the compiler is making some fairly egregious mistakes in terms of doing redundant loads and store and (2) for some bizarre reason is loading 32 bit constants and converting them to 64 bits.

The JIT, however, does a fairly decent job of turning the IL into x86 and nearly everything is in registers.

I'm impressed with the F# compiler - that's pretty nice IL for the task.

ben said:

This looks like unoptomized code with all those nops..

Leave a comment


Type the characters you see in the picture above.

0 TrackBacks

Listed below are links to blogs that reference this entry: Is F# faster than C#.

TrackBack URL for this entry: http://www.nearinfinity.com/mt/mt-tb.cgi/576