Tuning the IBM JVM for large heaps

| | Comments (7) | TrackBacks (0)
Recently I have been rewriting a lot of our lucene search engine code for a web application that I'm currently supporting.  Our physical environment is a little different than most, we have a single very large computer (a left over from a previous project) running several virtual machines.  The virtual machine that we are currently using for our searching duties is a 64 cpu 128G box.

Before I go into the tuning saga, I would like to say that in a previous version of the search engine we ran many more (16), smaller jvms all on the same box.  They work together through a main controller module that coordinates calls to all the searching jvms.  After the rewrite of the searchers, there was a lot of pressure to get the new code out the door.  The new code partitions the data and searching duties in a very similar way to the previous version, however I haven't had time to focus on getting all the parts broken up across all the jvms.  It's very doable, but do to customer priorities it hasn't been developed.  But I was in luck (kinda), our environment was a single computer, so I thought I will just run one large jvm.  I knew going in that garbage collection (GC) was going to be a problem, I just didn't realize how much.

Currently, we are running the 64-bit IBM java 6 jvm, the IBM jvm is very different from the Sun jvm, but no more so than in the GC tuning department.  In the Sun jvm you have all kinds of settings to tweak, algorithms to use etc.  The IBM jvm just isn't as advanced, you have 4 different GC policies, and sizing for various parts of the internals.  And that's it!  Great, it should be simple right?  Well sorta.

First try, bring the system online with a 50G (yes that's gigabytes) heap, with all the default settings.  Run some load tests and see where we are.

Everything is running great right up to the first full GC, 25 seconds doesn't sound that long but when you are waiting for a computer to return results, it's an eternity.  For those that don't know, when a full GC occurs (some of the newer Sun algorithms are different) it stops the world (STW).  This means the JVM is frozen until the GC is complete.  No good.

So the default policy is the optthruput policy (-Xgcpolicy:optthruput).  After digging through the IBM documentation, this type of policy should be used for maximum throughput, but at the expense of pauses during GC.  They also mention that this should be used for batch processing when pauses are really a problem.

Next I tried the optavgpause policy (-Xgcpolicy:optavgpause), this is suppose to smooth out the GCs by kicking off the mark phase early.  I'm not going to talk about mark, sweep, and compaction, you can find it here (http://www.ibm.com/developerworks/ibm/library/i-incrcomp/).  But basically it's suppose to run a concurrent parallel mark phase before the jvm runs out of heap space and performs a full GC that STW.  This did help, got us down in the 10-15 second range on average, but the problem was that the mark phase was too slow to start under heavy load and didn't give us a whole lot of concurrent marking before the STW.  I found this out by adding -verbose:gc, this adds a lot of debugging information to the standard out.  You should just run this all the time, it provides a lot of useful information about your application.

Next I tried subpool (-Xgcpolicy:subpool), it worked fine but was plagued by the same problem, slow full GCs.  Subpool is suppose to work better on large SMP machines like ours, but in the end it was more of the same.  It's full GCs were in the 10-12 second range, plus the application seemed to run slower, by about 10%.

And last I tried gencon (-Xgcpolicy:gencon), the newest of all their policies.  Gencon is suppose to be used on "transactional systems", systems that create a lot of short lived objects.  Isn't that what most java applications do?  When I started it up, it seemed to be faster, and our load test confirmed that, 30% more throughput.  But the amazing thing was that the full GCs were fast, really fast, in the 200-400ms range, for a 50G heap!  But wait, it wasn't using most of the heap, and it was GCing all the time.  Back to the verbose:gc log, AHA!  The nursery was too small, it was about 10% of the heap, and because our application is almost all short lived objects I decided to increase the size of the nursery.  I gave it 50% of the heap, and the time between GCs slowed down, but the time to perform the GC was still in the 350ms range.  Awesome!

I finally settled on 100G heap with 50% a nursery, and the full GCs are now in the 400-600ms range.  I can live with that, because this gives us a huge ceiling for load, and capacity.

So to summarize, if you application needs to run a huge heap size, and you are using the IBM jvm, I would start by using the gencon policy.  It seems to be the most modern of all their policies, and it seems to work the best.

Good luck!

7 Comments

André Galastri said:

Very interesting!
But I became curious about how much memory a full 300-400ms gencon gc is able to reclaim.
Do you have an idea?

Aaron said:

At full load with the 100G heap it was collecting between 45G-55G of heap space in about 400-600ms. The smaller settings of 50G total with a nursery of 50% was yielding the 200-400ms responses with 20-25G being reclaimed. BTW, I was watching the heap with jconsole, and the verbose:gc jvm setting.

Wow, a 100G heap on a regular JVM? Totally cool! How often does these Full GCs occur?

BTW, can you use the IBM JDK in production without paying fees?

Alex said:

The gotcha you don't describe is compaction. No matter your GC algorithm, allocation chunk and GC-ing them will create fragmented heap, and JVM with big heap can seriously suffer of compaction time. I'd suggest you dive some more into this topic and share your observations.

Peter B. said:

100G heap sounds great!
while researching ive found this interesting post about tuning applications using different JVMs.

http://bigdatamatters.com/bigdatamatters/2009/08/jvm-performance.html

Paloma said:

Your article is great! I wonder where you find materials for such articles? While writing an article, I always use books and periodical search engine( http://www.pdfqueen.com ), perhaps you know some other good variants? Thank you in advance.

Aaron said:

Most of the content was gathered from the ibm's web site, most the content was from their JVM tuning pdfs.

Leave a comment


Type the characters you see in the picture above.

0 TrackBacks

Listed below are links to blogs that reference this entry: Tuning the IBM JVM for large heaps.

TrackBack URL for this entry: http://www.nearinfinity.com/mt/mt-tb.cgi/602