Monday, October 24, 2011

On using "PermGen" as application level cache

I was reading an interesting article 'Assualt by GC' by stack exchange guy and it felt like a Déjà vu with my past couple of years of development on the JVM. It struck to me that we can definitely do better, so here it goes..

Automatic GC is really a great step forward in software development, except when it is not. If you have deployed application on a JVM with large heap (4g+) you probably know what a long GC pause really feels like [insert familiar knock-knock joke about java]. Jokes aside, JVM's GC advancement is unprecedented. The amount of tuning you can do with different garbage collectors can define a niche profession.

For most applications where GC latency isn't an issue; default garbage collector works just fine. For applications which need to scale GC can (and does) become a 'bottleneck'. If you disagree, try running a JVM under memory pressure and see app response times. It should be surprising because in most data driven applications bottleneck is usually IO or other IO bound resources (e.g. a DB). This situation generally happens when GC is completely thrashing the process because there are too many "tenured" objects which don't fit in the allocated heap or heap is fragmented and GC wastes a lot of time compacting the heap. Unlike .NET, Java folks are not very lucky with platform specific optimizations such as locking pages to prevent swapping. So it is not uncommon that a full GC causes excessive paging, making GC IO bound.

Turns out that GCing large "tenured" object space is expensive compared to short lived young objects (sweep). A large population of "tenured" objects is generally a genuine requirement for long running server processes relying on large amounts of data and this requirement shouldn't really punish application with long GC pauses. While not impossible, it is not really practical to set size of tenured generation very large because it may adversely affect young generation collections. JVM GC optimization is a skill not in abundance but the problem is all too common. So what can we do about it?

One way to eliminate GC on predictable "tenured" application data is to just not store it on JVM heap (i.e. use direct byte buffer etc. ). I've been watching solutions like Terracotta's BigMemory which uses similar approach to address GC issues. However all such solutions seem a mix of  manual memory management with hacks to circumvent GC which end up being half-baked reinvention of JVM's copy-on-write "permgen".

Most of the java developers I know consider "permgen" to be some kind of evil which causes all kinds of problems including "eclipse crashing", crying JSP/[insert other template library] compiler, unpredictable class unloading and really large interned strings which stick around. "permgen" is going to go away from the hotspot vm, which is kind of sad because I think it could be a great way to achieve GC free heap storage for application level data (more specifically cache). This is not really possible unless "permgen" is used for one specific purpose, and if that specific purpose allows application to store its data, we can have standard supported GC free application data without the need of third party solutions which achieve the goal poorly. Even better would be java.cache using "permgen" for cache storage.

One of the commenter at HN talked about Smalltalk VM's way of using permgen (just send a message to  object to move itself to "permgen"). I like this approach because applications can control which objects are long lived which is sensible because they have the best knowledge about long lived objects. The only similarity in JVM we have is String.intern, which unfortunately caches strings forever and it is not really as useful as having some kind of eviction control.

So, what do you think about this approach?