Nirav's Contemplations

Monday, March 05, 2012

On Static v/s Dynamic typing

I've been pondering about trivialities again, I am observing how most programmers have strong vested interest in defending static v/s dynamic typing as if they are mutually exclusive.

There're those who think static typing is the ultimate engineering solution to build reliable software and dynamic typing is some kind of a sick joke to put their lazyness to test. Then there are those who think types are for primitive minds and variable declaration is an insulting feature.

At this point in time, I can relate to both groups as I've had strong opinion on this subject. Having opinion on typing is OK, I think. (having no opinion on this topic basically means google didn't work for you and you should head back).

Most programmers go through, what I call, a typing evolution cycle. I started with dynamically typed language (name removed because of possible copyright violation). It was really great to stuff everything in to a variant (actually I never even bothered to declare variables!) and not worrying about compilation or runtime errors, it mostly ran. Then I was introduced to C (and later C++) and I remember how I hated it because I was now forced to declare variables and had to think about what I wanted computer to do upfront. It was alien to me that compiler will point out my mistakes instead of doing my bidding. During that time I was a supporter of dynamic typing because I was naive.

As the time went by I got more used to static typing with interesting IDE features such as IntelliSense of Visual Studio. Writing correct program was much easier now and it always worked (mostly correctly). Then came the Java programming, everything was object (except primitives). Large projects and Eclipse JDT made me appreciate value of static typing. When I start writing test I don't think about type errors because they are taken care of, I could easily refactor code without nightmares that something could be broken. Scala's type inference and simpler type system actually boosted my faith in static typing even further. At this point in time I supported static typing because I was naive.

This was my experience, may be it is reverse for others (starting from static typing and going back and forth).
My views have changed on typing over the time. Depending on what I'm working on, I don't mind hand-waving type system (coupled with esoteric tests) for a quick isolated stab that brings a lot of benefits v/s a bloat of API to do the same in few years. While still having the confidence in my software being immune to my bad keyboard-fu and assurance that stupid mistakes will not make it all the way to production. Probably, I'm still naive.

Monday, December 05, 2011

Avoid storing references of java.net.URL

Normally, I avoid writing something so obvious but since I'm bitten multiple times now, it might help future me.

Never ever store references of java.net.URL in Java collections. The reasoning is pretty simple, 'equals' and 'hashCode' methods of this class does extremely expensive synchronous DNS lookup on every call.

It is not uncommon to see most of your thread's time being spent on monitors:

"pool-2-thread-2" prio=10 tid=0x92061400 nid=0x1744 waiting for monitor entry [0x91fad000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.net.URLStreamHandler.getHostAddress(URLStreamHandler.java:429)
    - waiting to lock <0x9731b200> (a sun.net.www.protocol.http.Handler)
    at java.net.URLStreamHandler.hashCode(URLStreamHandler.java:354)
    at java.net.URL.hashCode(URL.java:875)
    - locked <0xaac87290> (a java.net.URL)
    at java.util.HashMap.getEntry(HashMap.java:361)
    at java.util.HashMap.containsKey(HashMap.java:352)
    at java.util.HashSet.contains(HashSet.java:201)

"pool-2-thread-1" prio=10 tid=0x9205e800 nid=0x1743 runnable [0x91ffe000]
   java.lang.Thread.State: RUNNABLE
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
    at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    at java.net.InetAddress.getAllByName(InetAddress.java:1063)
    at java.net.InetAddress.getByName(InetAddress.java:1013)
    at java.net.URLStreamHandler.getHostAddress(URLStreamHandler.java:437)
    - locked <0x9731b200> (a sun.net.www.protocol.http.Handler)
    at java.net.URLStreamHandler.hashCode(URLStreamHandler.java:354)
    at java.net.URL.hashCode(URL.java:875)
    - locked <0xaac97228> (a java.net.URL)
    at java.util.HashMap.getEntry(HashMap.java:361)
    at java.util.HashMap.containsKey(HashMap.java:352)
    at java.util.HashSet.contains(HashSet.java:201)

This stack-trace just depicts hashCode but expect similar blocking code for 'equals' too. If you care a bit about performance, just stay away from this goddamned class.

Thursday, November 24, 2011

Optimizing string memory footprint in Java - Part 1

This is first in a series of blog posts where I will try to describe my miserable attempts at storing large amount of strings to build efficient spell correction and auto-suggest facility in Java. I learned some important lessons doing this and it will be a waste if I didn't share it. So here it goes..

In order to efficiently store large amount of strings, we first need to understand how JVM stores string objects in memory. In this post I will try to summarize memory layout of String object.

Each object in JVM has fixed, unavoidable structural overhead called object header that JVM uses for various tasks such as Garbage Collection, Identification, addressing and others that I don't understand. A 32-bit HotSpot VM uses 8 bytes header per object, 64-bit HotSpot VM with heap larger than 32g as well as IBM's J9 VM takes 16 bytes per object. As always, arrays are treated differently. Array is an object so it has a fixed header of its own as described above. However, since Java spec guarantees array bounds check on every array operation it has to store length of each array which is another 4 byte, making effective array header size 12 bytes instead of regular 8 for objects. And finally there's object padding in multiples of 8 or 16 bytes, depending on CPU word size and JVM to make memory to CPU communication efficient.

Assuming 32-bit JVM, the java.lang.String class has three bookkeeping integer fields called offset, count, hash occupying 12 bytes plus a 4 byte char array pointer and 8 byte header totaling 24 bytes fixed overhead per string. This is inefficient but slightly clever because padding is not required for string object itself making hash a free cache.

Here's the formula to calculate String's shallow and deep sizes:

Shallow size = HEADER + ( offset + count + hash + char array pointer) = 24 bytes
Retained size = Shallow size + 12 byte array header + (nchars * 2 + padding)

Where padding = ( (nchars * 2) mod word_size) is rounded off to 8/16 bytes depending on 32/64-bit JVM (Except J9) word_size.

So, on a 32-bit HotSpot VM, 6 byte word "Memory" takes :
24 + 12 + (6 * 2 + 4 padding) = 24 + 12 + 16 = 52 bytes

That is 77% JVM imposed overhead v/s 23% actual data considering it a unicode string.

One way to make string representations efficient is to amortize structural overhead by storing larger strings. To illustrate, 100 char string will be 80% data v/s 20% overhead, 200 char string will be 94.33% data v/s overhead, 500 char string will be 97.6% data v/s overhead and so on. This will be one of the key technique in reducing string's memory footprint I will write about in later blog posts. Following is a rough graph that depicts this theory.

It is easy to see that storing a lot of small strings actually waste huge amount of memory which in no less part the reason why Java is considered a memory hog.

That's all with the basics of understanding overhead involved with storing strings. In next blog post, I will write about various implementations I tried and general rant on Java Collections' memory efficiency.

Monday, October 24, 2011

On using "PermGen" as application level cache

I was reading an interesting article 'Assualt by GC' by stack exchange guy and it felt like a Déjà vu with my past couple of years of development on the JVM. It struck to me that we can definitely do better, so here it goes..

Automatic GC is really a great step forward in software development, except when it is not. If you have deployed application on a JVM with large heap (4g+) you probably know what a long GC pause really feels like [insert familiar knock-knock joke about java]. Jokes aside, JVM's GC advancement is unprecedented. The amount of tuning you can do with different garbage collectors can define a niche profession.

For most applications where GC latency isn't an issue; default garbage collector works just fine. For applications which need to scale GC can (and does) become a 'bottleneck'. If you disagree, try running a JVM under memory pressure and see app response times. It should be surprising because in most data driven applications bottleneck is usually IO or other IO bound resources (e.g. a DB). This situation generally happens when GC is completely thrashing the process because there are too many "tenured" objects which don't fit in the allocated heap or heap is fragmented and GC wastes a lot of time compacting the heap. Unlike .NET, Java folks are not very lucky with platform specific optimizations such as locking pages to prevent swapping. So it is not uncommon that a full GC causes excessive paging, making GC IO bound.

Turns out that GCing large "tenured" object space is expensive compared to short lived young objects (sweep). A large population of "tenured" objects is generally a genuine requirement for long running server processes relying on large amounts of data and this requirement shouldn't really punish application with long GC pauses. While not impossible, it is not really practical to set size of tenured generation very large because it may adversely affect young generation collections. JVM GC optimization is a skill not in abundance but the problem is all too common. So what can we do about it?

One way to eliminate GC on predictable "tenured" application data is to just not store it on JVM heap (i.e. use direct byte buffer etc. ). I've been watching solutions like Terracotta's BigMemory which uses similar approach to address GC issues. However all such solutions seem a mix of manual memory management with hacks to circumvent GC which end up being half-baked reinvention of JVM's copy-on-write "permgen".

Most of the java developers I know consider "permgen" to be some kind of evil which causes all kinds of problems including "eclipse crashing", crying JSP/[insert other template library] compiler, unpredictable class unloading and really large interned strings which stick around. "permgen" is going to go away from the hotspot vm, which is kind of sad because I think it could be a great way to achieve GC free heap storage for application level data (more specifically cache). This is not really possible unless "permgen" is used for one specific purpose, and if that specific purpose allows application to store its data, we can have standard supported GC free application data without the need of third party solutions which achieve the goal poorly. Even better would be java.cache using "permgen" for cache storage.

One of the commenter at HN talked about Smalltalk VM's way of using permgen (just send a message to object to move itself to "permgen"). I like this approach because applications can control which objects are long lived which is sensible because they have the best knowledge about long lived objects. The only similarity in JVM we have is String.intern, which unfortunately caches strings forever and it is not really as useful as having some kind of eviction control.

So, what do you think about this approach?