Friday, February 16, 2007

Escape Analysis in Mustang - II

EDIT: As commented by Brian, these Escape Analysis optimizations were dropped before final release and deferred to Java 7. The debug version of VM can no longer be found at http://download.java.net/download/jdk6/6u1/promoted/b03/binaries/jdk-6u1-ea-bin-b03-windows-i586-debug-19_jan_2007.jar

In my previous post, I mentioned that escape analysis is available in mustang releases and debug flags available for it in HotSpot(tm) VM.

I ran some micro benchmarks yesterday, they are fairly trivial but good enough to identify the performance gains, and here is what I found:
import java.io.ByteArrayOutputStream;
import java.io.OutputStream;
import java.io.PrintStream;

public class TestEA {
public static class MyPrintStream extends PrintStream {
public MyPrintStream(OutputStream out) {
super(out);
}

@Override
public void println(int cnt) {/* deep inline candidate */
}

}

private static final int COUNT = 10000000;

public static void main(String[] args) throws Exception {
System.setOut(new MyPrintStream(new ByteArrayOutputStream()));
for (int i = 0; i <= 10; i++)
test();

}

private static void test() {
int cnt = 0;
Object objLock = new Object();
long start = System.currentTimeMillis();
for (int i = 0; i < COUNT; i++) {
synchronized (objLock) {
cnt++;
}
consume(cnt);
}
long end = System.currentTimeMillis();
System.err.println(cnt + ", time=" + (end - start) / 1000.0);
}

private static void consume(int cnt) {
System.out.println(cnt);
}

}

The results
VMArgs: -server

10000000, time=0.907
10000000, time=0.938
10000000, time=0.969
10000000, time=0.906
10000000, time=0.906

With
VMArgs: -server -XX+DoEscapeAnalysis
10000000, time=0.89
10000000, time=0.922
10000000, time=0.016
10000000, time=0.016
10000000, time=0.015
10000000, time=0.016

JVMOut:
28 JavaObject NoEscape [[ 54F]] 28 Allocate
....
40 LocalVar NoEscape [[ 28P]] 40 Proj ....
90 LocalVar NoEscape [[ 28P]] 90 Phi ....
213 JavaObject NoEscape [[]] 213 Allocate ....
225 LocalVar NoEscape [[ 213P]] 225 Proj ....
======== Connection graph for TestEscapeAnalysis::test ....
172 JavaObject NoEscape [[]] 172 Allocate ....
184 LocalVar NoEscape [[ 172P]] 184 Proj ....

I ran the loop in main for few times because server VM does a deep inline and it might optimize the execution considering cnt is not significantly used anywhere.

Well, It can be seen that escape analysis successfully eliminated synchronization
on objLock (lock elision) . As I posted earlier, synchronization has significant impact on execution speed, elimination of this heavy operation improved the speed substantially. Consider the effect of it on a highly concurrent web server (JAWS) handling hundreds of simultaneous requests. Of course, it happened because objLock is allocated locally and VM identified that it can't be shared between multiple threads and its safe to remove the sync overhead.

I also tried making objLock a static field of the class and found that escape analysis has no positive impact on execution speed as can be seen here:

VMArgs: -server -XX:+DoEscapeAnalysis (objLock as static field in above program)
10000000, time=0.922
10000000, time=0.922
10000000, time=0.906
10000000, time=0.891
10000000, time=0.921

As expected, objLock being a shared field, VM has no way of identifying whether lock on it can be eliminated or not.

Unfortunately, stack allocation seems to be absent in current builds of mustang (or there were no hints in
debug output as when it was done. Also, its hard to decipher the VM output and I found no explanation for it). Here is an excellent presentation on new optimization in HotSpot(tm) VM. And for lock elimination enhancements refer to this.

So, there you go, one more optimization for the managed runtime; for those who are still in the illusion that native static compiler optimized programs are the fastest, think again, your programs are static at runtime can't organize itself, managed runtime is metamorphic, it can adapt and substantially optimize itself at runtime. Man, that's what I call programming.

No comments: