Tuesday, May 12, 2009

Scala v/s Java arrays

Here's a Java puzzler for the curious (and a good interview question too!). Given a array merge method below, what will be the output of following program?



public class Generics {

static class A {
}

static class B extends A {
}

public static void main(String[] args) {

A[] copy = merge(new B[] { new B() }, new A[] { new A() }, new B[1]);
System.out.println(copy.length != 1);

}

static Z[] merge(Z[] arr1, Z[] arr2, Z[] store) {
List list = new ArrayList();
list.addAll(Arrays.asList(arr1));
list.addAll(Arrays.asList(arr2));
return list.toArray(store);
}

}


If you didn't guess it already, the program above results in a runtime exception (java.lang.ArrayStoreException).


Exception in thread "main" java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at java.util.Arrays.copyOf(Unknown Source)
at java.util.ArrayList.toArray(Unknown Source)
at name.nirav.Generics.merge(Generics.java:23)
at name.nirav.Generics.main(Generics.java:16)


I am not a huge fan of generics in Java because we are left with whatever type safety we get from a half-hearted implementation (and I'm not even criticizing). It is too much to expect from a Java compiler to check that the program above has type safety compromised at call site, mostly because that's how arrays in Java are handled by VM. Arrays are special types of mutable objects with components as anonymous members which are accessed with indices. An array itself isn't a type, it assumes whatever type its components are. This is where the problem starts.

With current generics implementation, generic arrays are treated as covariant by default i.e. an array of component type T is also array of component type S where T is a subclass of S. This introduces type issues such as above where syntactically valid programs are victimized, making Java's "statically typed, type safe language" designation an irony. If arrays were regular objects, compiler will report an error in code without type variance information.

Arrays are regular objects in Scala, each array is an instance of Scala.Array class. The code below is equivalent to Java program above with some syntactic differences, unlike Java code the Scala code below is not syntactically valid. Scala arrays are non-variant, and Scala compiler uses what is called "conservative approximation" to ensure type safety at compile time.


object App extends Application{

class A

class B extends A

def merge[T](arr1 : Array[T], arr2: Array[T], store: Array[T]) : Array[T] = {
val list = new ArrayList[T]
list.addAll(Arrays.asList(arr1:_*)) // :_* is for vararg conversion
list.addAll(Arrays.asList(arr2:_*))
list toArray store
}

merge(Array[B](new B), Array[A](new A), new Array[B](1)) //Error, type mismatch

}


The Scala compiler will report an error on "merge" call, complaining about type mismatch.

Not everyone likes to know about such details until it bites back with million dollar bugs. Why are Java arrays co-variant? Who needs more run time checks?

Tuesday, May 05, 2009

Fork/Join Concurrency with Scala Actors

Have you ever wondered why there are no special frameworks to address concurrency in a Java based application? Considering Java's rich (NIH) ecosystem, I do wonder why I have to write same old state management code while introducing even a small amount of concurrency in Java application.

The reason why I think it is almost impossible to consider concurrency as an aspect in arbitrary application is because of JVM's native support for shared memory concurrency. As a result every developer is forced to think in terms of threaded shared state with guarded blocks. If you have read or written non-trivial piece of code using shared memory concurrency primitives (Mutex, Semaphore etc.) you probably know that the resultant code is hard to visualize and test.

I have been reading about Scala's Actor library and its share-nothing message passing abstraction built over existing concurrency model of JVM. While it doesn't try to solve the fundamental problem, it provides an alternative to address concurrency in your application from a different perspective which is testable and easier to understand.

In actor model, an Actor is a forkable task which runs independently, something like a serializable+immutable object with its private data and behavior. Each actor can send and receive (or react to) messages asynchronously, very similar to object oriented programming with objects responding to messages, but in a concurrent way. This abstraction can seamlessly be applied to a given application of divide and conquer nature and can be made concurrent with minimal efforts as compared to adapting to Java's concurrency primitives.

To explain my point further take a look at classically trivial Producer/Consumer example in Java.


public class Consumer extends Thread {
private final Buffer buffer;
public Consumer(Buffer buffer) {
super("Consumer");
this.buffer = buffer;
}
@Override
public void run() {
while (true){
System.out.println(buffer.next());
}
}
}
public class Producer extends Thread {
private final Buffer buffer;
public Producer(Buffer buffer) {
this.buffer = buffer;
}
@Override
public void run() {
Random random = new Random(System.nanoTime());
while (true) {
String num = Integer.toString(random.nextInt());
System.out.println(getName() + "=putting: " + num);
buffer.add(num + ": " + getName());
try {
sleep(400);
} catch (InterruptedException e) {
}
}
}
}
public class Buffer {
private String string;
private boolean ready = false;
public synchronized String next() {
if (ready != true)
try {
wait();
} catch (InterruptedException e) {
}
ready = false;
return string;
}

public synchronized void add(String string) {
while(ready == true)
try {
wait();
} catch (InterruptedException e) {
}
this.string = string;
notifyAll();
ready = true;
}

}
public class Test {
public static void main(String[] args) throws Throwable {
Buffer buffer = new Buffer();
new Consumer(buffer).start();
Producer producer = new Producer(buffer);
producer.start();
producer.join();
}
}


Take a look at Buffer class, we have used some concurrency primitives there since that's the place where state is being manipulated. We didn't declare variable ready as volatile since primitive assignments are guaranteed to be atomic (except long and double), Even a simple problem like this involves fair bit of understanding of the underlying threading model. There's no doubt this complexity will extrapolate in non-trivial applications e.g. multi-phase concurrent incremental compiler, SEDA based server etc.

Now take a look at the equivalent Producer/Consumer example in Scala.

import actors._
import actors.Actor._
import util.Random

case class SimpleMessage(num: Long)

class Producer(c: Consumer) extends Actor{
val random = new Random(System nanoTime)
def act = {
loop{
val num = produce
println("Sending: " + num )
c ! SimpleMessage(num) // asynchronous message passing
}
}
def produce(): Long = {
Thread sleep 400
return random.nextLong
}
}
class Consumer() extends Actor{
def act = {
loop{
receive{ //blocks here
case SimpleMessage(num) => println("Received: " + num);
}
}
}
}
object PCTest {
def main(args : Array[String]) : Unit = {
var c = new Consumer()
var p = new Producer(c)
c.start;p.start
}
}

Even if we don't compare the amount of code, the Scala code above is much more clear in terms of its functionality. In Scala, Actors can be mapped to a single native thread with 'receive' (similar to Thread#wait()) or we can replace 'receive' with 'react' which is event based invocation but doesn't cost a blocked thread. The code within 'react' is executed by any non-blocked thread from a pre-created thread-pool. Just a single change and your application is scalable!

The Java example code above can be equally trivialized with the util.concurrent BlockingQueue, but the important point to take away is, writing shared memory concurrency code is inherently difficult and error-prone. With JDK1.7 we will get similar fork/join abstraction in Java itself (JSR166y), which will add new alternative to how we design and write concurrent applications.

Scala borrowed Actors from Erlang and similar libraries exist for Java as well. If you are curious about interesting details on Actor based OO concurrency implementation in Java, take a look at some of the thoughts Sebastian is sharing with his ConcurrentObjects library.

Tuesday, April 21, 2009

How Scala's pattern matching can replace Visitors

The primary motivation of Visitor design pattern is to separate model traversal from operational logic. A visitable model takes the responsibility of model navigation while the behavior is defined by arbitrary visitors. In this post I will try to explain problems associated with Visitors in general and how Scala's pattern matching feature can eliminate such problems cleanly.

Consider a simplified Insurance Policy model as follows (In Java):



public class PolicyElement {
static class Quote extends PolicyElement {
protected final Risk risk;
public Quote(Risk risk) {
this.risk = risk;
}
public void accept(PolicyVisitor visitor){
visitor.visit(this);
visitor.visit(this.risk);
}
}

static class Risk extends PolicyElement {
protected Coverage coverage;
public Risk(Coverage coverage) {
this.coverage = coverage;
}
public void accept(PolicyVisitor visitor){
visitor.visit(coverage);
}
}

static class Coverage extends PolicyElement {
protected final Premium prem;
public Coverage(Premium prem) {
this.prem = prem;
}
public void accept(PolicyVisitor visitor){
visitor.visit(prem);
}
}

static class Premium extends PolicyElement {
protected final double amt;
public Premium(double amt) {
this.amt = amt;
}
public void accept(PolicyVisitor visitor){
visitor.visit(this);
}
}
}

public interface PolicyVisitor {
public void visit(Quote quote);
public void visit(Risk risk);
public void visit(Coverage cvrg);
public void visit(Premium prem);
}
public class PolicyTest {
static class PremiumCalcVisitor implements PolicyVisitor {
private double totalPremium;

@Override
public void visit(Premium prem) {
totalPremium = getTotalPremium() + prem.amt;
}

@Override
public void visit(Coverage cvrg) {
}

@Override
public void visit(Risk risk) {
}

@Override
public void visit(Quote quote) {
}

public double getTotalPremium() {
return totalPremium;
}
};

public static void main(String[] args) {
Quote quote1 = new Quote(new Risk(new Coverage(new Premium(10))));
Quote quote2 = new Quote(new Risk(new Coverage(new Premium(30))));
PremiumCalcVisitor visitor1 = new PremiumCalcVisitor();
PremiumCalcVisitor visitor2 = new PremiumCalcVisitor();
quote1.accept(visitor1);
quote2.accept(visitor2);
assert visitor1.getTotalPremium() + visitor2.getTotalPremium() == 40;
}
}


(Generally, we introduce one more abstract class to omit empty implementations in Visitors but I have left it for brevity.)

Now, not so apparent problem here is that if the object model changes (which is more frequently the case in real life), we have to add one more method to PolicyVisitor interface, all visitor implementations if change is substantial and have new Policy elements implement visitor methods. This invasive nature of Visitor couples it tightly with the model.

With pattern matching and views in Scala, you can have alternative implementation which is precise as well as non-invasive unlike visitors.

class PolicyElement
case class Quote(risks: Risk) extends PolicyElement
case class Risk(cvrg: Coverage) extends PolicyElement
case class Coverage(limit: Premium) extends PolicyElement
case class Premium(amt: Double) extends PolicyElement
object PremCalcTest {
class PremCalculator(pol: PolicyElement){
def calcPrem : Double = calcPrem(pol)

def calcPrem(policy: PolicyElement): Double = policy match{
case Quote(risk) => calcPrem(risk)
case Risk(coverage) => calcPrem(coverage)
case Coverage(premium)=> calcPrem(premium)
case Premium(amt) => amt
}
}

implicit def calPremV(pol: PolicyElement)= new PremCalculator(pol)

def main(string: Array[String]){
val risk1 = Risk(Coverage(Premium(10)))
val risk2 = Risk(Coverage(Premium(30)))
println(Quote(risk1).calcPrem + Quote(risk2).calcPrem)
}
}

This code requires some explanation. What we have done here is we labeled domain classes with a 'case' keyword in Scala. If you tag a class with 'case' it can be used for pattern matching in a switch-case like structure as done in method 'calcPrem'. You don't need to create members or setter/getters for them, they are created by compiler for you. A case class can be instantiated without 'new' keyword; So Risk(Coverage(Premium(0)) is translated as new Risk(new Coverage(new Premium(0D))) in equivalent Java code.

The code in 'calcPrem' function can be assumed to be something similar to instanceOf checks for each possible case in Java, for example:


if(object instanceOf Premium)
return ((Premium)object).amt;


What we also have done silently is added a method 'calcPrem' to PolicyObject class. This is done through implicitly defined function 'calPremV', this will allow us to call 'calcPrem' method on any PolicyObject without actually modifying the domain model code. This type of lexically scoped class extension is known as a View in Scala and is similar to what is available in Ruby as open classes except without scoping.

In case if the model changes in this case, we just need to modify a single function and we are done. These programming language features of Scala frees us from coupling introduced by inheritance.

So it is easy to see that Scala's language features can be elegant and far more powerful than other languages (specifically Java) without sacrificing compiler checks and type safety.

Thursday, April 16, 2009

Your Language is not SLOW!

Do you really give a thought when you say "Ruby is slow or Python is slow"? Just because Twitter moved their back end messaging to JVM doesn't necessarily make Ruby any slower than it already was. If a language runtime can't handle concurrent load or long running processes it is not a limitation of the language!

This is dejavu for me as Java was considered "a slow language" back in 1.1 days when HotSpot JVM was in its poorer life and ISO/IEC C++ was hot girl in town. Now that you realized your pony implementations can't catch up with real life performance expectation you start blaming languages, are you serious?

Whether it's global interpreter lock or lack of native threads it's the runtime (re-read the runtime) and not the language that is slow. For the love of technology, Stop blaming languages for poor performance of your suboptimal reinventions!

Every language implementor should at least consider the host platform support before coming up with their "Not Invented Here" approach, sadly that's what is happening with most modern languages.