Tuesday, May 12, 2009

Scala v/s Java arrays

Here's a Java puzzler for the curious (and a good interview question too!). Given a array merge method below, what will be the output of following program?



public class Generics {

static class A {
}

static class B extends A {
}

public static void main(String[] args) {

A[] copy = merge(new B[] { new B() }, new A[] { new A() }, new B[1]);
System.out.println(copy.length != 1);

}

static Z[] merge(Z[] arr1, Z[] arr2, Z[] store) {
List list = new ArrayList();
list.addAll(Arrays.asList(arr1));
list.addAll(Arrays.asList(arr2));
return list.toArray(store);
}

}


If you didn't guess it already, the program above results in a runtime exception (java.lang.ArrayStoreException).


Exception in thread "main" java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at java.util.Arrays.copyOf(Unknown Source)
at java.util.ArrayList.toArray(Unknown Source)
at name.nirav.Generics.merge(Generics.java:23)
at name.nirav.Generics.main(Generics.java:16)


I am not a huge fan of generics in Java because we are left with whatever type safety we get from a half-hearted implementation (and I'm not even criticizing). It is too much to expect from a Java compiler to check that the program above has type safety compromised at call site, mostly because that's how arrays in Java are handled by VM. Arrays are special types of mutable objects with components as anonymous members which are accessed with indices. An array itself isn't a type, it assumes whatever type its components are. This is where the problem starts.

With current generics implementation, generic arrays are treated as covariant by default i.e. an array of component type T is also array of component type S where T is a subclass of S. This introduces type issues such as above where syntactically valid programs are victimized, making Java's "statically typed, type safe language" designation an irony. If arrays were regular objects, compiler will report an error in code without type variance information.

Arrays are regular objects in Scala, each array is an instance of Scala.Array class. The code below is equivalent to Java program above with some syntactic differences, unlike Java code the Scala code below is not syntactically valid. Scala arrays are non-variant, and Scala compiler uses what is called "conservative approximation" to ensure type safety at compile time.


object App extends Application{

class A

class B extends A

def merge[T](arr1 : Array[T], arr2: Array[T], store: Array[T]) : Array[T] = {
val list = new ArrayList[T]
list.addAll(Arrays.asList(arr1:_*)) // :_* is for vararg conversion
list.addAll(Arrays.asList(arr2:_*))
list toArray store
}

merge(Array[B](new B), Array[A](new A), new Array[B](1)) //Error, type mismatch

}


The Scala compiler will report an error on "merge" call, complaining about type mismatch.

Not everyone likes to know about such details until it bites back with million dollar bugs. Why are Java arrays co-variant? Who needs more run time checks?

10 comments:

Daniel Spiewak said...

Java arrays are covariant because James Gosling (and friends) thought it was a good idea at the time. The problem is that Java 1 *didn't* have generics, so the only way to provide utility methods like System.arraycopy was to make arrays covariant in their value type. It was actually worse back in Java 1: there were *no* runtime checks for array storage problems. Thus, your code would probably have crashed the JVM. ArrayStoreException was added in Java 1.1 IIRC.

As a matter of interest, it is precisely covariant arrays + backward compatibility which makes Java's generics so unbearably horrible. Scala does its best to get around this, but it still has to have some special logic for dealing with arrays and legacy Java APIs which expect covariance.

Hardcoded said...

The problem you're describing has nothing todo with Generics, but with the array hierarchy.

The JLS defines that B[] is a subclass of A[]. This causes a lot of confusion and you're right with your critic about it.

But this is contrary to generic types, which are, per default, invariant:

static List<T> merge(List<T> l1, List<T> l2, List<T> store) {
store.addAll(Arrays.asList(arr1));
store.addAll(Arrays.asList(arr2));
return store;
}

If you try to do the same thing here, you would get a compile time error, because List<B> ist not a List<A>:

List<A> copy = merge(new List<B>(someBs), new List<A> (someAs), new List<B>());


co- and contravariant types are supported by Java Generics and the compiler will ensure type safety in most cases.
Means:
A List<B> is a List<? extends A>. But you can't add any elements to a List<? extends A>, because the type isn't exactly defined.

Java Generics causes many headaches, because of the bad integration.
Nonetheless this is not a generics problem.

thoredge said...

Have to agree with Hardcoded here. This has nothing to do without generics. A simpler example of this problem can be expressed this way:

public class UnGenerics {
static class A {
}
static class B extends A {
}
public static void main(String[] args) {
insert(new A(), new B[1]);
}
private static void insert(A a, A[] bs) {
bs[0] = a;
}
}

Nirav Thaker said...

Hardcoded, thoredge:

This is indeed generics issue, IMO.
The raw arrays in Java are covariant by default. Which is fine, but when I generify them I expect it to be non-variant. Just like List <String> is not List<Object> without view bounds, arrays should also not be co-variant unless of course I add bounds and wildcards. Why should I use generics if I wanted to have the same treatment to arrays as regular objects?

Hardcoded said...
This comment has been removed by the author.
Hardcoded said...

You can't "generify" an array.
If you specify a generic variable/parameter, it's normal that you can put subclasses to it.

new ArrayList<A>().add(new B());
new ArrayList<A[]>().add(new B[1]);

There's no difference in here, no breaking of the generics: B[] is a subclass of A[], because B is a subclass of A.

Nirav Thaker said...

Think I wasn't precise enough, I am not disagreeing about generics in Java for types isn't a problem, it is generics and array as the post title says which attracts confusion.

I don't get any benefit out of generifying the array (by that I mean, a method signature with generic array types in it).

Unknown said...

I agree with previous posters. This has nothing to do with generics, merely a legacy of java arrays, which are with us since 1995. (Don't you think it's kind of old news in mid-2009?)

Also, arrays of course have types, distinct from their component types. (Here is even how these types are represented in bytecode, for example [I is int[], [[D a double[][] etc).

If this was true: "An array itself isn't a type, it assumes whatever type its components are", then you could do as well:
int[] x = 5; //wouldn't be a type error!

Nirav Thaker said...

If it is not truly a generics issue, can you enlighten me why the example code in post compiles without error and why following code does't?

A[] a1 = new A[1];
B[] b1 = new B[1];
b1 = a1; //Type mismatch

Because that's the only issue I have problem with.

thoredge said...

A[] a1 = new A[1];
B[] b1 = new B[1];
b1 = a1; //Type mismatch

In this example you cannot cast the A[] to a B[] because A is not a B. The other way works fine though.

a1 = b1;

..., but then it blows up when inserting a simple A ...

a1[0] = new A();

You can also force the A[] to B[] cast. This will of course blow up.

b1 = (B[]) a1;

Arrays encounter the same problem as generics with the handling input or output. Input must be subtype of narrowest possible type that the collection can hold; while output are the widest possible type. In arrays errors in this is dealt with by throwing runtime exceptions. In generics you have type erasure which means you have to deal with this compile time using <? extends X> and <? super X>.