Sunday, March 07, 2010

Eclipse Refactoring for legacy code

It has been quite sometime since I wrote anything on this blog, twitter probably spoiled me. If you are following me on twitter, you probably noticed announcement of  a small eclipse plugin for automated refactoring for Legacy Code. 


Before few months, I wrote an LTK refactoring mainly in Scala (yes, eclipse plugin in Scala language) to forward static method calls in a Java method to an instance method in same class. I am not really an expert at Scala (or functional programming for that matter) so the code is more of javaish Scala but I am improving and thats the best thing about Scala. This is also an attempt to prove how easy it is to write eclipse plugins in Scala and how seamlessly it integrates with Java source; Scala is not only a better language, it is much more suitable to deal with Eclipse APIs (you can define views etc. on extremely verbose interfaces like JDT AST).


Enough about Scala; the real purpose of the refactorings provided in this plugin would be to ease development with Legacy Java code, most of the code generators (think JavaCC) generate ugly Java code which is not only non unit testable but pain to comprehend and is often not usable with concurrent routines. Using this automated refactoring should make such code better. You can read more about the motivation behind the plugin in wiki.


For the performance heads who are worried about method chains introduced by this refactoring are encouraged to run their own "sane" micro-benchmarks to be sure JVM is really in-lining the method calls created by this refactoring.


The plugin is currently in beta and I am open for any new refactoring proposals. If you have any feedback, you are welcome to comment on this blog post or create a bug. The update site is here.

Thursday, August 13, 2009

Real Refactoring

A lot of really good material has been written about refactoring. However, just like a lot of other things in software development, refactoring is also victimized by becoming the buzzword implying 'rewrites' and every other thing that is not refactoring. A lot of people really don't understand what they mean when they refer to 'refactoring', this post aims to provide a systematic explanation of refactoring for such audience.


Refactoring is merely a process of restructuring existing code so that its end result and meaning remains intact. However, this restructuring has a great potential to become far better design than the original one. (It is implied that the refactored design is easier to refactor and hence better) In rest of this post I will try to summarize how to approach and refactor a code in real life application development.

1. Get familiar with code

I am assuming here that we are almost always refactoring code written fully or partially by others, refactoring your own code should be much more simple exercise. Primarily you would be dealing with Legacy code or testable code. If you are lucky enough to get the code which is almost covered with unit tests, your job becomes substantially easy and you can start refactoring very fast without a lot of trouble. However if you are working with existing code which just exists and works somehow, you will have to start slowly. The first step is to understand the incomprehensible. There are several ways you can get familiar with code:
  • Read the code
Reading code is one of the fundamental activity we developers do most of the time, if any employer is in the illusion that she is paying developers just to write code then they are paying 1/3 of the salary. Reading code is an art which develops over the time, reading good code can make you better developer (with a precondition that one possesses the ability to judge good and bad code for the language). The more you read bad code the more you familiarize yourself with the repeating mistakes in the code (duplication, absolute hostility towards testability, extremely fearful defensive checks and other hilarities). The goal is to grasp the mindset of the original developer(s) who wrote it. You will know it immediately how deeply thought out the code is in a few passes. Use any modern IDEs to navigate through files, ignore the comments which don't make sense.
  • Write test
Once you have some understanding of inputs/outputs and/or interactions in code's life cycle you can write some tests just so that you can identify if you made some mistake in next step (Step 2). These tests might not be unit-tests (unless the code itself is unit testable). Writing test is incredibly useful way to get familiar with the code base.

2. Isolate the refactoring hot-spot (or "The Inflection points" as Michael Feathers calls it)

There has to be a really good reason why you are refactoring and it certainly can't be a full 'rewrite'. Irony will kill itself when it finds out that the problems in a bad code manifests itself in popular hot-spots. By isolating these hot-spots you should be able to mock out the uninteresting and untestable (or very slow) dependencies. If you have multiple inflection points; deal with them one by one in isolation. Isolation may become tricky because you may have to introduce some inevitable changes before you actually have any unit tests. For example, you might want to eliminate inherently untestable 'statics', extract methods/interfaces which you can stub with NullObjects/mocks. The tests created in earlier step will be valuable to identify any problems you introduced while isolating the code. The goal here is to not concern yourself with trivialities other than the refactoring hot-spots.

3. Write unit tests

Now that you have code which can be isolated, start writing unit tests. If you are working on Java sources I highly recommend using Mockito to mock the dependencies. Don't waste time on accidental complexities of other mock libraries. A reasonable analogy would be: if Agile software development is a good thing than other mocking tools are worse than Waterfall - Mockito is Agile.

Your primary goal is to cover code under refactoring as much as possible, there are no hard numbers on code/branch coverages (but anything beyond 90% should be good enough :)), use common sense and quality metrics like CRAP. The more unit tests you will write, more you will learn about the code base. There may be totally unused code for anti-YAGNI stronghold, or there might be bugs, Your unit test will reveal these naturally. Depending on the code in question, A good test suit generally reaches the size of the code under test.

4. Refactor

Refactoring is particularly painful when it has to be done in an environment with highly volatile code base and for the parts of code which are critical to overall application functionality. Make small changes; make sure all tests pass.

If you are using Eclipse - turn off auto build and add a ANT or Maven builder so that it runs your tests after each compile - you will be running tests a lot.

Use a real source control system, it might be much harder for you to revert the changes otherwise. You might consider using something like git in between upstream and your local repository if your primary SCM is not good enough. In practice, this becomes a lot more important than what IDE you use, you will need the ability to identify and rollback a commit specifically in code base with many hands. Commit after each change, the smaller the commit the lesser chance of breaking a lot of things together by a long shot.

With each commit going in repository without external complains (like "You broke my build!"), you will gain confidence to make bigger changes. You should continue writing tests while refactoring because if at this point you are making changes without violating functional contracts you are evolving a new design and doing something good. If you have broken something and your justification involves the word 'refactoring', you are doing it wrong! you either don't have enough tests or you failed to understand the code entrails.

So this is refactoring in real life. For the primary audience, please consider following these simple steps before tossing the word 'refactoring' around. If you disagree, then invent the new word and let me know :).

Tuesday, May 12, 2009

Scala v/s Java arrays

Here's a Java puzzler for the curious (and a good interview question too!). Given a array merge method below, what will be the output of following program?



public class Generics {

static class A {
}

static class B extends A {
}

public static void main(String[] args) {

A[] copy = merge(new B[] { new B() }, new A[] { new A() }, new B[1]);
System.out.println(copy.length != 1);

}

static Z[] merge(Z[] arr1, Z[] arr2, Z[] store) {
List list = new ArrayList();
list.addAll(Arrays.asList(arr1));
list.addAll(Arrays.asList(arr2));
return list.toArray(store);
}

}


If you didn't guess it already, the program above results in a runtime exception (java.lang.ArrayStoreException).


Exception in thread "main" java.lang.ArrayStoreException
at java.lang.System.arraycopy(Native Method)
at java.util.Arrays.copyOf(Unknown Source)
at java.util.ArrayList.toArray(Unknown Source)
at name.nirav.Generics.merge(Generics.java:23)
at name.nirav.Generics.main(Generics.java:16)


I am not a huge fan of generics in Java because we are left with whatever type safety we get from a half-hearted implementation (and I'm not even criticizing). It is too much to expect from a Java compiler to check that the program above has type safety compromised at call site, mostly because that's how arrays in Java are handled by VM. Arrays are special types of mutable objects with components as anonymous members which are accessed with indices. An array itself isn't a type, it assumes whatever type its components are. This is where the problem starts.

With current generics implementation, generic arrays are treated as covariant by default i.e. an array of component type T is also array of component type S where T is a subclass of S. This introduces type issues such as above where syntactically valid programs are victimized, making Java's "statically typed, type safe language" designation an irony. If arrays were regular objects, compiler will report an error in code without type variance information.

Arrays are regular objects in Scala, each array is an instance of Scala.Array class. The code below is equivalent to Java program above with some syntactic differences, unlike Java code the Scala code below is not syntactically valid. Scala arrays are non-variant, and Scala compiler uses what is called "conservative approximation" to ensure type safety at compile time.


object App extends Application{

class A

class B extends A

def merge[T](arr1 : Array[T], arr2: Array[T], store: Array[T]) : Array[T] = {
val list = new ArrayList[T]
list.addAll(Arrays.asList(arr1:_*)) // :_* is for vararg conversion
list.addAll(Arrays.asList(arr2:_*))
list toArray store
}

merge(Array[B](new B), Array[A](new A), new Array[B](1)) //Error, type mismatch

}


The Scala compiler will report an error on "merge" call, complaining about type mismatch.

Not everyone likes to know about such details until it bites back with million dollar bugs. Why are Java arrays co-variant? Who needs more run time checks?

Tuesday, May 05, 2009

Fork/Join Concurrency with Scala Actors

Have you ever wondered why there are no special frameworks to address concurrency in a Java based application? Considering Java's rich (NIH) ecosystem, I do wonder why I have to write same old state management code while introducing even a small amount of concurrency in Java application.

The reason why I think it is almost impossible to consider concurrency as an aspect in arbitrary application is because of JVM's native support for shared memory concurrency. As a result every developer is forced to think in terms of threaded shared state with guarded blocks. If you have read or written non-trivial piece of code using shared memory concurrency primitives (Mutex, Semaphore etc.) you probably know that the resultant code is hard to visualize and test.

I have been reading about Scala's Actor library and its share-nothing message passing abstraction built over existing concurrency model of JVM. While it doesn't try to solve the fundamental problem, it provides an alternative to address concurrency in your application from a different perspective which is testable and easier to understand.

In actor model, an Actor is a forkable task which runs independently, something like a serializable+immutable object with its private data and behavior. Each actor can send and receive (or react to) messages asynchronously, very similar to object oriented programming with objects responding to messages, but in a concurrent way. This abstraction can seamlessly be applied to a given application of divide and conquer nature and can be made concurrent with minimal efforts as compared to adapting to Java's concurrency primitives.

To explain my point further take a look at classically trivial Producer/Consumer example in Java.


public class Consumer extends Thread {
private final Buffer buffer;
public Consumer(Buffer buffer) {
super("Consumer");
this.buffer = buffer;
}
@Override
public void run() {
while (true){
System.out.println(buffer.next());
}
}
}
public class Producer extends Thread {
private final Buffer buffer;
public Producer(Buffer buffer) {
this.buffer = buffer;
}
@Override
public void run() {
Random random = new Random(System.nanoTime());
while (true) {
String num = Integer.toString(random.nextInt());
System.out.println(getName() + "=putting: " + num);
buffer.add(num + ": " + getName());
try {
sleep(400);
} catch (InterruptedException e) {
}
}
}
}
public class Buffer {
private String string;
private boolean ready = false;
public synchronized String next() {
if (ready != true)
try {
wait();
} catch (InterruptedException e) {
}
ready = false;
return string;
}

public synchronized void add(String string) {
while(ready == true)
try {
wait();
} catch (InterruptedException e) {
}
this.string = string;
notifyAll();
ready = true;
}

}
public class Test {
public static void main(String[] args) throws Throwable {
Buffer buffer = new Buffer();
new Consumer(buffer).start();
Producer producer = new Producer(buffer);
producer.start();
producer.join();
}
}


Take a look at Buffer class, we have used some concurrency primitives there since that's the place where state is being manipulated. We didn't declare variable ready as volatile since primitive assignments are guaranteed to be atomic (except long and double), Even a simple problem like this involves fair bit of understanding of the underlying threading model. There's no doubt this complexity will extrapolate in non-trivial applications e.g. multi-phase concurrent incremental compiler, SEDA based server etc.

Now take a look at the equivalent Producer/Consumer example in Scala.

import actors._
import actors.Actor._
import util.Random

case class SimpleMessage(num: Long)

class Producer(c: Consumer) extends Actor{
val random = new Random(System nanoTime)
def act = {
loop{
val num = produce
println("Sending: " + num )
c ! SimpleMessage(num) // asynchronous message passing
}
}
def produce(): Long = {
Thread sleep 400
return random.nextLong
}
}
class Consumer() extends Actor{
def act = {
loop{
receive{ //blocks here
case SimpleMessage(num) => println("Received: " + num);
}
}
}
}
object PCTest {
def main(args : Array[String]) : Unit = {
var c = new Consumer()
var p = new Producer(c)
c.start;p.start
}
}

Even if we don't compare the amount of code, the Scala code above is much more clear in terms of its functionality. In Scala, Actors can be mapped to a single native thread with 'receive' (similar to Thread#wait()) or we can replace 'receive' with 'react' which is event based invocation but doesn't cost a blocked thread. The code within 'react' is executed by any non-blocked thread from a pre-created thread-pool. Just a single change and your application is scalable!

The Java example code above can be equally trivialized with the util.concurrent BlockingQueue, but the important point to take away is, writing shared memory concurrency code is inherently difficult and error-prone. With JDK1.7 we will get similar fork/join abstraction in Java itself (JSR166y), which will add new alternative to how we design and write concurrent applications.

Scala borrowed Actors from Erlang and similar libraries exist for Java as well. If you are curious about interesting details on Actor based OO concurrency implementation in Java, take a look at some of the thoughts Sebastian is sharing with his ConcurrentObjects library.