Monday, December 13, 2010

Tackling nulls the functional way

Most programmers have suffered null pointer one way or other - usually a core-dump followed by a segmentation fault on development machine or on a production box with application in smokes. NullPointerException results in a visible embarrassment of not thinking about "that something *could be* null".

Tracking Null Pointer ranges from loading core-dump in gdb and tracing dereferenced pointer to stack traces pointing to exact location in source. However, ease of tracking nulls opens up the doors to ignore them in practice and throwing null-checks just becomes as common as throwing one more div to fix IE's layout problems which is bad.

Problems with Null:
I hate having to ignore nulls as it is not always enough just to add one more null check. The reason why I am writing this blog is because I have several problems with Nulls:
  1. All and Every reference can be a null in languages like Java. This covers everything: method parameters, return values, fields etc. There's no precise way for a programmer to know that some method might return null or accept null parameters. You absolutely have to resort to actual source code or documentation to see if it can possibly return null (and you are going to need good luck with that). All of it adds extra work when you really want to be focusing on fixing the real problem.
  2. The problem with NullPointerException is that they point to causal eventuality and not usually the actual cause. So what you see in stack traces are usually the code paths where damage is not really initiated but done when we are normally interested in case where damage is initiated. 
  3. Null is actually very ambiguous. Is it the uninitialized value or absence of value or is it used to indicate an error? The paradigm of null fits well in database but not in programming model.
  4. Having Nulls in your code has major implications in code quality and complexity. For example, it is not unusual to see code branches with null checks breeding like rabbits when an API "may" return null which in turn results in extremely defensive code. This significantly taxes readability.
  5. Null makes Java's type system dumber when a method is overridden and you want to call it. Writing code like methodDoingStuff((ActualType)null, otherArgs) isn't exactly a pretty sight. This results in subtle errors when arguments are non-generic. 
In many ways Nulls are necessary evil. For those of us who care about readability and safety we can't ignore them yet we shouldn't just let it overtake safety and readability.

I have come to know several techniques to tackle nulls. First, there is Null Object pattern which is not entirely as ridiculous as the name implies but it's not practical in real life software having hundreds of class hierarchies and thousands of classes, and so, I will not talk about it. Then there are languages like Haskell and Scala with library classes that try to treat nulls in, IMO, a better way. Haskell has MayBe and Scala has Options. After using options in Scala for a while in a side project, I found that I was no longer fighting with nulls. I knew exactly when I had to make a decision that a value is really optional and I must do alternate processing.

The central idea behind Haskell's MayBe and Scala's Option is to introduce a definitive agreement on a value's eligibility to be either null or not-null enforced with the help of type system. I will talk about Scala's Option since I have worked with it, but the concept remains same. I will also introduce how to implement and use Options in Java since this is much more of a functional way of thinking about handling nulls and it doesn't (almost) take Scala's neat language features to implement it.

Treating nulls the better way:

Usual course of action when you are not sure what value to return from a method is:
Most of the times it results in the last option because you don't have to fear about breaking everything (well, mostly) and everyone passes the buck like this.

We can do better with Scala's Option classes. We can wrap any reference in to Some or None and handle it with pattern matching or "for comprehension". For example:

Some(x) represents a wrapper with x as actual value; None represents absence of value. Some and None are subclasses of Option. Option has all the interesting methods you can use. When a variable in question is null we can:  fall back to default value, evaluate and return a function's computed value, filter and so on.

Options in Java:
Implementing Options in Java is surprisingly a trivial task. However, it is not as pleasant as Scala's options. Implementation boils down to a wrapper class Option with two children: Some and None. None represents a null but with a type (None[T]) and Some represents non-null type.

To make Option interesting we make Option extend List, so we can iterate on it to mimic poor man's "for comprehension". We will also go as far as tagging both types with Enums so we can do poor man's pattern matching with a switch. You can find example implementation of Options in Java with a test case demonstrating use of Options. Here's the small snippet which covers the essense:

As you can see, Option opens up several doors to fix the null situation. You now have  choice to compute a value, use default value or do arbitrary stuff when you encounter nulls.

How are my null problem solved with Options:
  1. Using Options for optional/null-able references I have at least avoided "all things could be null" problem in my code. When an API is returning a Option, I don't have to wonder if it can return null. Intention is pretty clear.
  2. When I am forced to handle null right at the time of using an API, I have to handle it right there: do alternate processing or use default. No surprises.
  3. Option is a very clear way of saying a variable represents possibly an absent value.
  4. Option doesn't really solve this problem completely. For example, method signatures with wrapper Option type can get really long (e.g. def method1(): Map[String, Option[List[Option[String]]] = {}). However, compared to null checks, I would prefer long method signature any day. Other benefits out-weight this limitation.
  5. Clearly, Option[Integer] always means only Option[Integer] and not Option[Integer], Option[String], Option[Character], Option[Date] and so on. Compiler can infer exact method call from generic types.
As good as the concept behind optional values is, it doesn't and will not always save you from Null. You will still have to deal with existing libraries which return nulls and cause all these problems and more. However, most of the time null is problematic in your own code.

Where to use Options:
Here are the common places where I think using Options makes more sense:
  1. APIs: Make your API as specific and as readable as possible; all optional parameters and return values should be Option.
  2. Use in your domain model: You already have fair understanding on null-able columns, use Option for null-able fields in your table. It is not hard to integrate using Options if you are using an ORM with interceptable DB fetch; you can initialize fields to None if database contains null and so on.

In the interest of keeping this post relevant and on topic, I have completely avoided heavy theoretical baggage (monads et. al.) that's inevitable when theoretical functionalists (functional programmers) talk about Options. I really hope this post generates some interest in this topic. If you disagree or would like to share more on this topic, please leave a comment.

Saturday, May 01, 2010

Future of a Java programmer

As a long time Java only programmer professionally, I have been pondering about how things are changing around me as a Java programmer. Ever since I remember I had no choice but to use subset of C++ dialect (Java lang) with an extremely rich class library and ecosystem (Java platform).

In last few years there has been a drastic shift in number of languages targeting JVM. For example: dynamic (javascript, jruby, jython, groovy), functional + OO (scala) and a lisp dialect (clojure) and so many others. While I am excited about all the options I have today I don't think a single language will dominate on JVM anymore like Java did so far.

In a way this is a good thing, one tool rarely fits all needs (I couldn't curse Java enough for GUI programming). Like C, Java was never designed to be used for developing dynamic web apps, but we still tried and miserably failed with JSP/JSF and plethora of frameworks against PHP/Rails/Python in terms of productivity. One really good thing Java did was to raise a level of abstractions from platform specific details and memory management. These new languages on top of JVM raise the abstraction level even further for its area of strength.

It is not a remote future when we will see concurrent processes being programmed in clojure and presented with jruby/rails with intermediate code written in Java. Each layer of application is going to be implemented in different programming languages while interfaces being transparent for developers working in each layer. This is a big thing, it has never been envisioned before for Java Platform, the lowest coupling we have seen so far is through remoting (web services et. al.) where clients and servers are on different runtimes and languages.

What this means for a Java developer is if you are

  • A web developer: you are going to learn things which are extremely different from struts/jsf/jsps, no more artificial model1/model2 MVCs.
  • A non web-developer : you are going to write code which is far more readable and very specific to your business domain via DSL created in any of the languages mentioned above without worrying about accidental complexity Java and its frameworks imposed on you.

While I can keep classifying developers on Java platform all day long, these two are major ones whose life (and resumes) are going to change soon, they will be expected to know more than one programming languages rather than frameworks now. Contrary to the cool kids on interwebz, I don't think Java the language is going to die anytime soon not because many of the existing libraries are written in it, but because of the number of programmers on earth who know Java, tooling around it and the native JVM support for it. Java is like C in a way, you can do whatever is supported by underlying implementation.

Many of you who are like me are going to see change around them soon, I am thrilled to see how my career is going to transform as polyglot programmer are you?

Sunday, March 07, 2010

Eclipse Refactoring for legacy code

It has been quite sometime since I wrote anything on this blog, twitter probably spoiled me. If you are following me on twitter, you probably noticed announcement of  a small eclipse plugin for automated refactoring for Legacy Code. 

Before few months, I wrote an LTK refactoring mainly in Scala (yes, eclipse plugin in Scala language) to forward static method calls in a Java method to an instance method in same class. I am not really an expert at Scala (or functional programming for that matter) so the code is more of javaish Scala but I am improving and thats the best thing about Scala. This is also an attempt to prove how easy it is to write eclipse plugins in Scala and how seamlessly it integrates with Java source; Scala is not only a better language, it is much more suitable to deal with Eclipse APIs (you can define views etc. on extremely verbose interfaces like JDT AST).

Enough about Scala; the real purpose of the refactorings provided in this plugin would be to ease development with Legacy Java code, most of the code generators (think JavaCC) generate ugly Java code which is not only non unit testable but pain to comprehend and is often not usable with concurrent routines. Using this automated refactoring should make such code better. You can read more about the motivation behind the plugin in wiki.

For the performance heads who are worried about method chains introduced by this refactoring are encouraged to run their own "sane" micro-benchmarks to be sure JVM is really in-lining the method calls created by this refactoring.

The plugin is currently in beta and I am open for any new refactoring proposals. If you have any feedback, you are welcome to comment on this blog post or create a bug. The update site is here.