Nirav's Contemplations: 2009

Thursday, August 13, 2009

Real Refactoring

A lot of really good material has been written about refactoring. However, just like a lot of other things in software development, refactoring is also victimized by becoming the buzzword implying 'rewrites' and every other thing that is not refactoring. A lot of people really don't understand what they mean when they refer to 'refactoring', this post aims to provide a systematic explanation of refactoring for such audience.

Refactoring is merely a process of restructuring existing code so that its end result and meaning remains intact. However, this restructuring has a great potential to become far better design than the original one. (It is implied that the refactored design is easier to refactor and hence better) In rest of this post I will try to summarize how to approach and refactor a code in real life application development.

1. Get familiar with code

I am assuming here that we are almost always refactoring code written fully or partially by others, refactoring your own code should be much more simple exercise. Primarily you would be dealing with Legacy code or testable code. If you are lucky enough to get the code which is almost covered with unit tests, your job becomes substantially easy and you can start refactoring very fast without a lot of trouble. However if you are working with existing code which just exists and works somehow, you will have to start slowly. The first step is to understand the incomprehensible. There are several ways you can get familiar with code:

Read the code

Reading code is one of the fundamental activity we developers do most of the time, if any employer is in the illusion that she is paying developers just to write code then they are paying 1/3 of the salary. Reading code is an art which develops over the time, reading good code can make you better developer (with a precondition that one possesses the ability to judge good and bad code for the language). The more you read bad code the more you familiarize yourself with the repeating mistakes in the code (duplication, absolute hostility towards testability, extremely fearful defensive checks and other hilarities). The goal is to grasp the mindset of the original developer(s) who wrote it. You will know it immediately how deeply thought out the code is in a few passes. Use any modern IDEs to navigate through files, ignore the comments which don't make sense.

Write test

Once you have some understanding of inputs/outputs and/or interactions in code's life cycle you can write some tests just so that you can identify if you made some mistake in next step (Step 2). These tests might not be unit-tests (unless the code itself is unit testable). Writing test is incredibly useful way to get familiar with the code base.

2. Isolate the refactoring hot-spot (or "The Inflection points" as Michael Feathers calls it)

There has to be a really good reason why you are refactoring and it certainly can't be a full 'rewrite'. Irony will kill itself when it finds out that the problems in a bad code manifests itself in popular hot-spots. By isolating these hot-spots you should be able to mock out the uninteresting and untestable (or very slow) dependencies. If you have multiple inflection points; deal with them one by one in isolation. Isolation may become tricky because you may have to introduce some inevitable changes before you actually have any unit tests. For example, you might want to eliminate inherently untestable 'statics', extract methods/interfaces which you can stub with NullObjects/mocks. The tests created in earlier step will be valuable to identify any problems you introduced while isolating the code. The goal here is to not concern yourself with trivialities other than the refactoring hot-spots.

3. Write unit tests

Now that you have code which can be isolated, start writing unit tests. If you are working on Java sources I highly recommend using Mockito to mock the dependencies. Don't waste time on accidental complexities of other mock libraries. A reasonable analogy would be: if Agile software development is a good thing than other mocking tools are worse than Waterfall - Mockito is Agile.

Your primary goal is to cover code under refactoring as much as possible, there are no hard numbers on code/branch coverages (but anything beyond 90% should be good enough :)), use common sense and quality metrics like CRAP. The more unit tests you will write, more you will learn about the code base. There may be totally unused code for anti-YAGNI stronghold, or there might be bugs, Your unit test will reveal these naturally. Depending on the code in question, A good test suit generally reaches the size of the code under test.

4. Refactor

Refactoring is particularly painful when it has to be done in an environment with highly volatile code base and for the parts of code which are critical to overall application functionality. Make small changes; make sure all tests pass.

If you are using Eclipse - turn off auto build and add a ANT or Maven builder so that it runs your tests after each compile - you will be running tests a lot.

Use a real source control system, it might be much harder for you to revert the changes otherwise. You might consider using something like git in between upstream and your local repository if your primary SCM is not good enough. In practice, this becomes a lot more important than what IDE you use, you will need the ability to identify and rollback a commit specifically in code base with many hands. Commit after each change, the smaller the commit the lesser chance of breaking a lot of things together by a long shot.

With each commit going in repository without external complains (like "You broke my build!"), you will gain confidence to make bigger changes. You should continue writing tests while refactoring because if at this point you are making changes without violating functional contracts you are evolving a new design and doing something good. If you have broken something and your justification involves the word 'refactoring', you are doing it wrong! you either don't have enough tests or you failed to understand the code entrails.

So this is refactoring in real life. For the primary audience, please consider following these simple steps before tossing the word 'refactoring' around. If you disagree, then invent the new word and let me know :).

Tuesday, May 12, 2009

Scala v/s Java arrays

Here's a Java puzzler for the curious (and a good interview question too!). Given a array merge method below, what will be the output of following program?



public class Generics {

 static class A {
 }

 static class B extends A {
 }

 public static void main(String[] args) {

  A[] copy = merge(new B[] { new B() }, new A[] { new A() }, new B[1]);
  System.out.println(copy.length != 1);

 }

 static  Z[] merge(Z[] arr1, Z[] arr2, Z[] store) {
  List list = new ArrayList();
  list.addAll(Arrays.asList(arr1));
  list.addAll(Arrays.asList(arr2));
  return list.toArray(store);
 }

}

If you didn't guess it already, the program above results in a runtime exception (java.lang.ArrayStoreException).


Exception in thread "main" java.lang.ArrayStoreException
 at java.lang.System.arraycopy(Native Method)
 at java.util.Arrays.copyOf(Unknown Source)
 at java.util.ArrayList.toArray(Unknown Source)
 at name.nirav.Generics.merge(Generics.java:23)
 at name.nirav.Generics.main(Generics.java:16)

I am not a huge fan of generics in Java because we are left with whatever type safety we get from a half-hearted implementation (and I'm not even criticizing). It is too much to expect from a Java compiler to check that the program above has type safety compromised at call site, mostly because that's how arrays in Java are handled by VM. Arrays are special types of mutable objects with components as anonymous members which are accessed with indices. An array itself isn't a type, it assumes whatever type its components are. This is where the problem starts.

With current generics implementation, generic arrays are treated as covariant by default i.e. an array of component type T is also array of component type S where T is a subclass of S. This introduces type issues such as above where syntactically valid programs are victimized, making Java's "statically typed, type safe language" designation an irony. If arrays were regular objects, compiler will report an error in code without type variance information.

Arrays are regular objects in Scala, each array is an instance of Scala.Array class. The code below is equivalent to Java program above with some syntactic differences, unlike Java code the Scala code below is not syntactically valid. Scala arrays are non-variant, and Scala compiler uses what is called "conservative approximation" to ensure type safety at compile time.


object App extends Application{

 class A

 class B extends A
 
 def merge[T](arr1 : Array[T], arr2: Array[T], store: Array[T]) : Array[T] = {
    val list = new ArrayList[T]
    list.addAll(Arrays.asList(arr1:_*)) // :_* is for vararg conversion
    list.addAll(Arrays.asList(arr2:_*))
    list toArray store
 }

 merge(Array[B](new B), Array[A](new A), new Array[B](1)) //Error, type mismatch 

}

The Scala compiler will report an error on "merge" call, complaining about type mismatch.

Not everyone likes to know about such details until it bites back with million dollar bugs. Why are Java arrays co-variant? Who needs more run time checks?

Tuesday, May 05, 2009

Fork/Join Concurrency with Scala Actors

Have you ever wondered why there are no special frameworks to address concurrency in a Java based application? Considering Java's rich (NIH) ecosystem, I do wonder why I have to write same old state management code while introducing even a small amount of concurrency in Java application.

The reason why I think it is almost impossible to consider concurrency as an aspect in arbitrary application is because of JVM's native support for shared memory concurrency. As a result every developer is forced to think in terms of threaded shared state with guarded blocks. If you have read or written non-trivial piece of code using shared memory concurrency primitives (Mutex, Semaphore etc.) you probably know that the resultant code is hard to visualize and test.

I have been reading about Scala's Actor library and its share-nothing message passing abstraction built over existing concurrency model of JVM. While it doesn't try to solve the fundamental problem, it provides an alternative to address concurrency in your application from a different perspective which is testable and easier to understand.

In actor model, an Actor is a forkable task which runs independently, something like a serializable+immutable object with its private data and behavior. Each actor can send and receive (or react to) messages asynchronously, very similar to object oriented programming with objects responding to messages, but in a concurrent way. This abstraction can seamlessly be applied to a given application of divide and conquer nature and can be made concurrent with minimal efforts as compared to adapting to Java's concurrency primitives.

To explain my point further take a look at classically trivial Producer/Consumer example in Java.


public class Consumer extends Thread {
private final Buffer buffer;
public Consumer(Buffer buffer) {
 super("Consumer");
 this.buffer = buffer;
}
@Override
public void run() {
 while (true){
  System.out.println(buffer.next());
 }
}
}
public class Producer extends Thread {
private final Buffer buffer;
public Producer(Buffer buffer) {
 this.buffer = buffer;
}
@Override
public void run() {
 Random random = new Random(System.nanoTime());
 while (true) {
  String num = Integer.toString(random.nextInt());
  System.out.println(getName() + "=putting: " + num);
  buffer.add(num + ": " + getName());
  try {
   sleep(400);
  } catch (InterruptedException e) {
  }
 }
}
}
public class Buffer {
private String string;
private boolean ready = false;
public synchronized String next() {
 if (ready != true)
  try {
   wait();
  } catch (InterruptedException e) {
  }
 ready = false;
 return string;
}

public synchronized void add(String string) {
 while(ready == true)
  try {
   wait();
  } catch (InterruptedException e) {
  }
 this.string = string;
 notifyAll();
 ready = true;
}

}
public class Test {
public static void main(String[] args) throws Throwable {
 Buffer buffer = new Buffer();
 new Consumer(buffer).start();
 Producer producer = new Producer(buffer);
 producer.start();
 producer.join();
}
}

Take a look at Buffer class, we have used some concurrency primitives there since that's the place where state is being manipulated. We didn't declare variable ready as volatile since primitive assignments are guaranteed to be atomic (except long and double), Even a simple problem like this involves fair bit of understanding of the underlying threading model. There's no doubt this complexity will extrapolate in non-trivial applications e.g. multi-phase concurrent incremental compiler, SEDA based server etc.

Now take a look at the equivalent Producer/Consumer example in Scala.


import actors._
import actors.Actor._
import util.Random

case class SimpleMessage(num: Long)

class Producer(c: Consumer) extends Actor{
val random = new Random(System nanoTime)
def act = {
 loop{
  val num = produce
  println("Sending: " + num )
  c ! SimpleMessage(num)  // asynchronous message passing
 }
}
def produce(): Long = {
 Thread sleep 400
 return random.nextLong
}
}
class Consumer() extends Actor{
def act = {
  loop{
   receive{ //blocks here
     case SimpleMessage(num) => println("Received: " + num);
   }
  }
}
}
object PCTest {
def main(args : Array[String]) : Unit = {
var c = new Consumer()
var p = new Producer(c)
c.start;p.start
}
}

Even if we don't compare the amount of code, the Scala code above is much more clear in terms of its functionality. In Scala, Actors can be mapped to a single native thread with 'receive' (similar to Thread#wait()) or we can replace 'receive' with 'react' which is event based invocation but doesn't cost a blocked thread. The code within 'react' is executed by any non-blocked thread from a pre-created thread-pool. Just a single change and your application is scalable!

The Java example code above can be equally trivialized with the util.concurrent BlockingQueue, but the important point to take away is, writing shared memory concurrency code is inherently difficult and error-prone. With JDK1.7 we will get similar fork/join abstraction in Java itself (JSR166y), which will add new alternative to how we design and write concurrent applications.

Scala borrowed Actors from Erlang and similar libraries exist for Java as well. If you are curious about interesting details on Actor based OO concurrency implementation in Java, take a look at some of the thoughts Sebastian is sharing with his ConcurrentObjects library.

Tuesday, April 21, 2009

How Scala's pattern matching can replace Visitors

The primary motivation of Visitor design pattern is to separate model traversal from operational logic. A visitable model takes the responsibility of model navigation while the behavior is defined by arbitrary visitors. In this post I will try to explain problems associated with Visitors in general and how Scala's pattern matching feature can eliminate such problems cleanly.

Consider a simplified Insurance Policy model as follows (In Java):



public class PolicyElement {
 static class Quote extends PolicyElement {
  protected final Risk risk;
  public Quote(Risk risk) {
   this.risk = risk;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(this);
   visitor.visit(this.risk);
  }
 }

 static class Risk extends PolicyElement {
  protected Coverage coverage;
  public Risk(Coverage coverage) {
   this.coverage = coverage;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(coverage);
  }
 }

 static class Coverage extends PolicyElement {
  protected final Premium prem;
  public Coverage(Premium prem) {
   this.prem = prem;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(prem);
  }
 }

 static class Premium extends PolicyElement {
  protected final double amt;
  public Premium(double amt) {
   this.amt = amt;
  }
  public void accept(PolicyVisitor visitor){
   visitor.visit(this);
  }
 }
}

public interface PolicyVisitor {
 public void visit(Quote quote);
 public void visit(Risk risk);
 public void visit(Coverage cvrg);
 public void visit(Premium prem);
}
public class PolicyTest {
 static class PremiumCalcVisitor implements PolicyVisitor {
  private double totalPremium;

  @Override
  public void visit(Premium prem) {
   totalPremium = getTotalPremium() + prem.amt;
  }

  @Override
  public void visit(Coverage cvrg) {
  }

  @Override
  public void visit(Risk risk) {
  }

  @Override
  public void visit(Quote quote) {
  }

  public double getTotalPremium() {
   return totalPremium;
  }
 };

 public static void main(String[] args) {
  Quote quote1 = new Quote(new Risk(new Coverage(new Premium(10))));
  Quote quote2 = new Quote(new Risk(new Coverage(new Premium(30))));
  PremiumCalcVisitor visitor1 = new PremiumCalcVisitor();
  PremiumCalcVisitor visitor2 = new PremiumCalcVisitor();
  quote1.accept(visitor1);
  quote2.accept(visitor2);
  assert visitor1.getTotalPremium() + visitor2.getTotalPremium() == 40;
 }
}

(Generally, we introduce one more abstract class to omit empty implementations in Visitors but I have left it for brevity.)

Now, not so apparent problem here is that if the object model changes (which is more frequently the case in real life), we have to add one more method to PolicyVisitor interface, all visitor implementations if change is substantial and have new Policy elements implement visitor methods. This invasive nature of Visitor couples it tightly with the model.

With pattern matching and views in Scala, you can have alternative implementation which is precise as well as non-invasive unlike visitors.


class PolicyElement 
case class Quote(risks: Risk) extends PolicyElement
case class Risk(cvrg: Coverage) extends PolicyElement 
case class Coverage(limit: Premium) extends PolicyElement 
case class Premium(amt: Double) extends PolicyElement
object PremCalcTest {
  class PremCalculator(pol: PolicyElement){
 def calcPrem : Double = calcPrem(pol)
 
    def calcPrem(policy: PolicyElement): Double = policy match{
   case Quote(risk)   => calcPrem(risk)
   case Risk(coverage)  => calcPrem(coverage)
   case Coverage(premium)=> calcPrem(premium)
   case Premium(amt)  => amt
 }
  }
 
  implicit def calPremV(pol: PolicyElement)= new PremCalculator(pol)
  
  def main(string: Array[String]){
   val risk1 = Risk(Coverage(Premium(10)))
   val risk2 = Risk(Coverage(Premium(30)))
   println(Quote(risk1).calcPrem + Quote(risk2).calcPrem)
  }
}

This code requires some explanation. What we have done here is we labeled domain classes with a 'case' keyword in Scala. If you tag a class with 'case' it can be used for pattern matching in a switch-case like structure as done in method 'calcPrem'. You don't need to create members or setter/getters for them, they are created by compiler for you. A case class can be instantiated without 'new' keyword; So Risk(Coverage(Premium(0)) is translated as new Risk(new Coverage(new Premium(0D))) in equivalent Java code.

The code in 'calcPrem' function can be assumed to be something similar to instanceOf checks for each possible case in Java, for example:


if(object instanceOf Premium)
return ((Premium)object).amt;

What we also have done silently is added a method 'calcPrem' to PolicyObject class. This is done through implicitly defined function 'calPremV', this will allow us to call 'calcPrem' method on any PolicyObject without actually modifying the domain model code. This type of lexically scoped class extension is known as a View in Scala and is similar to what is available in Ruby as open classes except without scoping.

In case if the model changes in this case, we just need to modify a single function and we are done. These programming language features of Scala frees us from coupling introduced by inheritance.

So it is easy to see that Scala's language features can be elegant and far more powerful than other languages (specifically Java) without sacrificing compiler checks and type safety.

Thursday, April 16, 2009

Your Language is not SLOW!

Do you really give a thought when you say "Ruby is slow or Python is slow"? Just because Twitter moved their back end messaging to JVM doesn't necessarily make Ruby any slower than it already was. If a language runtime can't handle concurrent load or long running processes it is not a limitation of the language!

This is dejavu for me as Java was considered "a slow language" back in 1.1 days when HotSpot JVM was in its poorer life and ISO/IEC C++ was hot girl in town. Now that you realized your pony implementations can't catch up with real life performance expectation you start blaming languages, are you serious?

Whether it's global interpreter lock or lack of native threads it's the runtime (re-read the runtime) and not the language that is slow. For the love of technology, Stop blaming languages for poor performance of your suboptimal reinventions!

Every language implementor should at least consider the host platform support before coming up with their "Not Invented Here" approach, sadly that's what is happening with most modern languages.

Thursday, April 09, 2009

Scala: First impression

If you are curious enough about programming languages then you probably have heard about Scala - the 'statically typed' functional and object oriented language. Scala is the new sun rising to balance 'the burn factor' between functional and object oriented schools of thoughts.

Unlike what this paper suggests[pdf], The reason why I think Scala exists is because functional v/s object oriented groups are moving in opposite directions, which is not only inefficient but they can't leverage achievements of one another. If you read about functional v/s object oriented programming comparison, every argument boils down to productivity, tooling and maintainability of code. While functional languages (Lisp, Haskell, Python etc.) offer excellent productivity compared to OO languages (Java, C++ etc.), Object oriented languages offer excellent tooling and are relatively maintainable. The reason why I think OO languages have been so popular is due to its easier to understand concept which is easy to map in real life, so for most people who have never took computer science course OO is still easier to grasp compared to Functional programming methods like list comprehension, closures or the higher order functions which are rooted from the formal systems of mathematics.

Scala tries to satisfy both of the groups by providing grammar and a type system which seamlessly integrates with mainstream platforms (Java, .NET) and offers powerful functional abstractions only available in dynamic languages. What this means is, Java developers can write their code in same fashion they write in Java using existing libraries and frameworks but with an added advantage of functional programming techniques wherever they feel it might be productive. Functional programming language enthusiast get access to rich class libraries and powerful tooling (eventually).

If you take a look at the Scala language grammar[pdf] you will notice that what you can create with Scala is limited by your creativity. Based on what I have learned so far, I find Scala much more refreshing than Java, Scala feels a lot more like a programming language of the 21st century! Scala compiler itself is pluggable so you can do heck of a stuff you can only dream with javac, ecj. What is missing is tooling, the existing tooling is scrap but that will improve hopefully with an active community.

Bill Venners of Artima has presented Scala wonderfully, take a look at the presentation[requires flash] on 'The feel of Scala'.

Tuesday, March 24, 2009

OPath ported for Java object model

I have been thinking about alternate uses of OPath I created for EVars plug-in. And it occurred to me that I can very well use it for my unit testing (which involves pretty complex insurance object model) and for things like JSPs where I really hate to write ten lines of Java code just to display some value.

So I wrote a port of OPath for Java object model (less than 200 lines of real code). Following example should explain how it can add some value to your development efforts:

Consider an example of code for simple Accessible Swing Table :

If you are writing a UI test to see specific table's column heading, you just write following

Collection<Object> findAll = OPathReflectiveInterpreter.findAll(frame, "//dataModel//@val\\$column.*");
assertEquals("First Name",((Object[])findAll.toArray()[0])[0]);

This is very trivial test, of course. But it is sufficient to express the power of OPath micro-scripting.

If you like to try it out for unit-testing or templating check-out the download here (You will also need opath.jar).

Disclaimer: This is experimental work at best, it might be slow and it might have bugs at this time.

Sunday, March 22, 2009

EVars Update

Based on the awesome feedback, I have been doing some offline development on EVars plug-in lately. I have added some basic documentation on wiki. Installation is now easy, let P2 do the work to install/update the plug-in (it's a struggle to host update-site with google code).

One of the most exciting feature (Which I can't stop talking about) is value based filtering. Now you can have all the powers of predicate expression, how about searching a map like map//key['@user.*']/.. to select entries matching regular expression user.* ? This update also includes support for relational operators so you can search exactly you want like this //value[count > n] .

I have uploaded OPath grammar for the language enthusiasts (left factored, left recursion free). One feedback was 'experiencing some unresponsive behavior for huge graphs' I have addressed that to some extent by integrating progress monitoring and asynchronous interpretation job.

Thanks for the feedback! Also, for those at EclipseCon have fun with your sessions!

Tuesday, March 10, 2009

Improve your debugging speed with EVars Eclipse plugin

Update: Added alternate link to screen-cast. Workaround to start plug-in

I have been debugging a lot lately which included open source libraries, closed source web-service engine and parts of closed source application server (WebLogic). During this painful debug-fest I felt a strong need for several features missing in Eclipse (NetBeans as well, for that matter) . Debugging closed applications is a pain beyond imagination, even with IDE integrated decompilers, it takes incredible amount of patience and time to debug.

To improve productivity (or prevent burnout), I jotted down a plugin to export/import live variables, filter variables view with a xpath like expressions etc. The plugin, called 'evars', features a small expression interpreter (similar to xpath) which can do a wonderful job of filtering variables on current stackframe. It also allows you to export variables to a file and reload them at a later point.

For the first time, I have attempted to create a screencast to explain its usefulness. You may watch it over here [10mb non-streaming, 1275x860], alternate link (open with your browser).

I have also created a beta release to see if it finds any interest which you can download it from here [JAR ~800kb, Jdk6] (Click here for Java 5 version) and drop it in dropins folder [Eclipse 3.4+]

If you find it interesting, please leave feedback here.

Monday, March 09, 2009

Two types of Architects

I am amused by these creatures called 'Architects' in software development. In recent phenomena, They are available under numerous headings such as 'Software architect', 'Solution architect', 'Front end architect', 'Enterprise architect' and so on. To me, in what they really do, there are only two broad types of architects. My brain timed out while guessing fancy names for them so their description is in order with uninteresting machine generated type names, it is up to you to decide what to call them.

The Type1 Architect:
A Type1 architect lives near 'The well of Eternal dreams' in his tranquility. He speaks high of latest technologies, buzzwords from internet meme. He actively attends all expensive technology conferences, management meetings, and is excellent at presenting management with summaries of what he may possibly be capable of.
The ideal Type1 architect rarely talks to developer and/or QA teams, he doesn't really waste his time for trivial details and is religious about development-by-exception (don't stop coding/testing till you hit the roadblock); he would rather express his greatness with another Type1 architect who can really appreciate him. He is a huge fan of his super-mega automated, ultra-modern cloud-based, universally-ultimate end-of-the-world web 9000.0 ready architecture, which he thoughtfully insists in applying wherever it's not needed. His thinking is beyond languages, frameworks, platforms and business needs, no mortal can grasp what he thinks. He usually masters the skill of overkill and is full of without-the-box ideas. I can really go on for next few weeks talking about this type of architects but I need to talk about the other type so I will stop here.

The Type2 Architect:
This type of dude is a very simple earthliving guy, he lives by the belief that success of software development efforts is dependent only on the people who are involved in it rather than tools and technologies. He holds the shared vision of what the end product is likely to be and thrives at driving the team efforts in that direction without managing them. He doesn't fear coding (more like the existing code fears for its life when he starts coding), testing or training whenever he needs to do it. He believes in pleasing management with frequent stable releases than presentation of the obvious. He communicates with everyone, right from tester to business analysts and managers frequently, this guy is often found roaming around discussing with different team members.

So if you are an architect, what type are you?

Friday, January 30, 2009

Formal grammars, parsing etc. for dummies

I thought it may be useful to post summaries of what I have been reading recently. In this post, I will try to explain some basic grammar and parsing stuff in as simple way as I can (omitting all the academic complacency) .

A grammar defines a language with a set of rules of type x -> y, where x is left hand side of the rule and y is right hand side of the rule. x and y can be either expandable or literals depending on grammar type (math zealot? read terminals and non-terminals).

Chomsky hierarchy defines four levels of grammar (although these are not the only grammar types in literature but Chomsky is cool dude in linguistic literature, hence the list):

0. Unrestricted grammar:

Languages based on such grammar can have any rule for derivation (which usually means any text can belong to a possible unrestricted grammar like 'yo dawg', where 'yo dawg -> u'). However, it is possible to define a language with unrestricted grammar which is also Turing complete. A language based on unrestricted grammar is extremely difficult to parse, but not completely impossible. Such language doesn't necessarily make sense, e.g. It is possible to create a math expression such as this 1++1=1--*1=1 in an unrestricted language, as well as an unrestricted English statement such as this: "all your bases are belong to us".

1. Context sensitive grammar:

A language defined with CSG may represent all the natural languages. Such grammar is defined with a set of rules where both sides may have expandable symbols (e.g. "a" B "c" -> "a" "x" "c"). It is very difficult to build a parser which can parse CSG.

2. Context free grammar:

All programming language grammars are context free grammars. It can have only one expandable symbol on left hand side of a grammar rule and is very easy to parse. E.g. X -> "if" (y) z | "if" (y) z "else" (X), where y and z can be represented as other statements of the same grammar. There are several well known parsing techniques to parse CFGs such as:

- LL(1)-LL(k) with a recursive descent parser (top down parsers with look-aheads 1 to k) and,
- Bottom up parsers such as LR(1) and LALR which can parse more CFGs than LL parsers (generally shift reduce parsers).

LL parser starts with Left to right and always expands Left expandable symbol first (think of queue) where as LR starts with Left to right but expands Right symbol first (think of stack).

3. Regular grammar:

A regular expression is a language defined by a regular grammar. It is represented as X -> "t" | "t" Y, where Y represents expandable symbol and always follows literal (if it exists). E.g. ab*, where * may be an expandable symbol defined to mean one or more occurrence of the literal it followed as a grammar rule.

Each grammar type above is a subset of previous grammar type, meaning every regular grammar is context free grammar, every context free grammar is also context sensitive grammar and so on.

Wednesday, January 21, 2009

Mutual recursion fun with Eclipse Debug model

I rarely come across real-world data structures which are inherently mutually recursive (a.k.a. Daisy chain recursion in CS literature). Recently, while writing a plug-in related to eclipse.debug, I came across this interesting data structure.

Source Eclipse.org

Observe the IVariable ←→ IValue relationship (although most relationships in this object-model is mutually recursive), given an implementation of this model (say JDI) how do you clone an object-tree representing variables on a stack-frame?

The answer is simple: assuming implementation classes Variable and Value just write a simple function to clone variables one by one:

public class Variable {
…
private Value value;
…
public static Variable create(IVariable var) {
  Variable v = new Variable() ;
  try {
      v.setName(var.getName());
      v.setType(var.getReferenceTypeName());
      v.setValue(Value.create(var.getValue()));
      return v;
  } catch (DebugException e) {
      ..
  }
}
}

Similarly for Value class:

public class Value {
…
private  Variable[] variables;
…
public static Value create(IValue var) {
    Value v = new Value() ;
    try {
        v.setType(var.getReferenceTypeName());
        v.setValue(...);
        IVariable[] variables = value.getVariables();
        if (variables != null) {
            for (IVariable variable : variables)
                v.add(Variable.create(variable));
        }
        return v;
    } catch (DebugException e) {
        ..
    }
}
}

Statement 'Variable.create(yourJDIVariable);' clones entire JDI object. This is one of those intuitive recursion examples which you die to find out IRL. Interesting, isn't it?

This is how your regular stack-frame looks (Variables view):

And this is how it looks when you represent it using an example debug model :

Nirav's Contemplations