I am supposed to deal with something similar to comparing two gigantic XML documents in wild ways.
I can think of several upfront techniques to achieve it which might involve performance and maintainability trade offs. As you might know, writing code for parsing XML by hand was the the activity of ancient times (hey there are you still writing code for parsing XML?), today we've plethora of tools to parse, bind and persist XML with very less pain. I came across several XML binding libraries like JAXB2.0, XMLBeans, JiBX etc (and given a change why not EMF?). JiBX seams to be interesting but since I'm bounded by not using open source at will, I tried JAXB2.0. The XML schema provided to me was a huge XSD document, the JAXB binding compiler spitted 550 Java classes out of that.
A test driven simple recursive depth-first reflective (opps, too many adjectives) traversal algorithm on the generated object tree sufficed the requirements to identify XML delta information. This was very obvious and pretty fast solution (fast to develop), the downside is, it would require maintenance of 550 generated classes, though they can be regenerated and synchronized with the help of XJC ant task but still the memory foot print and object creation time can be circumscribing for production code.
The other approach I tried was calculating XML diffs using XML processing. I found a nice little utility library XMLUnit among others, which does almost the same what I want. XMLUnit is a tool primarily for unit testing XML-intensive applications, It is very small with clean API and well documented (if you want to read i.e.). There are several utility classes which shields you from looking/writing ugly XML processing code which I used to get the XML diffs. Although I need to poke around on XMLwith XPath still because of the complex requirements.
I would have tried my favourite XStream as well but FAQs suggests me not to, anyway, What would be your strategy to deal with something like this ?