Tuesday, September 06, 2011

Thoughts on Event sourcing

I read about event sourcing a while back and I couldn't stop thinking about it so it had to explode here as a blog post.

A typical data driven application involves CRUD operations on domain entities which are important to business. Such application typically capture data from user or other system in a centralized database for reporting or for further distribution. Architecture of such application is generally simple and there's a vast ecosystem of platforms, frameworks, tools and libraries to support it.

If we wanted to represent traditional design of such application in terms of a state machine, then we can say that such application capture, distribute and allow reporting of domain in a certain state. The primary objective of such application is to facilitate data manipulation (CRUD) which changes state of the domain. For most applications it is generally a sound design ignoring familiar caveats.

Thinking in terms of state machines, there's an interesting alternative: we can store all the state transitions that led initial domain model to its current state. This second perspective to application design has many interesting repercussions. In this approach, current state of domain is no longer as important as the earlier approach because it can be recreated just by repeating all the transitions. This second approach is named "Event sourcing" where event is just a little familiar name for state transition.

Not every application care about the easy recreatibility aspect of domain, most business applications care only about current state of domain. However many applications, especially the ones with mandated audit trails or domain with significant historical data, can benefit from event sourcing. A common example of such application is a version control system, version control systems capture state transitions (diffs) of domain entities (source files) so you can switch to any state (version) and rebuild it to desired state by successive applications of  state transitions (diffs).

As far as business domains goes, Insurance domain by far seems to be a great area of application for event sourcing given that audit trail is a legal compliance requirement and insurance domain models tend to be really complex. Think of an insurance policy as a state and all the changes to it (endorsements) as transitions. By just tracking the transitions, one can rebuild a policy to its current state and reason about it for underwriting analysis and audit. Compare it, instead, with our initial approach of capturing multiple states (identical records in relational database) with complex logic to diff them and comparing them. This approach has profound positive implication on usability of application as well as testability, with this approach it is easy to visualize and rebuild data of interest to any point in time.

One more interesting application of event sourcing, I think, is in data mining. If data is stored as events it is fairly easy to sample, plot and build historical and predictive models. My limited experience in mining data has always involved custom (expensive) efforts to store historical data, such custom efforts usually involve complex development efforts just to extract marginally meaningful information.

It shouldn't be surprising that event sourcing can have significant influence on application architecture which may not be an easy sell especially in a larger setting. There are many related concepts to event sourcing, specifically CQRS which lead to wild architectures (which I'm not quite fond of yet).

I'm learning this is not new or revolutionary and has been done in past but never caught up for whatever reasons, nonetheless I find it interesting. As far as my technical curiosities go I'm very much inclined to try it out with a pet project to see how far these benefits are viable.