Monday, April 14, 2014

MySQL Insert performance and random primary key values

It's been almost a year since I've been putting up with MySQL and its bizarre performance characteristics and lack of choices on index data structures. If it wasn't for the volume and velocity of data we're storing in it, I would have proposed to ditch it by now.

We use MySQL purely as a stable high performance key-value store so many might not share the frustrations. I've been bitten by poor performance of InnoDB tables with semi-random values in primary key more than once now and hopefully this post will help someone who is in similar situation.

MySQL with InnoDB struggles with random primary key values, probably because it only supports BTree indices on it and page split cost on a BTree can be very IO intensive, I've only experienced this with numerical columns but it may be the true for other data types as well.

The only solution that has worked satisfactorily for me so far is to add an AUTO INCREMENT primary key and index the other column as unique. This is a complete waste of storage space but has offered the best insert performance so far. AUTO INCREMENT with upsert (ON DUPLICATE UPDATE) is probably the only reliable and performant way of using MySQL as super fast ingestion engine.

No comments: