Wednesday, June 24, 2015

So you want to Mongo?

After a good 6 months of solid Mongo-ing, I think it's finally time to share some insights on the platform.  I can't remember if I shared my initial reasons for delving into Mongo, but the driving reason for moving on was a need to persist objects with a variety of nested properties.  Without exposing company secrets or boring you, it fit neatly into the problem we were trying to solve.  Imagine a standard table row with a varying degree of metadata underneath.  The standard way to do this in the SQL world is to have a secondary table that essentially serves as a key-value store (ironically, how some NoSQL databases store data).  Resulting access to the table would require a JOIN and things would get complicated when it was time to model your data out on the object/software end.  

After a lot of discussion amongst the team, we decided that the variable nature of the data (as well as the resulting oddness in data modeling) called for a change.  We had previously modeled our structure in PostgreSQL and the transition over to Mongo was pretty seamless.  The beauty of using Mongo for our specific platform was how it plugged into Spring.  Spring's Mongo support is absurdly easy and simple, and allowed us to store our metadata as a simple Map.  Mind you, there are most assuredly ways to do this with SQL databases as well, but Spring Mongo is perfect for this right out of the box. 

Anyway, let's get into some of the advantages and pitfalls of using Mongo and some reasons why you might want to consider it, and other reasons you may want to avoid it. 

Pro: Your data is best expressed as "documents," rather than "rows"

Mongo makes no qualms about being a "document store" rather than a traditional database.  What this means is that you store "collections" of JSON documents.  You can query these documents based upon facets of field values quite easily, with decent performance.  When you start to spread your data out across several collections, however, you're going to run into a wall.  The concept of a JOIN simply doesn't exist here, which leads us to...

Con: Your data structure changes frequently but your model can't

If you're going to use Mongo, you're going to need to be flexible on your modeling/software side.  Where SQL databases allow you to stitch new tables together with old tables via JOINs, this sort of thing can only be done in Mongo by doing discrete queries and stitching the data together in memory.  What you'll ultimately want to do is change the way your documents look, which often means subbing the type of data you store in specific fields -- this can be incredibly problematic for a system that doesn't allow for change at the same cadence and the database.  You should also be prepared to "sunset" some data as your structure changes. 

Pro: You don't like being a DBA

Mongo is incredibly easy to setup and administer.  Even doing things like sharding and replicating is a snap.  Additionally, Mongo's language of record is JavaScript, something that most of us are comfortable with.  There are a few domain-specific idiosyncrasies to learn, but beyond that, your confidence will build quickly.  You can also back your data up with ease.  I say this with a minor caveat: beware the "journal," especially if you're using things like database references.  Journal space can get absolutely enormous, and if your hard drive space is at a premium, this can bite you.

Con: You need to aggregate/summarize data

Don't be too alarmed.  Mongo has all of the summary functions you'd expect.  But prepare to have a bunch of JavaScript objects with nested objects expressing how you want to summarize your data.  If you're used to a nice, compact SQL statement followed by a GROUP BY, you're in for a rude awakening.  Once you start messing around with the aggregate(..) function, you'll gain a degree of familiarity.  This may or may not stop you from holding your nose as you write the code though.  

Pro: You aren't afraid of indexing for performance

When you're starting with Mongo on a small scale, you'll be suitably impressed with its performance.  Once you're dealing with tens or hundreds of millions of documents, you will notice a marked degradation in query performance.  The beauty of Mongo is that you can generally throw an index in and improve your performance to a spellbinding degree.  A protip is to use dex, a tool to automatically determine which fields could be indexed to increase your performance. 

Misc

There are a few tools you can use that will make your life incredibly easy.  I'm going to assume that you, like me, are a sane person and use a Mac in your development. 

Robomongo - My favorite Mongo querying tool.  The UI isn't much to look at, but it performs beautifully and is very query-friendly.  

MongoHub -  Another visual Mongo tool.  It brings nice UX to the table for certain operation but generally performs a bit more sluggishly and crashes in weird situations. 

Mongo Prefpane - A nice visual way to control your local MongoDB.  Another awesome aspect of Mongo is that it can be run locally with very little baggage.  You can also import and export data from a production or staging environment with ease.