MongoDB a Nice Fit for Object Oriented Programming
Yesterday we talked about database designs that work well with object oriented development patterns. One of the concerns I strongly emphasized is that object oriented programming and relational databases don’t mix as nicely as other non-SQL counterparts available today.
David responds today with a nice review of his experience with MongoDB. I felt it was a good follow-up to the original question posed by Michael. We’ll come back later and take another look at a couple of relational database designs that work well with object oriented techniques.
David writes:
I’d just like to throw in that among the No-SQL databases, MongoDB is one that is not too dissimilar in capabilities to an SQL database, but really is directly targeted at Michal’s needs, in my opinion. (I have nothing to do with MongoDB except having evaluated it through a personal project.)
Data in MongoDB is stored as BSON, a binary version of JSON (though it has more types than Javascript, allowing ints, longs, shorts, as well as doubles, amongst other additions). Because of this, that means arrays and objects are inherent to the structure of the data you store, and the tree of your object structure can be stored exactly into the database:
{
_id: ObjectID(23423f8048a8908b…),
username: “billy”,
passhash: 43987df80808a9084098098e8,
salt: 43897f087a8d88008e032,
images: [
“/what/is/love/baby_dont_hurt_me.png”,
“/what/is/amore/big_pizza_pie.jpg”
],
posts: {
comments: [ ObjectId(…), ObjectId(…), … ],
articles: [ ObjectId(…), ObjectId(…) ]
},
create_ts: 2011-04-05T11:45:48Z,
active: true
}
Any of these fields can be indexed and queried on, you can CRUD, and loading-unloading data is a simple serialization – each object gets an _id attribute storing an ObjectID type, and pointers to other objects become ObjectID values in your object so you can query and reinstantiate them if needed.
Of course, there’s a trade-off here: you don’t get the power of JOINs (or rather, the object you store is effectively “pre-joined” the way you specify), so if you ever need to do reporting on this data, you need to use MongoDB’s map-reduce functionality, providing the server with javascript functions (using a specialized API) to perform the map and reduce steps, which is a very different way of thinking about these things.
Also, Create, and Delete have roughly the same cost as in SQL (Create is slightly cheaper if you have a very nested structure, since each nesting layer would be a separate set of writes to SQL), but Update is costly because the entire object must be locked during an update (I believe because of the way the object is translated and stored into BSON, since many of the data types can have arbitrary lengths, the current object must be read by the DB, updated, and re-stored).
This leads to Reads, which aren’t as bad as you might think (I’ll explain why), but can’t ever be as good as a well-optimized SQL read (whether ORM tools for SQL produce these optimized reads is a whole other matter). Basically, the ObjectID is not a hash value as you might expect, but a 12-byte integer composed of four parts: The Unix epoch time (seconds since 1/1/1970), three bytes from a hash to uniquely identify the host machine, two bytes for the process id, and a three-byte incremental field. (Producing an upper limit of ~16.8million new objects per second.)
So the ObjectID is continually increasing, while also being unique across Mongo servers (assuming you have less than ~16.8million MongoDB servers 😉 ), which means B-Trees on ObjectIDs make sense (and are being used). Also, because everything is “pre-joined”, it requires just one read request to get your entire object back.
MongoDB also allows indexing on any element of any object it is storing (where these objects are bundled into “collections” that indexes are applied), so most WHERE-style look-ups work, but because these objects, even in the same collection, can have wildly different sizes, fragmentation is a much bigger problem for MongoDB than for an SQL server.
Also, MongoDB does not support transactions. If you’re performing an update to 5 objects and in the middle of that a different call into MongoDB changed one of them, it will just go along its merry way. Of course, in MongoDB, it can be argued that many of the requirements for a transaction are removed because you should be putting data that co-depend on each other to be in a valid state into the same object, since you can have an entire tree of nested objects in MongoDB and not in SQL.
I certainly don’t see MongoDB replacing the general SQL database, but there are a few use cases – ones with a heavy tree structure and/or updated by a single entity (perhaps read by hundreds), or for an application where only ORM tools access the database, since MongoDB is essentially a fusion of the database with ORM and these are the use-cases for ORM tools.
5 years from now and the kids will have NoSQL out of their system and NoSQL will only be used when its needed. But this current burst of activity will flesh them out such that these NoSQL systems are not just toys, anymore, just like Linux wasn’t a toy for servers anymore after 5 years of its existence.
Thanks for the detailed report, David. Feel free to send your comments to btaylor@sswug.org.
Cheers,
Ben
$$SWYNK$$
Featured Article(s)
Troubleshooting SQL Server 2008 CLR Problems
In this article, Alexander Chigrik explains some problems that you can have when you work with SQL Server 2008 CLR objects. He also tells how to resolve these problems.
Featured White Paper(s)
Query Tuning Strategies for Microsoft SQL Server
Written by Quest Software
When you’re looking for a reliable tool to diagnose … (read more)