Editorials

Where’s your schema?

Recently I was reading a blog from Edd Wildter-James as he reflects on the nature of a schemaless database, especially as it is espoused by MongoDb. His central point is that even if your database, or physical data store, does not support enforcement of a schema, that doesn’t mean your database is schemaless. At some layer your application must be aware of the contents of data in order for a user or system to interact with the data elements. The only question is where the schema is enforced.

If your objects are your schema, and your store your data in a very simple, such as JSON, then if you alter the objects, it handles the impact of missing data in the JSON string when you de-serialize JSON into the new object structure. If you add a property to an existing object, such as adding a favorite pet name to a person object, when you de-serialize JSON that was serialized from a previous version of the object, not having the favorite pet property, then the de-serialized instance in the new object will have null or a default value for the new favorite pet property. Since your schema is enforced only in your objects, there is only one place in code where you have to make changes when you modify your schema.

If you utilize a schema based storage engine such as a relational database engine, the data is required to have structure, and to conform to that structure. Now, if you modify your data design you have to modify both the database and the objects when properties are altered.

As Edd brings out, regardless of the implementation chosen, there is always a schema. In my perception, the difference is where the schema is enforced. Using a schema based storage engine the schema is enforced in both the database, and the objects representing the database. Using an un-structured data store, the schema is only enforced in the objects.

From my experience, neither option is better or worse when it comes to storage and stability of your application. When it seems to matter is when you start looking at multiple records. A single JSON string representing an object, or better yet a complex object, with collections of other objects is easy to manage when you are looking at a single JSON string. However, as soon as you begin to write a query that connects or filters multiple JSON strings as records, then the schemaless model begins to break down. It’s probably for that very reason that many systems use both persistence models. Schemaless is used for OLTP. Schema based is used for querying. That’s why we have tools like a service bus to keep the two different persistence sources synchronized.

Are you working with Schemaless systems? Perhaps you can share some of your experiences in our comments.

Cheers,

Ben