A lot of us database professionals enjoy the capabilities built into SQL data engines, to the point that we miss the real beauty of SQL. SQL provides us with complete independence on how the data is stored and manipulated physically. The underlying storage structure and data structures implementing the SQL Language are, for most of us, uninteresting.
What is of interest is that the SQL Engine performs very fast, is reliable, consistent, durable, etc. It provides the necessary locking structures and backup capabilities so we don’t have to tell our program how a data file is organized. Programmers work with abstract concepts such as tables and views. We do not focus on the internal storage mechanisms.
As NoSql continues to evolve, much of the richness of the data abstraction built into the SQL Language is sorely missed. Writing map-reduce algorithms from scratch is a fairly painful process. If you have worked with query parallelism in SQL Server, you may find that you have been using map-reduce code already, in the guise of an SQL Query. As NoSql continues to evolve we are seeing more and more front end efforts to automate map-reduce like algorithms built into a querying language. Is it any surprise that some of those languages follow a SQL syntax?
This isn’t because there was really no need to move away from the traditional SQL Syntax, or relational data structure. The primary goal often found in NoSql, was to increase scalability, while still maintaining performance and reliability on large volumes of data, typically using commodity hardware.
At first, the SQL set based mindset was thought to be closely linked to the physical single data store paradigm. The ACID properties were too lilmiting, and could not scale out. Systems were stateless. Traditional data access to assure you had the most accurate data was just not possible. So we threw it all out and started over.
Now we are coming back around to a storage mechanism more like that found on the AS400 in years gone by, where you had native storage that could be accessed as DB 2 or using old school techniques in RPG. It’s the same data and the same engine behind it all. The difference is how you need to use the data. The storage engine supports them all.
Today we are moving to NoSql engines that can be accessed through vector programs such as R, through SQL interfaces automating map-reduce, through traditional Hadoop, and more. The engine is not the interesting part. And access to it is not restricted to a single implementation. You use the implementation that best suites your needs.
That being said, it makes sense for today’s data professional to become familiar with the different data engines. You may be supporting them too, in the future.
Cheers,
Ben