MapReduce – A Key to NoSQL Performance
If you have been following the trend of my editorials these last few weeks, it will all start to come together today.
While NoSQL and MapReduce are not necessarily linked, they are often combined to increase performance.
MapReduce is a pattern allowing large chunks of data to be broken up into smaller chunks (map) and processed in parallel by multiple CPU processes (Reduce). Google has built their business around this technique and openly shares how it is implemented in their environment.
Hadoop is an engine wrapped around the MapReduce pattern that integrates with any number of data persistence devices, including in memory data stores. It is one of the more popular open systems Map/Reduce engines available today.
For this reason, you can see the emphasis of vendors such as Amazon, EMC, Google, HP, Microsoft, and Oracle to create appliances and/or software environments capable of utilizing parallel processing. Chunking through large volumes of data is greatly optimized by MapReduce techniques.
MapReduce works great when chunking through large volumes of data. For Google, crawling the web produces incredible volumes of data to index. MapReduce works nicely for this kind of requirement.
But what about OLTP situations? MapReduce may not provide a large benefit. However, one of the key elements of the MapReduce pattern is the sharding of data. The Map pattern looks for similarities, breaking the large volume of data into smaller and smaller shards (subsets of the original data) that may be parceled out in parallel to multiple machines for processing.
This is the key aspect of NoSQL for enhancing high volume transaction performance. Individual transactions are routed (sharded) to the most efficient data store for each transaction. Therefore, the persistence of each transaction is now performed in parallel across multiple persistence stores. Additionally, the NoSQL environment often provides the means for locating where the data was placed quickly for future retrieval.
Now, instead of having one big, fast SAN in order to get the performance necessary for high volume traffic, the work is broken up across many machines, the aggregate of which is faster than a single large storage device. Just as SAN performance increases with the addition of more spindles (disks), so does the performance of a NoSQL fabric with the addition of additional workers.
The key difference here is that not only is the data distributed, but the work manipulating the data is distributed as well. This is where the key to performance comes from. Work is moved to or near the data rather than moving data to the work.
As David said yesterday, there are different flavors of NoSQL. Don’t throw away your Relational Database Skills thinking they are now obsolete. Instead, now is the time to expand your understanding of other data persistence options, how they scale, how they perform, and how they may fit into your companies needs.
Reader Feedback – How I Met SSWUG
Celia Writes:
I also came across SSWUG a few years ago when searching for a solution to an SQL query problem I was having and have received SSWUG updates ever since.
I have of late found that I am reading SSWUG content more and more as I have recently started lecturing and need to find scenarios to provide to my delegates and also need to keep abreast of the latest of what is happening out there in the real world just in case my delegates (and my peers) ask me those questions (:o).
So thank you SSWUG foe improving my knowledge base daily, it is greatly appreciated.
Do you have an experience you’d like to share, or other comments regarding NoSQL? Send your comments to btaylor@sswug.org.
Cheers,
Ben
$$SWYNK$$
Featured Article(s)
A Mango That’s Ripe for the Picking
Microsoft serves up fresh tech sure to satisfy mobile developers appetites.
Featured White Paper(s)
Structuring the Unstructured: How to Dimensionalize Semi-Structured Business Data
Written by Interactive Edge
The Business Intelligence industry … (read more)
Featured Script
Audit Trigger Generation
Use this script to Generate Audit Triggers for tables in your database… (read more)