Editorials

Big Data Storage

The final data warehouse design pattern I want to review in this series is Big Data. Unlike all of the other data warehousing structures we have reviewed so far, big data does not have to conform to a table like structure with relationships. In fact, it often has jagged arrays containing other arrays in the form of objects.

Instead of having a parent table with a on to many relationship with a child table, big data will often have an object, with a property containing a collection of properties. The properties contained in the collection may be simple or complex. In fact, sometimes they may not even be the same thing.

In traditional data warehousing the structure defines the contents. In Big Data, the structure is derived from the contents. Traditional defines the structure up front, and all data must conform. Big Data is able to sustain any data structure, and the structure is defined by the contents of the data.

Traditional warehousing is fast with limited resources because the structure and contents are known when questions are asked. This is a great platform for monitoring or mining for known dimensions or attributes. In contrast, programs utilizing big data crawl through the data looking for patterns or properties that may be of interest. The data may be well known allowing the programs to be more targeted. Or, the problems may be less structured, comparing properties that may not be considered directly linked, looking for behaviors where one property may influence another.

Another difference in big data is that data is usually highly distributed. This is not just for scalability, but to distribute the work when querying the database. Instead of having a single data engine pulling all of the data for a client program, in big data, the program may be sent to each of the data stores who do the work, returning their independent result to a client. In this case, the data is so large, it is more efficient to pass the program around than it is to move the raw data.

Each of these different structures have a purpose and place. As a data professional, it would be a good goal to become well acquainted with each, and understand the value they bring.

Cheers,

Ben