- SSWUG.ORG

Back in the late ‘80s and early ‘90s, when data warehousing was a new and growing concept, things were wild and crazy. Even then, using traditional tools such as Cognos, it was important to have a good understanding and structuring of your data in order to have reliable, performant, data mining.

Many data warehousing projects failed, because they could not define or acquire the data necessary for mining. Many businesses simply said, "put everything in there, and then we’ll decide what we want". Usually one of two things happened. 1) They got all of the data in there, but never had time to organize it for mining. 2) The project collapsed on itself because there was too much data available. The industry came up with a term for this, “Data Tenement”. The system had a lot of compartmentalized data, but the configuration did not serve well as a warehouse.

This was bad because traditional mining tools required some sort of structure in order to find correlations, map trends, and predict probabilities. Even with machine learning, the principle is still similar. Although Machine Learning doesn’t require tables or cubes, it does have to understand the data in limited forms.

What’s really cool is that with Hadoop techniques, you have data prior to schema. The structure is defined as you consume the data, not before you acquire it. Due to this different approach, what was formerly called a data tenement, is now considered a gold mine. While it may not be as efficient, there is much less human intervention required to find or convert large masses of data into structured data, which may be consumed by many different tools. The results may be addressed with vector tools such as R, it can be consumed by traditional data warehousing engines, you may even send it to a machine learning tool.

Perhaps now is the day to bring back the data tenement? What do you think? Comments are available for your thoughts.

Cheers,

Ben

Recent Posts

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Debugging Multi-Cloud Performance

Mixing Flavors of SQL Server

July Spotlight – Db2 LUW: Types of I/O

Part II: Overview of B- Tree and B+ Tree

Is it undermining or rude to email the boss to ask him to get his act together?

Getting Started With Deep Learning in Your Browser Using TensorFlow.js

SQL Server Collation Overview and Examples

MySQL Escaping on the Client-Side With Go

VS Code Gets New Python Language Server, Named After Monty Python Character

How Hello World! changed – top level statements and functions (C# 9)

SSWUG.ORG