Editorials

Database Appliances

Database Appliances
Yesterday as I started the discussion around Appliances. I found it difficult to characterize the concept of Appliance and the different vendors providing these resources. Generally you could say appliances form the foundation for a private cloud offering while reducing the cost of designing and maintaining the configuration due to some form of standardization.

I don’t have a complete list of the players in this market at this time, and provide information on a few of which I am aware. I welcome input from our readers or vendors who participate in this market with other options to consider.

I did receive an Email from EMC today sharing the fact that with their acquisition of Greenplum, they are much more than a simple storage device.

Greenplum on EMC hardware is a powerful, scalable, parallel cloud platform for ETL, Storage and Analytics. The combination of EMC storage and processors results in a very flexible and scalable solution, pushing the work close to the data storage. It utilizes a distributed data storage technology along with Hadoop and MapReduce.

One thing I find really cool is the ability for EMC to use different forms of data storage from the fastest disk technology of SSD to slower disconnected Network Attached Storage (NAS). That wasn’t an error; of course they support SAN. But this product can take advantage of archival data stored on NAS. How cool is that!

EMC Greenplum is a good example of a Big Data appliance. It handles unstructured data storage and mining, which is very different from traditional BI platforms.

On the other end of the spectrum would be the Microsoft Parallel Data Warehouse. In this case, the data storage is built on distributed relational databases. In this environment, data is stored in relational tables or data analysis schemas. Some data is duplicated on multiple instances allowing local joins. Other data has less distribution, and may be sharded across multiple databases.

What is the difference between the two platforms. Greenplum is a good fit for mining data that is unknown and un-structured. Data comes from multiple sources and is simply stored.

For the Parallel Data Warehouse, a lot of work goes into understanding, structuring, organizing and cleansing the data before it gets into the warehouse. ETL sanatizes and organizes data. Most of the BI queries are well known and structured.

Typically the different platforms work together. The un-structure Big Data storage is used to ask what if questions of large volumes of data where there are no expectations or correlations that have been validated. Results from these queries are often rough guesses of what may be reality.

Structured BI solutions, on the other hand, have a lot of design for gathering, storing and retrieving data. There are expected outputs for given inputs. This data is used to make definitive decisions on real world formulas.

So, let me open up the discussion to you. Are you considering your own private cloud? What kind of analysis do you think you will be doing; Big Data or Structure BI? Are you finding Cloud services are meeting your business requirements and/or budget? What other platforms/vendors should we be considering when looking at Appliances?

Drop me a note with your input at btaylor@sswug.org.

Reader Comments
Karel writes:
Are you taking a broad definition of “appliance”? I think of Netezza and Vertica as appliances. Is ParaAccel an appliance in your mind. Teradata as an appliance?


Maybe some definition and categorization would be helpful.


I think this is an important topic, especially for organization seeking new life for their BI strategy, if they have one—i.e., understanding the entire value proposition…

Cheers,

Ben

$$SWYNK$$

Featured White Paper(s)
Structuring the Unstructured: How to Dimensionalize Semi-Structured Business Data
Written by Interactive Edge

The Business Intelligence industry … (read more)

Featured Script
ParseDate Function
Generic SQL function that can parse a date from a string. The function supports many date formats and others can be easily ad… (read more)