Editorials

The Snowflake Warehouse Design

As we have been looking at data warehouse structures we reviewed data marts, and star schemas. Today we’re going to enhance the star schema, converting it into a more complicated form known as a snowflake schema.

A snowflake schema begins as a star schema. As dimensions are added to fact tables, a lot of redundant data begins to emerge in the dimension tables. Friday we used an example of a sales star schema consisting of a Sales Fact Table, and store, time, and product dimensions.

In a star schema, if you wanted to have the phone numbers of the store included, you would have to do one of two things; 1) you could add another dimension table for the phone numbers, and associate it to the sales table; or 2) you could add the phone information to the store dimension. Both of these options are acceptable solutions, and don’t compromise the star schema design.

The phone number is not necessarily a significant attribute. Adding all of the phone numbers as a dimension for sales could increase the number of fact records without providing much value. As a result it is not a good candidate for a dimension. If a store has multiple phone number, but each store does not have the same quantity of phone numbers, it become difficult to add them to the store dimension such as fax, switchboard, customer service, etc.

This is where the snowflake schema extends the star schema. Using our example, we could have an additional table for phone numbers, Instead of relating phone number yable to the Sales fact table, we would instead associate the phone number table to the store table. In a star schema this is not allowed. Only fact and dimension tables are allowed. In the snowflake schema, we have the fact and dimension tables, and also include additional descriptive tables to further define the facts.

This additional layer makes the queries more complex, while reducing the amount of redundant data in the design. It is similar to the process of normalization, while still maintaining de-normalized dimension and fact tables.

Tomorrow we’ll compare the use of the three designs we have discussed so far.

Cheers,

Ben