Editorials

Star Schema Warehouse Design

Today we continue the review of different data warehousing structures. We began yesterday with Data Marts. As a quick review, a data mart is simply a table that is self contained. It is rarely joined to any other data. Everything needed to provide results for your query is often contained in a single data mart table. Sometimes there are a few tables such as those found in OLTP, that may be joined to a Data Mart. But often that is not the case.

As your data warehouse evolves, you may move into a star schema warehouse design. A star schema could contain multiple smaller data marts, joined on a centralized table. The center of the star consists of a fact table. The fact table looks like a pin cushion in a database diagram, because there are multiple dimesion tables associated to it.

A good example might be a sales warehouse. It would have a fact table of sales. Each sale would be associated with three dimensions. A Store dimension with all the data concerning a store, a time dimension, with different breakouts for time, and a product dimension with the data regarding products.

Each of these dimension tables are in essence a form of a data mart. They are highly denormalized. Everything you need to know about a store is contained in the store dimension. Everything you need to know about a product is in the product dimension.

Because of this denormalization, the dimensions are often not comprehensive. Consider a product. You could have any number of properties about the product itself. If you add into it the vendor providing the product, it could result in a much bigger table with many more records. Of course you could simply create a new dimension of vendor, and associate that with the sale.

A single data warehouse may have more than one star. Different stars may share the same dimensions. For example, you could have a summarized fact table on orders, aggregating the results by some dimension such as customer, while still maintaining a star on the level of order details. You may choose not to maintain the summary star, because you can generate this kind of data through the data mining queries, and put that information into your cubes.

From a simply theoretical level, you could consider this a progression for data warehousing. A star schema is a relational design based on a set of data marts, indstead of normalized tables as in an OLTP schema.

Tomorrow we’ll take a quick overview of the evolution from a star schema into a snowflake schema.

Cheers,

Ben