SQL Server

Partitioning relational and multidimensional data for OLAP solution – (Part-2)

Partitioning relational and multidimensional data for OLAP solution – (Part-2)

Multidimensional solutions usually hold extremely large amounts of data. Partitioning this data can improve manageability because aged data can be removed more easily. When aged data is removed from the relational data source, the matching partition can be removed, so it is not necessary to reprocess the entire cube. Well-designed partitioning also helps to improve the performance.

In this second part of the two part article series, you will learn about the benefits of partitioning multidimensional data. You will also learn some guidelines to use when you partition this data.

Reasons to partition multidimensional data

The following are the two common reasons to partition relational data:

Manageability

Measure groups can be extremely large and are volatile because of the amount of new information regularly added to them. Many fact tables have a data field that can be used as a natural partitioning key with which to separate the data into time-based partitions. If you keep most recent data in separate partition, only partition needs to be reprocessed when data is added to the fact table. Separate partitions are also beneficial when data is aged out of the multidimensional solution. If a partition holds only aged data, the partition can be deleted, and it is not necessary to reprocess the entire cube to reflect data that was removed from the source database.

Performance

Partitioning your multidimensional data improves performance in several areas. For example, it minimizing cube processing time, because you can apply different processing frequency settings to partitions depending on data volatility. Partitioning also improve query processing time, because of a reduced amount of data in each partition and enabling SQL Server to use parallel processing.

SQL Server Analysis Services Partitioning Architecture

In SQL Server Analysis Services, partitions are implemented at the measure group level. Each measure group can include one or more partitions. When a measure group is created, a default partition is also created for that group. Additional partitions can be added to improve query and processing performance. The following are some of the important features of the SQL Server Analysis Services partitioning architecture:

· Partitioning occurs at the measure group level.

· When a measure group has multiple partitions, parallel processing is used when the measure group is processed.

· Partitions can be processed selectively.

· Data slices are optional, and are not used in the MOLAP storage model. However, data slices must be defined for best performance when you use ROLAP storage model or when proactive caching involves a ROLAP phase.

Guidelines for Partitioning Multidimensional Data

When you intend to partition measure groups in an SQL Server Analysis Services cube, you must design your partitioning solution for optimal performance and manageability. The following are some guidelines you should follow when partitioning measure groups in SQL Server Analysis Services:

Determine the best boundaries for your partitions

You must determine the best/natural boundaries for your partition. To find the best boundaries for your data, you should consider the structure of your relational data, and match your multidimensional structure to this. You should also consider the partition definition because if you define partitions with the query including WHERE clause, the partitions can be maintained more efficiently then if you define a partition by specifying a table or view.

Synchronize your relational and multidimensional partitions

You must design a process to synchronize your relational and multidimensional partitions whenever partitioning changes occur in the relational data source.

Choose best storage model and aggregation level

Determine what storage model and aggregation level best fit the data in each partition. For example, you should use MOLAP with the higher level of aggregations for partition with frequently queried data, or use ROLAP without aggregations for data that is queried infrequently, or use ROLAP or real-time HOLAP for data that must be accessed in real time.