Editorials

Using Staging Tables Effectively in SSIS

SSIS has a wonderful set of tools for moving and transforming data. I suggest the use of staging tables when bringing data into SQL Server, not for the purpose of replacing the capabilities of SSIS, but for the purpose of performance when it comes to merging the results into production tables.

You could create a complete dataset in SSIS by reading data in from an CSV file, importing data through an FTP call, or even reading data from another relational data store. If the next step is simply to merge the new data into an existing relational database, this is where staging tables can provide benefit. Importing the data into staging tables, SQL Server can now take over the process of merging the new data into existing production tables.

The reason this can work better is based on the speed of joins within a single database production and staging tables, compared to that of a heterogeneous process, joining data in SSIS to data in SQL Server. Merging data using SSIS you have to join data from an in memory set of data to that of the database. This join tells you what to do with each record: 1) Ignore Duplicates, 2) Update Existing, 3) Delete Removed, 4) Insert new.

If, instead, you insert all your new data into staging tables, this final merge step can be performed efficiently. Statistics will be available, and you can even add indexes to your stating tables, thus improving this final stage of introducing new data, If you are using your staging data to populate multiple tables, you get additional value and performance.

There are many useful data transformations that can be performed in SSIS much easier than in TSQL. Even still, there is sometimes great value of importing data into staging tables, and then using SSIS to perform a transformation from SQL Server data sets. You may find staging tables have value earlier in the SSIS workflow than simply the final stage where the data is merged with the production sets.

Some key factors I have found for considering staging tables are, 1) Large sets of data, 2) Merging data into exsting data, 3) Memory Constraints on the SSIS server, 4) Multiple targets from the transformed data.


Cheers,

Ben