Editorials

Schema Refactoring

I inteded to publish this last Friday. Sorry for the delay. I wanted to take a high level look at refactoring databases as it applies to Code Smells. Scott Ambler in his book “Refactoring Databases” identifies different categories of database refactoring:

  • Database Schema – Table and View definitions
  • Data Quality – Conforming data to the intended design
  • Referential Integrity – Enforcing data quality through table releationships
  • Architectural – External Program Interaction
  • Procedural – modifications to Procedures, Functions, or custom code tools

Let’s take a look at Schema refactoring. Schema refactoring occurs when you alter the definitions of tables or views within your database. Often this is known as normalization, de-normalization, or extension for new features.

Normalization might occur when you do something like replacing repeating data types in multiple columns of a single table with another table using a one to many relationship. I experienced this kind of refactoring on a database containing multiple fees that could be applied to a transaction. Each new fee was created by adding a new fee column to the fees table. Most times a transaction would have a few fees that applied, and the majority of the fee columns were zero, meaning it didn’t apply.

In this case, it was more flexible to have a transaction table, a fee table, and a many to many relationship join table, Transaction_Fee. Then when a new fee was created it would be added to the fee table, and any transaction applying that fee would have an entry in the Transaction_Fee join table. Fees that didn’t apply were not created.

The normalization of the data made the introduction of new fees easy to accomplish. It didn’t always make presentation easy because users would often want to see all fees with a Zero value if they didn’t apply. This might be the scenario where a materialized view would make sense as we talked about earlier in the week as a technique for enhancing resilience.

In other situations you may choose to generate some data marts or a data warehouse allowing for reporting that performs according to your needs. As we considered Resilience last week, breaking out reports into a separate system has the benefit of allowing the different systems to remail online when the other system fails.

Do you like the art of data modeling? Share your refactoring experiences for data modeling here as part of the conversation, or feel free to drop an email to btaylor@sswug.org.

Cheers,

Ben