Editorials

Schema Options

Schema Options

When it comes to relational databases you have options to consider differing from those of other No-SQL engines. It is important to not discount the value of the relationship in a relational database.

Record State is one of the dimensions where you have many options available. Often a database entity may only have one state at a time. In this case, we generally associate a foreign key value from a state table to the entity in question, with an attribute such as a state id (numeric value such as an Enum) or code (character value with minimal meaning).

Sometimes we may wish to track the history of the state for an entity record. In this case we would more likely have a many to many join table associating the entity record with a state record, and probably a column with the date/time for when the state was established.

In this kind of scenario it becomes a little more complicated to get the current state of an entity because you must first locate the latest record in the many to many join table, and then establish the state from that specific instance.
Often we create views on the many to many table returning the last state of each entity record contained therein. A scalar function is another technique locating the last record and returning the state…but this would have to be called multipoe time for each record if you had a query returning multiple entity records, which may not be efficient.

To simplify the whole thing, many folks will maintain the last state in the entity record itself, and then save the history in the Many to Many table providing good performance for both requirements. You have a state history available should it be needed. You also have a quick way to determine the state of an entity record without a lot of gymnastics by using the value currently contained in the Entity record. The only downside is that you have to update the existing entity record, and add a hsitory record, when the state of the entity changes.

Some situations may have only two states for an entity in a kind of workflow process. The entity starts processing, and then it completes. No other states occur. It simply starts and finishes. In this case you have similar schema options as those for tracking history. You have a few options to accomplish this scenario.

  1. Store only the current state in the entity table. Update the original started state with finished when the process completes
  2. Store two state columns as binary (true/false) in the entity and set them to true at the appropriate time. This is redundant since neither state may be true at the same time, so it isn’t really a good design.
  3. Use a One to Zero or One relationship table to track the finished state

Let me explain the third option since I haven’t introduced this schema design yet. Basically, the entity table record is the existence of the initial state. The record and the state are one and the same. A date/time column in this table also acts as a historical time demarcation when the intial state took place.

A second table is then created for the finished state. It has the same primary key of the entity table, and will have only zero or one record per entity record. The existence of a record in this second table is the record of the finished state of the entity. If you want to know what entities have begun and not yet completed, simply do an outer join of entity to completed entity. When the completed entity is null, the process has not yet completed.

Using a time stamp from both tables you can even determine the interval of time between the two different states.

This third option, in the case of only two states allowed, is actually a powerful option because Inserts are faster than updates, you don’t have null values in your entity table for future attributes, you don’t update the original state, you retain the history of both states, and joins are extremely fast in any relational data engine.

While I wouldn’t use this design in a NoSQL data storage design, it makes a lot of sense in a relational design, and you have a great amount of flexibility for querying and optimal performance.

Share your thoughts on relational schema design below, or drop me an email at btaylor@sswug.org.

Cheers,

Ben