Editorials

Data Auditing Implementation

There was a lot of valuable feedback from yesterday’s editorial, “Generate Audit Triggers Using SQL Script“, demonstrating a script that creates audit trigger code. Interesting that most of the comments were really about the auditing technique rather than the code generation.

So, let me add a few thoughts:

The script example provided is not the most efficient code. It doesn’t need to be efficient because it’s only purpose is to produce the trigger Code. It runs once. The trigger code it generates, however, does need to be efficient, because it is executed with every insert/update/delete on a table with a trigger.

So here is the crux of the design. I create a single table which tracks changes for every row and column, individually, in an audit table, for all tables.

CREATE TABLE Audit

(

AuditId INT NOT NULL

,UserId INT NOT NULL

,TableId INT NOT NULL

,ColumnId INT NOT NULL

,KeyId INT NOT NULL

,StartDate DATETIME NOT NULL

,EndDate DATETIME NOT NULL

,Value VARCHAR(MAX) NULL

)

Each column in a row is evaluated individually to determine if the value has changed since it was last audited. Because this takes finding the last record in the audit table, an AuditQueue table is used as a gathering of all table modifications through the trigger. Later, a second process compares the records in AuditQueue with the Audit table, and makes the necessary updates and entrie, or simply discards the AUditQueue record if the value hasn’t changed. The trigger saves the changes to AuditQueue and exits. Very Fast, very light weight.

Once your complete audit system is in place, your audt table will tell you that

For a tableId, columnId, keyId the value was “some value” from the start date to the end date. For each record in the audit table based on tableID, ColumnId, KeyId the startdate and enddate values do not overlap.You now have a running history of any data value over time. You can ask the question, “What was the value of a users first name as of 10/23/2014?”

The trigger code has been vetted as working and acurate for over 10 years, at least for SQL Server 2000 and later.

One of the comments addressed the load of converting everything to VARCHAR(MAX) as a heavy handed approach. I agree. If you are sure that the largest datatype you are going to have is not bigger than 8000k, like an NVARCHAR(4000), probably the most flexible approach would be to use VARBINARY(8000) instead. Then you can literally store almost anything.

I thought about using SQL Variant instead of converting everything to a string or binary. That may actually be the best solution. Regardless of what datatype you use for storing the individual audit records, the more your table contains columns with large data types, the more efficient this method of logging becomes, simply because once the data is removed from the AuditQueue table, only changes to each indivudal column are stored. We don’t store a change of every column, when any column in a row is modified.

The best part about this structure is it only saves the column(s) that were modified in a single row. For wide tables, this is a nice savings. Moreover, it lets you put together a comprehensive history of a table. You can literally get the data values for any point of time, individually, for every column.

Consider this scenario.

User 1 changes column a yesterday

User 2 changes column b today

If you only audit changes on the row level, then it looks like user 2 changed both column a and column b. However, if you audit individually by column, you get the changes from column a by user 1, and column b by user 2 because they are separate audit records.

Maybe you can help perfect this technique? Perhaps you have a different solution, or have experience with using the SQLVARIANT data type effectively? Does this warrent compression on any level? Share your thoughts in our comments or by Email to btaylor@sswug.org.

Cheers,

Ben