Editorials

Troubleshooting SQL Server – Next Steps

Featured Article(s)
Information Economics for IT Managers
Information is a valuable resource in any organization. However, the preparation of formal information is not free; it costs money. How much should an organization spend for information? Some type of a cost-benefit analysis ought to be undertaken. It is more easily said, than done. Difficulties occur in measuring the cost of providing the information and measuring the value of information. Information is conceptual in nature and possibly has hardly any tangible characteristics, except symbolic representation. Here is some attempt to understand some aspects of information economics for IT managers.

Troubleshooting – Next Steps
I’ve received SO many suggestions on this (*Thank You*) – I’m going to write up an article on this that you can hopefully put to use in the future. I’d like to continue gathering troubleshooting ideas – but wanted answer some questions here too, and provide some more information. Be sure to read Friday’s editorial if you’re interested in how this all started.

Question: Why are you paying attention to the DisK Queue’s? It’s an odd thing to watch!
Answer: Yes, it is odd to watch this, but it was our indicator as things started going south, the queue would shoot up, showing that the problem was happening. We only watched it because it was repeatably a good indicator that the issue is happening.

Question: Is this a replication issue?
Answer: No. We don’t have replication impacting this.

Question: Are there errors in the system or SQL Server logs? (Or, are there disk/RAID system array issues?)
Answer: No. No indication of a disk issue (this was mentioned by a LOT of you that wrote in) and no errors. Further, CHECKDB shows no issues, and we don’t get code-based errors.

Question: Are there system jobs running, or other regular, timed processes that are impacting the performance?
Answer: No. We’ve confirmed that the system issues do not correlate with any jobs or other recurring activities on the system.

Question: Is this a SAN-related issue? (Hot spot, etc.)?
Answer: Nope, not an issue here.

Question: What does the end-user see? A slow-down? Outright failure? Timeouts? Nothing?
Answer: The user experience when the issue happens is one of very slow or "hung" access. If a web page is being access, IE/Firefox show the page as "done" but then nothing paints (even now, this is an odd response, BTW). Timeouts occur frequently, depending on the operation.

Question: Is this a checkpoint issue?
Answer: No. We’ve checked on this, manually forcing, testing, etc. – This does not appear to be the issue.

Tomorrow I’ll let you know what we’ve done to correct this (though we’re still testing the solution and the longevity of it). But in the meantime, please send in your thoughts about what you’d poke at next – drop me a note here

Featured White Paper(s)
Understanding Business Intelligence: ETL Best Practices
This paper is about building powerful data marts that require minimal administration and are simple to change. This may seem … (read more)

An Enterprise Approach to Protecting Critical Data
Applications and databases form the core of an organizations information technology infrastructure. Without the business proc… (read more)