[.net] How to Troubleshoot Intermittent SQL Timeout Errors

We've been having a few instances per day where we get a slew of SQL Timeout errors from multiple applications (System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.) We have over 100 different applications on our network, both web and desktop apps. Everything from VB6 and Classic ASP to .NET 4. I can find all kinds of data that show the side effects but can't pinpoint what is causing this. Our DBA says nothing is wrong with the SQL server, and IT says there's nothing wrong with the web servers or network, so of course I'm left in the middle trying to troubleshoot this.

I'm really just looking for suggestions on what other troubleshooting I can do to try and track this down.

We're running SQL Server 2008 R2 in a cluster. There's a handful of different servers that connect to it, ranging from Windows server 2003 to 2008 of different varieties.

Here's what I've done so far:

  • Run SQL trace of long running queries and deadlocks. This shows no deadlocks at the times of the problems, and long running queries all coincide with our timeout errors, but look to be a side effect, and not the cause. Queries that are very basic that typically return instantly end up taking 30, 60 or 120 seconds to run at times. This happens for a few minutes then everything picks up and works fine after that.
  • Use performance monitor to track connection pool connections. This sometimes shows some spikes in the number of connections near the times of the timeouts, but still not even halfway to the default 100 connection limit. Again, nothing here that seems to point to a cause.
  • Separate web applications into different App Pools. We tried to narrow down the apps we thought may be the main problem (most chatty, etc) and put them in separate Application Pools but that doesn't seem to affect anything or help us narrow down anything.
  • Monitor disk usage on SQL Server. We've done some monitoring on the SQL server and see no spikes or any signs of problems when these timeouts are occurring.
  • Verified TempDB was not the cause of the problem.

I'll come back and add more if I think of what else we've tried. Please let me know some ideas on what to troubleshoot next.

This question is related to .net sql sql-server sql-server-2008 timeout

The answer is


Run SQL trace of long running queries and deadlocks. This shows no deadlocks at the times of the problems, and long running queries all coincide with our timeout errors, but look to be a side effect, and not the cause. Queries that are very basic that typically return instantly end up taking 30, 60 or 120 seconds to run at times. This happens for a few minutes then everything picks up and works fine after that.

It looks like some queries/transaction lock your database till they are done. You have to find out which queries are blocking and rewrite them/run them at an other time to avoid blocking other processes. At this moment the waiting queries just timeout.

An extra point to dig into is the auto increment size of your transaction log and database. Set them on a fixed size instead of a percentage of the current files. If files are getting taller the time it takes to allocate enough space will eventually longer as your transaction timeout. And your db comes to a halt.


Bit of a long shot, but on a lab a while back, we had a situation where a SQL Server appeared unresponsive, not because we had spiked the CPU or anything we could track within SQL Server, it appeared operational to all tests but connections failed under some load.

The issue turned out to be due to the volume of traffic against the server meant we were triggering the in built windows Syn Attack Flood Protection within Windows. Annoyingly when you hit this, there is no logged message within windows server, or within SQL - you only see the symtpoms which are connections failing to be made - this is because windows slows down on accepting the messages and let's a queue build. From the connection standpoint, the server appears to not respond when it should (it doesn't even acknowledge the message arrived)

http://msdn.microsoft.com/en-us/library/ee377084(v=bts.10).aspx

Scroll down to SynAttackProtect and you will see the default in windows server 2003 sp1 onwards was to enable this feature by default. It is a DDOS protection mechanism in effect, and the lack of logging that it is triggering makes it incredibly difficult to detect when your server does this.

It took 3 days within the MS lab before it was figured out.

You mentioned 100 conenctions, we had an app that constantly connected, ran queries and then disconnected, it did not hold the connections open. This meant that we had multiple threads on each machine connectiong doing this, 10 machines, multiple threads per machine, and it was considered enough different connections consistently being made / dropped to trigger the defense.

Whether you are at that level (since it is not a clearly defined threshold by MS) is hard to say.


We experienced this with SQL Server 2012 / SP3, when running a query via an SqlCommand object from within a C# application. The Command was a simple invocation of a stored procedure having one table parameter; we were passing a list of about 300 integers. The procedure in turn called three user-defined functions and passed the table as a parameter to each of them. The CommandTimeout was set to 90 seconds.

When running precisely the same stored proc with the same argument from within SQL Server Management Studio, the query ran in 15 seconds. But when running it from our application using the above setup, the SqlCommand timed out. The same SqlCommand (with different but comparable data) had been running successfully for weeks, but now it failed with any table argument containing more than 20 or so integers. We did a trace and discovered that when run from the SqlCommand object, the database spent the entire 90 seconds acquiring locks, and would invoke the procedure only at about the moment of the timeout. We changed the CommandTimeout time, and no matter time what we selected the stored proc would be invoked only at the very end of that period. So we surmise that SQL Server was indefinitely acquiring the same locks over and over, and that only the timeout of the Command object caused SQL Server to stop its infinite loop and begin executing the query, by which time it was too late to succeed. A simulation of this same process on a similar server using similar data exhibited no such problem. Our solution was to reboot the entire database server, after which the problem disappeared.

So it appears that there is some problem in SQL Server wherein some resource gets cumulatively consumed and never released. Eventually when connecting via an SqlConnection and running an SqlCommand involving a table parameter, SQL Server goes into an infinite loop acquiring locks. The loop is terminated by the timeout of the SqlCommand object. The solution is to reboot, apparently restoring (temporary?) sanity to SQL Server.


Since I do troubleshooting everyday as a part of my job, here is what I would like to do:

  1. Since it's SQL Server 2008 R2, you can run SQLDiag which comes as a part of the product. You can refer books online for more details. In brief, capture Server Side trace and blocker script.

  2. Once trace is captured, look for "Attention" event. That would be the spid which has received the error. If you filter by SPID, you would see RPC:Completed event before "Attention". Check the time over there. Is that time 30 seconds? If yes, then client waited for 30 second to get response from SQL and got "timed out" [This is client setting as SQL would never stop and connection]

  3. Now, check if the query which was running really should take 30 seconds?

  4. If yes then tune the query or increase the timeout setting from the client.

  5. If no then this query must be waiting for some resources (blocked)

  6. At this point go back to Blocker Script and check the time frame when "Attention" came

Above is assuming that issue is with SQL Server not network related!


My experience with these issues (not on SQL Server though) is that overdone multi-tasking is often the cause of the problem. If there is similar/connected data/tables queried at (almost) the same time by many connections, the DBMS may have trouble keeping all the isolation at check. This is not that much of an issue of disk usage as to making some connections wait for things to be done by other ones. Synchronization is very expensive in terms of CPU usage.

The 100 connections is way too much in my opinion. (In my experience again) even 20 connections asked to be done by one machine may be overly optimistic.


Like the other posters have suggested, it sounds like you have a lock contention issue. We faced a similar issue a few weeks back; however, ours was much more intermittent, and often cleared up before we could get a DBA onto the server to run sp_who2 to trace down the issue.

What we ended up doing was implement an e-mail notification if a lock exceeded a certain threshold. Once we put this in place, we were able to identify the processes that were locking, and change the isolation level to read uncommitted where appropriate to fix the issue.

Here's an article that provides an overview of how to configure this type of notification.

If locking turns out to be the issue, and if you're not already doing so, I would suggest looking into configuring row versioning-based isolation levels.


I suggest you have a deep look at the super cool SQL Server's Dynamic Management Views feature:

Dynamic management views and functions return server state information that can be used to monitor the health of a server instance, diagnose problems, and tune performance.

This article is a good start with DMVs, although it was written for SQL 2005 (DMVs feature first appearance): Troubleshooting Performance Problems in SQL Server 2005, especially the 'blocking' chapters.


The issue is because of a bad query the time to executing query is taking more than 60 seconds or a Lock on the Table

The issue looks like a deadlock is occurring; we have queries which are blocking the queries to complete in time. The default timeout for a query is 60 secs and beyond that we will have the SQLException for timeout.

Please check the SQL Server logs for deadlocks. The other way to solve the issue to to increase the Timeout on the Command Object (Temp Solution).


I had an issue similar to this and found out is was due to a default .Net framework setting

Sqlcommand.Timeout

http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout(v=VS.100).aspx

The default is 30 seconds as sated in the above url by Microsoft, try setting this to a higher number of seconds or maybe -1 before opening the connection to see if this solves the issue.

It maybe a setting in your web.config or app.config files or on you applicaiton / web server config files.


Sounds like you may already have your answer but in case you need one more place to look you may want to check out the size and activity of your temp DB. We had an issue like this once at a client site where a few times a day their performance would horribly degrade and occasionally timeout. The problem turned out to be a separate application that was thrashing the temp DB so much it was affecting overall server performance.

Good luck with the continued troubleshooting!


Performance problems boil down to CPU, IO, or Lock contention. It sounds like you have ruled out IO. I would guess CPU is not a problem since this is a database, not a number cruncher. So, that leaves lock contention.

If you can execute a sp_who2 while the queries are timing out, you can use the BlkBy column to trace back to the holding the lock that everyone else is waiting on. Since this is only happening a few times a day, you may have trouble catching enough data if you are running this manually, so I suggest you rig up an automated system to dump this output on a regular basis, or maybe to be triggered by the application timeout exceptions. You can also use the Activity Monitor to watch the degradation of query responsiveness in real-time, as suggested by peer.

Once you find the long-running query and the application that executes it, you can immediately resolve the domino of timeouts by reducing the timeout for that single application below all the others (right now, it must be longer). Then, you should inspect the code to determine a better solution. You could reduce the time the lock is held by committing the transaction sooner within a sproc, or reduce the lock required by the reading query with hints such as NOLOCK or UPDLOCK.

Here's some more reading on sp_who2: http://sqlserverplanet.com/dba/using-sp_who2/

And query hints: http://msdn.microsoft.com/en-us/library/ms181714.aspx http://msdn.microsoft.com/en-us/library/ms187373.aspx


I've seen similar problems happen if anti-virus was installed on the SQL server. The AV's auto-update features were clocking the server and not allowing enough CPU for SQL Server.

Also, have you put a small application on the SQL server itself that verifies that connections can be made or runs very basic SQL like "SELECT GETDATE();"? This would eliminate network possibilities.


Are these servers virtualized? On another post I've read about a SQL server running sometimes very slowly because of lack of sufficient memory. This in turn was caused by a so-called memory balloon that the virtualizer used to limit the amount of memory used by that virtual server. It was hard to find because the pressure on physical memory had nothing to do with the SQL server itself.

Another common cause for a temporary performance degradation might be a virus scanner. When a new virus definition is installed, all other processes will suffer and run very slow. Check out any other automatic update process, this might also take a lot of resources quite unexpectedly. Good luck with it!


I am experiencing the same issue.. and I build some logging into several functions that I could identify that were frequently running long. when I say frequently I mean about 2% of the time. So part of the log inserted the start time and the end time of the procedure or query. I then produced a simple report sorting several days of logs by the total execution time descending. here is what I found.

the long running instances always started between HH:00 and HH:02 or HH:30 an HH:32 and none of the short running queries ran between those times. Interesting....

Now It seems that there is actually more order to the chaos that I was experiencing. I was using a recovery target of 0 this implemented "indirect checkpoints" in my databases so that my recovery time could be achieved at nearly 1 minute. causing these checkpoints to be created every 30 mins.

WOW, what a coincidence!

In Microsoft's online documentation about changing the recovery time of a database comes with this little warning...

"An online transactional workload on a database that is configured for indirect checkpoints could experience performance degradation."

Wow go figure...

so I modified my recovery time and bango no more issues.


You are on the right track with your tracing and profiling. what you need to do is look for what the queries that time-out have in common - it is likely they will all be hitting a small subset of tables or indexes. I suspect some application has a long-running update/insert that affects queries on tables that use indexes affected by the updates/inserts.

You have to work backwards a little - given the subset of tables you see timing out, see what indexes are on those tables. Look for other queries that are running at the smae time that touch those tables/indexes. I'm betting you will find a small set of updates/inserts doing this.

Then you have some decisions to make. One option is to changing the locking hints on the queries that are timing out. But that is geenrally bad practice because it will mask the real problem for a while. While you mgiht see the timeouts go away for a while, depending on the hint you choose you might end up with dirty reads and then bogus data coming back from those queries. That might turn out to be worse than the timeouts - hard to say.

Best bet is to figure out which of your applications are submitting the update/inserts you found and dig in to figure out why they take so long.


Examples related to .net

You must add a reference to assembly 'netstandard, Version=2.0.0.0 How to use Bootstrap 4 in ASP.NET Core No authenticationScheme was specified, and there was no DefaultChallengeScheme found with default authentification and custom authorization .net Core 2.0 - Package was restored using .NetFramework 4.6.1 instead of target framework .netCore 2.0. The package may not be fully compatible Update .NET web service to use TLS 1.2 EF Core add-migration Build Failed What is the difference between .NET Core and .NET Standard Class Library project types? Visual Studio 2017 - Could not load file or assembly 'System.Runtime, Version=4.1.0.0' or one of its dependencies Nuget connection attempt failed "Unable to load the service index for source" Token based authentication in Web API without any user interface

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to sql-server

Passing multiple values for same variable in stored procedure SQL permissions for roles Count the Number of Tables in a SQL Server Database Visual Studio 2017 does not have Business Intelligence Integration Services/Projects ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database How to create temp table using Create statement in SQL Server? SQL Query Where Date = Today Minus 7 Days How do I pass a list as a parameter in a stored procedure? SQL Server date format yyyymmdd

Examples related to sql-server-2008

Violation of PRIMARY KEY constraint. Cannot insert duplicate key in object How to Use Multiple Columns in Partition By And Ensure No Duplicate Row is Returned SQL Server : How to test if a string has only digit characters Conversion of a varchar data type to a datetime data type resulted in an out-of-range value in SQL query Get last 30 day records from today date in SQL Server How to subtract 30 days from the current date using SQL Server Calculate time difference in minutes in SQL Server SQL Connection Error: System.Data.SqlClient.SqlException (0x80131904) SQL Server Service not available in service list after installation of SQL Server Management Studio How to delete large data of table in SQL without log?

Examples related to timeout

Waiting for Target Device to Come Online Spring Boot Java Config Set Session Timeout How to dispatch a Redux action with a timeout? Spring Boot REST API - request timeout? 5.7.57 SMTP - Client was not authenticated to send anonymous mail during MAIL FROM error How to set timeout in Retrofit library? How to set connection timeout with OkHttp How to modify the nodejs request default timeout time? How to handle ETIMEDOUT error? Timeout for python requests.get entire response