Today I am going to share a small tip that needs to be keep in mind while working on troubleshooting performance issues.
If I reveal the answer at the beginning everyone will have an easy opinion hence I will reveal this at the very end.
Let me give the Background of the servers we have 4 servers (2 are in Primary Data center) and the other (2 are in DR) enabled in Always ON with SQL Server 2016 Version.
we have some issues on these as the fail overs were happening on Daily Basis and I found the root cause of the mystery fail overs and blog posted the same
Coming back to issue we have 3 availability groups where 2 of them should run on NODE 1 and the other should run on NODE 2. As a result of these unexpected fail-overs we were getting performance issues as 3 of the AG's are running on same node.
After failing over the instance which should run on NODE 2 we thought the issues were resolved. However after couple of days and that too on Month end there were some reports that needs to run and they started behaving weirdly.
I started my troubleshooting with sp_whoisactive to see what queries are running at the moment and on what are they waiting for . Many of them are in runnable state and could see CPU consumption greater than 80%.
At first I thought it is parameter sniffing issue but it's not and next I checked if there are any outdated stats but that is not the case either. (Below are the links on how to troubleshoot parameter sniffing issues)
Usually I will not consider fragmentation as they will not cause issues of all sudden so I literally ran out of ideas.
In general when we have performance issues we will not check error logs as we will not find much info in them. However in my case it is the game changer.
While checking I observed that there is a change in compatibility level of the problematic database from 110 to 130.
After checking this with out any thought I changed the compatibility level back to 110 and the issue got resolved.
As you know the cardinality estimator has been changed from SQL 2014 hence the execution plans got changed completely. So why there is a change in compatibility level? It has been done to enable query store feature as our instance is running on SQL 2016.
Frankly Speaking I think it is not required to change the compatibility level to enable query store when your database is on 110 compatibility level and your instance running on SQL 2016.
Now I have huge task with me as why the new cardinality estimator is causing problems and what should be done to make that work as expected.
As always performance tuning is one area where any one has space to improve themselves always.