Rahman Agoro, the frontline SQL DBA

Just another WordPress.com weblog

SQL 2014 Replication Error The process could not execute ‘sp_replcmds’ on ‘xxxxxxxxxxxx’. (Source: MSSQL_REPL, Error number: MSSQL_REPL20011)

Posted by rahmanagoro on October 20, 2016


After spending some time away from SQL replication (SQL 2014), I find myself working with this brilliant piece of technology again. In this particular case I had to test, support and troubleshoot replication even without having any knowledge of its setup.

I had this error come up during one of my investigations when I check the log reader agent within SQL server replication monitor.

Error messages:

  • The process could not execute ‘sp_replcmds’ on ‘XXXServerXXX’. (Source: MSSQL_REPL, Error number: MSSQL_REPL20011)
    Get help: http://help/MSSQL_REPL20011
  • Cannot execute as the database principal because the principal “dbo” does not exist, this type of principal cannot be impersonated, or you do not have permission. (Source: MSSQLServer, Error number: 15517)
    Get help: http://help/15517
  • The process could not execute ‘sp_replcmds’ on ‘XXXSERVERXXX’. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037)
    Get help: http://help/MSSQL_REPL22037

 

After doing some investigations, I found that the database in question didn’t have an owner specified.

I changed the owner of the database and hey pronto, the agent started to work again.

 

Use DatabaseName
GO
sp_changedbowner 'sa'

Posted in Management, SQL replication | Leave a Comment »

Database ‘DatabaseName’ is in transition. Try the statement later Msg 952, Level 16, State 1, Line 1

Posted by rahmanagoro on April 4, 2016


I bumped onto this error message when our monitoring tool started to report errors with a SQL agent job. On taking a close look at the error, I found that whenever I tried to connect to the database, I simply got the error:

Msg 952, Level 16, State 1, Line 1
Database 'DatabaseName' is in transition. Try the statement later.

Digging deeper into the issue and also the error information, i made some interesting discoveries. It appears that we have a blocking chain going on within the database, our environment is  such that database owners can choose to detach databases etc.

I can the following code to check for blocking:

select
 l.resource_type,
 l.request_mode,
 l.request_status,
 l.request_session_id,
 r.command,
 r.status,
 r.blocking_session_id,
 r.wait_type,
 r.wait_time,
 r.wait_resource,
 request_sql_text = st.text,
 s.program_name,
 most_recent_sql_text = stc.text
from sys.dm_tran_locks l
left join sys.dm_exec_requests r
on l.request_session_id = r.session_id
left join sys.dm_exec_sessions s
on l.request_session_id = s.session_id
left join sys.dm_exec_connections c
on s.session_id = c.session_id
outer apply sys.dm_exec_sql_text(r.sql_handle) st
outer apply sys.dm_exec_sql_text(c.most_recent_sql_handle) stc
where l.resource_database_id = db_id('DatabaseName')

It turned out that one user wanted to detach the database whilst another user just started to issue a query on the same database, this was the reason for the error and the change of the transition state. One way to avoid such issue before detaching a database is to take the following steps.

  • Personally, I do like using code/T-SQL to run commands rather then the good old GUI, I can see what i am doing and I can also see if my actions are getting stuck, rather than wondering if my Pc is slow etc.
  • Secondly, when detaching a database, if other connections can be rolled back. The code below can be issued.
ALTER DATABASE DatabaseName SET offline with rollback immediate

The issue

I could see the blocking chain, from the SQL query above which checks the locks. Multiple sessions wanted to connect to a database that was about to change its transition to offline and then detached.

What I simply did was kill one of the sessions which was causing the blocking.

 

 

 

Posted in Management, Scripts, SQL 2008, SQL Administration | Leave a Comment »

Cannot drop database XXXX because it is being used for replication.

Posted by rahmanagoro on March 15, 2016


If you have ever come across this error, its because replication has been setup on the database before and it needs to be removed before the database can be dropped.

In my case, I had the error when I tried to run the code below.

drop database TEST_FWXXX

Server: Msg 3724, Level 16, State 3, Line 1
Cannot drop the database XXX because it is being used for replication.

 

Please note that in this case, the publication and subscription had already been dropped. Simple to resolve, just remove the replication property on the DB by running the code below.

exec sp_removedbreplication ‘TEST_FWXXX’

After running this, I was then able to drop the database successfully.

 

Posted in Management, SQL replication | Leave a Comment »

Cannot perform a differential backup for database “XXX”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE, omitting the WITH DIFFERENTIAL option error on Litespeed

Posted by rahmanagoro on October 25, 2013


I recently bumped onto a situation the other day, all of a sudden my differential backup started to fail unexpectedly. This was a SQL 2005 cluster. This sounded quite unusual, and reading the actual error message made me even more curious. We do use SQL litespeed for all SQL instances before SQL 2008 because of its compression feature and the disk space savings that it offers. I started to see errors similar to the below.

Msg 61700, Level 16, State 1, Line 0

VDI open failed due to requested abort.

BACKUP DATABASE is terminating abnormally.

Cannot perform a differential backup for database “XXX”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE, omitting the WITH DIFFERENTIAL option.

 

First of all I thought it was really quite a strange one especially as I knew that the full backup had actually just completed, what’s also ironic is that once you run a full backup and then kick off a differential afterwards it always worked, but whenever we had it scheduled it always failed. Hmmmmn, how weird….

Upon examining the system, it became clear that something was clearly wrong somewhere, I took the unusual step of actually re-installing the SQL litespeed binaries. This didn’t seem to help, and for some reason I decided to query the msdb database to check if any other backups were being done, I checked the SQL agent job list to ensure that there weren’t any other jobs that could be running, I didn’t see any jobs so I then decided to query the msdb database.

backup1

 

What we can see from the above is that some of the backups are being written to the file system, however there are others which are being written to what looks like a tape device. It appeared rather strange, because our backup policy is to write to the file system then the tape infrastructure picks up the files.

I got in touch with the backup administrators to make enquiries as I thought they might have setup a direct tape backup to the database which was then causing the litespeed full backup no to be recognised when the differential gets kicked off.

Then I also noticed the following within the SQL error log file as well.

Message

I/O was resumed on database DBA. No user action is required.

 

Hmmn, this hit me like a tonne of rocks, this message was one that I last saw a few years ago at a different company, in this particular case we had the ability to take SAN snapshots and that uses VSS technology on the database disk volumes. This effectively freezes the IO on the databases, and then takes snapshots of the disk volumes. Armed with this information, I then informed the tape backup administrators to check the tape backup config again to ensure that VSS wasn’t enabled, and waaaala that was the problem. I got the administrator to disable the VSS option and then asked him to backup the drive again, he did and this time I didn’t see any record logged on the msdb database and also any information on the SQL error log file also.

 

I stumbled upon this KB article as well from MS which is related: http://support.microsoft.com/kb/903643

 

 

Posted in Backup, Day to day queries, High Availability | Tagged: , , , , , | Leave a Comment »

SSRS 2008 Reporting Services HTTP 500 Error

Posted by rahmanagoro on January 1, 2013


I recently had a HTTP 500 error whilst trying to open a SSRS page, I was a bit suspicious of this error as it didn’t have any diagnostic information to suggest the root cause of the error on the SSRS page. I then went onto checking the report server database server to see if I could find any information relating to the database server. 

Upon diagnosing the problem, I checked the application logs on the report server and stumbled upon the error information below.

 

</span>

Failed to initialize the AppDomain:ReportManager_MSSQLSERVER_0
Exception: System.Web.HttpException
Message: Could not load file or assembly 'Microsoft.ReportingServices.Diagnostics, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. There is not enough space on the disk. (Exception from HRESULT: 0x80070070)
StackTrace:    at System.Web.Compilation.BuildManager.ReportTopLevelCompilationException()
   at System.Web.Compilation.BuildManager.EnsureTopLevelFilesCompiled()
   at System.Web.Hosting.HostingEnvironment.Initialize(ApplicationManager appManager, IApplicationHost appHost, IConfigMapPathFactory configMapPathFactory, HostingEnvironmentParameters hostingParameters)
   at System.Web.Hosting.HostingEnvironment.Initialize(ApplicationManager appManager, IApplicationHost appHost, IConfigMapPathFactory configMapPathFactory, HostingEnvironmentParameters hostingParameters)
   at System.Web.Hosting.ApplicationManager.CreateAppDomainWithHostingEnvironment(String appId, IApplicationHost appHost, HostingEnvironmentParameters hostingParameters)
   at System.Web.Hosting.ApplicationManager.CreateAppDomainWithHostingEnvironmentAndReportErrors(String appId, IApplicationHost appHost, HostingEnvironmentParameters hostingParameters)
InnerException: System.Configuration.ConfigurationErrorsException
Message: Could not load file or assembly 'Microsoft.ReportingServices.Diagnostics, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. There is not enough space on the disk. (Exception from HRESULT: 0x80070070)
StackTrace:    at System.Web.Configuration.CompilationSection.LoadAssemblyHelper(String assemblyName, Boolean starDirective)
   at System.Web.Configuration.CompilationSection.LoadAllAssembliesFromAppDomainBinDirectory()
   at System.Web.Configuration.CompilationSection.LoadAssembly(AssemblyInfo ai)
   at System.Web.Configuration.AssemblyInfo.get_AssemblyInternal()
   at System.Web.Compilation.BuildManager.GetReferencedAssemblies(CompilationSection compConfig)
   at System.Web.Compilation.BuildProvidersCompiler..ctor(VirtualPath configPath, Boolean supportLocalization, String outputAssemblyName)
   at System.Web.Compilation.ApplicationBuildProvider.GetGlobalAsaxBuildResult(Boolean isPrecompiledApp)
   at System.Web.Compilation.BuildManager.CompileGlobalAsax()
   at System.Web.Compilation.BuildManager.EnsureTopLevelFilesCompiled()
InnerException: System.IO.FileLoadException

Message: Could not load file or assembly 'Microsoft.ReportingServices.Diagnostics, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. There is not enough space on the disk. (Exception from HRESULT: 0x80070070)

StackTrace:    at System.Reflection.Assembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, Assembly locationHint, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection)
   at System.Reflection.Assembly.InternalLoad(AssemblyName assemblyRef, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
   at System.Reflection.Assembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
   at System.Reflection.Assembly.Load(String assemblyString)
   at System.Web.Configuration.CompilationSection.LoadAssemblyHelper(String assemblyName, Boolean starDirective)

InnerException: System.IO.FileLoadException

Message: Could not load file or assembly 'Microsoft.ReportingServices.Diagnostics' or one of its dependencies. There is not enough space on the disk. (Exception from HRESULT: 0x80070070)
StackTrace:
  

 

From the error we can see that it’s ran out of space somewhere, and hence why we have the problem.

The corrective action taken was to check the log folder for SSRS as it was one of the main disk space consumers. I can see from the location that there are loads of files generated.

Check the SSRS log file location in this case, C:\Apps\MSSQL\MSRS10.MSSQLSERVER\Reporting Services\LogFiles.

It turns out that in SSRS, Default trace is switched on by default and the tracing level set by default is level 3 which includes the following: Exceptions, restarts, warnings, status messages (default). This explains why the file sizes on this case were quite big in size.


 <system.diagnostics>
        <switches>
            <add name="DefaultTraceSwitch" value="3" />
        </switches>
  </system.diagnostics>
  <RStrace>
        <add name="FileName" value="ReportServerService_" />
        <add name="FileSizeLimitMb" value="32" />
        <add name="KeepFilesForDays" value="14" />
        <add name="Prefix" value="tid, time" />
        <add name="TraceListeners" value="file" />
        <add name="TraceFileMode" value="unique" />
        <add name="Components" value="all" />
  </RStrace>

<span style="font-family: Calibri; color: #000000; font-size: medium;">  

The DefaultTraceSwitch as we can see above is set to 3.

So when you run into this error, it can be caused by anything including lack of disk space as I have found out.

For more information, please see http://msdn.microsoft.com/en-us/library/ms156500.aspx

 

 

Posted in SSRS, Tips | Leave a Comment »

Error: 17053, Severity: 16, State: 1 what does it mean ?

Posted by rahmanagoro on October 25, 2012


E:\DATA\Database.ndf: Operating system error 112(failed to retrieve text for this error. Reason: 15100) encountered.

I have seen this error message on one of our production systems, and wondering what the issue is. This error is being caused by the fact that one is trying to expand the size of the ndf file, and there just isn’t enough disk space on the actual drive or mount point.

To recreate the problem in SQL 2008, create an existing database. And then expand the size of one of the files to exceed the free available capacity on the server.

Posted in Management, SQL 2008, Tips | Leave a Comment »

Statistics again and we see same query with two different plans

Posted by rahmanagoro on October 25, 2012


Following on from my earlier post on the importance of keeping statistics up to date, I have come across another issue recently and this time, I have the luxury of actually showing the query plans too. Please note that I have used SQL Sentry plan explorer which you can download for free from http://www.sqlsentry.com

I had a call from one of my clients today complaining about bad performance on their database. The irony of this particular case is that I can see it myself by actually running the query on the database server and I could see that its taking a lot of time to complete for a fairly trivial query that shouldn’t really take that long to complete. I was lucky in this particular case that I also had a UAT system to run the query on as well, as it was a report which wasn’t really changing data, it made rerunning the report slightly less intrusive.

I proceeded onto UAT, ran the query and he pesto it completed within 9 seconds as opposed to 3 hours which it was taking on production. In my normal fashion, I started taking a look at the query plans and one thing which became obvious is the fact that the query plans are somewhat different on the 2 systems. Although the way the joins in the query have been written isn’t the best I have seen in my career but these things do happen. I looked through one of the tables and used the SQL Sentry query plan too to observe that the estimated rows on the table was somewhat wrong. I can also see that the wait stats on the query was CXPACKET waits, the query uses parallelism but I can’t really say it’s a bad thing.

The section of the plan below shows the table with the right number of row estimates for the statistics.

The query below shows the same query but with the statistics somewhat wrong.

Although looking at the query, the optimizer also recommended that an index be created on the table, I was hesitate to create the index as the query was working fine before, so why would it start to behave badly all of a sudden. After updating the statistics on the table, cleared the procedure cache and reran the query everything was fine once more. This just proves the importance of having up to date statistics and also the fact that when the optimizer recommends that you create indexes, it’s not always the answer to every performance problem.

Posted in Management, SQL 2008, SQL Administration, Tips | Leave a Comment »

Statistics and the effect on query performance

Posted by rahmanagoro on October 25, 2012


I have always known about issues with table statistics on a SQL server database, but actually seeing it cause performance problems is a bit of a different experience. I got called from the support teams saying that a report which they normally run on the database has been running for 3 hours now, and shows no sign of completing.

I then logged onto the system, ran some diagnostic queries to pull up the query plan for the query running and then suddenly, something didn’t quite make sense to me. One of the tables which I’m familiar with seems to be returning an estimated number of rows of one. At this stage, I knew this wasn’t right as I’m quite familiar with the table and I know that it contains millions and millions of rows.

Even looking at a section of the query plan, I just knew it wasn’t right at all.

Straight away all the tables which had an estimate of one, I updated the statistics with full scan, and re-ran the query again, and this time it completed in around 6 minutes. One thing to learn in this post is that when you see estimated number of rows of 1, and you know that the table certainly has more than one row, its usually a pointer to show that the statistics are out of date. After updating the statistics we can see from the screenshot below that the pipes which moves from object to object is a lot more bigger in size, this means that the number of rows being worked on it significantly more.

Conclusion

 

·         Always check statistics on the database.

·         Ensure that auto update statistics is on for the database, unless you choose to manually run this yourself or the database is very large and manual statistics update is essential.

·         Watch out for one row estimates on the query plan especially for large tables when you know the number of rows that ought to be returned is more than one.

·         Update statistics full scan may not be suitable for every environment, normally a sampling rate would also work, but tests will need to be carried out to establish which sample rate is suitable.

 

 

Posted in Database development, SQL 2008, SQL Administration, Tips | Tagged: | Leave a Comment »

Configuring and Installing SSIS on a SQL 2008 R2 failover cluster infrastructure.

Posted by rahmanagoro on October 23, 2011


SSIS isn’t cluster aware out of the box, so there is some bit of config changes to make in order to make it work on a cluster.

C

Select just the Integration service.

On the Client Access point screen, type in the name and also the IP address as well looking at the requirement spreadsheet.

Click on Next

Since the SSIS service has been installed in its own resource group, we now need to set the dependencies.

Set the disk dependency for the SSIS service

Add the disk resource dependency

Requirement: Change SSIS configuration, custom configuration

SSIS Custom Configuration

Navigate to the path: C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn

Create a folder on the root of the SSIS cluster disk called SSIS_Config

Create a folder on the SSIS cluster disk called Packages

Copy the file from the location C:\Program Files\Microsoft SQL Server\100\DTS\Binn\MsDtsSrvr.ini.xml to S:\SSIS_Config\



Right Click the SSIS service and select properties and then go onto Registry replication and add the registry key below.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\SSIS\ServiceConfigFile\


Click on OK and then Apply.

Take the resource offline and bring it back online for the changes to take effect.



Perform the following steps for SSIS on the second node.

Change the MsDTSSrvr.ini.xml file on the location defined above and change the packages folder to point to S:\Packages

Make the registry entry changes on the second node as well to finish off the configuration.

Test the failover by failing over the SSIS resource group by NodeA to NodeB.

Copy a .dtsx file onto the S:\Packages folder and open up management studio.

In the illustration below, you can see that we have connected to the SSIS instance and we can see the packages.

102311_1839_Configuring28


Posted in Clustering, SQL 2008, Tips, Uncategorized | 5 Comments »

Posted by rahmanagoro on August 12, 2011


I was working with SQL 2008 replication and on checking replication monitor, I noticed that on the distribution agent for one of the databases, it had a retrying command status. I further looked into it to see what the problem was, here I can see that the distribution agent had become stucked whilst trying to send commands to the subscriber. The first thing that came to my mind was, “what is happening with the subscriber ? ”

I ran the following on the subscriber

use master
GO
sp_who2 active

I was able to see that there is a blocking spid at the moment, which is coming from my distributor. So what must have happened here is that, as a snapshot was being delivered to the subscriber, objects such as stored procedures and functions were also being delivered as well, thereby causing a deadlock to occur.

To confirm this, I can the following

from distributor

Read the rest of this entry »

Posted in SQL replication, Tips | Leave a Comment »