Rahman Agoro, the frontline SQL DBA

Just another WordPress.com weblog

Archive for the ‘SQL replication’ Category

SQL 2014 Replication Error The process could not execute ‘sp_replcmds’ on ‘xxxxxxxxxxxx’. (Source: MSSQL_REPL, Error number: MSSQL_REPL20011)

Posted by rahmanagoro on October 20, 2016


After spending some time away from SQL replication (SQL 2014), I find myself working with this brilliant piece of technology again. In this particular case I had to test, support and troubleshoot replication even without having any knowledge of its setup.

I had this error come up during one of my investigations when I check the log reader agent within SQL server replication monitor.

Error messages:

  • The process could not execute ‘sp_replcmds’ on ‘XXXServerXXX’. (Source: MSSQL_REPL, Error number: MSSQL_REPL20011)
    Get help: http://help/MSSQL_REPL20011
  • Cannot execute as the database principal because the principal “dbo” does not exist, this type of principal cannot be impersonated, or you do not have permission. (Source: MSSQLServer, Error number: 15517)
    Get help: http://help/15517
  • The process could not execute ‘sp_replcmds’ on ‘XXXSERVERXXX’. (Source: MSSQL_REPL, Error number: MSSQL_REPL22037)
    Get help: http://help/MSSQL_REPL22037

 

After doing some investigations, I found that the database in question didn’t have an owner specified.

I changed the owner of the database and hey pronto, the agent started to work again.

 

Use DatabaseName
GO
sp_changedbowner 'sa'

Posted in Management, SQL replication | Leave a Comment »

Cannot drop database XXXX because it is being used for replication.

Posted by rahmanagoro on March 15, 2016


If you have ever come across this error, its because replication has been setup on the database before and it needs to be removed before the database can be dropped.

In my case, I had the error when I tried to run the code below.

drop database TEST_FWXXX

Server: Msg 3724, Level 16, State 3, Line 1
Cannot drop the database XXX because it is being used for replication.

 

Please note that in this case, the publication and subscription had already been dropped. Simple to resolve, just remove the replication property on the DB by running the code below.

exec sp_removedbreplication ‘TEST_FWXXX’

After running this, I was then able to drop the database successfully.

 

Posted in Management, SQL replication | Leave a Comment »

Posted by rahmanagoro on August 12, 2011


I was working with SQL 2008 replication and on checking replication monitor, I noticed that on the distribution agent for one of the databases, it had a retrying command status. I further looked into it to see what the problem was, here I can see that the distribution agent had become stucked whilst trying to send commands to the subscriber. The first thing that came to my mind was, “what is happening with the subscriber ? ”

I ran the following on the subscriber

use master
GO
sp_who2 active

I was able to see that there is a blocking spid at the moment, which is coming from my distributor. So what must have happened here is that, as a snapshot was being delivered to the subscriber, objects such as stored procedures and functions were also being delivered as well, thereby causing a deadlock to occur.

To confirm this, I can the following

from distributor

Read the rest of this entry »

Posted in SQL replication, Tips | Leave a Comment »

Drop SQL server replication after a server name change

Posted by rahmanagoro on August 4, 2011


After spending some time brushing up my photography skills, its back to the real world of work, coming in to work I had a request from a developer to setup SQL server replication on a publisher with one subscriber, I have a routine of powershell scripts that I normally run to do this. After completing the steps, I noticed some errors, upon investigating it became apparent that the identity of the server isn’t what it is expected to be. The servername is different for some reason.

After running the select command to view the server name,  I find that its different from the server in which I have logged in from.

I get the error

SELECT @@SERVERNAMEsp_dropserver WRONGSERVER
Msg 20582, Level 16, State 1, Procedure sp_MSrepl_check_server, Line 31
Cannot drop server WRONGSERVER because it is used as a Publisher in replication.

–Trying to drop SQL replication I get the following errors

 

exec master..sp_dropdistributor @no_checks = 1

Msg 208, Level 16, State 1, Procedure sp_MSrepl_dropsubscription, Line 103
Invalid object name 'dbo.syssubscriptions'.
Msg 266, Level 16, State 2, Procedure sp_MSrepl_dropsubscription, Line 103
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_dropsubscription, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_MSpublishdb, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_replicationdboption, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.
Msg 266, Level 16, State 2, Procedure sp_dropdistributor, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.

In this case, I can tell that there has been an exception within the execution of the stored procedures. What needs doing is for the replication markers on the databases to be removed so that replication can be dropped cleanly on the database.

Step 1

 

-- Remove the replication marker on the databases

EXEC master..sp_MSforeachdb '
USE [?]
IF ''?'' <> ''master'' AND ''?'' <> ''model'' AND ''?'' <> ''msdb'' AND ''?'' <> ''tempdb''
BEGIN
    SELECT ''?''

 IF OBJECT_ID(''tempdb..#dbproperty'') is not null
 drop table #dbproperty

 create table #dbproperty (OptionName   varchar(20), CurrentSetting varchar(10))

 insert into #dbproperty
 exec sp_dboption ''?'',''published''

 IF EXISTS (Select * from #dbproperty where CurrentSetting = ''ON''  )
 exec sp_removedbreplication @dbname = ''?'', @type = ''tran''

 IF OBJECT_ID(''tempdb..#dbproperty'') is not null
 drop table #dbproperty

END
'

This will now drop SQL replication.


exec master..sp_dropdistributor @no_checks = 1

It should now work.

Change the servername now.

sp_dropserver WRONGSERVER

— Create new server name

sp_addserver ‘RIGHTSERVER’,local

Restart the SQL service on the publisher.

The correct new servername should now be reflected on the SQL instance.

Now proceed to setting up SQL replication.

 

Posted in Scripts, SQL 2008, SQL replication, Tips | 2 Comments »

Monitoring SQL replication using the tracer tokens TSQL dynamically

Posted by rahmanagoro on March 17, 2011


I recently wanted to monitor the latency between publisher/subscriber within a SQL transactional replication setup. This can be done quite easily within the GUI, but I wanted to automate the process and make it run as a frequent check on the database.

I have written a script and stored procedure as to how I managed to achieve this.

 

SET QUOTED_IDENTIFIER ON
GO

/*
Author:  Rahman Agoro
Description This stored procedure will insert tracer tokens and monitor SQL replication
Date  14/02/2011
USE    declare @alarm int exec spCheckReplicationLatency '00:02:00', '%SUBSCRIBER%',@alarm output select @alarm */
CREATE procedure [dbo].[spCheckReplicationLatency] (@DelayLength char(8)= '00:00:00', @Exclusion varchar(100),@Alarm int output )
as
Begin
SET NOCOUNT ON

DECLARE @LoopStart int
DECLARE @LoopEnd int
DECLARE @Database varchar(50)
DECLARE @Publication varchar(100)
DECLARE @SQL NVARCHAR(4000)
DECLARE @ReplicationIssue int
DECLARE @Error int
DECLARE @Seperator char(10)
DECLARE @Message varchar(1000)
DECLARE @operatorEmail varchar(100)

SET @Seperator = char(9)
--Check to see if the distribution database is present.
IF DB_ID('distribution') is null
RAISERROR ('distribution database not present',16,1)
IF OBJECT_ID('tempdb..##replicationpublicationlatency') is not null
drop table ##replicationpublicationlatency

create table ##replicationpublicationlatency
(
 pubID   int identity(1,1),
 databasename varchar(50),
 publication varchar(100)
)

insert into ##replicationpublicationlatency (databasename,publication)
select publisher_db,publication
from distribution..MSpublications

SET @LoopStart = 1
SET @LoopEnd = (SELECT COUNT(*) from ##replicationpublicationlatency)

WHILE (@LoopStart <= @LoopEnd)
BEGIN

 select @Database = databasename, @Publication = publication from ##replicationpublicationlatency
 WHERE pubID = @LoopStart

 SET @SQL = ''
 SET @SQL = @SQL + CHAR(10) + 'USE ' + @Database + '  exec sp_posttracertoken ' + '''' + @Publication + '''' + char(10) + char(13)

 SET @LoopStart = @LoopStart + 1
 exec master..sp_executesql @SQL
 
END
DECLARE @ReturnInfo varchar(255)
IF ISDATE('2000-01-01 ' + @DelayLength + '.000') = 0
    BEGIN
        SELECT @ReturnInfo = 'Invalid time ' + @DelayLength
        + ',hh:mm:ss, submitted.';
        -- This PRINT statement is for testing, not use in production.
        PRINT @ReturnInfo
        GOTO ErrorHandler
    END
    WAITFOR DELAY @DelayLength

IF OBJECT_ID('tempdb..##TokenHistory') IS NOT NULL
DROP TABLE ##TokenHistory

;with CTE_Latest_Tracer_Token_History(publication,publication_id,agent_id,agent_name,publisher_commit,distributor_commit,subscriber_commit,publisher_distributor,distributor_subscriber,Total_latency, RANKING)
as
(
SELECT a.publication,publication_id, agent_id, a.name,
       t.publisher_commit, t.distributor_commit, h.subscriber_commit,datediff(ss,t.publisher_commit,t.distributor_commit )publisher_distributor,
       datediff(ss,t.distributor_commit, h.subscriber_commit)distributor_subscriber,datediff(ss,t.publisher_commit,t.distributor_commit) + datediff(ss,t.distributor_commit, h.subscriber_commit) Total_latency,
       ROW_NUMBER() OVER (PARTITION BY publication_id, agent_id ,a.name ORDER BY t.publisher_commit DESC) AS RANKING

FROM distribution..MStracer_tokens t
JOIN distribution..MStracer_history h
ON t.tracer_id = h.parent_tracer_id
  JOIN distribution..MSdistribution_agents a
on a.id = agent_id
and a.name not like @Exclusion
)
select *
into ##TokenHistory
from CTE_Latest_Tracer_Token_History
where RANKING in (1)
order by publication

SET @ReplicationIssue = (select count(*) from ##TokenHistory where Total_latency is null)
SET @Message = 'Replication Latency Report ' + @@SERVERNAME
 if ISNULL(@ReplicationIssue,0) > 0         
 Begin         
    Set @Alarm = 1        
    print cast(@ReplicationIssue as nvarchar)+N' replication agents has errors'         
 select @operatorEmail = email_address from msdb..sysoperators where name = 'DBA'       
  

     EXEC msdb.dbo.sp_send_dbmail @recipients=@operatorEmail, 
     @subject = @Message, 
     @body = 'Please see attached replication latency report', 
     @body_format = 'HTML', 
     @query ='SELECT agent_name,publisher_distributor,distributor_subscriber,Total_latency FROM ##TokenHistory  ' , 
     @query_attachment_filename = 'Replication_Latency_Report.txt', 
     @attach_query_result_as_file = 'TRUE',   
     @query_result_separator = @Seperator

--WE have an issue if the issue flag  = 1
 Insert into tblReplicationLatencyLog (AgentName,LogDate,Issue)
 SELECT agent_name,GETDATE(),1
 from ##TokenHistory where Total_latency is NULL

 Set @Error = @@Error if @Error <> 0 GoTo ErrorHandler    

 End         
 Else         
 Insert into tblReplicationLatencyLog (AgentName,LogDate,Issue)
 SELECT 'No Issue',GETDATE(),0

    Set @Alarm = 0         
                 

ErrorHandler: 
 
IF  @Error <> 0 
Begin 
 
 Print 'Error Occured !' 
 Return(@Error)     
  
End 
 
END

GO

In this example, I wanted to log cases where there had been replication latency issues. Calling the stored procedure also gives an option to specify a wait for period, this is because when you insert a tracer token, you need to allow for some time for the token to get onto the subscribers and report back to the publisher. This is an example of how the stored procedure is actually used.

 

declare @alarm int exec spCheckReplicationLatency '00:02:00', '%SUBSCRIBER%',@alarm output select @alarm 

The reason why I had the subscriber to exclude was because in my case, there was a particular subscriber across the atlantic which was like a DR database server and according to SLA’s, its allowed a few minutes lag behind, and I didnt want to raise alarms all the time due to this particular server.

Posted in SQL 2008, SQL replication, Tips | Leave a Comment »

Visual studio 2010 team edition database deployment and SQL replication support

Posted by rahmanagoro on November 2, 2010


I have been using Microsoft Visual Studio 2010 within our shop, we use it towards the management and version control of SQL databases, these databases are stored as projects under visual studio. It’s within an agile environment and releases can happen almost every weekend or during the week. We also use SQL transaction replication in order to maintain sets of reporting database servers.

In the last couple of weeks, I tried to do a deployment onto a publisher database and ran into and error with visual studio, it was trying to drop and object on the database whilst the object was being replicated. This obviously throws and exception on SQL

It’s official that Microsoft visual studio does not support SQL replication and the only known workaround to the problem is to change the deployment properties to ignore verification checks. This can either be done from the GUI or altering an XML file, which is what the GUI does.

You can get to the deployment properties by navigating to the following location from visual studio.

Project >> Properties >> Database deployment

Posted in Database development, Day to day queries, Management, Scripts, SQL 2008, SQL replication, Tips | Leave a Comment »

Could not obtain information about Windows NT group/user ‘domain\xxxx’, error code 0x5. sp_dropdistributor

Posted by rahmanagoro on October 11, 2010


I have recently ran into an issue with dropping replication, it complains about the error below.

exec master..sp_dropdistributor @no_checks = 1
Msg 15404, Level 16, State 19, Procedure sp_replcmds, Line 1
Could not obtain information about Windows NT group/user ‘DOMAIN\xxxx.xxxxxx’, error code 0x5.

Exceptions like this are often caused when the login does not have access to a resource, or maybe when one is trying to detach/attach a database in SQL 2008 whilst using different user names. It sounded quite strange that I am having the same issues whilst trying to do something as simple as dropping replication on a test server. I setup a profiler trace for the session that I am using to try and track where the exception is occurring so as to know how to tackle the problem. I realized that the stored procedure was failing when it got the section trying to run the code.

sys.sp_replcmds 0

This stored procedure runs as part of exec master..sp_dropdistributor @no_checks = 1

Upon further investigation, I was able to run the stored procedure on the database which was throwing the exception, the first thing that came to my mind was, “which login owns the database ?”. I then ran the code below to check the owners of each and every database.

select name, SUSER_NAME(sid) as DBOwner from master..sysdatabases (nolock)
where name not like '%Snap'
order by name

Some of the databases came up as “sa” which is my preferred db owner and some came up as null, the ones that came up as null were databases that I had just restored recently using my windows login, once restored the SIDS did not match that which was on the master database in the syslogins table.

In order to fix the problem, I then ran the script below that will update all the database owners on the database to sa.

   EXEC master..sp_MSforeachdb '
USE [?]
IF DB_NAME() not in (''master'',''msdb'',''distribution'',''tempdb'',''model'')
BEGIN
	IF db_name() not like ''%Snap''
		BEGIN
		   SELECT ''?''
			exec sp_changedbowner ''sa''
		END	
END'

Now whilst trying to drop the distributor, it now works without any problem.

exec master..sp_dropdistributor @no_checks = 1

As a precaution, always change the owner of the databases to SA in order to avoid any such problems. Another issue that can occur is if the windows login which owns the database is removed from active directory ? thesame problem can occur.

   EXEC master..sp_MSforeachdb '
USE [?]
IF DB_NAME() not in (''master'',''msdb'',''distribution'',''tempdb'',''model'')
BEGIN
	IF db_name() not like ''%Snap''
		BEGIN
		   SELECT ''?''
			exec sp_changedbowner ''sa''
		END	
END'

Posted in Management, Scripts, SQL replication, Tips | 2 Comments »

Replication snapshot agent error The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity.

Posted by rahmanagoro on September 14, 2010


We recently had a production release where I had to add two new subscribers to the replication set, the idea was that we were moving onto new hardware but we couldn’t delete the two older subscribers, as we had to leave the subscribers running for a few more weeks just in case we had issues with the two new database servers in the new data center. The strategy was that should we encounter issues in the new data center, we would just re-route our applications to connect to the two subscribers in the old data center.

After adding the two new subscriber, meaning we had six subscribers in total, I started seeing errors within the replication snapshot agent. The error was as follows

Error messages:

The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

I noticed that the only article Microsoft had on this was on the link, http://technet.microsoft.com/en-us/library/ms152484.aspx. It was just telling me something that the message already says, the page didn’t detail a proper workaround, one thing with our replication topology is that we use the same server as the OLTP, publisher and distributor and whenever the replication snapshot agents failed, its normally at busy periods of the day, but I couldn’t still nail down the problem, I then decided to add some form of verbose logging on the snapshot agent job to see what the problem was, but by the time I got round to doing this, I was told that we can now remove the two extra subscribers, after removing the two extra subscribers the errors stopped. It was also difficult for me to recreate the problem on the test environment because we didn’t have six subscribers on test environment; we only had a maximum of two and besides this issue only started after adding two extra subscribers.

I also looked at the idea of changing the heartbeat interval, but this didn’t resolve the problem as this merely changed the details of the error message to reflect the new interval period specified.

It’s a weird one, but one which I hope Microsoft will be able to provide a more accurate workaround.

Posted in SQL 2008, SQL replication, Tips, Uncategorized | Leave a Comment »

SQL 2008 replication error ” retrying command or the process is running and waiting for a response from the server “

Posted by rahmanagoro on August 26, 2010


I was working with SQL 2008 replication and on checking replication monitor, I noticed that on the distribution agent for one of the databases, it had a retrying command status. I further looked into it to see what the problem was, here I can see that the distribution agent had become stucked whilst trying to send commands to the subscriber. The first thing that came to my mind was, “what is happening with the subscriber ? ”


I ran the following on the subscriber

use  master
 GO
 sp_who2 active

I was able to see that there is a blocking spid at the moment, which is coming from my distributor. So what must have happened here is that, as a snapshot was being delivered to the subscriber, objects such as stored procedures and functions were also being delivered as well, thereby causing a deadlock to occur.

To confirm this, I ran the code below to check for replication errors.

select * from distribution..MSrepl_errors

I had entries like

1092 2010-08-01 00:35:16.560 0 0 MSSQL_ENG HYT00 Query timeout expired 0x0001262200000210000100000002 1 0

1092 2010-08-01 00:35:16.557 0 1 MSSQL_ENG if @@trancount > 0 rollback tran 0x0001262200000210000100000002 1 0

On the subscriber, I confirmed the locking scenario by looking at the wait type for the replication SPID, and it was a LCK_M_S wait type, which confirmed it was a deadlocking situation.

I also ran a deadlocking detail script as well.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SELECT
 owt.session_id AS waiting_session_id,
 DB_NAME(tls.resource_database_id) AS database_name,
 owt.wait_duration_ms,
--    owt.waiting_task_address,
 owt.wait_type,
--    tls.resource_associated_entity_id,
--    tls.resource_description AS local_resource_description,
 tls.resource_type,
 (CASE tls.resource_type
 WHEN 'OBJECT' THEN OBJECT_NAME(tls.resource_associated_entity_id, tls.resource_database_id)
 WHEN 'DATABASE' THEN DB_NAME(tls.resource_database_id)
 ELSE (SELECT  OBJECT_NAME(pat.[object_id], tls.resource_database_id)
 FROM sys.partitions pat WHERE pat.hobt_id = tls.resource_associated_entity_id)

 END
 ) AS object_name,
 tls.request_mode,
 tls.request_type,
 tls.request_session_id,
 owt.blocking_session_id,
 owt.resource_description AS blocking_resource_description,
 (SELECT SUBSTRING(est.[text], ers.statement_start_offset/2 + 1,
 (CASE WHEN ers.statement_end_offset = -1
 THEN LEN(CONVERT(nvarchar(max), est.[text])) * 2
 ELSE ers.statement_end_offset
 END
 - ers.statement_start_offset
 ) / 2)
 FROM sys.dm_exec_sql_text(ers.[sql_handle]) AS est) AS waiting_query_text,
 CASE WHEN owt.blocking_session_id > 0
 THEN (
 SELECT
 est.[text] FROM sys.sysprocesses AS sp
 CROSS APPLY sys.dm_exec_sql_text(sp.[sql_handle]) as est
 WHERE sp.spid = owt.blocking_session_id)
 ELSE
 NULL
 END AS blocking_query_text,
 qp.query_plan AS waiting_query_plan
FROM sys.dm_tran_locks AS tls
 INNER JOIN sys.dm_os_waiting_tasks owt ON tls.lock_owner_address = owt.resource_address
 INNER JOIN sys.dm_exec_requests ers ON tls.request_request_id = ers.request_id AND owt.session_id = ers.session_id
 OUTER APPLY sys.dm_exec_query_plan(ers.[plan_handle]) AS qp
GO


I was able to see that one spid was calling the stored procedure:

exec sp_MSacquiresnapshotdeliverysessionlock

and the other spid only had a begin tran statement showing from my dbcc inputbuffer statement.
I killed this SPID which had begin tran as the sql statement which it was running, and the replication commands were delivered and replication was working once more. The problem was fixed.

Posted in SQL 2008, SQL Administration, SQL replication, Uncategorized | 2 Comments »