Rahman Agoro, the frontline SQL DBA

Just another WordPress.com weblog

SQL 2008 replication error ” retrying command or the process is running and waiting for a response from the server “

Posted by rahmanagoro on August 26, 2010


I was working with SQL 2008 replication and on checking replication monitor, I noticed that on the distribution agent for one of the databases, it had a retrying command status. I further looked into it to see what the problem was, here I can see that the distribution agent had become stucked whilst trying to send commands to the subscriber. The first thing that came to my mind was, “what is happening with the subscriber ? ”


I ran the following on the subscriber

use  master
 GO
 sp_who2 active

I was able to see that there is a blocking spid at the moment, which is coming from my distributor. So what must have happened here is that, as a snapshot was being delivered to the subscriber, objects such as stored procedures and functions were also being delivered as well, thereby causing a deadlock to occur.

To confirm this, I ran the code below to check for replication errors.

select * from distribution..MSrepl_errors

I had entries like

1092 2010-08-01 00:35:16.560 0 0 MSSQL_ENG HYT00 Query timeout expired 0x0001262200000210000100000002 1 0

1092 2010-08-01 00:35:16.557 0 1 MSSQL_ENG if @@trancount > 0 rollback tran 0x0001262200000210000100000002 1 0

On the subscriber, I confirmed the locking scenario by looking at the wait type for the replication SPID, and it was a LCK_M_S wait type, which confirmed it was a deadlocking situation.

I also ran a deadlocking detail script as well.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SELECT
 owt.session_id AS waiting_session_id,
 DB_NAME(tls.resource_database_id) AS database_name,
 owt.wait_duration_ms,
--    owt.waiting_task_address,
 owt.wait_type,
--    tls.resource_associated_entity_id,
--    tls.resource_description AS local_resource_description,
 tls.resource_type,
 (CASE tls.resource_type
 WHEN 'OBJECT' THEN OBJECT_NAME(tls.resource_associated_entity_id, tls.resource_database_id)
 WHEN 'DATABASE' THEN DB_NAME(tls.resource_database_id)
 ELSE (SELECT  OBJECT_NAME(pat.[object_id], tls.resource_database_id)
 FROM sys.partitions pat WHERE pat.hobt_id = tls.resource_associated_entity_id)

 END
 ) AS object_name,
 tls.request_mode,
 tls.request_type,
 tls.request_session_id,
 owt.blocking_session_id,
 owt.resource_description AS blocking_resource_description,
 (SELECT SUBSTRING(est.[text], ers.statement_start_offset/2 + 1,
 (CASE WHEN ers.statement_end_offset = -1
 THEN LEN(CONVERT(nvarchar(max), est.[text])) * 2
 ELSE ers.statement_end_offset
 END
 - ers.statement_start_offset
 ) / 2)
 FROM sys.dm_exec_sql_text(ers.[sql_handle]) AS est) AS waiting_query_text,
 CASE WHEN owt.blocking_session_id > 0
 THEN (
 SELECT
 est.[text] FROM sys.sysprocesses AS sp
 CROSS APPLY sys.dm_exec_sql_text(sp.[sql_handle]) as est
 WHERE sp.spid = owt.blocking_session_id)
 ELSE
 NULL
 END AS blocking_query_text,
 qp.query_plan AS waiting_query_plan
FROM sys.dm_tran_locks AS tls
 INNER JOIN sys.dm_os_waiting_tasks owt ON tls.lock_owner_address = owt.resource_address
 INNER JOIN sys.dm_exec_requests ers ON tls.request_request_id = ers.request_id AND owt.session_id = ers.session_id
 OUTER APPLY sys.dm_exec_query_plan(ers.[plan_handle]) AS qp
GO


I was able to see that one spid was calling the stored procedure:

exec sp_MSacquiresnapshotdeliverysessionlock

and the other spid only had a begin tran statement showing from my dbcc inputbuffer statement.
I killed this SPID which had begin tran as the sql statement which it was running, and the replication commands were delivered and replication was working once more. The problem was fixed.

Advertisements

2 Responses to “SQL 2008 replication error ” retrying command or the process is running and waiting for a response from the server “”

  1. Sandile said

    I still get the same message even after running all the commands to run the subscription again. It started by saying “The initial snapshot for article ‘BANK’ is not yet available” – “Starting agent.” and the log reader agent in the replication monitor still says “the process is running and is waiting for a response from the server”

  2. Tez said

    Outstanding article – well written and easy to understand, resolved my issue.
    Thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: