Posted by rahmanagoro on August 12, 2011
I was working with SQL 2008 replication and on checking replication monitor, I noticed that on the distribution agent for one of the databases, it had a retrying command status. I further looked into it to see what the problem was, here I can see that the distribution agent had become stucked whilst trying to send commands to the subscriber. The first thing that came to my mind was, “what is happening with the subscriber ? ”
I ran the following on the subscriber
I was able to see that there is a blocking spid at the moment, which is coming from my distributor. So what must have happened here is that, as a snapshot was being delivered to the subscriber, objects such as stored procedures and functions were also being delivered as well, thereby causing a deadlock to occur.
To confirm this, I can the following
I had entries like
1092 2010-08-01 00:35:16.560 0 0 MSSQL_ENG HYT00 Query timeout expired 0x0001262200000210000100000002 1 0
1092 2010-08-01 00:35:16.557 0 1 MSSQL_ENG if @@trancount > 0 rollback tran 0x0001262200000210000100000002 1 0
On the subscriber, I confirmed the locking scenario by looking at the wait type for the replication SPID, and it was a LCK_M_S wait type, which confirmed it was a deadlocking situation.
I also ran a deadlocking detail script as well.
.session_id AS waiting_session_id,
— tls.resource_description AS local_resource_description,
(CASE tls.resource_typeWHEN‘OBJECT’THENOBJECT_NAME(tls.resource_associated_entity_id, tls.resource_database_id)
ELSE (SELECTOBJECT_NAME(pat.[object_id], tls.resource_database_id)
FROMsys.partitions pat WHERE pat.hobt_id = tls.resource_associated_entity_id)
.resource_description AS blocking_resource_description,
(SELECTSUBSTRING(est.[text], ers.statement_start_offset/2 + 1,
(CASEWHEN ers.statement_end_offset =–1THENLEN(CONVERT(nvarchar(max), est.[text]))* 2ELSE ers.statement_end_offsetEND
– ers.statement_start_offset)/ 2)
FROMsys.dm_exec_sql_text(ers.[sql_handle])AS est)AS waiting_query_text,
CASEWHEN owt.blocking_session_id > 0 THEN (
.[text] FROMsys.sysprocessesAS spCROSSAPPLYsys.dm_exec_sql_text(sp.[sql_handle])as estWHERE sp.spid = owt.blocking_session_id)
.query_plan AS waiting_query_plan
sys.dm_tran_locksAS tlsINNERJOINsys.dm_os_waiting_tasks owt ON tls.lock_owner_address = owt.resource_addressINNERJOINsys.dm_exec_requests ers ON tls.request_request_id = ers.request_id AND owt.session_id = ers.session_idOUTERAPPLYsys.dm_exec_query_plan(ers.[plan_handle])AS qp
I was able to see that one spid was calling the stored procedure: execsp_MSacquiresnapshotdeliverysessionlock
Running dbcc inputbuffer(spid) on the other was was only showing the statement as begin tran. I killed this session and the replication commands were delivered and replication was working once more.
Then reinitialise the subscription and the problem was fixed.