Rahman Agoro, the frontline SQL DBA

Just another WordPress.com weblog

Posted by rahmanagoro on August 12, 2011


I was working with SQL 2008 replication and on checking replication monitor, I noticed that on the distribution agent for one of the databases, it had a retrying command status. I further looked into it to see what the problem was, here I can see that the distribution agent had become stucked whilst trying to send commands to the subscriber. The first thing that came to my mind was, “what is happening with the subscriber ? ”

I ran the following on the subscriber

use master
GO
sp_who2 active

I was able to see that there is a blocking spid at the moment, which is coming from my distributor. So what must have happened here is that, as a snapshot was being delivered to the subscriber, objects such as stored procedures and functions were also being delivered as well, thereby causing a deadlock to occur.

To confirm this, I can the following

from distributor

I had entries like

1092 2010-08-01 00:35:16.560 0 0 MSSQL_ENG HYT00 Query timeout expired 0x0001262200000210000100000002 1 0
1092 2010-08-01 00:35:16.557 0 1 MSSQL_ENG if @@trancount > 0 rollback tran 0x0001262200000210000100000002 1 0

On the subscriber, I confirmed the locking scenario by looking at the wait type for the replication SPID, and it was a LCK_M_S wait type, which confirmed it was a deadlocking situation.

I also ran a deadlocking detail script as well.

See

— Detailed blocking information with query information

SET

SELECT

TRANSACTIONISOLATIONLEVELREADUNCOMMITTED

owt

.session_id AS waiting_session_id,

DB_NAME(tls.resource_database_id)AS database_name,

owt

.wait_duration_ms,

— owt.waiting_task_address,

owt

.wait_type,

— tls.resource_associated_entity_id,

— tls.resource_description AS local_resource_description,

tls

.resource_type,

(CASE tls.resource_typeWHEN‘OBJECT’THENOBJECT_NAME(tls.resource_associated_entity_id, tls.resource_database_id)

WHEN‘DATABASE’THENDB_NAME(tls.resource_database_id)

ELSE (SELECTOBJECT_NAME(pat.[object_id], tls.resource_database_id)

FROMsys.partitions pat WHERE pat.hobt_id = tls.resource_associated_entity_id)

END

)ASobject_name,

tls

.request_mode,

tls

.request_type,

tls

.request_session_id,

owt

.blocking_session_id,

owt

.resource_description AS blocking_resource_description,

(SELECTSUBSTRING(est.[text], ers.statement_start_offset/2 + 1,

(CASEWHEN ers.statement_end_offset =1THENLEN(CONVERT(nvarchar(max), est.[text]))* 2ELSE ers.statement_end_offsetEND

ers.statement_start_offset)/ 2)

FROMsys.dm_exec_sql_text(ers.[sql_handle])AS est)AS waiting_query_text,

CASEWHEN owt.blocking_session_id > 0 THEN (

SELECT

est

.[text] FROMsys.sysprocessesAS spCROSSAPPLYsys.dm_exec_sql_text(sp.[sql_handle])as estWHERE sp.spid = owt.blocking_session_id)

ELSE

NULL

ENDAS blocking_query_text,

qp

.query_plan AS waiting_query_plan

FROM

sys.dm_tran_locksAS tlsINNERJOINsys.dm_os_waiting_tasks owt ON tls.lock_owner_address = owt.resource_addressINNERJOINsys.dm_exec_requests ers ON tls.request_request_id = ers.request_id AND owt.session_id = ers.session_idOUTERAPPLYsys.dm_exec_query_plan(ers.[plan_handle])AS qp

GO

I was able to see that one spid was calling the stored procedure: execsp_MSacquiresnapshotdeliverysessionlock

Running dbcc inputbuffer(spid) on the other was was only showing the statement as begin tran. I killed this session and the replication commands were delivered and replication was working once more.

Then reinitialise the subscription and the problem was fixed.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: