Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 927236

Summary: Deadlock between PageSubscriptionImpl and PageTransactionInfoImpl
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Francisco Borges <francisco.borges>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: anmiller, brian.stansberry, csuconic, myarboro
Target Milestone: ER6Keywords: TestBlocker
Target Release: EAP 6.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
threadump_client.txt
none
threadump_deadlock_backup.txt none

Description Miroslav Novak 2013-03-25 13:15:40 UTC
Description of problem:
We hit deadlock during failover of JMS client (with TRANSACTED_SESSION) from live to backup. Servers were configured to use replicated journal.

How reproducible:
1. Start EAP 6.1.0.ER3 (HornetQ 2.3.0.CR1) live/backup pair in dedicated topology with replicated journal
2. Start producer - sends messages to queue
3. Start consumer - receives messages from queue
4. Kill live server (kill -9 ...)
Result:
Sometimes happened that consumer hanged after failover to backup at (threadump_client.txt):
org.jboss.qa.hornetq.apps.clients.ReceiverTransAck.receiveMessage(ReceiverTransAck.java:287)

It seems that it's caused by deadlock on backup server (threadump_deadlock_backup.txt):

Java stack information for the threads listed above:
===================================================
"Thread-23 (HornetQ-server-HornetQServerImpl::serverUUID=null-1009921661)":
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.getPageInfo(PageSubscriptionImpl.java:833)
	- waiting to lock <0x00000000cfb71508> (a org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl)
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.processACK(PageSubscriptionImpl.java:900)
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.lateDeliveryRollback(PageSubscriptionImpl.java:618)
	at org.hornetq.core.paging.impl.PageTransactionInfoImpl.rollback(PageTransactionInfoImpl.java:229)
	- locked <0x00000000cfd7f278> (a org.hornetq.core.paging.impl.PageTransactionInfoImpl)
	at org.hornetq.core.paging.impl.PagingStoreImpl$FinishPageMessageOperation.afterRollback(PagingStoreImpl.java:1013)
	at org.hornetq.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:495)
	- locked <0x00000000cfd8cf00> (a org.hornetq.core.transaction.impl.TransactionImpl)
	at org.hornetq.core.transaction.impl.TransactionImpl.access$400(TransactionImpl.java:38)
	at org.hornetq.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:365)
	at org.hornetq.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:236)
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)
"Thread-21 (HornetQ-server-HornetQServerImpl::serverUUID=null-1009921661)":
	at org.hornetq.core.paging.impl.PageTransactionInfoImpl.deliverAfterCommit(PageTransactionInfoImpl.java:249)
	- waiting to lock <0x00000000cfd7f278> (a org.hornetq.core.paging.impl.PageTransactionInfoImpl)
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.moveNext(PageSubscriptionImpl.java:1346)
	- locked <0x00000000cfb71508> (a org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl)
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.next(PageSubscriptionImpl.java:1261)
	- locked <0x00000000cfb6d208> (a org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator)
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.hasNext(PageSubscriptionImpl.java:1415)
	- locked <0x00000000cfb6d208> (a org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator)
	at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1862)
	at org.hornetq.core.server.impl.QueueImpl.access$1200(QueueImpl.java:78)
	at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2523)
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:679)

Found 1 deadlock.

Comment 1 Miroslav Novak 2013-03-25 13:16:45 UTC
Created attachment 716012 [details]
threadump_client.txt

Comment 2 Miroslav Novak 2013-03-25 13:17:10 UTC
Created attachment 716013 [details]
threadump_deadlock_backup.txt

Comment 3 Clebert Suconic 2013-03-25 17:02:21 UTC
I believe this is fixed.. we have got this on our testsuite as well and not any longer.. next version should fix it

Comment 4 Miroslav Novak 2013-04-02 07:33:53 UTC
I'll try with EAP 6.1.0.ER4 (HQ 2.3.0.CR2). Moving to ON_QA.

Comment 6 Miroslav Novak 2013-04-11 10:35:34 UTC
This issue was hit againg with EAP 6.1.0.ER4 (HQ 2.3.0.CR2). 

Server had to be patched by fix from bz#948247 - "Backup can't start after fallback - java.lang.IllegalStateException: ServiceBuilder is already installed" because it was blocking this testing. Moving to ASSIGNED.

Comment 8 Francisco Borges 2013-04-18 13:03:33 UTC
This commit https://github.com/FranciscoBorges/hornetq/commit/fdd0ee4216e8b6f66a7b887d69f006248f85e51c is supposed to fix this.

Right now this is on an unmerged PR https://github.com/hornetq/hornetq/pull/1006

Comment 9 Paul Gier 2013-04-18 21:03:49 UTC
Changing status to POST because PR has not yet been merged.
Unsetting target milestone, because it hasn't been determined yet which EAP build will have this fix.

Comment 10 Francisco Borges 2013-04-19 07:53:45 UTC
The PR is merged and the commit is in our master branch

https://github.com/hornetq/hornetq/commit/6873ef0118abb8d79d2b5b4aac7df141e94b6a2b

The PR:
https://github.com/hornetq/hornetq/pull/1006

Comment 11 Miroslav Novak 2013-05-03 11:26:29 UTC
I cannot hit deadlock anymore. This appears to be fixed. Thanks for the fix Francisco!

Verified in EAP 6.1.0.ER6