Bug 1292768

Summary: one-off (BZ1290841) - Message Loss or Duplicate in cluster with network failures
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: jboss-set
Status: CLOSED UPSTREAM QA Contact: Miroslav Novak <mnovak>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.4.5CC: clichybi, csuconic, istudens, msvehla, toross
Target Milestone: ---   
Target Release: One-off release   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1295675 (view as bug list) Environment:
Last Closed: 2025-02-10 03:48:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1290841, 1295675    
Bug Blocks:    

Description Miroslav Novak 2015-12-18 10:04:07 UTC
Description of problem:
There is lost in cluster with network failures which appears to be the same as bz#1111628. But this time small message was lost.

Version-Release number of selected component (if applicable):
EAP 6.4.5.CP + one-off BZ1290841

Test scenario - "suspend server in remote JCA topology in cluster" [1]
1. Start servers 1 and 3 in HornetQ cluster (jms cluster) with deployed queues InQueue and OutQueue
2. Start servers 2 and 4 (mdb servers) which has configured resource adapter to connect to jms cluster
3. Start sending 10 000  messages to InQueue to jms cluster and deploy MDB to each mdb server
   - MDB consumes messages from InQueue and for each message sends a new message to OutQueue. MDB makes JNDI lookup for OutQueue for each message.
5. During processing of messages by MDB, suspend process of server 3 (jms server) for 10 minutes and resume
6. Wait until all messages are processed from InQueue
7. Receive messages from OutQueue

Test failed because there was 1 message sent to InQueue which did not have its corresponding message in OutQueue:
Lost message detected - there are not corresponding messages for: [ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564]

I've checked the logs and see that message was processed by MDB on server 2 before suspend and sent to OutQueue to server 3. Server 3 tried to load balance this message and sends it to server 1 in the moment when server 3 is suspended. After 10 minutes server 3 is resumed and tried to finish send of this message to server-1. 
But server-1 disconnects from server 3 and this message is lost. 

Last logs from server 3 are:
node-3-log/server-trace.log:00:45:09,664 TRACE [org.hornetq.core.server] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c-1455706055)) Clustered bridge  copied message ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=872,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TOsf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@1583730442 as ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=616,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TOsf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@614579463 before delivery
node-3-log/server-trace.log:00:45:09,664 TRACE [org.hornetq.core.server] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c-1455706055)) going to send message ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=616,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,_HQ_BRIDGE_DUP=[C61E 4996 A548 11E5 8582 BBE5 2863 ED4C 0000 0000 0001 38B8),__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TO=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@614579463
node-3-log/server-trace.log:00:45:09,665 TRACE [org.hornetq.core.client] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c-1455706055)) Sending packet nonblocking PACKET(SessionSendMessage)[type=71, channelID=10, packetObject=SessionSendMessage,message=ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=616,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,_HQ_BRIDGE_DUP=[C61E 4996 A548 11E5 8582 BBE5 2863 ED4C 0000 0000 0001 38B8),__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TO=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@614579463] on channeID=10
node-3-log/server-trace.log:00:45:10,243 TRACE [org.hornetq.core.server] (Old I/O client worker ([id: 0xd1bea55b, /127.0.0.1:38948 => localhost/127.0.0.1:5445])) ClusterConnectionBridge@36cbf30e [name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, queue=QueueImpl[name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c]]@5d535819 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@36cbf30e [name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, queue=QueueImpl[name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c]]@5d535819 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=localhost], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@2144569865[nodeUUID=c61e4996-a548-11e5-8582-bbe52863ed4c, connector=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=7445&host=localhost, address=jms, server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c])) [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=localhost], discoveryGroupConfiguration=null]] Acking Reference[80056]:RELIABLE:ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=872,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TOsf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@1583730442 on queue QueueImpl[name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c]]@5d535819

It appears that server 3 acked this message even if it was not delivered/load-balanced to server1.

I'll provide logs,configs and journals from servers.

Comment 11 Miroslav Novak 2016-01-06 15:29:27 UTC
*** Bug 1296203 has been marked as a duplicate of this bug. ***

Comment 17 JBoss JIRA Server 2016-01-12 13:35:10 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-2100 to Closed

Comment 18 JBoss JIRA Server 2016-01-12 13:35:13 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-2100 to Reopened

Comment 19 JBoss JIRA Server 2016-01-12 16:47:44 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-2100 to Resolved

Comment 20 JBoss JIRA Server 2016-01-19 07:33:41 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Closed

Comment 21 JBoss JIRA Server 2016-01-19 07:33:51 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Reopened

Comment 22 JBoss JIRA Server 2016-01-19 07:34:01 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Resolved

Comment 23 JBoss JIRA Server 2016-01-19 07:34:21 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Closed

Comment 24 JBoss JIRA Server 2016-01-19 07:34:25 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Reopened

Comment 25 JBoss JIRA Server 2016-01-19 07:34:34 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Resolved

Comment 26 JBoss JIRA Server 2016-07-13 16:41:09 UTC
Jiri Pallich <jpallich> updated the status of jira JBEAP-2100 to Closed

Comment 27 Red Hat Bugzilla 2025-02-10 03:48:38 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.