Bug 1292768 - one-off (BZ1290841) - Message Loss or Duplicate in cluster with network failures
Summary: one-off (BZ1290841) - Message Loss or Duplicate in cluster with network failures
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: HornetQ
Version: 6.4.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: One-off release
Assignee: jboss-set
QA Contact: Miroslav Novak
URL:
Whiteboard:
: 1296203 (view as bug list)
Depends On: 1290841 1295675
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-18 10:04 UTC by Miroslav Novak
Modified: 2025-02-10 03:48 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
: 1295675 (view as bug list)
Environment:
Last Closed: 2025-02-10 03:48:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker JBEAP-2100 0 Blocker Closed Lost large message in cluster with network failures 2020-03-12 10:28:54 UTC

Description Miroslav Novak 2015-12-18 10:04:07 UTC
Description of problem:
There is lost in cluster with network failures which appears to be the same as bz#1111628. But this time small message was lost.

Version-Release number of selected component (if applicable):
EAP 6.4.5.CP + one-off BZ1290841

Test scenario - "suspend server in remote JCA topology in cluster" [1]
1. Start servers 1 and 3 in HornetQ cluster (jms cluster) with deployed queues InQueue and OutQueue
2. Start servers 2 and 4 (mdb servers) which has configured resource adapter to connect to jms cluster
3. Start sending 10 000  messages to InQueue to jms cluster and deploy MDB to each mdb server
   - MDB consumes messages from InQueue and for each message sends a new message to OutQueue. MDB makes JNDI lookup for OutQueue for each message.
5. During processing of messages by MDB, suspend process of server 3 (jms server) for 10 minutes and resume
6. Wait until all messages are processed from InQueue
7. Receive messages from OutQueue

Test failed because there was 1 message sent to InQueue which did not have its corresponding message in OutQueue:
Lost message detected - there are not corresponding messages for: [ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564]

I've checked the logs and see that message was processed by MDB on server 2 before suspend and sent to OutQueue to server 3. Server 3 tried to load balance this message and sends it to server 1 in the moment when server 3 is suspended. After 10 minutes server 3 is resumed and tried to finish send of this message to server-1. 
But server-1 disconnects from server 3 and this message is lost. 

Last logs from server 3 are:
node-3-log/server-trace.log:00:45:09,664 TRACE [org.hornetq.core.server] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c-1455706055)) Clustered bridge  copied message ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=872,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TOsf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@1583730442 as ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=616,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TOsf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@614579463 before delivery
node-3-log/server-trace.log:00:45:09,664 TRACE [org.hornetq.core.server] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c-1455706055)) going to send message ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=616,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,_HQ_BRIDGE_DUP=[C61E 4996 A548 11E5 8582 BBE5 2863 ED4C 0000 0000 0001 38B8),__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TO=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@614579463
node-3-log/server-trace.log:00:45:09,665 TRACE [org.hornetq.core.client] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c-1455706055)) Sending packet nonblocking PACKET(SessionSendMessage)[type=71, channelID=10, packetObject=SessionSendMessage,message=ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=616,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,_HQ_BRIDGE_DUP=[C61E 4996 A548 11E5 8582 BBE5 2863 ED4C 0000 0000 0001 38B8),__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TO=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@614579463] on channeID=10
node-3-log/server-trace.log:00:45:10,243 TRACE [org.hornetq.core.server] (Old I/O client worker ([id: 0xd1bea55b, /127.0.0.1:38948 => localhost/127.0.0.1:5445])) ClusterConnectionBridge@36cbf30e [name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, queue=QueueImpl[name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c]]@5d535819 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@36cbf30e [name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, queue=QueueImpl[name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c]]@5d535819 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=localhost], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@2144569865[nodeUUID=c61e4996-a548-11e5-8582-bbe52863ed4c, connector=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=7445&host=localhost, address=jms, server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c])) [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=localhost], discoveryGroupConfiguration=null]] Acking Reference[80056]:RELIABLE:ServerMessage[messageID=80056,userID=19ef808e-a549-11e5-9573-8b2db847ede4,priority=4, bodySize=872,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:e1ff20d8-a548-11e5-b1ee-795adfbb9564,__HQ_CID=f2d6cdc1-a548-11e5-9573-8b2db847ede4,_HQ_ROUTE_TOsf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9=[0000 0000 0000 0022),bytesAsLongs(34],_HQ_DUPL_ID=NULL-value]]@1583730442 on queue QueueImpl[name=sf.my-cluster.b53473dc-a548-11e5-ad40-a1735f027ff9, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c61e4996-a548-11e5-8582-bbe52863ed4c]]@5d535819

It appears that server 3 acked this message even if it was not delivered/load-balanced to server1.

I'll provide logs,configs and journals from servers.

Comment 11 Miroslav Novak 2016-01-06 15:29:27 UTC
*** Bug 1296203 has been marked as a duplicate of this bug. ***

Comment 17 JBoss JIRA Server 2016-01-12 13:35:10 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-2100 to Closed

Comment 18 JBoss JIRA Server 2016-01-12 13:35:13 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-2100 to Reopened

Comment 19 JBoss JIRA Server 2016-01-12 16:47:44 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-2100 to Resolved

Comment 20 JBoss JIRA Server 2016-01-19 07:33:41 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Closed

Comment 21 JBoss JIRA Server 2016-01-19 07:33:51 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Reopened

Comment 22 JBoss JIRA Server 2016-01-19 07:34:01 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Resolved

Comment 23 JBoss JIRA Server 2016-01-19 07:34:21 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Closed

Comment 24 JBoss JIRA Server 2016-01-19 07:34:25 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Reopened

Comment 25 JBoss JIRA Server 2016-01-19 07:34:34 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-2100 to Resolved

Comment 26 JBoss JIRA Server 2016-07-13 16:41:09 UTC
Jiri Pallich <jpallich> updated the status of jira JBEAP-2100 to Closed

Comment 27 Red Hat Bugzilla 2025-02-10 03:48:38 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.


Note You need to log in before you can comment on or make changes to this bug.