Bug 1310537

Summary: Lost large messages if backup is shutdown during synchronization
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: baranowb <bbaranow>
Status: CLOSED WONTFIX QA Contact: Miroslav Novak <mnovak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.4.6CC: bbaranow, csuconic, egonzale, msvehla, ppenicka
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-19 13:13:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miroslav Novak 2016-02-22 07:59:47 UTC
Cloned from https://issues.jboss.org/browse/JBEAP-3419:

Test scenario:
1. Start live server with replicated journal and queue testQueue0
2. Send 500 large messages to testQueue0 t live
3. Start backup server and receiving messages from testQueue0 (session CLIENT_ACKNOWLEDGE)
4. Before backup is announced/synchronized with live, cleanly shutdown backup
5. Wait until receiver consumes all messages

Expected result:
Receiver consumed 500 messages. No losses or duplicates.

Actual result:
There are lost messages. Client did not receive all messages. Messages are not in the journal of live server after the test.

By tracking message Id of the lost message, the message was send to receiver. Because it's large message, receiver tries to ack the message right away. As backup is already shutdown (step 4) and live cannot sync message acknowledge with backup, live does not respond to client until connection with backup times out. If this timeout for cluster connection is longer than receiver's call-timeout then receiver gets JMSException like from consumer.receive() method:

16:26:12,983 Thread-27 ERROR [org.jboss.qa.hornetq.apps.clients.ReceiverClientAck:341] RETRY receive for host: 127.0.0.1, Trying to receive message with count: 57
javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41
	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:350)
	at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendACK(ActiveMQSessionContext.java:421)
	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.acknowledge(ClientSessionImpl.java:696)
	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.doAck(ClientConsumerImpl.java:1035)
	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.acknowledge(ClientConsumerImpl.java:702)
	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:96)
	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:38)
	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:212)
	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:119)
	at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.receiveMessage(ReceiverClientAck.java:333)
	at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.run(ReceiverClientAck.java:169)
Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41]
	... 11 more

Problem is that message was acked on live server and thus never redelivered to consumer again.

Comment 2 JBoss JIRA Server 2016-02-24 08:25:42 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-3419 to Resolved

Comment 3 JBoss JIRA Server 2016-02-26 08:41:18 UTC
Miroslav Novak <mnovak> updated the status of jira JBEAP-3419 to Reopened

Comment 5 JBoss JIRA Server 2016-07-26 07:20:29 UTC
Andy Taylor <ataylor> updated the status of jira JBEAP-3419 to Resolved

Comment 6 Petr Penicka 2016-09-19 13:13:26 UTC
Triage: closing as this one is for Artemis, fixed in 7.0.2, not applicable for 6.4.