Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 901158 (JBPAPP6-1191)

Summary: Messages are hidden after restart when ACKs are done out of order on paging
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Clebert Suconic <clebert.suconic>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0.0CC: clebert.suconic, mnovak
Target Milestone: ---   
Target Release: EAP 6.0.1   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/JBPAPP6-1191
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-07 11:06:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer2.zip
none
hq-journal-with-lost-message.log none

Description Miroslav Novak 2012-11-02 14:09:09 UTC
project_key: JBPAPP6

Say you have messages 1, 2, 3, 4 and 5 on Page1.

You acked 2, 3 and 4


Pagin will look for the first acked message to start the flow and will hide message 1.

This was discovered on the following test scenario:

The MDB had random waits and multiple consumers on the queue, as a result we could get on this scenario described before.

Test scenario:

1. Configure two EAP 6 servers - "mdb" and "jms" server. "mdb" server has configured HornetQ resource adapter to connect to remote "jms" server.
2. Deploy "InQueue" and "OutQueue" to jms server
3. Send 500 messages to "InQueue" 
4. Deploy MDB to mdb server. MDB reads messages from InQueue and sends to OutQueue in XA transaction so it:
starts an XA transaction
dequeues a single message from InQueue on jms server
enqueues the same message to OutQueue on jms server
commits the transaction
5. During processing of messages in step 4. kill (kill -9 ...) jms server and restart it.
6. Wait until MDB processes all the messages. 
7. Receive messages from "OutQueue"

Result:
*Sometimes happens that after step 7. there is lost message.* There are warnings on mdb server:
{code}
14:33:14,035 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:680, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:681, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6fbdd27a >
14:33:14,036 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016038: No XAResource to recover < formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:680, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:681, subordinatenodename=null, eis_name=unknown eis name >
14:33:14,111 WARN  [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Periodic Recovery) null: javax.transaction.xa.XAException
	at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:1330) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:163) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.commit(HornetQXAResourceWrapper.java:110) [hornetq-jms-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2755) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2671) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1806) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:251) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.6.Final-redhat-1.jar:]

14:33:14,114 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016036: commit on < formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:670, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:671, subordinatenodename=null, eis_name=unknown eis name > (HornetQXAResourceWrapper [serverLocator=null, csf=null, delegate=null, xaRecoveryConfigs=[XARecoveryConfig [transportConfiguration = [org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=localhost], discoveryConfiguration = null, username=null, password=null]], instance=811298853]) failed with exception $XAException.XAER_NOTA: javax.transaction.xa.XAException
	at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:1330) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:163) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.commit(HornetQXAResourceWrapper.java:110) [hornetq-jms-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2755) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2671) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1806) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:251) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.6.Final-redhat-1.jar:]
{code}

*Lost message is never delivered and seem to be stuck on jms server.* HQ journal shows that there is one message but it's not in any queue - hq-journal-with-lost-message.log

Attached reproducer2.zip:
1. Download and unzip reproducer2.zip. Following scripts run in unzipped directory.
2. Prepare servers - "sh prepare.sh"
3. Start jms server - "sh start-server1.sh localhost"
4. Start producer and send 500 messages to InQueue - "sh start-producer.sh localhost 500"
5. Start mdb server - "sh start-server2.sh <some_other_ip>"
6. When mdb server is processing messages, kill (kill -9 ...) jms server and start again by "sh start-server1.sh <some_other_ip>"
7. Receive messages from OutQueue and check warnings in mdb servers log - "sh start-consumer.sh localhost".

This issue relates to JBPAPP-10305. There is the same reproducer.

Comment 1 Miroslav Novak 2012-11-02 14:11:22 UTC
Attachment: Added: reproducer2.zip
Attachment: Added: hq-journal-with-lost-message.log


Comment 2 Miroslav Novak 2012-11-02 14:11:53 UTC
Link: Added: This issue relates to JBPAPP-10305


Comment 3 Miroslav Novak 2012-11-02 14:12:50 UTC
Sources for clients and mdb are attached in JBPAPP-10305.

Comment 4 Misty Stanley-Jones 2012-11-05 04:22:58 UTC
Release Notes Docs Status: Added: Not Required


Comment 5 Clebert Suconic 2012-11-06 20:32:59 UTC
This is more a duplicate of JBPAPP-10305

Comment 6 tom.jenkinson 2012-11-08 10:15:15 UTC
Miroslav - please can you:
1. Update the description as to how to dequeue the messages
2. Confirm whether there is message loss
I realise you clarified some of this in a comment on JBPAPP-10305 but it is best to keep this particular Jira accurate as a single entity

Comment 7 tom.jenkinson 2012-11-08 10:16:23 UTC
I guess 2 is implied by the title, but it helps when  following your 1,2,3... steps to see this there too you see as then we know when to expect the message lost, I guess it is step 7

Comment 8 tom.jenkinson 2012-11-08 10:55:41 UTC
Message loss implies there is a bug somewhere here, but please do bear in my my comment on JBPAPP-10305 how if "commit" is performed by the RM but the TM does not get chance to remove the log record before the TM crashes then the stack traces you are seeing may happen. That said, I do not think this can happen in this scenario as the TM is never crashed in your scenario. NOTE: There is no way to differentiate between an RM incorrectly not reporting an indoubt XID or having committed pre-crash apart from checking log files or some business logic specific method. Perhaps use a counter in a JMS message property to work out which message is lost to narrow down what was happening at that time.



Comment 9 Miroslav Novak 2012-11-08 11:59:22 UTC
I'm not sure when I get to modification of reproducer to narrow the issue since EAP 5.2.0.ER4 testing has started. I'm very sorry, the one provided here must be sufficient for now.

Comment 10 Miroslav Novak 2012-11-09 15:08:17 UTC
Link: Added: This issue Cloned to JBPAPP-10388


Comment 11 Anne-Louise Tangring 2012-11-13 20:43:50 UTC
Release Notes Docs Status: Removed: Not Required 
Docs QE Status: Removed: NEW 


Comment 12 Miroslav Novak 2012-12-07 10:09:38 UTC
Cannot reproduce this issue in EAP 6.0.1.ER4.2. Thanks, Clebert!

Closing jira.