Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 901158 (JBPAPP6-1191)

Summary:

Messages are hidden after restart when ACKs are done out of order on paging

Product:

[JBoss] JBoss Enterprise Application Platform 6

Reporter:

Miroslav Novak <mnovak>

Component:

HornetQ

Assignee:

Clebert Suconic <clebert.suconic>

Status:

CLOSED NEXTRELEASE

QA Contact:

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.0.0

CC:

clebert.suconic, mnovak

Target Milestone:

---

Target Release:

EAP 6.0.1

Hardware:

Unspecified

OS:

Unspecified

URL:

http://jira.jboss.org/jira/browse/JBPAPP6-1191

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-12-07 11:06:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
reproducer2.zip	none
hq-journal-with-lost-message.log	none

Description Miroslav Novak 2012-11-02 14:09:09 UTC

project_key: JBPAPP6

Say you have messages 1, 2, 3, 4 and 5 on Page1.

You acked 2, 3 and 4


Pagin will look for the first acked message to start the flow and will hide message 1.

This was discovered on the following test scenario:

The MDB had random waits and multiple consumers on the queue, as a result we could get on this scenario described before.

Test scenario:

1. Configure two EAP 6 servers - "mdb" and "jms" server. "mdb" server has configured HornetQ resource adapter to connect to remote "jms" server.
2. Deploy "InQueue" and "OutQueue" to jms server
3. Send 500 messages to "InQueue" 
4. Deploy MDB to mdb server. MDB reads messages from InQueue and sends to OutQueue in XA transaction so it:
starts an XA transaction
dequeues a single message from InQueue on jms server
enqueues the same message to OutQueue on jms server
commits the transaction
5. During processing of messages in step 4. kill (kill -9 ...) jms server and restart it.
6. Wait until MDB processes all the messages. 
7. Receive messages from "OutQueue"

Result:
*Sometimes happens that after step 7. there is lost message.* There are warnings on mdb server:
{code}
14:33:14,035 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:680, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:681, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6fbdd27a >
14:33:14,036 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016038: No XAResource to recover < formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:680, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:681, subordinatenodename=null, eis_name=unknown eis name >
14:33:14,111 WARN  [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Periodic Recovery) null: javax.transaction.xa.XAException
	at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:1330) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:163) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.commit(HornetQXAResourceWrapper.java:110) [hornetq-jms-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2755) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2671) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1806) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:251) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.6.Final-redhat-1.jar:]

14:33:14,114 WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016036: commit on < formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:670, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:671, subordinatenodename=null, eis_name=unknown eis name > (HornetQXAResourceWrapper [serverLocator=null, csf=null, delegate=null, xaRecoveryConfigs=[XARecoveryConfig [transportConfiguration = [org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=localhost], discoveryConfiguration = null, username=null, password=null]], instance=811298853]) failed with exception $XAException.XAER_NOTA: javax.transaction.xa.XAException
	at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:1330) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:163) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.commit(HornetQXAResourceWrapper.java:110) [hornetq-jms-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)]
	at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2755) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2671) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1806) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:251) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.6.Final-redhat-1.jar:]
	at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.6.Final-redhat-1.jar:]
{code}

*Lost message is never delivered and seem to be stuck on jms server.* HQ journal shows that there is one message but it's not in any queue - hq-journal-with-lost-message.log

Attached reproducer2.zip:
1. Download and unzip reproducer2.zip. Following scripts run in unzipped directory.
2. Prepare servers - "sh prepare.sh"
3. Start jms server - "sh start-server1.sh localhost"
4. Start producer and send 500 messages to InQueue - "sh start-producer.sh localhost 500"
5. Start mdb server - "sh start-server2.sh <some_other_ip>"
6. When mdb server is processing messages, kill (kill -9 ...) jms server and start again by "sh start-server1.sh <some_other_ip>"
7. Receive messages from OutQueue and check warnings in mdb servers log - "sh start-consumer.sh localhost".

This issue relates to JBPAPP-10305. There is the same reproducer.

Comment 1 Miroslav Novak 2012-11-02 14:11:22 UTC

Attachment: Added: reproducer2.zip
Attachment: Added: hq-journal-with-lost-message.log

Comment 2 Miroslav Novak 2012-11-02 14:11:53 UTC

Link: Added: This issue relates to JBPAPP-10305

Comment 3 Miroslav Novak 2012-11-02 14:12:50 UTC

Sources for clients and mdb are attached in JBPAPP-10305.

Comment 4 Misty Stanley-Jones 2012-11-05 04:22:58 UTC

Release Notes Docs Status: Added: Not Required

Comment 5 Clebert Suconic 2012-11-06 20:32:59 UTC

This is more a duplicate of JBPAPP-10305

Comment 6 tom.jenkinson 2012-11-08 10:15:15 UTC

Miroslav - please can you:
1. Update the description as to how to dequeue the messages
2. Confirm whether there is message loss
I realise you clarified some of this in a comment on JBPAPP-10305 but it is best to keep this particular Jira accurate as a single entity

Comment 7 tom.jenkinson 2012-11-08 10:16:23 UTC

I guess 2 is implied by the title, but it helps when  following your 1,2,3... steps to see this there too you see as then we know when to expect the message lost, I guess it is step 7

Comment 8 tom.jenkinson 2012-11-08 10:55:41 UTC

Message loss implies there is a bug somewhere here, but please do bear in my my comment on JBPAPP-10305 how if "commit" is performed by the RM but the TM does not get chance to remove the log record before the TM crashes then the stack traces you are seeing may happen. That said, I do not think this can happen in this scenario as the TM is never crashed in your scenario. NOTE: There is no way to differentiate between an RM incorrectly not reporting an indoubt XID or having committed pre-crash apart from checking log files or some business logic specific method. Perhaps use a counter in a JMS message property to work out which message is lost to narrow down what was happening at that time.

Comment 9 Miroslav Novak 2012-11-08 11:59:22 UTC

I'm not sure when I get to modification of reproducer to narrow the issue since EAP 5.2.0.ER4 testing has started. I'm very sorry, the one provided here must be sufficient for now.

Comment 10 Miroslav Novak 2012-11-09 15:08:17 UTC

Link: Added: This issue Cloned to JBPAPP-10388

Comment 11 Anne-Louise Tangring 2012-11-13 20:43:50 UTC

Release Notes Docs Status: Removed: Not Required 
Docs QE Status: Removed: NEW

Comment 12 Miroslav Novak 2012-12-07 10:09:38 UTC

Cannot reproduce this issue in EAP 6.0.1.ER4.2. Thanks, Clebert!

Closing jira.