Bug 901158 (JBPAPP6-1191)
| Summary: | Messages are hidden after restart when ACKs are done out of order on paging | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Miroslav Novak <mnovak> | ||||||
| Component: | HornetQ | Assignee: | Clebert Suconic <clebert.suconic> | ||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 6.0.0 | CC: | clebert.suconic, mnovak | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | EAP 6.0.1 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| URL: | http://jira.jboss.org/jira/browse/JBPAPP6-1191 | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-12-07 11:06:23 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Attachment: Added: reproducer2.zip Attachment: Added: hq-journal-with-lost-message.log Link: Added: This issue relates to JBPAPP-10305 Sources for clients and mdb are attached in JBPAPP-10305. Release Notes Docs Status: Added: Not Required This is more a duplicate of JBPAPP-10305 Miroslav - please can you: 1. Update the description as to how to dequeue the messages 2. Confirm whether there is message loss I realise you clarified some of this in a comment on JBPAPP-10305 but it is best to keep this particular Jira accurate as a single entity I guess 2 is implied by the title, but it helps when following your 1,2,3... steps to see this there too you see as then we know when to expect the message lost, I guess it is step 7 Message loss implies there is a bug somewhere here, but please do bear in my my comment on JBPAPP-10305 how if "commit" is performed by the RM but the TM does not get chance to remove the log record before the TM crashes then the stack traces you are seeing may happen. That said, I do not think this can happen in this scenario as the TM is never crashed in your scenario. NOTE: There is no way to differentiate between an RM incorrectly not reporting an indoubt XID or having committed pre-crash apart from checking log files or some business logic specific method. Perhaps use a counter in a JMS message property to work out which message is lost to narrow down what was happening at that time. I'm not sure when I get to modification of reproducer to narrow the issue since EAP 5.2.0.ER4 testing has started. I'm very sorry, the one provided here must be sufficient for now. Link: Added: This issue Cloned to JBPAPP-10388 Release Notes Docs Status: Removed: Not Required Docs QE Status: Removed: NEW Cannot reproduce this issue in EAP 6.0.1.ER4.2. Thanks, Clebert! Closing jira. |
project_key: JBPAPP6 Say you have messages 1, 2, 3, 4 and 5 on Page1. You acked 2, 3 and 4 Pagin will look for the first acked message to start the flow and will hide message 1. This was discovered on the following test scenario: The MDB had random waits and multiple consumers on the queue, as a result we could get on this scenario described before. Test scenario: 1. Configure two EAP 6 servers - "mdb" and "jms" server. "mdb" server has configured HornetQ resource adapter to connect to remote "jms" server. 2. Deploy "InQueue" and "OutQueue" to jms server 3. Send 500 messages to "InQueue" 4. Deploy MDB to mdb server. MDB reads messages from InQueue and sends to OutQueue in XA transaction so it: starts an XA transaction dequeues a single message from InQueue on jms server enqueues the same message to OutQueue on jms server commits the transaction 5. During processing of messages in step 4. kill (kill -9 ...) jms server and restart it. 6. Wait until MDB processes all the messages. 7. Receive messages from "OutQueue" Result: *Sometimes happens that after step 7. there is lost message.* There are warnings on mdb server: {code} 14:33:14,035 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:680, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:681, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6fbdd27a > 14:33:14,036 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016038: No XAResource to recover < formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:680, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:681, subordinatenodename=null, eis_name=unknown eis name > 14:33:14,111 WARN [org.hornetq.jms.server.recovery.HornetQXAResourceWrapper] (Periodic Recovery) null: javax.transaction.xa.XAException at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:1330) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)] at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:163) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)] at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.commit(HornetQXAResourceWrapper.java:110) [hornetq-jms-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)] at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2755) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2671) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1806) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:251) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.6.Final-redhat-1.jar:] 14:33:14,114 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016036: commit on < formatId=131077, gtrid_length=30, bqual_length=36, tx_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:670, node_name=20, branch_uid=0:ffff0a2203bf:3eed75fd:5093cb0d:671, subordinatenodename=null, eis_name=unknown eis name > (HornetQXAResourceWrapper [serverLocator=null, csf=null, delegate=null, xaRecoveryConfigs=[XARecoveryConfig [transportConfiguration = [org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=localhost], discoveryConfiguration = null, username=null, password=null]], instance=811298853]) failed with exception $XAException.XAER_NOTA: javax.transaction.xa.XAException at org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:1330) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)] at org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:163) [hornetq-core-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)] at org.hornetq.jms.server.recovery.HornetQXAResourceWrapper.commit(HornetQXAResourceWrapper.java:110) [hornetq-jms-2.2.22.Final-redhat-1.jar:2.2.22.SNAPSHOT (HQ_2_2_22_AS7_Final, 122)] at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2755) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2671) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1806) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.arjuna.recovery.RecoverAtomicAction.replayPhase2(RecoverAtomicAction.java:71) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.doRecoverTransaction(AtomicActionRecoveryModule.java:152) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.processTransactionsStatus(AtomicActionRecoveryModule.java:251) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.AtomicActionRecoveryModule.periodicWorkSecondPass(AtomicActionRecoveryModule.java:109) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:789) [jbossjts-4.16.6.Final-redhat-1.jar:] at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:371) [jbossjts-4.16.6.Final-redhat-1.jar:] {code} *Lost message is never delivered and seem to be stuck on jms server.* HQ journal shows that there is one message but it's not in any queue - hq-journal-with-lost-message.log Attached reproducer2.zip: 1. Download and unzip reproducer2.zip. Following scripts run in unzipped directory. 2. Prepare servers - "sh prepare.sh" 3. Start jms server - "sh start-server1.sh localhost" 4. Start producer and send 500 messages to InQueue - "sh start-producer.sh localhost 500" 5. Start mdb server - "sh start-server2.sh <some_other_ip>" 6. When mdb server is processing messages, kill (kill -9 ...) jms server and start again by "sh start-server1.sh <some_other_ip>" 7. Receive messages from OutQueue and check warnings in mdb servers log - "sh start-consumer.sh localhost". This issue relates to JBPAPP-10305. There is the same reproducer.