| Summary: | Cancelling undelivered messages during shutdown leads to message loss | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise SOA Platform 5 | Reporter: | Kevin Conner <kevin.conner> |
| Component: | JBoss Messaging | Assignee: | trev <tkirby> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 5.1.0.ER4 | CC: | jbertram |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | http://jira.jboss.org/jira/browse/SOA-2593 | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-11-19 09:34:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Kevin Conner
2010-11-18 12:27:08 UTC
This can be reproduced simply, by creating a clustered queue over two nodes and delaying the delivery via the message sucker. The second node should contain the only consumer, with all messages being delivered to the first. The following byteman script will delay the delivery, allowing the messages to buffer, during which a clean shutdown of the first server will result in the failure ([ServerSessionEndpoint] No DLQ has been specified so the message will be removed) RULE delay delivery CLASS org.jboss.messaging.core.impl.clusterconnection.MessageSucker METHOD onMessage AT CALL send IF true DO Thread.sleep(1000) ENDRULE The bug appears to be in ServerSessionEndpoint.cancelDeliveryInternal
boolean reachedMaxDeliveryAttempts =
cancel.isReachedMaxDeliveryAttempts() || cancel.getDeliveryCount() >= rec.maxDeliveryAttempts;
cancel.isReachedMaxDeliveryAttempts() == false
cancel.getDeliveryCount() == 0
rec.maxDeliveryAttempts; == -1
Link: Added: This issue depends JBPAPP-5429 Thanks Kevin. I'm having trouble reproducing it. Here is what I did: 1. start a cluster of two nodes, node0 and node1 2. send a message to node0, but receive it on node1 3. in MessageSucker.onMessage(), I put a 20 sec sleep before send() call. 4. when message is sucked from node0 to node1, onMessage() method is called. During the sleep I shutdown node0 (control-c). 5. I observe the message still be received by consumer on node1. Am I missing some step? Howard You need to send multiple messages to the first node so that the deliveries buffer up in the consumer associated with the MessageSucker. It is the buffered messages which are cancelled and then lost. Also, use the above byteman script to stall the delivery to the local queue on the second node. Ignore the part about byteman, it looks like you have modified the code directly to introduce the delay. The key is sending multiple messages so that they buffer and must be cancelled. Thanks Kevin. This time I sent 3 messages. But still i didn't reproduce it. I'm not familiar with byteman but I'll try tomorrow. Just to confirm with you: 1. messages are sent to first node and then the messages are sucked to the second (the only consumer connects to). 2. the sleep happens at the second node before send() call in onMessage() 3. shut down the second node so the first node will cancel the messages. This is the step I did. If the steps are correct , then I guess this issue is not always happens. Thanks This looks a lot like JBMESSAGING-1774. Can anyone confirm? Assigning to Trevor, since when this is fixed, it will involve a build update of EAP. hi Justin, Again, for another time, you saved me. :) I can pretty confirm it hits 1774, as I have looked the code that the maxDeliveryAttempts has no where to be -1 in the Branch_1_4 (jbm dev branch). I've been wondering I must have missed some hidden code. Thanks Justin. Can you deliver a patch to Kevin and let him confirm? :) Eagle eye Justin. Howard Rejecting this for SOA 5.1 as the fix was made elsewhere in the JBM codebase and I didn't pick up on that. The fix was done for JBM 1.4.7 GA which is the version we currently use. |