Bug 780090 (SOA-2457) - MessageSucker failures cause the delivery of the failed message to stall
Summary: MessageSucker failures cause the delivery of the failed message to stall
Keywords:
Status: CLOSED NOTABUG
Alias: SOA-2457
Product: JBoss Enterprise SOA Platform 5
Classification: JBoss
Component: JBoss Messaging
Version: 5.0.0 GA
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: One Off Releases
Assignee: Rick Wagner
QA Contact:
URL: http://jira.jboss.org/jira/browse/SOA...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-21 15:03 UTC by david.boeren
Modified: 2012-08-30 23:35 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-30 20:02:58 UTC
Type: Support Patch


Attachments (Terms of Use)
helloworld.zip (12.50 KB, application/zip)
2010-10-21 15:04 UTC, david.boeren
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker SOA-2457 0 Major Closed MessageSucker failures cause the delivery of the failed message to stall 2012-09-10 15:49:15 UTC

Description david.boeren 2010-10-21 15:03:58 UTC
Support Case Reference: https://c.na7.visual.force.com/apex/Case_View?id=500A00000044AOYIA2&sfdc.override=1
project_key: SOA

Note that the customer is already using a patch that may involved overlapping class files here: 
https://jira.jboss.org/browse/JBPAPP-5224 

So, this patch would have to go on top of the existing patch: 

---- 

The MessageSucker is responsible for migrating messages between different members of a cluster, it is a consumer to the remote queue from which it receives messages destined for the queue on the local cluster member. 

The onMessage routine, at its most basic, does the following 

- bookkeeping for the incoming message, including expiry 
- acknowledge the incoming message 
- attempt to deliver to the local queue 

When the delivery fails, the result is the *appearance* of lost messages. Those messages which are processed during the failure are not redelivered, but they still exist in the database. 

The only way I have found to trigger the redelivery of those messages is to redeploy the queue containing the messages and/or restart that app server. Obviously neither approach is acceptable. 

In order to trigger the error I created a SOA cluster which *only* shared the JMS database, and no other. I modified the helloworld quickstart to display a counter of messages consumed, clustered the *esb* queue, and then used byteman to trigger the faults. 

The byteman rule is as follows, the quickstart will be attached. 

RULE throw every fifth send 
INTERFACE ProducerDelegate 
METHOD send 
AT ENTRY 
IF callerEquals("MessageSucker.onMessage", true) && (incrementCounter("throwException") % 5 == 0) 
DO THROW new IllegalStateException("Deliberate exception") 
ENDRULE 

This results in an exception being thrown for every fifth message. Once the delivery has quiesced, examine the JBM_MSG and JBM_MSG_REF tables to see the messages which have not been delivered. 

The clusters are ports-default and ports-01, the client seeds the gateway by sending 300 messages to the default. 

Adding up the counter from each server *plus* the message count from JBM_MSG results in 300 (or multiples thereof for more executions).

Comment 1 david.boeren 2010-10-21 15:04:15 UTC
Attachment: Added: helloworld.zip


Comment 2 david.boeren 2010-10-21 15:04:50 UTC
Link: Added: This issue is related to JBPAPP-5280


Comment 3 Justin Bertram 2011-01-05 20:39:53 UTC
This was opened in the wrong project.  The real issue should have been opened in the JBMESSAGING project.  See JBMESSAGING-1822, and the backport of this for SOA at SOA-2526.

Comment 4 Kevin Conner 2011-01-06 09:17:53 UTC
This was opened in the correct project as it is logged against the platform.  SOA-2526 duplicates this issue and was where the backport work was handled.

Comment 5 Kevin Conner 2011-01-06 09:18:35 UTC
Link: Added: This issue is duplicated by SOA-2526


Comment 6 Rick Wagner 2011-11-30 20:02:58 UTC
Resolved.


Note You need to log in before you can comment on or make changes to this bug.