Description of problem:
A message starvation was observed when a consumer hangs on a two node HornetQ cluster in domain mode
Version-Release number of selected component (if applicable):
JBoss-EAP-6.2_CP4 aka JBoss-EAP-6.2.4
HornetQ-2.3.14.Final (220.127.116.11 (Branch 2.3.eap6_2), 123)
Steps to Reproduce:
1. Please configure two cluster aware instances in a domain.xml to host two node cluster; NodeA and NodeB
2. Configure a destination at both server instances, queue/A
3. Start both server instances
4. Create two MDBeans with having maxSession at "1", FirstMDB and SecondMDB. Please trigger a latency in FirstMDB by making the consumer thread to sleep 5 seconds
5. Deploy both MDBs at each server instance
6. Stop the NodeB
7. Send 250+ messages to queue/A in NodeA
8. Please let the MDB in NodeA to consume a few messages
9. Start the Node2
The consumer in NodeB didn't consume any messages although the consumer-window-size was configured to "0" at both server instances
You could observe the consumer in NodeB starving messages
Both consumer should be able to consume messages.
Configuration files and test MDB would follow in a while.
I don't think this is a bug... you can configure redistribution delay and the messages would be redistributed.
I have configured redistribution delay to 1 second during my tests. This issue occurs when the first consumer, FirstMDB pauses for 5 seconds, during message consumption.
This works fine as expected when I remove the 5 second pause in the consumer. I have introduced a pause in the consumer to emulate the problem my customer was seeing.
I assigned this case to Howard, since I have discussed this issue in detail with him. :-)
Created attachment 924041 [details]
Created attachment 924042 [details]
If you don't set the consumer-window-size to 0 the test case is invalid for us
I thought of updating this bugzilla with a comment to avoid any confusions with its title. The same behaviour can be seen in standalone mode.
I wouldn't fix this issue. You can just set consumerWindowSize=0 for slow consumers, or set slow consumer detection to kill the consumer.
I don't see any issues that could be fixed here.
Closing as this is more of a feature that would need to be implemented in Artemis