Red Hat Bugzilla – Bug 1278341
[QE] (6.4.z) Messages are not load balanced in HornetQ cluster
Last modified: 2017-01-17 06:44:35 EST
Description of problem:
We found a regression in customer use case (LODH customer) with EAP 6.4.5.CP.CR1 release. Messages are not balanced in HornetQ cluster and only one server of cluster contains all messages. Regression is related to change described in BZ#1222900 . BZ introduces an optimization for message load balancing in cluster, however messages are not balanced at all when customer uses core bridges for trasporting messages between two clusters of HornetQ servers.
From QE point of view it is regression in customer use case which should be fixed.
- start 2 EAP 6 servers (server 1,2) with deployed InQueue in HornetQ cluster
- start another 2 EAP 6 servers (server 3,4) with deployed OutQueue in different HornetQ cluster
- set up 2 HornetQ core bridges deployed on server 1 and 2
- core bridges resends messages from InQueue to OutQueue to 2nd cluster (1->3, 2->4)
- start producer which sends messages to InQueue to server 1 and consumer which reads messages from OutQueue from server 4
- during processing of messages by bridges, cleanly shutdown server 3 and restart after a while, producer is still running and connected to the server 1
- stop producer and verify that all messages were received on server 4 by consumer
In EAP 6.4.4.CP consumer got messages when server 3 was down. But with EAP 6.4.5.CP.CR1 the problem is that once server 3 is shutdown, consumer on server 4 does not get any new messages until server 3 is restarted.
Investigation showed that no messages are load balanced to server 2 so core bridge 2->4 cannot send messages and consumer is starving when server 3 is not available.
Version-Release number of selected component (if applicable):
EAP 6.4.5.CP.CR1 (HornetQ 2.3.25.SP5)
This issue contradicts #bz-1222900
and I don't think it is an issue at all.
if you always want load-balance, set forward-when-no-consumers=true, and you should load balance. at least AFAIK
I agree with Clebert. The semantic change from BZ-1222900 now requires that if you want load-balancing you must set forward-when-no-consumers to true. If forward-when-no-consumers is false then no load-balancing will take place now whether or not there are consumers. It appears that LODH's use-case was relying on the non-intuitive behavior that BZ-1222900 changed so either the configuration for the use-case needs to change or BZ-1222900 needs to be reverted.
To be clear, BZ-1222900 was only applied to the '2.4.x' and 'master' branches at first. However, Tom Ross later back-ported the change to '2.3.x' and '2.3.25.x' which is how it ended up in EAP 6.x.
Setting forward-when-no-consumers=true requires restart and outage in production.
Other problem is that setting this to true disables redistribution of messages between servers 3 and 4. Which means that messages will be stuck on server 3.
I've checked that hornetq core bridge on server 2 does not trigger redistribution of messages from server 1 to server 2. If it would then no configuration changes and restarts would be needed.
Do you think there is a reason why core bridge does not trigger redistribution?
Redistribution only occurs when there are no local consumers on the queue. Server 1 has a consumer on InQueue (i.e. the bridge) so no messages will ever be redistributed from Server 1 to Server 2 (and vice-versa).
Re: Setting forward-when-no-consumers=true requires restart and outage in production.
Won't updating the version of EAP also require a restart and outage in production?
Re: Other problem is that setting this to true disables redistribution of messages between servers 3 and 4. Which means that messages will be stuck on server 3.
According to your original description of the environment both Server 3 and Server 4 have local consumers which means there would never be any redistribution anyway.
Either way you look at it we have a tough choice to make. Either we force existing customers to change their configuration or we revert the change from BZ#1222900 in which case a customer won't have the functionality they are looking for. Personally I think that BZ#1222900 should have been treated more like a feature request than a bug because of the semantic changes.
To be clear, we removed the 'forward-when-no-consumers' configuration element in Artemis because it has confused users for so long. We now have the 'message-load-balancing' configuration element with 3 choices rather than a boolean.
> Redistribution only occurs when there are no local consumers on the queue.
> Server 1 has a consumer on InQueue (i.e. the bridge) so no messages will
> ever be redistributed from Server 1 to Server 2 (and vice-versa).
I've verified that core bridge does not trigger message redistribution by modifying the test scenario in following way:
- I've undeployed core bridge from server 1.
- Started servers 1,2,4. (server 3 is not started for the whole duration of test)
- Core bridge is configured only on server 2 and sends messages from InQueue on server 2 to OutQueue on server 4
- Started producer which sends messages to InQueue to server 1 and consumer which receives messages from OutQueue from server 4.
Result: Consumer did not receive any message from server 4. All messages stayed on server 1 in InQueue which means that core bridge on server 2 did not trigger message redistribution.
> According to your original description of the environment both Server 3 and
> Server 4 have local consumers which means there would never be any
> redistribution anyway.
In our scenario there is no consumer on server 3. This scenario was based on feedback from customer.
> Won't updating the version of EAP also require a restart and outage in
Modification of configuration is no part of CP update.
> Either way you look at it we have a tough choice to make. Either we force
> existing customers to change their configuration or we revert the change
> from BZ#1222900 in which case a customer won't have the functionality they
> are looking for. Personally I think that BZ#1222900 should have been treated
> more like a feature request than a bug because of the semantic changes.
Message redistribution should work for core bridges.
Did you adjust the redistribution-delay to be >= 0?
redistribution-delay is set to 0, If consumer tries to receive messages from InQueue on server 2 then messages are redistributed to it from server 1.
Revert this commit as requested on the SP6 tag:
Verified in EAP 6.4.5.CP.CR2.
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.