Created attachment 1118849 [details] nova-compute log Description of problem: I can hit the problems described in https://bugs.launchpad.net/oslo.messaging/+bug/1519851 (sometimes I believe I can see multiples unsuccessful tries to reconnect to different AMQP servers (even 11 times) on my RHOS8 setup. It take sometimes even several minutes to get reconnected after failover of a controller in HA (3 controllers setup) - see attached debug log from nova-compute. I tested the patch from upstream bug and It seemed to help speed up, I would suggest to backport the patch which seems to be simple. Version-Release number of selected component (if applicable): python-oslo-messaging-2.5.0-1.el7ost.noarch How reproducible: Often Steps to Reproduce: 1. Restartd one of the controller in HA OS setup. 2. Look at the nova-compute log for example and look for successful reconnection to different AQMP server Actual results: It takes sometimes several minutes while the OS is not operational. Expected results: It should take singificantly lower time. Additional info:
I think it's safe to backport this patch. I've proposed it upstream and I'll do the backport downstream after some feedback is provided on the upstream patch.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0603.html