There are several ways that oslo.messaging can get hung when trying to reconnect to RabbitMQ. This upstream change, pending Juno backport, addresses this: https://review.openstack.org/#/c/143805/ It would be good to get this backport into RHOS-6.0.
Please use RDO patches branch, I've now rebased it on top of stable/juno: https://github.com/redhat-openstack/oslo.messaging/commits/juno-patches Does that include all required patches or we need to cherry-pick something not backported to upstream stable/juno yet?
@Alan: good enough for me.
@Alan, it does provide the patch. @hguemar, this now needs to be built. Thank you, both
Verified using the following procedure: * Start a service that uses oslo.messaging with rabbitmq e.g. nova-compute * Stop rabbitmq while tail-F /var/log/nova/nova-compute.log * Observe that nova-compute amqp times out and it is trying to reconnect * Restart rabbitmq * Observe that rabbitmq connection has re-established Without the fix a reoccurring message in rabbitmq log was showing: =ERROR REPORT==== 18-Aug-2015::16:15:54 === closing AMQP connection <0.792.0> (10.35.160.37:42202 -> 10.35.160.37:5672): {handshake_timeout,handshake} Version: 2014.2.3 python-oslo-messaging-1.4.1-5.el7ost.noarch
(In reply to ushkalim from comment #19) > Verified using the following procedure: > > * Start a service that uses oslo.messaging with rabbitmq e.g. nova-compute > > * Stop rabbitmq while tail-F /var/log/nova/nova-compute.log > > * Observe that nova-compute amqp times out and it is trying to reconnect > > * Restart rabbitmq > > * Observe that rabbitmq connection has re-established > > Without the fix a reoccurring message in rabbitmq log was showing: > =ERROR REPORT==== 18-Aug-2015::16:15:54 === > closing AMQP connection <0.792.0> (10.35.160.37:42202 -> 10.35.160.37:5672): > {handshake_timeout,handshake} > > Version: > 2014.2.3 > python-oslo-messaging-1.4.1-5.el7ost.noarch Awesome news. I've seen that handshake_timeout before and was wondering what was causing it from the client side. If this makes that go away, I will be extra happy.
*** Bug 1230134 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1659.html