+++ This bug was initially created as a clone of Bug #1060689 +++ The current loop is: delay = 1 while True: # Close the session if necessary if self.connection.opened(): try: self.connection.close() except qpid_exceptions.ConnectionError: pass broker = self.brokers[attempt % len(self.brokers)] attempt += 1 try: self.connection_create(broker) self.connection.open() except qpid_exceptions.ConnectionError, e: msg_dict = dict(e=e, delay=delay) msg = _("Unable to connect to AMQP server: %(e)s. " "Sleeping %(delay)s seconds") % msg_dict LOG.error(msg) time.sleep(delay) delay = min(2 * delay, 60) that can lead to over 60 seconds waiting time if the qpid sever is not immediately available at startup. 60 seconds is too long for HA environment where timers need to be very aggressive to reduce downtime to the very minimum. This is a blocker for HA deployments.
Note: this patch, if implemented, won't go to upstream since oslo-rpc that we use is for bug fixing only, and this patch will be considered as too 'featurey'. So we would need to support our own downstream patch for each service that use oslo-rpc if we want to see this in current release.
RHOS 4.0 on RHEL6.5 python-neutron-2013.2.3-4.el6ost.noarch python-neutronclient-2.3.4-1.el6ost.noarch openstack-neutron-openvswitch-2013.2.3-4.el6ost.noarch openstack-neutron-2013.2.3-4.el6ost.noarch killed qpid service and checked log: 2014-04-22 12:07:21.221 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 1 seconds 2014-04-22 12:07:22.222 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 2 seconds 2014-04-22 12:07:22.438 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 3 seconds 2014-04-22 12:07:22.439 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 3 seconds 2014-04-22 12:07:24.223 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 3 seconds 2014-04-22 12:07:25.439 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 4 seconds 2014-04-22 12:07:25.440 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 4 seconds 2014-04-22 12:07:27.224 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 4 seconds 2014-04-22 12:07:29.440 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 5 seconds 2014-04-22 12:07:29.441 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 5 seconds 2014-04-22 12:07:31.226 20972 ERROR neutron.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 111] ECONNREFUSED. Sleeping 5 seconds
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0516.html