The current loop is: delay = 1 while True: # Close the session if necessary if self.connection.opened(): try: self.connection.close() except qpid_exceptions.ConnectionError: pass broker = self.brokers[attempt % len(self.brokers)] attempt += 1 try: self.connection_create(broker) self.connection.open() except qpid_exceptions.ConnectionError, e: msg_dict = dict(e=e, delay=delay) msg = _("Unable to connect to AMQP server: %(e)s. " "Sleeping %(delay)s seconds") % msg_dict LOG.error(msg) time.sleep(delay) delay = min(2 * delay, 60) that can lead to over 60 seconds waiting time if the qpid sever is not immediately available at startup. 60 seconds is too long for HA environment where timers need to be very aggressive to reduce downtime to the very minimum. This is a blocker for HA deployments.
This change will require adding a new variable to the qpid driver upstream. This is an issue for 2 reasons: 1. It requires adding a new config param which is something that upstream stable/branches don't allow unless there's a really good reason (critical bug or security bug) 2. The old oslo-rpc implementation is frozen and it accepts patches that actually fix bugs. This change would have to go to oslo.messaging and RHOS4.0 doesn't support oslo.messaging. All that being said, the patch is certainly doable and not very invasive.
Included in 2013.2.3 upstream stable/havana release.
verified using openstack-cinder-2013.2.3-1.el6ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0577.html