Previously, Qpid's driver reconnection delay was not configurable. Also, the delay was hard-coded and quite high. As a result, it became a blocker from the high availability perspective. As making this value configurable was a no-go for this version, the hard-coded delay was tweaked and made more reasonable from the high availability perspective. The updated value for the delay cap is now reduced to 5 seconds.
DescriptionFabio Massimo Di Nitto
2014-02-03 11:30:15 UTC
The current loop is:
delay = 1
while True:
# Close the session if necessary
if self.connection.opened():
try:
self.connection.close()
except qpid_exceptions.ConnectionError:
pass
broker = self.brokers[attempt % len(self.brokers)]
attempt += 1
try:
self.connection_create(broker)
self.connection.open()
except qpid_exceptions.ConnectionError, e:
msg_dict = dict(e=e, delay=delay)
msg = _("Unable to connect to AMQP server: %(e)s. "
"Sleeping %(delay)s seconds") % msg_dict
LOG.error(msg)
time.sleep(delay)
delay = min(2 * delay, 60)
that can lead to over 60 seconds waiting time if the qpid sever is not immediately available at startup.
60 seconds is too long for HA environment where timers need to be very aggressive to reduce downtime to the very minimum.
This is a blocker for HA deployments.
This change will require adding a new variable to the qpid driver upstream. This is an issue for 2 reasons:
1. It requires adding a new config param which is something that upstream stable/branches don't allow unless there's a really good reason (critical bug or security bug)
2. The old oslo-rpc implementation is frozen and it accepts patches that actually fix bugs. This change would have to go to oslo.messaging and RHOS4.0 doesn't support oslo.messaging.
All that being said, the patch is certainly doable and not very invasive.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHBA-2014-0577.html
The current loop is: delay = 1 while True: # Close the session if necessary if self.connection.opened(): try: self.connection.close() except qpid_exceptions.ConnectionError: pass broker = self.brokers[attempt % len(self.brokers)] attempt += 1 try: self.connection_create(broker) self.connection.open() except qpid_exceptions.ConnectionError, e: msg_dict = dict(e=e, delay=delay) msg = _("Unable to connect to AMQP server: %(e)s. " "Sleeping %(delay)s seconds") % msg_dict LOG.error(msg) time.sleep(delay) delay = min(2 * delay, 60) that can lead to over 60 seconds waiting time if the qpid sever is not immediately available at startup. 60 seconds is too long for HA environment where timers need to be very aggressive to reduce downtime to the very minimum. This is a blocker for HA deployments.