Bug 1060689 - cinder qpid reconnection delay must be more accurate
Summary: cinder qpid reconnection delay must be more accurate
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 4.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 4.0
Assignee: Flavio Percoco
QA Contact: Dafna Ron
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-03 11:30 UTC by Fabio Massimo Di Nitto
Modified: 2016-04-26 13:23 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, Qpid's driver reconnection delay was not configurable. Also, the delay was hard-coded and quite high. As a result, it became a blocker from the high availability perspective. As making this value configurable was a no-go for this version, the hard-coded delay was tweaked and made more reasonable from the high availability perspective. The updated value for the delay cap is now reduced to 5 seconds.
Clone Of:
: 1060711 1060772 1083414 1083415 (view as bug list)
Environment:
Last Closed: 2014-05-29 19:57:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0577 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform 4 Bug Fix and Enhancement Advisory 2014-05-29 23:55:40 UTC

Description Fabio Massimo Di Nitto 2014-02-03 11:30:15 UTC
The current loop is:

        delay = 1
        while True:
            # Close the session if necessary
            if self.connection.opened():
                try:
                    self.connection.close()
                except qpid_exceptions.ConnectionError:
                    pass

            broker = self.brokers[attempt % len(self.brokers)]
            attempt += 1

            try:
                self.connection_create(broker)
                self.connection.open()
            except qpid_exceptions.ConnectionError, e:
                msg_dict = dict(e=e, delay=delay)
                msg = _("Unable to connect to AMQP server: %(e)s. "
                        "Sleeping %(delay)s seconds") % msg_dict
                LOG.error(msg)
                time.sleep(delay)
                delay = min(2 * delay, 60)

that can lead to over 60 seconds waiting time if the qpid sever is not immediately available at startup.

60 seconds is too long for HA environment where timers need to be very aggressive to reduce downtime to the very minimum.

This is a blocker for HA deployments.

Comment 1 Flavio Percoco 2014-02-04 14:13:47 UTC
This change will require adding a new variable to the qpid driver upstream. This is an issue for 2 reasons:

    1. It requires adding a new config param which is something that upstream stable/branches don't allow unless there's a really good reason (critical bug or security bug)

    2. The old oslo-rpc implementation is frozen and it accepts patches that actually fix bugs. This change would have to go to oslo.messaging and RHOS4.0 doesn't support oslo.messaging.


All that being said, the patch is certainly doable and not very invasive.

Comment 9 Alan Pevec 2014-04-03 22:50:12 UTC
Included in 2013.2.3 upstream stable/havana release.

Comment 18 Giulio Fidente 2014-04-17 13:34:56 UTC
verified using openstack-cinder-2013.2.3-1.el6ost.noarch

Comment 20 errata-xmlrpc 2014-05-29 19:57:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0577.html


Note You need to log in before you can comment on or make changes to this bug.