Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1060689

Summary:	cinder qpid reconnection delay must be more accurate
Product:	Red Hat OpenStack	Reporter:	Fabio Massimo Di Nitto <fdinitto>
Component:	openstack-cinder	Assignee:	Flavio Percoco <fpercoco>
Status:	CLOSED ERRATA	QA Contact:	Dafna Ron <dron>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.0	CC:	apevec, dnavale, dron, eharney, fdinitto, fpercoco, gfidente, scohen, srevivo, yeylon
Target Milestone:	z4	Keywords:	Rebase, ZStream
Target Release:	4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously, Qpid's driver reconnection delay was not configurable. Also, the delay was hard-coded and quite high. As a result, it became a blocker from the high availability perspective. As making this value configurable was a no-go for this version, the hard-coded delay was tweaked and made more reasonable from the high availability perspective. The updated value for the delay cap is now reduced to 5 seconds.	Story Points:	---
Clone Of:
Clones:	1060711 1060772 1083414 1083415 (view as bug list)		Environment:
Last Closed:	2014-05-29 19:57:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Fabio Massimo Di Nitto 2014-02-03 11:30:15 UTC

The current loop is:

        delay = 1
        while True:
            # Close the session if necessary
            if self.connection.opened():
                try:
                    self.connection.close()
                except qpid_exceptions.ConnectionError:
                    pass

            broker = self.brokers[attempt % len(self.brokers)]
            attempt += 1

            try:
                self.connection_create(broker)
                self.connection.open()
            except qpid_exceptions.ConnectionError, e:
                msg_dict = dict(e=e, delay=delay)
                msg = _("Unable to connect to AMQP server: %(e)s. "
                        "Sleeping %(delay)s seconds") % msg_dict
                LOG.error(msg)
                time.sleep(delay)
                delay = min(2 * delay, 60)

that can lead to over 60 seconds waiting time if the qpid sever is not immediately available at startup.

60 seconds is too long for HA environment where timers need to be very aggressive to reduce downtime to the very minimum.

This is a blocker for HA deployments.

Comment 1 Flavio Percoco 2014-02-04 14:13:47 UTC

This change will require adding a new variable to the qpid driver upstream. This is an issue for 2 reasons:

    1. It requires adding a new config param which is something that upstream stable/branches don't allow unless there's a really good reason (critical bug or security bug)

    2. The old oslo-rpc implementation is frozen and it accepts patches that actually fix bugs. This change would have to go to oslo.messaging and RHOS4.0 doesn't support oslo.messaging.


All that being said, the patch is certainly doable and not very invasive.

Comment 9 Alan Pevec 2014-04-03 22:50:12 UTC

Included in 2013.2.3 upstream stable/havana release.

Comment 18 Giulio Fidente 2014-04-17 13:34:56 UTC

verified using openstack-cinder-2013.2.3-1.el6ost.noarch

Comment 20 errata-xmlrpc 2014-05-29 19:57:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0577.html