Bug 1312912

Summary: Option "rabbit_retry_backoff" from component config has no effect when autoretrying connection
Product: Red Hat OpenStack Reporter: Marian Krcmarik <mkrcmari>
Component: python-oslo-messagingAssignee: Flavio Percoco <fpercoco>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: apevec, jschluet, lhh, tbarron, vstinner, yeylon
Target Milestone: ga   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-oslo-messaging-2.5.0-5.el7ost Doc Type: Bug Fix
Doc Text:
When the RabbitMQ service fails to deliver an AMQP message from one OpenStack service to another, it reconnects and retries delivery. The "rabbit_retry_backoff" option, whose default is 2 seconds, is supposed to control the pace of retries; however, retries were previously done every second irrespective of the configured value of this option. The consequence of this problem was excessive retries, for example, when an endpoint was not available. This problem has now been fixed, and the "rabbit_retry_backoff" option, as explicitly configured or with the default value of two seconds, properly controls message delivery retries.
Story Points: ---
Clone Of:
: 1313522 (view as bug list) Environment:
Last Closed: 2016-04-07 21:31:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1313522    

Description Marian Krcmarik 2016-02-29 14:03:24 UTC
Description of problem:
There is a messed up code in _drivers/impl_rabbit.py which causing that autoretry reconnection is always triggered every second despite rabbit_retry_backoff option being set (in default to 2).

Specifically:
865             autoretry_method = self.connection.autoretry(
866                 execute_method, channel=self.channel,
867                 max_retries=retry,
868                 errback=on_error,
869                 interval_start=self.interval_start or 1,
870                 interval_step=self.interval_stepping,
871                 on_revive=on_reconnection,
872             )
There is no interval_max=self.interval_max parameter specified (hardcoded to value 30).
Method kombu.connection.autoretry() calls kombu.connectio.ensure() which sets interval_max to 1 by default so backing off cannot have any effect since max interval is 1.

It would be nice to be able to set interval_max in component configs and not hardcoded it to 30 maybe.

Version-Release number of selected component (if applicable):
python-oslo-messaging-2.5.0-4.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Set rabbit_retry_backoff to anything 2+
2. Take down rabbitmq server

Actual results:
Reconenction is actually happening every second, instead of interval being backed off by 2+ seconds each try up to 30 seconds

Expected results:


Additional info:

Comment 2 Victor Stinner 2016-03-17 14:10:11 UTC
python-oslo-messaging-2.5.0-5.el7ost is ready for tests.

Comment 5 errata-xmlrpc 2016-04-07 21:31:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html