Bug 1102910

Summary: TTL never set on messages, causes messages to live forever
Product: Red Hat OpenStack Reporter: Mark Wagner <mwagner>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: Nir Magnezi <nmagnezi>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.0CC: chrisw, ihrachys, kgiusti, lpeer, nyechiel, oblaut, slong, tfreger, yeylon
Target Milestone: z5Keywords: ZStream
Target Release: 4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-2013.2.4-4.el6ost Doc Type: Bug Fix
Doc Text:
In the previous version, a Qpid OpenStack Networking (neutron) client created a new queue instead of reusing the old one. This meant that old Qpid queues could be left abandoned, using precious broker resources, and that messages piled up in the queue were never consumed. With this update, the Qpid queue name is reused on reconnect. This ensures that old Qpid queues are no longer abandoned by OpenStack AMQP clients, and all existing messages are consumed.
Story Points: ---
Clone Of:
: 1147618 (view as bug list) Environment:
Last Closed: 2014-10-22 17:23:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1081488, 1147618    

Description Mark Wagner 2014-05-29 19:36:26 UTC
Description of problem:

The core neutron routines never set a timeout and thus a TTL (Time To Live) on any of the qpid messages. Meanwhile the waiters use the rpc_response_timeout to determine when to give up on a response.As a result, messages that are no longer recognized can build up in the queues. This causes addition, wasted processing overhead and contributes to a longer backlog. 
 
Version-Release number of selected component (if applicable):
openstack-neutron.noarch 2013.2.3-6.el6ost

How reproducible:
Design Flaw so every time

Steps to Reproduce:
1.use the product
2.
3.

Actual results:
Never get a TTL, get big backlogs when things go awry

Expected results:
Messages that are no longer be waiting for should get deleted

Additional info:

Simple two liner in /usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py

In the top of the topic_send()
        if timeout is None:
            timeout = self.conf.rpc_response_timeout

Comment 1 Mark Wagner 2014-05-29 19:38:44 UTC
This should a RHEL-OSP Havana-A5 target

Comment 2 Mark Wagner 2014-05-29 19:46:29 UTC
Also note that this proposed change is not the ideal solution. It does not correctly factor the overall TTL. In fact it uses two different TTLs, one for the message from the client to the server.  The other for the responses from the server top the client.  

Th ideal solution will factor in the entire time in and use a single TTL.

For example, the client sends a request with a TTL of 60 (sec). The server pulls the request off of the queue, processes request and prepares the response. At the end, the server needs to determine how much time is left on the original TTL and use that value before sticking the response into the queue. This is because the client will be timing out at the time of its original TTL.

Comment 4 Ihar Hrachyshka 2014-09-26 16:03:13 UTC
I've talked to oslo.messaging cores, they say setting TTL for reply messages is not needed. Instead, we should make sure queues are auto-deleted (bug 1099657) and clients reuse the same queue on reconnect (see launchpad bug in external tracker list).

Comment 5 Ihar Hrachyshka 2014-09-29 13:22:56 UTC
Moving to A6 as per Livnat's request. The reasoning is that the bug is internal and is not something any known customer is waiting for.

Comment 7 Nir Magnezi 2014-10-08 08:15:17 UTC
Verified NVR: openstack-neutron-2013.2.4-4.el6ost.noarch

Verified that the fix was incorporated in the package.

Comment 9 errata-xmlrpc 2014-10-22 17:23:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1686.html