Bug 1102910 - TTL never set on messages, causes messages to live forever
Summary: TTL never set on messages, causes messages to live forever
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 4.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: z5
: 4.0
Assignee: Ihar Hrachyshka
QA Contact: Nir Magnezi
URL:
Whiteboard:
Depends On:
Blocks: 1081488 1147618
TreeView+ depends on / blocked
 
Reported: 2014-05-29 19:36 UTC by Mark Wagner
Modified: 2022-07-09 07:05 UTC (History)
9 users (show)

Fixed In Version: openstack-neutron-2013.2.4-4.el6ost
Doc Type: Bug Fix
Doc Text:
In the previous version, a Qpid OpenStack Networking (neutron) client created a new queue instead of reusing the old one. This meant that old Qpid queues could be left abandoned, using precious broker resources, and that messages piled up in the queue were never consumed. With this update, the Qpid queue name is reused on reconnect. This ensures that old Qpid queues are no longer abandoned by OpenStack AMQP clients, and all existing messages are consumed.
Clone Of:
: 1147618 (view as bug list)
Environment:
Last Closed: 2014-10-22 17:23:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 105494 0 None None None Never
OpenStack gerrit 124597 0 None None None Never
Red Hat Issue Tracker OSP-16503 0 None None None 2022-07-09 07:05:32 UTC
Red Hat Product Errata RHSA-2014:1686 0 normal SHIPPED_LIVE Moderate: openstack-neutron security and bug fix update 2014-10-22 21:21:18 UTC

Description Mark Wagner 2014-05-29 19:36:26 UTC
Description of problem:

The core neutron routines never set a timeout and thus a TTL (Time To Live) on any of the qpid messages. Meanwhile the waiters use the rpc_response_timeout to determine when to give up on a response.As a result, messages that are no longer recognized can build up in the queues. This causes addition, wasted processing overhead and contributes to a longer backlog. 
 
Version-Release number of selected component (if applicable):
openstack-neutron.noarch 2013.2.3-6.el6ost

How reproducible:
Design Flaw so every time

Steps to Reproduce:
1.use the product
2.
3.

Actual results:
Never get a TTL, get big backlogs when things go awry

Expected results:
Messages that are no longer be waiting for should get deleted

Additional info:

Simple two liner in /usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py

In the top of the topic_send()
        if timeout is None:
            timeout = self.conf.rpc_response_timeout

Comment 1 Mark Wagner 2014-05-29 19:38:44 UTC
This should a RHEL-OSP Havana-A5 target

Comment 2 Mark Wagner 2014-05-29 19:46:29 UTC
Also note that this proposed change is not the ideal solution. It does not correctly factor the overall TTL. In fact it uses two different TTLs, one for the message from the client to the server.  The other for the responses from the server top the client.  

Th ideal solution will factor in the entire time in and use a single TTL.

For example, the client sends a request with a TTL of 60 (sec). The server pulls the request off of the queue, processes request and prepares the response. At the end, the server needs to determine how much time is left on the original TTL and use that value before sticking the response into the queue. This is because the client will be timing out at the time of its original TTL.

Comment 4 Ihar Hrachyshka 2014-09-26 16:03:13 UTC
I've talked to oslo.messaging cores, they say setting TTL for reply messages is not needed. Instead, we should make sure queues are auto-deleted (bug 1099657) and clients reuse the same queue on reconnect (see launchpad bug in external tracker list).

Comment 5 Ihar Hrachyshka 2014-09-29 13:22:56 UTC
Moving to A6 as per Livnat's request. The reasoning is that the bug is internal and is not something any known customer is waiting for.

Comment 7 Nir Magnezi 2014-10-08 08:15:17 UTC
Verified NVR: openstack-neutron-2013.2.4-4.el6ost.noarch

Verified that the fix was incorporated in the package.

Comment 9 errata-xmlrpc 2014-10-22 17:23:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1686.html


Note You need to log in before you can comment on or make changes to this bug.