Bug 1050213 - Thread consuming qpid messages can die silently
Summary: Thread consuming qpid messages can die silently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 3.0
Assignee: Xavier Queralt
QA Contact: Gabriel Szasz
URL:
Whiteboard:
Depends On:
Blocks: 1024651 1050214 1050215 1050216 1050217
TreeView+ depends on / blocked
 
Reported: 2014-01-08 21:03 UTC by Russell Bryant
Modified: 2019-09-09 16:42 UTC (History)
7 users (show)

Fixed In Version: openstack-nova-2013.1.4-4.el6ost
Doc Type: Bug Fix
Doc Text:
Unhandled errors in the Qpid consuming thread could kill it silently and isolate the component from the rest of the system. To fix this, the consuming thread has been made more resilient to errors by ensuring it doesn't die on an unhandled error. Compute now logs the error and retries the consuming thread.
Clone Of:
: 1050214 1050215 1050216 1050217 (view as bug list)
Environment:
Last Closed: 2014-01-30 20:00:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
backport of fix for nova (2.90 KB, patch)
2014-01-08 21:03 UTC, Russell Bryant
no flags Details | Diff
reproducing failure mode (3.39 KB, patch)
2014-01-09 15:28 UTC, Russell Bryant
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1189711 0 None None None Never
Red Hat Product Errata RHSA-2014:0112 0 normal SHIPPED_LIVE Moderate: openstack-nova security and bug fix update 2014-01-31 00:58:47 UTC

Description Russell Bryant 2014-01-08 21:03:19 UTC
Created attachment 847344 [details]
backport of fix for nova

The code for receiving and processing qpid messages runs in its own greenthread.  Unfortunately, there is a code path that if it raises an exception, the greenthread will die silently, without any entry in the log file.

In particular, the code in question is:

https://git.openstack.org/cgit/openstack/nova/tree/nova/openstack/common/rpc/impl_qpid.py?h=stable/grizzly#n468

if self.session.next_receiver() raises an exception other than qpid's Empty or ConnectionError exceptions, the thread will die and no more messages will be received.

The fix is to backport the portion of the following change that applies to impl_qpid.  That includes the new decorator in excutils and the change to impl_qpid.py.

Comment 1 Russell Bryant 2014-01-08 21:11:55 UTC
The change that needs to be backported is: https://review.openstack.org/#/c/32235/13

I also attached a version of the backport for nova to bug 1050213

Comment 3 Russell Bryant 2014-01-09 15:28:04 UTC
Created attachment 847690 [details]
reproducing failure mode

Just so I don't forget it, this is how I tested this bug (attached patch).  I used the attached patch for force an exception to occur in the qpid reply thread roughly 30 seconds after the nova-compute service started.  With the @excutils.forever_retry_uncaught_exceptions decorator applied, the exception is logged and operation continues normally.  Without the decorator, the thread dies silently, and the only entries in the compute log are from where nova-compute no longer receives any responses.  You see timeout errors waiting for responses from conductor.

Comment 12 errata-xmlrpc 2014-01-30 20:00:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0112.html


Note You need to log in before you can comment on or make changes to this bug.