Bug 1050217

Summary: Thread consuming qpid messages can die silently
Product: Red Hat OpenStack Reporter: Russell Bryant <rbryant>
Component: openstack-ceilometerAssignee: Eoghan Glynn <eglynn>
Status: CLOSED ERRATA QA Contact: Kevin Whitney <kwhitney>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.0CC: ajeain, jruzicka, ndipanov, pbrady, sclewis, sradvan, srevivo, yeylon
Target Milestone: z4Keywords: ZStream
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ceilometer-2013.1.4-2.el6ost Doc Type: Bug Fix
Doc Text:
Cause: All potential errors were not explicitly handled in the QPID driver's consuming thread. Consequence: Unhandled errors encountered by the QPID driver's consuming thread could cause it to die silently, so that no further incoming messages are received by that agent. Fix: The consuming thread has been made more resilient to potential errors. Result: Errors are now logged and the message consumption logic is then retried.
Story Points: ---
Clone Of: 1050213 Environment:
Last Closed: 2014-01-30 19:51:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1050213    
Bug Blocks: 1050214, 1050215, 1050216    

Description Russell Bryant 2014-01-08 21:05:59 UTC
+++ This bug was initially created as a clone of Bug #1050213 +++

The code for receiving and processing qpid messages runs in its own greenthread.  Unfortunately, there is a code path that if it raises an exception, the greenthread will die silently, without any entry in the log file.

In particular, the code in question is:

https://git.openstack.org/cgit/openstack/nova/tree/nova/openstack/common/rpc/impl_qpid.py?h=stable/grizzly#n468

if self.session.next_receiver() raises an exception other than qpid's Empty or ConnectionError exceptions, the thread will die and no more messages will be received.

The fix is to backport the portion of the following change that applies to impl_qpid.  That includes the new decorator in excutils and the change to impl_qpid.py.

Comment 1 Russell Bryant 2014-01-08 21:13:14 UTC
The change that needs to be backported is: https://review.openstack.org/#/c/32235/13

I also attached a version of the backport for nova to bug 1050213

Comment 3 Eoghan Glynn 2014-01-20 14:22:18 UTC
Fix proposed upstream to stable/grizzly:

  https://review.openstack.org/67838

(for completeness only, without the expectation that it will land as that branch is restricted now to security fixes)

Comment 4 Eoghan Glynn 2014-01-20 14:24:49 UTC
Backport proposed to internal gerrit:

  https://code.engineering.redhat.com/gerrit/#/c/18581/

Comment 7 Kevin Whitney 2014-01-27 17:57:41 UTC
Verified:

1) retry decorator is in place

  /usr/lib/python2.6/site-packages/ceilometer/openstack/common/rpc/impl_qpid.py

2) Package
    Installed Packages
    Name        : python-ceilometer
    Arch        : noarch
    Version     : 2013.1.4
    Release     : 2.el6ost

Comment 9 errata-xmlrpc 2014-01-30 19:51:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0110.html