Bug 1054246

Summary: Thread consuming qpid messages can die silently
Product: Red Hat OpenStack Reporter: Russell Bryant <rbryant>
Component: openstack-cinderAssignee: Eric Harney <eharney>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.0CC: adarazs, breeler, eharney, mlopes, ndipanov, sclewis, yeylon
Target Milestone: z1Keywords: TestBlocker, ZStream
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-2013.2.1-5.el6ost Doc Type: Bug Fix
Doc Text:
Previously, unhandled errors in the QPID consuming thread could cause the thread to stop functioning and isolate the component from the rest of the system. This has been fixed and the consuming thread has been made more resilient to errors by ensuring it doesn't die on an unhandled error. It will now log the error and retry the consuming.
Story Points: ---
Clone Of: 1050215 Environment:
Last Closed: 2014-01-23 14:22:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1025025, 1050214, 1065428    

Description Russell Bryant 2014-01-16 14:01:31 UTC
+++ This bug was initially created as a clone of Bug #1050215 +++

+++ This bug was initially created as a clone of Bug #1050213 +++

The code for receiving and processing qpid messages runs in its own greenthread.  Unfortunately, there is a code path that if it raises an exception, the greenthread will die silently, without any entry in the log file.

In particular, the code in question is:

https://git.openstack.org/cgit/openstack/nova/tree/nova/openstack/common/rpc/impl_qpid.py?h=stable/grizzly#n468

if self.session.next_receiver() raises an exception other than qpid's Empty or ConnectionError exceptions, the thread will die and no more messages will be received.

The fix is to backport the portion of the following change that applies to impl_qpid.  That includes the new decorator in excutils and the change to impl_qpid.py.

--- Additional comment from Russell Bryant on 2014-01-08 16:12:34 EST ---

The change that needs to be backported is: https://review.openstack.org/#/c/32235/13



This is also needed for Cinder in RHOS 4.0.

Comment 7 Dafna Ron 2014-01-20 15:15:55 UTC
after irc discussion, we can verify this based on automation sanity runs for New OpenStack-4.0 Puddle: 2014-01-16.1



*Glance*

- localfs / all passed

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-glance-localfs/164/

- s3 / all passed

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-glance-s3/178/

- swift / all passed

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-glance-swift/188/


*Cinder*

- lvm / all passed

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-cinder-lvm/216/

- thinlvm / fails 1 test because of bz #1040609

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-cinder-thinlvm/226/

- gluster / all passed

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-cinder-gluster/272/

- nfs / all (except unsupported by the driver) passed

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-cinder-nfs/219/


*Notes*

cinder w/ emc(iscsi) and cinder w/ netapp(nfs) are unstable, probably because they're temporarily running too far from the actual hardware

emc is showing some real problems though, which we'll be investigating and open bz where needed:

http://jenkins.rhev.lab.eng.brq.redhat.com:8080/view/RHOS-Storage-QE/view/Havana/job/rhos-4.0-rhel-6.5-cinder-emc_iscsi/6

Comment 10 Lon Hohberger 2014-02-04 17:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-0046.html