Bug 1050213
Summary: | Thread consuming qpid messages can die silently | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Russell Bryant <rbryant> | ||||||
Component: | openstack-nova | Assignee: | Xavier Queralt <xqueralt> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Gabriel Szasz <gszasz> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 3.0 | CC: | dallan, dmaley, ndipanov, sgordon, slong, xqueralt, yeylon | ||||||
Target Milestone: | z4 | Keywords: | Triaged, ZStream | ||||||
Target Release: | 3.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | openstack-nova-2013.1.4-4.el6ost | Doc Type: | Bug Fix | ||||||
Doc Text: |
Unhandled errors in the Qpid consuming thread could kill it silently and isolate the component from the rest of the system. To fix this, the consuming thread has been made more resilient to errors by ensuring it doesn't die on an unhandled error. Compute now logs the error and retries the consuming thread.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1050214 1050215 1050216 1050217 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-01-30 20:00:28 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1024651, 1050214, 1050215, 1050216, 1050217 | ||||||||
Attachments: |
|
Description
Russell Bryant
2014-01-08 21:03:19 UTC
The change that needs to be backported is: https://review.openstack.org/#/c/32235/13 I also attached a version of the backport for nova to bug 1050213 Created attachment 847690 [details]
reproducing failure mode
Just so I don't forget it, this is how I tested this bug (attached patch). I used the attached patch for force an exception to occur in the qpid reply thread roughly 30 seconds after the nova-compute service started. With the @excutils.forever_retry_uncaught_exceptions decorator applied, the exception is logged and operation continues normally. Without the decorator, the thread dies silently, and the only entries in the compute log are from where nova-compute no longer receives any responses. You see timeout errors waiting for responses from conductor.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0112.html |