+++ This bug was initially created as a clone of Bug #1085006 +++ While working with a partner on some problems in their system, I have observed two instances where the qpid client library gets into a bad state and the qpid connection thread in nova never recovers. An example of the exception is: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 78, in inner_func return infunc(*args, **kwargs) File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 698, in _consumer_thread self.consume() File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 689, in consume it.next() File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 606, in iterconsume yield self.ensure(_error_callback, _consume) File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 540, in ensure return method(*args, **kwargs) File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 597, in _consume nxt_receiver = self.session.next_receiver(timeout=timeout) File "<string>", line 6, in next_receiver File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver if self._ecwait(lambda: self.incoming, timeout): File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait result = self._ewait(lambda: self.closed or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 571, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 214, in _ewait self.check_error() File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 207, in check_error raise self.error InternalError: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/qpid/messaging/driver.py", line 667, in write self._op_dec.write(*self._seg_dec.read()) File "/usr/lib/python2.6/site-packages/qpid/framing.py", line 269, in write if self.op.headers is None: AttributeError: 'NoneType' object has no attribute 'headers' There is some code that automatically detects if the thread dies with an exception. It will sleep for a second a retry. The code will sit in this loop forever. Every time it tries to run again it will hit this error immediately. As a result, you see a message like this every minute or so: 2014-04-06 09:03:49.014 125211 ERROR root [-] Unexpected exception occurred 60 time(s)... retrying. Part of the issue is that I don't think this should ever happen. However, if it does, Nova should be more tolerant and reset the connection instead being stuck in this error for forever. --- Additional comment from Russell Bryant on 2014-04-09 16:21:12 EDT --- This patch has been merged into both oslo.messaging and the rpc library in oslo-incubator. In RHOS 4.0, nothing had been converted to oslo.messaging, so this fix needs to be backported to all of the projects that include rpc from oslo-incubator. I will be cloning this bug to all affected projects.
Cloned to 3.0, as Nova uses rpc from oslo-incubator in RHOS 3.0.
In accordance with the Red Hat Enterprise Linux OpenStack Platform Support Policy, the one-year life cycle of Production Support for version 3 will end on July 31, 2014. On August 1, 2014, Red Hat Enterprise Linux OpenStack Platform version 3 will enter an inactive state and will no longer receive updated packages, including Critical-impact security patches or urgent-priority bug fixes. In addition, technical support through Red Hat's Global Support Services will no longer be provided after this date. We encourage customers to plan their migration from Red Hat Enterprise Linux OpenStack Platform 3.0 to a supported version of Red Hat Enterprise Linux OpenStack Platform. To upgrade to Red Hat Enterprise Linux OpenStack Platform version 4, see Chapter "Upgrading" in the Release Notes document linked to in the References section. Full details of the Red Hat Enterprise Linux OpenStack Platform Life Cycle can be found at https://access.redhat.com/support/policy/updates/openstack/platform/ https://rhn.redhat.com/errata/RHSA-2014-0995.html