+++ This bug was initially created as a clone of Bug #1085006 +++ While working with a partner on some problems in their system, I have observed two instances where the qpid client library gets into a bad state and the qpid connection thread in nova never recovers. An example of the exception is: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 78, in inner_func return infunc(*args, **kwargs) File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 698, in _consumer_thread self.consume() File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 689, in consume it.next() File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 606, in iterconsume yield self.ensure(_error_callback, _consume) File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 540, in ensure return method(*args, **kwargs) File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 597, in _consume nxt_receiver = self.session.next_receiver(timeout=timeout) File "<string>", line 6, in next_receiver File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver if self._ecwait(lambda: self.incoming, timeout): File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait result = self._ewait(lambda: self.closed or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 571, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 214, in _ewait self.check_error() File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 207, in check_error raise self.error InternalError: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/qpid/messaging/driver.py", line 667, in write self._op_dec.write(*self._seg_dec.read()) File "/usr/lib/python2.6/site-packages/qpid/framing.py", line 269, in write if self.op.headers is None: AttributeError: 'NoneType' object has no attribute 'headers' There is some code that automatically detects if the thread dies with an exception. It will sleep for a second a retry. The code will sit in this loop forever. Every time it tries to run again it will hit this error immediately. As a result, you see a message like this every minute or so: 2014-04-06 09:03:49.014 125211 ERROR root [-] Unexpected exception occurred 60 time(s)... retrying. Part of the issue is that I don't think this should ever happen. However, if it does, Nova should be more tolerant and reset the connection instead being stuck in this error for forever. --- Additional comment from Russell Bryant on 2014-04-09 16:21:12 EDT --- This patch has been merged into both oslo.messaging and the rpc library in oslo-incubator. In RHOS 4.0, nothing had been converted to oslo.messaging, so this fix needs to be backported to all of the projects that include rpc from oslo-incubator. I will be cloning this bug to all affected projects.
fix is in there tested on 2013.2.4-1.el6ost
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2014-1687.html