Description of problem: We hit a problem in oslab which manifests as hosts not getting any network. Eventually I traced it down to neutron-server not responding to RPC calls; this causes a cascading failure in all of the compute-node's dhcp-agents, openvswitch-agents, etc. Going right back to the root cause, neutron-server first logged --- 2014-04-09 19:02:10.901 7583 WARNING neutron.scheduler.dhcp_agent_scheduler [-] No more DHCP agents 2014-04-09 19:02:10.903 7583 WARNING neutron.db.agentschedulers_db [-] Fail scheduling network {'status': u'ACTIVE', 'subnets': [u'fc0eea49-5f08-4029-b806-ba1b61f1e7ba'], 'name': u'network1-net', 'provider:physical_network': None, 'admin_state_up': True, 'tenant_id': +u'210a99a1e68f43218f4cab705c908d45', 'provider:network_type': u'gre', 'router:external': False, 'shared': False, 'id': u'1c8f57e6-fa66-4bba-90e3-1a735cab79e2', 'provider:segmentation_id': 46L} 2014-04-09 19:02:10.908 7583 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [-] Only 0 of 1 DHCP agents associated with network '1c8f57e6-fa66-4bba-90e3-1a735cab79e2' are marked as active, so notifications may be sent to inactive agents. 2014-04-09 19:02:15.165 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:19.356 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:21.220 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:25.363 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:26.684 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:28.296 7583 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [-] Only 0 of 1 DHCP agents associated with network '1c8f57e6-fa66-4bba-90e3-1a735cab79e2' are marked as active, so notifications may be sent to inactive agents. 2014-04-09 19:02:31.062 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:32.163 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:33.223 7583 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.2 2014-04-09 19:02:39.974 7583 ERROR root [-] Unexpected exception occurred 61 time(s)... retrying. 2014-04-09 19:02:39.974 7583 TRACE root Traceback (most recent call last): 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 62, in inner_func 2014-04-09 19:02:39.974 7583 TRACE root return infunc(*args, **kwargs) 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 686, in _consumer_thread 2014-04-09 19:02:39.974 7583 TRACE root self.consume() 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 677, in consume 2014-04-09 19:02:39.974 7583 TRACE root it.next() 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 594, in iterconsume 2014-04-09 19:02:39.974 7583 TRACE root yield self.ensure(_error_callback, _consume) 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 528, in ensure 2014-04-09 19:02:39.974 7583 TRACE root return method(*args, **kwargs) 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 585, in _consume 2014-04-09 19:02:39.974 7583 TRACE root nxt_receiver = self.session.next_receiver(timeout=timeout) 2014-04-09 19:02:39.974 7583 TRACE root File "<string>", line 6, in next_receiver 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 660, in next_receiver 2014-04-09 19:02:39.974 7583 TRACE root if self._ecwait(lambda: self.incoming, timeout): 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait 2014-04-09 19:02:39.974 7583 TRACE root result = self._ewait(lambda: self.closed or predicate(), timeout) 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 566, in _ewait 2014-04-09 19:02:39.974 7583 TRACE root result = self.connection._ewait(lambda: self.error or predicate(), timeout) 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 209, in _ewait 2014-04-09 19:02:39.974 7583 TRACE root self.check_error() 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 202, in check_error 2014-04-09 19:02:39.974 7583 TRACE root raise self.error 2014-04-09 19:02:39.974 7583 TRACE root InternalError: Traceback (most recent call last): 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/driver.py", line 651, in write 2014-04-09 19:02:39.974 7583 TRACE root self._op_dec.write(*self._seg_dec.read()) 2014-04-09 19:02:39.974 7583 TRACE root File "/usr/lib/python2.6/site-packages/qpid/framing.py", line 269, in write 2014-04-09 19:02:39.974 7583 TRACE root if self.op.headers is None: 2014-04-09 19:02:39.974 7583 TRACE root AttributeError: 'NoneType' object has no attribute 'headers' 2014-04-09 19:02:39.974 7583 TRACE root 2014-04-09 19:02:39.974 7583 TRACE root --- it kept logging similar output until it was killed by an admin. From this point on it appears to have stopped processing rpc calls, with predictably bad consequences --- [root@host02 log]# qpid-stat -q -S msg Queues queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind ========================================================================================================================== q-plugin 59.6k 2.38m 2.32m 43.6m 1.73g 1.69g 0 2 q-l3-plugin 65 12.8k 12.7k 35.0k 7.05m 7.02m 0 2 --- Seems this has been noticed [1] and describes the problem. As mentioned there is a review out [1] https://bugs.launchpad.net/oslo/+bug/1303890 [2] https://review.openstack.org/85750
*** Bug 1085997 has been marked as a duplicate of this bug. ***
Verified NVR: openstack-neutron-2013.2.3-7.el6ost verified with neutron sanity tests.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0516.html