Description of problem: Sequence of scale up compute-1, then reboot, then scale down compute-0, then reboot again causes nova not to start after second reboot Version-Release number of selected component (if applicable): How reproducible: rerun the jenkins job Steps to Reproduce: 1. scale up compute-1 2. reboot 3. scale down compute-0 4. reboot Actual results: nova service fails to start Expected results: should work Additional info: from nova-compute.log on compute-1: 2019-03-16 01:56:31.878 1 ERROR oslo.messaging._drivers.impl_rabbit [req-996c9093-ddda-481e-a65c-5fe10e446cbc - - - - -] [c15e8746-ce5f-4644-a1fd-1b32bf9fabd1] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: timed out. Trying again in 1 seconds. Client port: None: timeout: timed out 2019-03-16 01:56:32.892 1 INFO oslo.messaging._drivers.impl_rabbit [req-996c9093-ddda-481e-a65c-5fe10e446cbc - - - - -] [c15e8746-ce5f-4644-a1fd-1b32bf9fabd1] Reconnected to AMQP server on controller-0.internalapi.localdomain:5672 via [amqp] client with port 36268. 2019-03-16 01:58:32.894 1 ERROR oslo.messaging._drivers.impl_rabbit [req-996c9093-ddda-481e-a65c-5fe10e446cbc - - - - -] [c15e8746-ce5f-4644-a1fd-1b32bf9fabd1] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: timed out. Trying again in 1 seconds. Client port: None: timeout: timed out 2019-03-16 01:58:33.911 1 INFO oslo.messaging._drivers.impl_rabbit [req-996c9093-ddda-481e-a65c-5fe10e446cbc - - - - -] [c15e8746-ce5f-4644-a1fd-1b32bf9fabd1] Reconnected to AMQP server on controller-0.internalapi.localdomain:5672 via [amqp] client with port 36278. on the controller-0 rabbitmq log we see that after controllers were rebooted and then compute-1, rabbitmq was restarted at 01:46 again (probably by pacemaker?) and later on some errors: =ERROR REPORT==== 16-Mar-2019::01:56:30 === Discarding message {'$gen_cast',{deliver,{delivery,false,true,<29642.28738.0>,{basic_message,{resource,<<"/">>,exchange,<<"q-server-resource-versions_fanout">>},[<<>>],{content,60,{'P_basic',<<"application/json">>,<<"utf-8">>,[],2,0,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<248,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110,5,117,116,102,45,56,0,0,0,0,2,0>>,rabbit_framing_amqp_0_9_1,[<<"{\"oslo.message\": \"{\\"_context_domain\\": null, \\"_context_request_id\\": \\"req-491a737c-388e-43bc-b548-0f5d5d965db3\\", \\"_context_global_request_id\\": null, \\"_context_auth_token\\": null, \\"_context_resource_uuid\\": null, \\"_context_tenant_name\\": null, \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_show_deleted\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"1.0\\", \\"_context_project_domain\\": null, \\"_context_timestamp\\": \\"2019-03-16 01:39:49.059025\\", \\"method\\": \\"report_agent_resource_versions\\", \\"_context_project\\": null, \\"_context_roles\\": [], \\"args\\": {\\"version_map\\": {\\"Subnet\\": \\"1.0\\", \\"Network\\": \\"1.0\\", \\"SubPort\\": \\"1.0\\", \\"SecurityGroup\\": \\"1.0\\", \\"SecurityGroupRule\\": \\"1.0\\", \\"Trunk\\": \\"1.1\\", \\"QosPolicy\\": \\"1.7\\", \\"Port\\": \\"1.1\\", \\"Log\\": \\"1.0\\"}, \\"agent_type\\": \\"Open vSwitch agent\\", \\"agent_host\\": \\"controller-1.localdomain\\"}, \\"_unique_id\\": \\"7233c509dd1a4d6cb8ab79583a060843\\", \\"_context_tenant_id\\": null, \\"_context_is_admin_project\\": true, \\"_context_project_name\\": null, \\"_context_user_identity\\": \\"- - - - -\\", \\"_context_tenant\\": null, \\"_context_project_id\\": null, \\"_context_read_only\\": false, \\"_context_user_domain\\": null, \\"_context_user_name\\": null}\", \"oslo.version\": \"2.0\"}">>]},<<124,165,169,40,69,139,221,178,23,12,115,15,163,108,192,137>>,true},1,flow},false}} from <0.1081.0> to <0.1836.0> in an old incarnation (1) of this node (2) =WARNING REPORT==== 16-Mar-2019::01:56:31 === closing AMQP connection <0.7675.0> (172.17.1.23:36206 -> 172.17.1.16:5672 - nova-compute:1:c15e8746-ce5f-4644-a1fd-1b32bf9fabd1, vhost: '/', user: 'guest'): client unexpectedly closed TCP connection ... from openswitch-agent.log on compute-1: 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!: MessagingTimeout: Timed out waiting for a reply to message ID bc0ede5cb23944b79a75b6d09c090759 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last): 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 319, in _report_state 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent True) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 93, in report_state 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return method(context, 'report_state', **kwargs) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=self.retry) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent timeout=timeout, retry=retry) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = self._waiter.wait(msg_id, timeout) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 440, in wait 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = self.waiters.get(msg_id, timeout=timeout) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 328, in get 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message ID %s' % msg_id) 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID bc0ede5cb23944b79a75b6d09c090759 2019-03-16 01:56:35.797 7530 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
*** This bug has been marked as a duplicate of bug 1592528 ***