Created attachment 985144 [details] computer1 Description of problem: I used rhel_osp_installer deploy the high availability openstack. when the contorller failover, i find the nova-computer service is abnormal. i used 'pcs cluster standby' command at 2015-01-28 22:54 to trigger failover. after the controller failover, the nova computer service cannot connect to controller. I used 'nova-manage service list' command to display the status. [root@mac04f93882f3ea ~]# nova-manage service list Binary Host Zone Status State Updated_At nova-cert mac04f93882f3a2.example.com internal enabled :-) 2015-01-28 15:50:35 nova-consoleauth mac04f93882f3a2.example.com internal enabled :-) 2015-01-28 15:50:41 nova-scheduler mac04f93882f3a2.example.com internal enabled :-) 2015-01-28 15:50:40 nova-conductor mac04f93882f3a2.example.com internal enabled :-) 2015-01-28 15:50:32 nova-cert macac4e914657d8.example.com internal enabled XXX 2015-01-28 15:47:29 nova-consoleauth macac4e914657d8.example.com internal enabled :-) 2015-01-28 15:50:40 nova-cert mac04f93882f3ea.example.com internal enabled :-) 2015-01-28 15:50:37 nova-consoleauth mac04f93882f3ea.example.com internal enabled :-) 2015-01-28 15:50:35 nova-scheduler macac4e914657d8.example.com internal enabled XXX 2015-01-28 15:47:26 nova-conductor macac4e914657d8.example.com internal enabled XXX 2015-01-28 15:47:30 nova-scheduler mac04f93882f3ea.example.com internal enabled :-) 2015-01-28 15:50:39 nova-conductor mac04f93882f3ea.example.com internal enabled :-) 2015-01-28 15:50:31 nova-compute mac04f93882f3f2.example.com nova enabled XXX 2015-01-28 15:46:27 nova-compute mac04f93882f3ca.example.com nova enabled XXX 2015-01-28 15:46:29 this is the log of nova-computer. 2015-01-28 22:54:25.939 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__ 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit **options) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__ 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__ 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit self.declare() 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare() 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive, 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit (40, 11), # Channel.exchange_declare_ok 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, allowed_methods) 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit self.method_reader.read_method() 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit raise m 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed 2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit 2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.88.184:5672 2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2015-01-28 22:54:26.956 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.88.184:5672 2015-01-28 22:54:52.327 12696 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources 2015-01-28 22:55:26.972 12696 ERROR nova.servicegroup.drivers.db [-] model server went away 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db Traceback (most recent call last): 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 95, in _report_state 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db service.service_ref, state_catalog) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/nova/conductor/api.py", line 218, in service_update 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db return self._manager.service_update(context, service, values) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 330, in service_update 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db service=service_p, values=values) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db wait_for_reply=True, timeout=timeout) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db timeout=timeout) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db return self._send(target, ctxt, message, wait_for_reply, timeout) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db result = self._waiter.wait(msg_id, timeout) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db reply, ending = self._poll_connection(msg_id, timeout) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db % msg_id) 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID c00916ecaa8b4926a8abecb4267ddf56 2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db 2015-01-28 22:55:26.973 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 51.035014 sec 2015-01-28 22:56:27.008 12696 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task Traceback (most recent call last): 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task task(self, context) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5527, in update_available_resource 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task rt.update_available_resource(context) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 249, in inner 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task return f(*args, **kwargs) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 315, in update_available_resource 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task context, self.host, self.nodename) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/objects/base.py", line 110, in wrapper 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task args, kwargs) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 425, in object_class_action 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task objver=objver, args=args, kwargs=kwargs) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task wait_for_reply=True, timeout=timeout) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task timeout=timeout) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task return self._send(target, ctxt, message, wait_for_reply, timeout) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task reply, ending = self._poll_connection(msg_id, timeout) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task % msg_id) 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d 2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task 2015-01-28 22:56:27.009 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 50.03548 sec 2015-01-28 22:56:27.012 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic, **kwargs) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__ 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit **options) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__ 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__ 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit self.declare() 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare() 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive, 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit (40, 11), # Channel.exchange_declare_ok 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, allowed_methods) 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit self.method_reader.read_method() 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit raise m 2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.used rhel_osp_installer deploy the high availability openstack. 2.used 'pcs cluster standby' command to trigger failover at each controller node. 3.used 'nova-manage service list' command to display the status. Actual results: the nova-computer service is abnormal. Expected results: the nova-computer service is normal. Additional info:
Created attachment 985146 [details] computer2
Created attachment 985147 [details] controller1
Created attachment 985155 [details] controller2
Created attachment 985156 [details] controller3