Bug 1186749 - after the controller failover, the nova computer service cannot connect to controller
Summary: after the controller failover, the nova computer service cannot connect to co...
Keywords:
Status: CLOSED DUPLICATE of bug 1215924
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 6.0 (Juno)
Assignee: Eoghan Glynn
QA Contact: nlevinki
URL:
Whiteboard:
Depends On: 1175685
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-28 13:26 UTC by lidong chen
Modified: 2019-09-09 16:00 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-04 18:28:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
computer1 (480.00 KB, application/x-tar)
2015-01-28 13:26 UTC, lidong chen
no flags Details
computer2 (1.09 MB, application/x-tar)
2015-01-28 13:28 UTC, lidong chen
no flags Details
controller1 (1.39 MB, application/x-7z-compressed)
2015-01-28 13:31 UTC, lidong chen
no flags Details
controller2 (2.46 MB, application/x-7z-compressed)
2015-01-28 13:33 UTC, lidong chen
no flags Details
controller3 (3.22 MB, application/x-7z-compressed)
2015-01-28 13:36 UTC, lidong chen
no flags Details

Description lidong chen 2015-01-28 13:26:45 UTC
Created attachment 985144 [details]
computer1

Description of problem:
I used rhel_osp_installer deploy the high availability openstack.
when the contorller failover, i find the nova-computer service is abnormal.

i used 'pcs cluster standby' command at 2015-01-28 22:54 to trigger failover.

after the controller failover, the nova computer service cannot connect to controller.

I used 'nova-manage service list' command to display the status.

[root@mac04f93882f3ea ~]# nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-cert        mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:35
nova-consoleauth mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:41
nova-scheduler   mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:40
nova-conductor   mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:32
nova-cert        macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:29
nova-consoleauth macac4e914657d8.example.com          internal         enabled    :-)   2015-01-28 15:50:40
nova-cert        mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:37
nova-consoleauth mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:35
nova-scheduler   macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:26
nova-conductor   macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:30
nova-scheduler   mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:39
nova-conductor   mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:31
nova-compute     mac04f93882f3f2.example.com          nova             enabled    XXX   2015-01-28 15:46:27
nova-compute     mac04f93882f3ca.example.com          nova             enabled    XXX   2015-01-28 15:46:29

this is the log of nova-computer.
2015-01-28 22:54:25.939 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     (40, 11),  # Channel.exchange_declare_ok
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit 
2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.88.184:5672
2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2015-01-28 22:54:26.956 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.88.184:5672
2015-01-28 22:54:52.327 12696 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2015-01-28 22:55:26.972 12696 ERROR nova.servicegroup.drivers.db [-] model server went away
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 95, in _report_state
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     service.service_ref, state_catalog)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/conductor/api.py", line 218, in service_update
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     return self._manager.service_update(context, service, values)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 330, in service_update
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     service=service_p, values=values)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     wait_for_reply=True, timeout=timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     timeout=timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     result = self._waiter.wait(msg_id, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     reply, ending = self._poll_connection(msg_id, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     % msg_id)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID c00916ecaa8b4926a8abecb4267ddf56
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db 
2015-01-28 22:55:26.973 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 51.035014 sec
2015-01-28 22:56:27.008 12696 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     task(self, context)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5527, in update_available_resource
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     rt.update_available_resource(context)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 249, in inner
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     return f(*args, **kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 315, in update_available_resource
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     context, self.host, self.nodename)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/objects/base.py", line 110, in wrapper
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     args, kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 425, in object_class_action
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     objver=objver, args=args, kwargs=kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     wait_for_reply=True, timeout=timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     timeout=timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     result = self._waiter.wait(msg_id, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     reply, ending = self._poll_connection(msg_id, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     % msg_id)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task 
2015-01-28 22:56:27.009 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 50.03548 sec
2015-01-28 22:56:27.012 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     (40, 11),  # Channel.exchange_declare_ok
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.used rhel_osp_installer deploy the high availability openstack.
2.used 'pcs cluster standby' command to trigger failover at each controller node.
3.used 'nova-manage service list' command to display the status.

Actual results:
the nova-computer service is abnormal.

Expected results:
the nova-computer service is normal.

Additional info:

Comment 1 lidong chen 2015-01-28 13:28:15 UTC
Created attachment 985146 [details]
computer2

Comment 2 lidong chen 2015-01-28 13:31:45 UTC
Created attachment 985147 [details]
controller1

Comment 4 lidong chen 2015-01-28 13:33:11 UTC
Created attachment 985155 [details]
controller2

Comment 5 lidong chen 2015-01-28 13:36:12 UTC
Created attachment 985156 [details]
controller3


Note You need to log in before you can comment on or make changes to this bug.