Bug 1186749

Summary: after the controller failover, the nova computer service cannot connect to controller
Product: Red Hat OpenStack Reporter: lidong chen <lidchen>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED DUPLICATE QA Contact: nlevinki <nlevinki>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.0 (RHEL 7)CC: berrange, ccliu, dasmith, dmaley, eglynn, jeckersb, kchamart, ndipanov, pbrady, sbauza, sferdjao, sgordon, vromanso, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-04 18:28:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1175685    
Bug Blocks:    
Attachments:
Description Flags
computer1
none
computer2
none
controller1
none
controller2
none
controller3 none

Description lidong chen 2015-01-28 13:26:45 UTC
Created attachment 985144 [details]
computer1

Description of problem:
I used rhel_osp_installer deploy the high availability openstack.
when the contorller failover, i find the nova-computer service is abnormal.

i used 'pcs cluster standby' command at 2015-01-28 22:54 to trigger failover.

after the controller failover, the nova computer service cannot connect to controller.

I used 'nova-manage service list' command to display the status.

[root@mac04f93882f3ea ~]# nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-cert        mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:35
nova-consoleauth mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:41
nova-scheduler   mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:40
nova-conductor   mac04f93882f3a2.example.com          internal         enabled    :-)   2015-01-28 15:50:32
nova-cert        macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:29
nova-consoleauth macac4e914657d8.example.com          internal         enabled    :-)   2015-01-28 15:50:40
nova-cert        mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:37
nova-consoleauth mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:35
nova-scheduler   macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:26
nova-conductor   macac4e914657d8.example.com          internal         enabled    XXX   2015-01-28 15:47:30
nova-scheduler   mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:39
nova-conductor   mac04f93882f3ea.example.com          internal         enabled    :-)   2015-01-28 15:50:31
nova-compute     mac04f93882f3f2.example.com          nova             enabled    XXX   2015-01-28 15:46:27
nova-compute     mac04f93882f3ca.example.com          nova             enabled    XXX   2015-01-28 15:46:29

this is the log of nova-computer.
2015-01-28 22:54:25.939 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     (40, 11),  # Channel.exchange_declare_ok
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2015-01-28 22:54:25.939 12696 TRACE oslo.messaging._drivers.impl_rabbit 
2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.88.184:5672
2015-01-28 22:54:25.942 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2015-01-28 22:54:26.956 12696 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.88.184:5672
2015-01-28 22:54:52.327 12696 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2015-01-28 22:55:26.972 12696 ERROR nova.servicegroup.drivers.db [-] model server went away
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 95, in _report_state
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     service.service_ref, state_catalog)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/conductor/api.py", line 218, in service_update
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     return self._manager.service_update(context, service, values)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 330, in service_update
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     service=service_p, values=values)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     wait_for_reply=True, timeout=timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     timeout=timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     result = self._waiter.wait(msg_id, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     reply, ending = self._poll_connection(msg_id, timeout)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db     % msg_id)
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID c00916ecaa8b4926a8abecb4267ddf56
2015-01-28 22:55:26.972 12696 TRACE nova.servicegroup.drivers.db 
2015-01-28 22:55:26.973 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 51.035014 sec
2015-01-28 22:56:27.008 12696 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager.update_available_resource: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/periodic_task.py", line 182, in run_periodic_tasks
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     task(self, context)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5527, in update_available_resource
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     rt.update_available_resource(context)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/openstack/common/lockutils.py", line 249, in inner
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     return f(*args, **kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 315, in update_available_resource
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     context, self.host, self.nodename)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/objects/base.py", line 110, in wrapper
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     args, kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 425, in object_class_action
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     objver=objver, args=args, kwargs=kwargs)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     wait_for_reply=True, timeout=timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     timeout=timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     return self._send(target, ctxt, message, wait_for_reply, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     result = self._waiter.wait(msg_id, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     reply, ending = self._poll_connection(msg_id, timeout)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task     % msg_id)
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID d928c1473ad24846a44f2651d5caa96d
2015-01-28 22:56:27.008 12696 TRACE nova.openstack.common.periodic_task 
2015-01-28 22:56:27.009 12696 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 50.03548 sec
2015-01-28 22:56:27.012 12696 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': Socket closed
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 622, in ensure
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _publish
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     publisher = cls(self.conf, self.channel, topic, **kwargs)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 379, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     **options)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.reconnect(channel)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     routing_key=self.routing_key)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init__
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.revive(self._channel)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.declare()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 105, in declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.exchange.declare()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     nowait=nowait, passive=passive,
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 620, in exchange_declare
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     (40, 11),  # Channel.exchange_declare_ok
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.channel_id, allowed_methods)
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     self.method_reader.read_method()
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit     raise m
2015-01-28 22:56:27.012 12696 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.used rhel_osp_installer deploy the high availability openstack.
2.used 'pcs cluster standby' command to trigger failover at each controller node.
3.used 'nova-manage service list' command to display the status.

Actual results:
the nova-computer service is abnormal.

Expected results:
the nova-computer service is normal.

Additional info:

Comment 1 lidong chen 2015-01-28 13:28:15 UTC
Created attachment 985146 [details]
computer2

Comment 2 lidong chen 2015-01-28 13:31:45 UTC
Created attachment 985147 [details]
controller1

Comment 4 lidong chen 2015-01-28 13:33:11 UTC
Created attachment 985155 [details]
controller2

Comment 5 lidong chen 2015-01-28 13:36:12 UTC
Created attachment 985156 [details]
controller3