Bug 1273680 - HA overcloud with network isolation deployment fails
HA overcloud with network isolation deployment fails
Status: CLOSED CURRENTRELEASE
Product: RDO
Classification: Community
Component: rdo-manager (Show other bugs)
Liberty
x86_64 Linux
high Severity high
: ---
: Liberty
Assigned To: John Trowbridge
Shai Revivo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-20 21:35 EDT by Alexander Chuzhoy
Modified: 2017-06-18 02:24 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-18 02:24:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
neutron server log (14.89 KB, text/plain)
2015-10-21 05:49 EDT, Marius Cornea
no flags Details
Neutron configuration files (29.78 KB, application/x-gzip)
2015-10-21 06:36 EDT, Marius Cornea
no flags Details
Neutron log files (312.52 KB, application/x-gzip)
2015-10-21 06:37 EDT, Marius Cornea
no flags Details
sosreport (17.64 MB, application/x-xz)
2015-10-21 06:50 EDT, Marius Cornea
no flags Details

  None (edit)
Description Alexander Chuzhoy 2015-10-20 21:35:16 EDT
Deployment fails: "Resource CREATE failed: resources.ControllerOvercloudServicesDeployment_Step5: resources.ControllerNodesPostDeployment.Error: resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6"



[stack@undercloud ~]$ heat resource-list -n 5 overcloud|grep -v COMPLE
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------------------------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time        | stack_name                                                                                                    |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------------------------------------+
| ControllerNodesPostDeployment               | abb7659b-008f-4877-940f-d63fb3f1f6f9          | OS::TripleO::ControllerPostDeployment             | CREATE_FAILED   | 2015-10-09T20:44:51 | overcloud                                                                                                     |
| ControllerOvercloudServicesDeployment_Step5 | 5e94bbae-da4f-4748-b7da-378cb248b13e          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2015-10-09T20:57:41 | overcloud-ControllerNodesPostDeployment-77ujcqvdrmro                                                          |
| 0                                           | 35fc43c3-2faf-47dc-b53a-0e18de104a0b          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2015-10-09T21:09:38 | overcloud-ControllerNodesPostDeployment-77ujcqvdrmro-ControllerOvercloudServicesDeployment_Step5-5mjvfzaqqcez |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+---------------------------------------------------------------------------------------------------------------+




Checking the /var/log/messages on one of the controllers:


Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent [req-23b41339-0b69-441f-bc4a-c088161cf920 - - - - -] Failed reporting state!
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 571, in _report_state
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 86, in report_state
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     return method(context, 'report_state', **kwargs)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 158, in call
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     retry=self.retry)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     timeout=timeout, retry=retry)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     retry=retry)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     result = self._waiter.wait(msg_id, timeout)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     message = self.waiters.get(msg_id, timeout=timeout)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent     'to message ID %s' % msg_id)
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID d88ede86381042b3b742cdf9eb03d9b0
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.839 21103 ERROR neutron.agent.dhcp.agent
Oct 21 01:28:32 localhost neutron-dhcp-agent: 2015-10-21 01:28:32.843 21103 WARNING oslo.service.loopingcall [req-23b41339-0b69-441f-bc4a-c088161cf920 - - - - -] Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state'
run outlasted interval by 30.01 sec
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 330, in _report_state
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.use_call)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 86, in report_state
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     return method(context, 'report_state', **kwargs)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 158, in call
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=self.retry)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     timeout=timeout, retry=retry)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=retry)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = self._waiter.wait(msg_id, timeout)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     message = self.waiters.get(msg_id, timeout=timeout)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     'to message ID %s' % msg_id)
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 9fa9942aebb0487c9f79012ead8e147e
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.133 21134 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
Oct 21 01:28:33 localhost neutron-openvswitch-agent: 2015-10-21 01:28:33.135 21134 WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent._report_state' run outlasted interval by 30.01 sec




Steps to reproduce:
Attempt to deploy HA overcloud with network isolation.



Result:
Stack failed with status: Resource CREATE failed: resources.ControllerOvercloudServicesDeployment_Step5: resources.ControllerNodesPostDeployment.Error: resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
Heat Stack create failed.


Expected result:
Successful deployment.
Comment 1 Ihar Hrachyshka 2015-10-21 05:09:56 EDT
It can be eg. neutron-server failing to reply, or AMQP broker failing to start, or anything in between. Without proper logs provided, it's hard to say. /var/log/messages snippets are NOT enough to make clear call.
Comment 2 Marius Cornea 2015-10-21 05:49 EDT
Created attachment 1085060 [details]
neutron server log

Attaching the neutron server.log on the first controller on Sasha's environment.

Neutron related packages:
python-neutron-lbaas-7.0.0-1.el7.noarch
openstack-neutron-openvswitch-7.0.0-2.el7.noarch
python-neutron-7.0.0-2.el7.noarch
openstack-neutron-7.0.0-2.el7.noarch
python-neutronclient-3.1.0-1.el7.noarch
openstack-neutron-common-7.0.0-2.el7.noarch
openstack-neutron-lbaas-7.0.0-1.el7.noarch
openstack-neutron-ml2-7.0.0-2.el7.noarch
openstack-neutron-metering-agent-7.0.0-2.el7.noarch
Comment 3 Ihar Hrachyshka 2015-10-21 06:28:26 EDT
OK, logs suggest configuration issue for server. Please show what's core_plugin.

Also, please attach all logs, not a single snippet. I don't see debug logs that could show which configuration the server sees.

Overall, let's try to avoid the churn and provide logs and configs in advance without waiting for devs to request them.
Comment 4 Alan Pevec 2015-10-21 06:32:56 EDT
Was cisco networking plugin installed?
Last time we saw ValueError: Empty module name when neutron is loading plugins was when cisco plugin from Kilo was installed (it has duplicate ml2 entry point confusing stevedore)
Cisco plugin is NOT Liberty compatible yet and most not be installed.
Comment 5 Marius Cornea 2015-10-21 06:36 EDT
Created attachment 1085078 [details]
Neutron configuration files

Attaching the neutron configuration files in /etc/neutron.
Comment 6 Alan Pevec 2015-10-21 06:36:51 EDT
What Ihar said, also rpm -qa or even full sosreport (which might too big for BZ attachment, so upload somewhere e.g. personal fedorapeople.org page)
Comment 7 Marius Cornea 2015-10-21 06:37 EDT
Created attachment 1085079 [details]
Neutron log files

Attaching the Neutron log files in /var/log/neutron
Comment 8 Marius Cornea 2015-10-21 06:38:30 EDT
[root@overcloud-controller-0 ~]# rpm -qa | grep cisco
python-networking-cisco-2015.1.0-1.el7.noarch
fence-agents-cisco-ucs-4.0.11-13.el7_1.2.x86_64
fence-agents-cisco-mds-4.0.11-13.el7_1.2.x86_64
Comment 9 Ihar Hrachyshka 2015-10-21 06:38:45 EDT
OK, I see cisco config files in the config tarball. I believe that's the issue Alan mentioned.
Comment 10 Marius Cornea 2015-10-21 06:50 EDT
Created attachment 1085084 [details]
sosreport

Attaching the sosreport.
Comment 11 Alan Pevec 2015-10-21 07:13:11 EDT
overcloud controller has kilo repo enabled which should not be the case!
I'm also not sure where is centos-cloud-rdo coming from, that repo is not defined in either cloud sig or rdo-release RPM:

base/7/x86_64 CentOS-7 - Base                                                                                                          
centos-cloud-rdo CentOS Cloud RDO
extras/7/x86_64 CentOS-7 - Extras                                                                                                          openstack-kilo OpenStack Kilo Repository                                                                                                  updates/7/x86_64 CentOS-7 - Updates

Please rebuild images using only steps from https://etherpad.openstack.org/p/RDO-Manager_liberty:
yum install -y  http://rdoproject.org/repos/openstack-liberty/rdo-release-liberty.rpm 
export NODE_DIST=centos7
export RDO_RELEASE='liberty'
openstack overcloud image build --all
Comment 12 John Trowbridge 2015-10-21 08:32:34 EDT
This is at least related, I would probably say duplicate actually of https://bugzilla.redhat.com/show_bug.cgi?id=1271200

My last comment from the above BZ for completeness:

This is fixed in the liberty-testing repo
http://buildlogs.centos.org/centos/7/cloud/x86_64/openstack-liberty/

with python-tripleoclient-0.0.11-3

However, I am going to leave it assigned until the fix is also in delorean.

In order to use the release repo when building images it is now needed to:
`export RDO_RELEASE=<release>`

so for liberty:
`export RDO_RELEASE='liberty'`

by default it will use kilo.
Comment 14 Christopher Brown 2017-06-17 13:21:53 EDT
This was fixed a while back so can be closed.

Note You need to log in before you can comment on or make changes to this bug.