Bug 1283623 - Overcloud deployment fails - openvswitch agent is not running and nova instances end up in error state
Overcloud deployment fails - openvswitch agent is not running and nova instan...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
high Severity high
: beta
: 8.0 (Liberty)
Assigned To: Ihar Hrachyshka
Eran Kuris
:
Depends On: 1269610
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-19 07:46 EST by Marius Cornea
Modified: 2016-04-26 21:50 EDT (History)
17 users (show)

See Also:
Fixed In Version: openstack-neutron-7.0.0-5.el7ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, a change to the Open vSwitch agent introduced a bug in how the agent handles the segmentation ID value for flat networking during agent startup. Consequently, the agent failed to restart when serving a flat network. With this update, the agent code was fixed to handle segmentation properly for flat networking. As a result, the agent is successfully restarted when serving a flat network.
Story Points: ---
Clone Of: 1269610
Environment:
Last Closed: 2016-04-07 17:12:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1494281 None None None Never

  None (edit)
Description Marius Cornea 2015-11-19 07:46:51 EST
+++ This bug was initially created as a clone of Bug #1269610 +++

Description of problem:
Overcloud deployment fails with the nova instances being in error state. Neutron server log shows 'Failed to bind port' messages. 

Version-Release number of selected component (if applicable):
instack-0.0.8-dev4.el7.centos.noarch
instack-undercloud-2.1.3-dev222.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
1.  openstack overcloud deploy --templates

Actual results:
[stack@instack ~]$ openstack overcloud deploy --templates
Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates
Stack failed with status: Resource CREATE failed: resources.Controller: ResourceInError: resources[0].resources.Controller: Went to status ERROR due to "Message: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance c63bb1e9-ae43-48e2-8824-b9a8f1ccfc5e. Last exception: [u'Traceback (most recent call last): \n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1, Code: 500"


Expected results:
Stack is created successfully. 

Additional info:
[stack@instack ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+----------+
| ID                                   | Name                    | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------+--------+------------+-------------+----------+
| e5c061ce-ad2e-4b62-8f13-8180776b5727 | overcloud-controller-0  | ERROR  | -          | NOSTATE     |          |
| b50ad55c-55ce-4916-888f-ed5d601f91d1 | overcloud-novacompute-0 | ERROR  | -          | NOSTATE     |          |
+--------------------------------------+-------------------------+--------+------------+-------------+----------+

neutron logs show Failed to bind port messages which point to neutron-openvswitch-agent not running. openvswitch agent log shows:
ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-ab5b4e8e-bf5c-444b-a74c-338fd54e3363 - - - - -] invalid literal for int() with base 10: 'None' Agent terminated!

Attaching openvswitch agent log.

--- Additional comment from Marius Cornea on 2015-10-07 13:49:37 EDT ---

Following the docs at http://docs.openstack.org/developer/tripleo-docs/installation/installing.html

Installed repos:
http://paste.openstack.org/show/475641/

[stack@instack ~]$ rpm -qa | grep neutron
python-neutron-8.0.0-dev362.el7.centos.noarch
python-neutronclient-3.1.1-dev7.el7.centos.noarch
openstack-neutron-openvswitch-8.0.0-dev362.el7.centos.noarch
openstack-neutron-8.0.0-dev362.el7.centos.noarch
openstack-neutron-ml2-8.0.0-dev362.el7.centos.noarch
openstack-neutron-common-8.0.0-dev362.el7.centos.noarch

--- Additional comment from John Trowbridge on 2015-10-08 09:47:49 EDT ---

I see one issue here, but I am not sure if it is the root cause.

Those neutron packages are Mitaka. This is a problem with using upstream docs for RDO. We need to fork the upstream docs so that we can point to liberty trunk repos.

Could you see if you can reproduce this issue using liberty repos?

http://trunk.rdoproject.org/centos7-liberty/
instead of
http://trunk.rdoproject.org/centos7/

--- Additional comment from Graeme Gillies on 2015-10-12 22:22:30 EDT ---

I can reproduce this problem in liberty with the following packages

python-neutron-7.0.0.0-rc2.dev21.el7.centos.noarch
python-neutronclient-3.1.1-dev1.el7.centos.noarch
openstack-neutron-ml2-7.0.0.0-rc2.dev21.el7.centos.noarch
openstack-neutron-7.0.0.0-rc2.dev21.el7.centos.noarch
openstack-neutron-common-7.0.0.0-rc2.dev21.el7.centos.noarch
openstack-neutron-openvswitch-7.0.0.0-rc2.dev21.el7.centos.noarch

Others are hitting this in RDO Liberty as well

https://bugs.launchpad.net/neutron/+bug/1494281

--- Additional comment from Marius Cornea on 2015-10-16 08:20:52 EDT ---

I've just hit this again after a couple of overcloud redeployments. The openvswitch agent eventually fails with 'invalid literal for int() with base 10: 'None' Agent terminated!' and no further instances can be boot up.

openstack-neutron-ml2-7.0.0.0-rc2.dev21.el7.centos.noarch
openstack-neutron-openvswitch-7.0.0.0-rc2.dev21.el7.centos.noarch
openstack-neutron-common-7.0.0.0-rc2.dev21.el7.centos.noarch
openstack-neutron-7.0.0.0-rc2.dev21.el7.centos.noarch
python-neutron-7.0.0.0-rc2.dev21.el7.centos.noarch
python-neutronclient-3.1.1-dev1.el7.centos.noarch

--- Additional comment from Marius Cornea on 2015-10-16 09:37:17 EDT ---

To reproduce this issue you can restart the openvswitch-agent right after the undercloud installation and the error should show up:

systemctl restart neutron-openvswitch-agent.service

--- Additional comment from Ihar Hrachyshka on 2015-10-19 08:18:02 EDT ---

The segment id is provided by neutron-server which should allocate it for a port and push into the agent. I suspect a configuration issue. Please attach config and log files for ovs agent and neutron-server.

--- Additional comment from Marius Cornea on 2015-10-19 14:23 EDT ---

Attached. Thanks.

--- Additional comment from Ihar Hrachyshka on 2015-10-20 08:00:17 EDT ---

I don't see all config files that are read by neutron-server, specifically, plugin.ini (which is probably ml2_conf.ini).

I believe it's the issue known to upstream: https://bugs.launchpad.net/neutron/+bug/1494281

I see that your neutron-server uses flat networking (as per logs). This is probably what is broken in Liberty.

--- Additional comment from Marius Cornea on 2015-10-20 08:09 EDT ---

I attached the ml2_conf.ini. By default there is a single flat network that gets created. Note that the configuration is done by the installer so if any config changes are required we'd probably want to track those in a separate BZ.

--- Additional comment from Ihar Hrachyshka on 2015-11-05 07:55:27 EST ---

The fix was merged in upstream into Liberty branch. We expect a new release of Neutron this week; once it's there, we'll rebase the package.
Comment 2 Marius Cornea 2015-11-19 07:49:03 EST
I hit this on the undercloud after restarting neutron-openvswitch-agent. 

Version:
openstack-neutron-common-7.0.0-4.el7ost.noarch
openstack-neutron-openvswitch-7.0.0-4.el7ost.noarch
openstack-neutron-ml2-7.0.0-4.el7ost.noarch
openstack-neutron-7.0.0-4.el7ost.noarch
python-neutron-7.0.0-4.el7ost.noarch
python-neutronclient-3.1.0-1.el7ost.noarch
Comment 5 Eran Kuris 2015-12-09 06:51:49 EST
Verified - fixed in :
 [stack@instack ~]$ rpm -qa | grep neutron
openstack-neutron-7.0.0-5.el7ost.noarch
openstack-neutron-ml2-7.0.0-5.el7ost.noarch
openstack-neutron-openvswitch-7.0.0-5.el7ost.noarch
openstack-neutron-common-7.0.0-5.el7ost.noarch
python-neutron-7.0.0-5.el7ost.noarch
python-neutronclient-3.1.0-1.el7ost.noarch
Comment 6 errata-xmlrpc 2016-04-07 17:12:44 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html

Note You need to log in before you can comment on or make changes to this bug.