Red Hat Bugzilla – Bug 1480897
Overcloud deployment fails when using ovs 2.8.90
Last modified: 2017-10-11 14:27:25 EDT
Description of problem:
Overcloud deployment fails with "no valid hosts" when using ovs 2.8.90 instead of ovs 2.7.0-8
Version-Release number of selected component (if applicable): RHOSP 12
How reproducible: 100%
Steps to Reproduce:
1. Install undercloud with ovs 2.8.90
2. Deploy overcloud with ovs 2.8.90
2017-08-12 15:34:25Z [overcloud.Controller.2.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2017-08-12 15:34:25Z [overcloud.Controller.2.Controller]: DELETE_IN_PROGRESS state changed
2017-08-12 15:34:26Z [overcloud.Controller.2.Controller]: DELETE_COMPLETE state changed
2017-08-12 15:34:27Z [overcloud.Controller.0.Controller]: CREATE_IN_PROGRESS state changed
2017-08-12 15:34:28Z [overcloud.Controller.0.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
Expected results: Overcloud deployed successfully
A couple of critical looking issues in /var/log/messages on the undercloud:
Aug 12 10:06:18 undercloud-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=5 --id=@manager -- create Manager "target=\"ptcp:6640:127.0.0.1\"" -- add Open_vSwitch . manager_options @manager
Aug 12 10:06:18 undercloud-0 ovs-vsctl: ovs|00002|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Permission denied)
Aug 12 11:36:22 undercloud-0 registry: 192.168.24.1 - - [12/Aug/2017:11:36:22 -0400] "OPTIONS / HTTP/1.0" 200 0 "" ""
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Traceback (most recent call last):
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 115, in wait
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: listener.cb(fileno)
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 57, in on_write
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: current.switch((, [original], ))
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: result = function(*args, **kwargs)
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 65, in _launch
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: raise e
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Exception: Could not retrieve schema from tcp:127.0.0.1:6640: Connection refused
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Removing descriptor: 9
Since there is a permission error running the command that sets ovsdb-server to listen on ptcp:6640, neutron won't be able to communicate with OVS. Things to check would be if there are selinux failures when running that command on the undercloud, in which case it could be new selinux rules are required for the new OVS, or the opentsack-selinux package isn't installed. I can't tell from the log snippet, but it could be that rootwrap wasn't configured properly and sudo wasn't used to run the create/add manager commands.
openstack-selinux is installed but probably doesn't include updated policy for the new ovs.
the failure posted here is after setting selinux to permissive.
with enforcing it fails much earlier with: "unable to start openvswitch..."
It looks like we are now running openvswitch services under their own users instead of as root, so that explains all of the dac_override selinux issues--/var/run/openvswitch/conf.db etc. are now owned by 'openvswitch' instead of root. Locally, when using ovs 2.8 and selinux=permissive, I am able to run sudo ovs-vsctl set-manager ptcp:6640, which makes me wonder whether sudo is actually being called (i'm not sure why it wouldn't be).
arie, according to aconole, you could try commenting out:
in /etc/sysconfig/openvswitch to work around the issue for now as well.
Do we expect to hit the user ID issue once the OVS team provides a "proper" OVS 2.8 build? If so, what would a solution look like? Would it come from the OVS or OSP (Neutron/TripleO) side?
amuller: aconole said he has a patch in testing for the selinux issues. we don't know if there are additional issues. It seems like their may be if things are in permissive mode. aconole said it looked like ovs-vsctl was run as root, so maybe there is a chance that somewhere in the deployment process selinux was re-enabled? Otherwise I'm not sure unless the ovs-vsctl itself was specifically dropping permissions (and the only place I see that is in the servers).
bz 1482682 describes the selinux issue, and links to a a patch upstream that should fix this (https://patchwork.ozlabs.org/patch/802232/)
arie, did you try removing
from /etc/sysconfig/openvswitch to temporarily move past the selinux issue? Also, what setting are you using to disable selinux. I might be able to help finding out where things are getting changed back as well.
Note: OVS 2.8.90 targeted for RHOSP 13. OSP 12 is running with ovs 2.7.2
Looks like this was fixed in openvswitch by providing its own custom policy.