This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1480897 - Overcloud deployment fails when using ovs 2.8.90
Overcloud deployment fails when using ovs 2.8.90
Status: POST
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Open vSwitch development team
Ofer Blaut
: Triaged
Depends On: 1482682
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-12 13:56 EDT by Arie Bregman
Modified: 2017-10-11 14:27 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Arie Bregman 2017-08-12 13:56:07 EDT
Description of problem:

Overcloud deployment fails with "no valid hosts" when using ovs 2.8.90 instead of ovs 2.7.0-8

Version-Release number of selected component (if applicable): RHOSP 12


How reproducible: 100%


Steps to Reproduce:
1. Install undercloud with ovs 2.8.90
2. Deploy overcloud with ovs 2.8.90

Actual results: 

2017-08-12 15:34:25Z [overcloud.Controller.2.Controller]: CREATE_FAILED  ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2017-08-12 15:34:25Z [overcloud.Controller.2.Controller]: DELETE_IN_PROGRESS  state changed
2017-08-12 15:34:26Z [overcloud.Controller.2.Controller]: DELETE_COMPLETE  state changed
2017-08-12 15:34:27Z [overcloud.Controller.0.Controller]: CREATE_IN_PROGRESS  state changed
2017-08-12 15:34:28Z [overcloud.Controller.0.Controller]: CREATE_FAILED  ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"


Expected results: Overcloud deployed successfully
Comment 2 Brent Eagles 2017-08-14 08:27:24 EDT
A couple of critical looking issues in /var/log/messages on the undercloud:

Aug 12 10:06:18 undercloud-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=5 --id=@manager -- create Manager "target=\"ptcp:6640:127.0.0.1\"" -- add Open_vSwitch . manager_options @manager
Aug 12 10:06:18 undercloud-0 ovs-vsctl: ovs|00002|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Permission denied)

Aug 12 11:36:22 undercloud-0 registry: 192.168.24.1 - - [12/Aug/2017:11:36:22 -0400] "OPTIONS / HTTP/1.0" 200 0 "" ""
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Traceback (most recent call last):
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 115, in wait
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: listener.cb(fileno)
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 57, in on_write
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: current.switch(([], [original], []))
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: result = function(*args, **kwargs)
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 65, in _launch
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: raise e
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Exception: Could not retrieve schema from tcp:127.0.0.1:6640: Connection refused
Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Removing descriptor: 9
Comment 3 Terry Wilson 2017-08-14 09:01:33 EDT
Since there is a permission error running the command that sets ovsdb-server to listen on ptcp:6640, neutron won't be able to communicate with OVS. Things to check would be if there are selinux failures when running that command on the undercloud, in which case it could be new selinux rules are required for the new OVS, or the opentsack-selinux package isn't installed. I can't tell from the log snippet, but it could be that rootwrap wasn't configured properly and sudo wasn't used to run the create/add manager commands.
Comment 4 Arie Bregman 2017-08-14 09:07:45 EDT
openstack-selinux is installed but probably doesn't include updated policy for the new ovs.

the failure posted here is after setting selinux to permissive.
with enforcing it fails much earlier with: "unable to start openvswitch..."

Any suggestions?
Comment 6 Terry Wilson 2017-08-14 10:45:13 EDT
It looks like we are now running openvswitch services under their own users instead of as root, so that explains all of the dac_override selinux issues--/var/run/openvswitch/conf.db etc. are now owned by 'openvswitch' instead of root. Locally, when using ovs 2.8 and selinux=permissive, I am able to run sudo ovs-vsctl set-manager ptcp:6640, which makes me wonder whether sudo is actually being called (i'm not sure why it wouldn't be).
Comment 7 Terry Wilson 2017-08-14 11:05:20 EDT
arie, according to aconole, you could try commenting out:

OVS_USER_ID="openvswitch:hugetlbfs"

in /etc/sysconfig/openvswitch to work around the issue for now as well.
Comment 8 Assaf Muller 2017-08-14 12:09:42 EDT
Do we expect to hit the user ID issue once the OVS team provides a "proper" OVS 2.8 build? If so, what would a solution look like? Would it come from the OVS or OSP (Neutron/TripleO) side?
Comment 9 Terry Wilson 2017-08-14 12:23:56 EDT
amuller: aconole said he has a patch in testing for the selinux issues. we don't know if there are additional issues. It seems like their may be if things are in permissive mode. aconole said it looked like ovs-vsctl was run as root, so maybe there is a chance that somewhere in the deployment process selinux was re-enabled? Otherwise I'm not sure unless the ovs-vsctl itself was specifically dropping permissions (and the only place I see that is in the servers).
Comment 13 Terry Wilson 2017-08-17 18:37:57 EDT
bz 1482682 describes the selinux issue, and links to a a patch upstream that should fix this (https://patchwork.ozlabs.org/patch/802232/)
Comment 14 Terry Wilson 2017-08-25 11:47:57 EDT
arie, did you try removing

OVS_USER_ID="openvswitch:openvswitch"

from /etc/sysconfig/openvswitch to temporarily move past the selinux issue? Also, what setting are you using to disable selinux. I might be able to help finding out where things are getting changed back as well.
Comment 17 Arie Bregman 2017-09-06 04:48:19 EDT
Note: OVS 2.8.90 targeted for RHOSP 13. OSP 12 is running with ovs 2.7.2
Comment 19 Lon Hohberger 2017-10-11 14:26:12 EDT
Looks like this was fixed in openvswitch by providing its own custom policy.

Note You need to log in before you can comment on or make changes to this bug.