Bug 1480897
Summary: | Overcloud deployment fails when using ovs 2.9.90 | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Arie Bregman <abregman> |
Component: | openvswitch | Assignee: | Aaron Conole <aconole> |
Status: | CLOSED NOTABUG | QA Contact: | Ofer Blaut <oblaut> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 12.0 (Pike) | CC: | abregman, amuller, apevec, atragler, beagles, chrisw, danken, fleitner, mgrepl, nyechiel, rhallise, rhos-maint, srevivo, supadhya, twilson |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 12:36:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1482682 | ||
Bug Blocks: |
Description
Arie Bregman
2017-08-12 17:56:07 UTC
A couple of critical looking issues in /var/log/messages on the undercloud: Aug 12 10:06:18 undercloud-0 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=5 --id=@manager -- create Manager "target=\"ptcp:6640:127.0.0.1\"" -- add Open_vSwitch . manager_options @manager Aug 12 10:06:18 undercloud-0 ovs-vsctl: ovs|00002|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Permission denied) Aug 12 11:36:22 undercloud-0 registry: 192.168.24.1 - - [12/Aug/2017:11:36:22 -0400] "OPTIONS / HTTP/1.0" 200 0 "" "" Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Traceback (most recent call last): Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 115, in wait Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: listener.cb(fileno) Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 57, in on_write Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: current.switch(([], [original], [])) Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: result = function(*args, **kwargs) Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 65, in _launch Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: raise e Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Exception: Could not retrieve schema from tcp:127.0.0.1:6640: Connection refused Aug 12 11:36:22 undercloud-0 neutron-openvswitch-agent: Removing descriptor: 9 Since there is a permission error running the command that sets ovsdb-server to listen on ptcp:6640, neutron won't be able to communicate with OVS. Things to check would be if there are selinux failures when running that command on the undercloud, in which case it could be new selinux rules are required for the new OVS, or the opentsack-selinux package isn't installed. I can't tell from the log snippet, but it could be that rootwrap wasn't configured properly and sudo wasn't used to run the create/add manager commands. openstack-selinux is installed but probably doesn't include updated policy for the new ovs. the failure posted here is after setting selinux to permissive. with enforcing it fails much earlier with: "unable to start openvswitch..." Any suggestions? It looks like we are now running openvswitch services under their own users instead of as root, so that explains all of the dac_override selinux issues--/var/run/openvswitch/conf.db etc. are now owned by 'openvswitch' instead of root. Locally, when using ovs 2.8 and selinux=permissive, I am able to run sudo ovs-vsctl set-manager ptcp:6640, which makes me wonder whether sudo is actually being called (i'm not sure why it wouldn't be). arie, according to aconole, you could try commenting out: OVS_USER_ID="openvswitch:hugetlbfs" in /etc/sysconfig/openvswitch to work around the issue for now as well. Do we expect to hit the user ID issue once the OVS team provides a "proper" OVS 2.8 build? If so, what would a solution look like? Would it come from the OVS or OSP (Neutron/TripleO) side? amuller: aconole said he has a patch in testing for the selinux issues. we don't know if there are additional issues. It seems like their may be if things are in permissive mode. aconole said it looked like ovs-vsctl was run as root, so maybe there is a chance that somewhere in the deployment process selinux was re-enabled? Otherwise I'm not sure unless the ovs-vsctl itself was specifically dropping permissions (and the only place I see that is in the servers). bz 1482682 describes the selinux issue, and links to a a patch upstream that should fix this (https://patchwork.ozlabs.org/patch/802232/) arie, did you try removing OVS_USER_ID="openvswitch:openvswitch" from /etc/sysconfig/openvswitch to temporarily move past the selinux issue? Also, what setting are you using to disable selinux. I might be able to help finding out where things are getting changed back as well. Note: OVS 2.8.90 targeted for RHOSP 13. OSP 12 is running with ovs 2.7.2 Looks like this was fixed in openvswitch by providing its own custom policy. |