Bug 1420636
Summary: | The node service can't be started after upgrade openvswitch to v2.6 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
Component: | Cluster Version Operator | Assignee: | Giuseppe Scrivano <gscrivan> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.5.0 | CC: | anli, aos-bugs, bbennett, bmeng, dcbw, gscrivan, jokerman, mmccomas, sdodson, xtian | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openshift-ansible-3.5.41-1 | Doc Type: | No Doc Update | ||||
Doc Text: |
Fixed a bug in the upgrade to openvswitch to v2.6
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-04-12 19:01:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Comment 1
Scott Dodson
2017-02-09 14:47:35 UTC
Dan: Can you answer this please? Dan, BTW, OVS is upgraded and restarted prior to restarting docker and then the node during 3.4 -> 3.5 upgrades encase that's related. @scott, Dan, Any update with this bug. Dan, any information to add on this one? I am still not able to reproduce this issue, after the upgrade finishes, I have: # rpm -qa openvswitch openvswitch-2.6.1-3.git20161206.el7fdb.x86_64 # oc get nodes NAME STATUS AGE rhel7server Ready 35m Could you please share with me the inventory file you are using? I'll try with the quick reproducer. I've tried also to downgrade openvswitch to openvswitch-2.5.0-14.git20160727.el7fdb.x86_64 and I am still able to see the issue here: # rpm -qa openvswitch openvswitch-2.5.0-14.git20160727.el7fdb.x86_64 # systemctl restart openvswitch # systemctl restart atomic-openshift-node # systemctl status atomic-openshift-node ● atomic-openshift-node.service - Atomic OpenShift Node Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: active (running) since Sat 2017-02-18 08:46:00 EST; 7s ago According to toady's testing result, The node service always be down once i restarted openvswitch. I have to fix it by 'ovs-vsctl del-br br0'. I am using openvswitch-2.6.1-3.git20161206.el7fdb.x86_64 which is upgrade from OCP 3.4. I got similar message with https://bugzilla.redhat.com/show_bug.cgi?id=1405479 :type=AVC msg=audit(1487845707.287:37548): avc: denied { getattr } for pid=37868 comm="ovs-ctl" path="/usr/bin/hostname" dev="dm-0" ino=74876 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:hostname_exec_t:s0 tclass=file The workaround works. # systemctl stop openvswitch # killall ovs-vswitchd # semanage permissive -a openvswitch_t # systemctl restart openvswitch # systemctl restart atomic-openshift-node Thanks. As you confirmed the same issue I was seeing, I am going to close this bug as a duplicate of 1405479 *** This bug has been marked as a duplicate of bug 1405479 *** *** Bug 1426139 has been marked as a duplicate of this bug. *** They cloned the selinux bug for 7.3.z, can you guys test the version from https://bugzilla.redhat.com/show_bug.cgi?id=1430751 to verify the fix? does it work if you use the same workaround we used before? # systemctl stop openvswitch # killall ovs-vswitchd # semanage permissive -a openvswitch_t # systemctl restart openvswitch # systemctl restart atomic-openshift-node Yes, It works if I use the workaround. so it is still a SELinux issue. Could you please fill a new bug or reopen https://bugzilla.redhat.com/show_bug.cgi?id=1405479 adding more information? Created attachment 1264910 [details]
Atomic-openshift-node journal logs
1. install ocp 3.4 with ovs 2.4
2. upgrade to ocp 3.5 with ovs 2.4
3. upgrade ovs-2.4 to 2.6 and selinux-policy to selinux-policy-3.13.1-102.el7_3.16.noarch
4. systemctl restart openvswitch; systemctl restart atomic-openshift-node
5. Get journal logs attached here.
journalctl -u atomic-openshift-node
I proposed a patch for restarting ovs-vswitch and ovsdb-server as well as part of the upgrade: https://github.com/openshift/openshift-ansible/pull/3718 I don't think that PR works. I tested this today and this order of operations seems to solve the problem. 1) systemctl stop atomic-openshift-node 2) systemctl stop openvswitch 3) yum upgrade openvswitch 4) systemctl start openvswitch 5) systemctl start atomic-openshift-node I'll try to get a PR up with this tonight after some more testing. https://github.com/openshift/openshift-ansible/pull/3748 merged into release-1.5 and seems to work for me The fix works. the upgrade success with atomic-openshift-utils-3.5.41-1.git.0.e33897c.el7.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0903 |