Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1420636 - The node service can't be started after upgrade openvswitch to v2.6
The node service can't be started after upgrade openvswitch to v2.6
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade (Show other bugs)
3.5.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Giuseppe Scrivano
Anping Li
: Reopened
: 1426139 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-02-09 01:52 EST by Anping Li
Modified: 2017-07-24 10 EDT (History)
10 users (show)

See Also:
Fixed In Version: openshift-ansible-3.5.41-1
Doc Type: No Doc Update
Doc Text:
Fixed a bug in the upgrade to openvswitch to v2.6
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-04-12 15:01:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Atomic-openshift-node journal logs (598.04 KB, application/x-gzip)
2017-03-21 03:05 EDT, Anping Li
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0903 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-04-12 18:45:42 EDT

  None (edit)
Comment 1 Scott Dodson 2017-02-09 09:47:35 EST
Ben or other networking folks,

Anything specific about 3.4 to 3.5 upgrades that require lbr0 tweaking?
Comment 2 Ben Bennett 2017-02-09 10:07:00 EST
Dan: Can you answer this please?
Comment 3 Scott Dodson 2017-02-09 21:53:32 EST
Dan, BTW, OVS is upgraded and restarted prior to restarting docker and then the node during 3.4 -> 3.5 upgrades encase that's related.
Comment 4 Anping Li 2017-02-13 21:38:47 EST
@scott, Dan, Any update with this bug.
Comment 8 Steve Milner 2017-02-16 14:40:50 EST
Dan, any information to add on this one?
Comment 9 Giuseppe Scrivano 2017-02-18 07:51:17 EST
I am still not able to reproduce this issue, after the upgrade finishes, I have:

# rpm -qa openvswitch
openvswitch-2.6.1-3.git20161206.el7fdb.x86_64

# oc get nodes
NAME          STATUS    AGE
rhel7server   Ready     35m


Could you please share with me the inventory file you are using?

I'll try with the quick reproducer.
Comment 10 Giuseppe Scrivano 2017-02-19 07:46:17 EST
I've tried also to downgrade openvswitch to openvswitch-2.5.0-14.git20160727.el7fdb.x86_64 and I am still able to see the issue here:

# rpm -qa openvswitch
openvswitch-2.5.0-14.git20160727.el7fdb.x86_64
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node
# systemctl status atomic-openshift-node
● atomic-openshift-node.service - Atomic OpenShift Node
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Sat 2017-02-18 08:46:00 EST; 7s ago
Comment 14 Anping Li 2017-02-22 06:44:28 EST
According to toady's testing result, The node service always be down once i restarted openvswitch. I have to fix it by 'ovs-vsctl del-br br0'. I am using openvswitch-2.6.1-3.git20161206.el7fdb.x86_64 which is upgrade from OCP 3.4.
Comment 17 Anping Li 2017-02-23 05:32:38 EST
I got similar message with https://bugzilla.redhat.com/show_bug.cgi?id=1405479

:type=AVC msg=audit(1487845707.287:37548): avc:  denied  { getattr } for  pid=37868 comm="ovs-ctl" path="/usr/bin/hostname" dev="dm-0" ino=74876 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:hostname_exec_t:s0 tclass=file


The workaround works.
# systemctl stop openvswitch
# killall ovs-vswitchd
# semanage permissive -a openvswitch_t
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node
Comment 18 Giuseppe Scrivano 2017-02-23 05:35:53 EST
Thanks.  As you confirmed the same issue I was seeing, I am going to close this bug as a duplicate of 1405479

*** This bug has been marked as a duplicate of bug 1405479 ***
Comment 19 Anping Li 2017-02-23 06:00:15 EST
*** Bug 1426139 has been marked as a duplicate of this bug. ***
Comment 20 Scott Dodson 2017-03-14 11:51:31 EDT
They cloned the selinux bug for 7.3.z, can you guys test the version from https://bugzilla.redhat.com/show_bug.cgi?id=1430751 to verify the fix?
Comment 26 Giuseppe Scrivano 2017-03-17 10:21:04 EDT
does it work if you use the same workaround we used before?

# systemctl stop openvswitch
# killall ovs-vswitchd
# semanage permissive -a openvswitch_t
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node
Comment 27 Anping Li 2017-03-19 21:09:51 EDT
Yes, It works if I use the workaround.
Comment 28 Giuseppe Scrivano 2017-03-20 07:21:41 EDT
so it is still a SELinux issue.  Could you please fill a new bug or reopen https://bugzilla.redhat.com/show_bug.cgi?id=1405479 adding more information?
Comment 31 Anping Li 2017-03-21 03:05 EDT
Created attachment 1264910 [details]
Atomic-openshift-node journal logs

1. install ocp 3.4  with ovs 2.4
2. upgrade to ocp 3.5 with ovs 2.4
3. upgrade ovs-2.4 to 2.6 and selinux-policy to selinux-policy-3.13.1-102.el7_3.16.noarch
4. systemctl restart openvswitch; systemctl restart atomic-openshift-node
5. Get journal logs attached here.
journalctl -u atomic-openshift-node
Comment 33 Giuseppe Scrivano 2017-03-21 07:30:34 EDT
I proposed a patch for restarting ovs-vswitch and ovsdb-server as well as part of the upgrade:

https://github.com/openshift/openshift-ansible/pull/3718
Comment 34 Scott Dodson 2017-03-22 16:28:55 EDT
I don't think that PR works. I tested this today and this order of operations seems to solve the problem.

1) systemctl stop atomic-openshift-node
2) systemctl stop openvswitch
3) yum upgrade openvswitch
4) systemctl start openvswitch
5) systemctl start atomic-openshift-node

I'll try to get a PR up with this tonight after some more testing.
Comment 35 Scott Dodson 2017-03-22 22:39:14 EDT
https://github.com/openshift/openshift-ansible/pull/3748 merged into release-1.5 and seems to work for me
Comment 37 Anping Li 2017-03-23 01:59:16 EDT
The fix works. the upgrade success with atomic-openshift-utils-3.5.41-1.git.0.e33897c.el7.noarch
Comment 39 errata-xmlrpc 2017-04-12 15:01:17 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903

Note You need to log in before you can comment on or make changes to this bug.