Bug 1420636 - The node service can't be started after upgrade openvswitch to v2.6
Summary: The node service can't be started after upgrade openvswitch to v2.6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Giuseppe Scrivano
QA Contact: Anping Li
URL:
Whiteboard:
: 1426139 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-09 06:52 UTC by Anping Li
Modified: 2017-07-24 14:11 UTC (History)
10 users (show)

Fixed In Version: openshift-ansible-3.5.41-1
Doc Type: No Doc Update
Doc Text:
Fixed a bug in the upgrade to openvswitch to v2.6
Clone Of:
Environment:
Last Closed: 2017-04-12 19:01:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
Atomic-openshift-node journal logs (598.04 KB, application/x-gzip)
2017-03-21 07:05 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0903 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-04-12 22:45:42 UTC

Comment 1 Scott Dodson 2017-02-09 14:47:35 UTC
Ben or other networking folks,

Anything specific about 3.4 to 3.5 upgrades that require lbr0 tweaking?

Comment 2 Ben Bennett 2017-02-09 15:07:00 UTC
Dan: Can you answer this please?

Comment 3 Scott Dodson 2017-02-10 02:53:32 UTC
Dan, BTW, OVS is upgraded and restarted prior to restarting docker and then the node during 3.4 -> 3.5 upgrades encase that's related.

Comment 4 Anping Li 2017-02-14 02:38:47 UTC
@scott, Dan, Any update with this bug.

Comment 8 Steve Milner 2017-02-16 19:40:50 UTC
Dan, any information to add on this one?

Comment 9 Giuseppe Scrivano 2017-02-18 12:51:17 UTC
I am still not able to reproduce this issue, after the upgrade finishes, I have:

# rpm -qa openvswitch
openvswitch-2.6.1-3.git20161206.el7fdb.x86_64

# oc get nodes
NAME          STATUS    AGE
rhel7server   Ready     35m


Could you please share with me the inventory file you are using?

I'll try with the quick reproducer.

Comment 10 Giuseppe Scrivano 2017-02-19 12:46:17 UTC
I've tried also to downgrade openvswitch to openvswitch-2.5.0-14.git20160727.el7fdb.x86_64 and I am still able to see the issue here:

# rpm -qa openvswitch
openvswitch-2.5.0-14.git20160727.el7fdb.x86_64
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node
# systemctl status atomic-openshift-node
● atomic-openshift-node.service - Atomic OpenShift Node
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Sat 2017-02-18 08:46:00 EST; 7s ago

Comment 14 Anping Li 2017-02-22 11:44:28 UTC
According to toady's testing result, The node service always be down once i restarted openvswitch. I have to fix it by 'ovs-vsctl del-br br0'. I am using openvswitch-2.6.1-3.git20161206.el7fdb.x86_64 which is upgrade from OCP 3.4.

Comment 17 Anping Li 2017-02-23 10:32:38 UTC
I got similar message with https://bugzilla.redhat.com/show_bug.cgi?id=1405479

:type=AVC msg=audit(1487845707.287:37548): avc:  denied  { getattr } for  pid=37868 comm="ovs-ctl" path="/usr/bin/hostname" dev="dm-0" ino=74876 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:hostname_exec_t:s0 tclass=file


The workaround works.
# systemctl stop openvswitch
# killall ovs-vswitchd
# semanage permissive -a openvswitch_t
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node

Comment 18 Giuseppe Scrivano 2017-02-23 10:35:53 UTC
Thanks.  As you confirmed the same issue I was seeing, I am going to close this bug as a duplicate of 1405479

*** This bug has been marked as a duplicate of bug 1405479 ***

Comment 19 Anping Li 2017-02-23 11:00:15 UTC
*** Bug 1426139 has been marked as a duplicate of this bug. ***

Comment 20 Scott Dodson 2017-03-14 15:51:31 UTC
They cloned the selinux bug for 7.3.z, can you guys test the version from https://bugzilla.redhat.com/show_bug.cgi?id=1430751 to verify the fix?

Comment 26 Giuseppe Scrivano 2017-03-17 14:21:04 UTC
does it work if you use the same workaround we used before?

# systemctl stop openvswitch
# killall ovs-vswitchd
# semanage permissive -a openvswitch_t
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node

Comment 27 Anping Li 2017-03-20 01:09:51 UTC
Yes, It works if I use the workaround.

Comment 28 Giuseppe Scrivano 2017-03-20 11:21:41 UTC
so it is still a SELinux issue.  Could you please fill a new bug or reopen https://bugzilla.redhat.com/show_bug.cgi?id=1405479 adding more information?

Comment 31 Anping Li 2017-03-21 07:05:30 UTC
Created attachment 1264910 [details]
Atomic-openshift-node journal logs

1. install ocp 3.4  with ovs 2.4
2. upgrade to ocp 3.5 with ovs 2.4
3. upgrade ovs-2.4 to 2.6 and selinux-policy to selinux-policy-3.13.1-102.el7_3.16.noarch
4. systemctl restart openvswitch; systemctl restart atomic-openshift-node
5. Get journal logs attached here.
journalctl -u atomic-openshift-node

Comment 33 Giuseppe Scrivano 2017-03-21 11:30:34 UTC
I proposed a patch for restarting ovs-vswitch and ovsdb-server as well as part of the upgrade:

https://github.com/openshift/openshift-ansible/pull/3718

Comment 34 Scott Dodson 2017-03-22 20:28:55 UTC
I don't think that PR works. I tested this today and this order of operations seems to solve the problem.

1) systemctl stop atomic-openshift-node
2) systemctl stop openvswitch
3) yum upgrade openvswitch
4) systemctl start openvswitch
5) systemctl start atomic-openshift-node

I'll try to get a PR up with this tonight after some more testing.

Comment 35 Scott Dodson 2017-03-23 02:39:14 UTC
https://github.com/openshift/openshift-ansible/pull/3748 merged into release-1.5 and seems to work for me

Comment 37 Anping Li 2017-03-23 05:59:16 UTC
The fix works. the upgrade success with atomic-openshift-utils-3.5.41-1.git.0.e33897c.el7.noarch

Comment 39 errata-xmlrpc 2017-04-12 19:01:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903


Note You need to log in before you can comment on or make changes to this bug.