Bug 1420636

Summary:

The node service can't be started after upgrade openvswitch to v2.6

Product:

OpenShift Container Platform

Reporter:

Anping Li <anli>

Component:

Cluster Version Operator

Assignee:

Giuseppe Scrivano <gscrivan>

Status:

CLOSED ERRATA

QA Contact:

Anping Li <anli>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.5.0

CC:

anli, aos-bugs, bbennett, bmeng, dcbw, gscrivan, jokerman, mmccomas, sdodson, xtian

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openshift-ansible-3.5.41-1

Doc Type:

No Doc Update

Doc Text:

Fixed a bug in the upgrade to openvswitch to v2.6

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-04-12 19:01:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Atomic-openshift-node journal logs	none

Comment 1 Scott Dodson 2017-02-09 14:47:35 UTC

Ben or other networking folks,

Anything specific about 3.4 to 3.5 upgrades that require lbr0 tweaking?

Comment 2 Ben Bennett 2017-02-09 15:07:00 UTC

Dan: Can you answer this please?

Comment 3 Scott Dodson 2017-02-10 02:53:32 UTC

Dan, BTW, OVS is upgraded and restarted prior to restarting docker and then the node during 3.4 -> 3.5 upgrades encase that's related.

Comment 4 Anping Li 2017-02-14 02:38:47 UTC

@scott, Dan, Any update with this bug.

Comment 8 Steve Milner 2017-02-16 19:40:50 UTC

Dan, any information to add on this one?

Comment 9 Giuseppe Scrivano 2017-02-18 12:51:17 UTC

I am still not able to reproduce this issue, after the upgrade finishes, I have:

# rpm -qa openvswitch
openvswitch-2.6.1-3.git20161206.el7fdb.x86_64

# oc get nodes
NAME          STATUS    AGE
rhel7server   Ready     35m


Could you please share with me the inventory file you are using?

I'll try with the quick reproducer.

Comment 10 Giuseppe Scrivano 2017-02-19 12:46:17 UTC

I've tried also to downgrade openvswitch to openvswitch-2.5.0-14.git20160727.el7fdb.x86_64 and I am still able to see the issue here:

# rpm -qa openvswitch
openvswitch-2.5.0-14.git20160727.el7fdb.x86_64
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node
# systemctl status atomic-openshift-node
● atomic-openshift-node.service - Atomic OpenShift Node
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Sat 2017-02-18 08:46:00 EST; 7s ago

Comment 14 Anping Li 2017-02-22 11:44:28 UTC

According to toady's testing result, The node service always be down once i restarted openvswitch. I have to fix it by 'ovs-vsctl del-br br0'. I am using openvswitch-2.6.1-3.git20161206.el7fdb.x86_64 which is upgrade from OCP 3.4.

Comment 17 Anping Li 2017-02-23 10:32:38 UTC

I got similar message with https://bugzilla.redhat.com/show_bug.cgi?id=1405479

:type=AVC msg=audit(1487845707.287:37548): avc:  denied  { getattr } for  pid=37868 comm="ovs-ctl" path="/usr/bin/hostname" dev="dm-0" ino=74876 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:hostname_exec_t:s0 tclass=file


The workaround works.
# systemctl stop openvswitch
# killall ovs-vswitchd
# semanage permissive -a openvswitch_t
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node

Comment 18 Giuseppe Scrivano 2017-02-23 10:35:53 UTC

Thanks.  As you confirmed the same issue I was seeing, I am going to close this bug as a duplicate of 1405479

*** This bug has been marked as a duplicate of bug 1405479 ***

Comment 19 Anping Li 2017-02-23 11:00:15 UTC

*** Bug 1426139 has been marked as a duplicate of this bug. ***

Comment 20 Scott Dodson 2017-03-14 15:51:31 UTC

They cloned the selinux bug for 7.3.z, can you guys test the version from https://bugzilla.redhat.com/show_bug.cgi?id=1430751 to verify the fix?

Comment 26 Giuseppe Scrivano 2017-03-17 14:21:04 UTC

does it work if you use the same workaround we used before?

# systemctl stop openvswitch
# killall ovs-vswitchd
# semanage permissive -a openvswitch_t
# systemctl restart openvswitch
# systemctl restart atomic-openshift-node

Comment 27 Anping Li 2017-03-20 01:09:51 UTC

Yes, It works if I use the workaround.

Comment 28 Giuseppe Scrivano 2017-03-20 11:21:41 UTC

so it is still a SELinux issue.  Could you please fill a new bug or reopen https://bugzilla.redhat.com/show_bug.cgi?id=1405479 adding more information?

Comment 31 Anping Li 2017-03-21 07:05:30 UTC

Created attachment 1264910 [details]
Atomic-openshift-node journal logs

1. install ocp 3.4  with ovs 2.4
2. upgrade to ocp 3.5 with ovs 2.4
3. upgrade ovs-2.4 to 2.6 and selinux-policy to selinux-policy-3.13.1-102.el7_3.16.noarch
4. systemctl restart openvswitch; systemctl restart atomic-openshift-node
5. Get journal logs attached here.
journalctl -u atomic-openshift-node

Comment 33 Giuseppe Scrivano 2017-03-21 11:30:34 UTC

I proposed a patch for restarting ovs-vswitch and ovsdb-server as well as part of the upgrade:

https://github.com/openshift/openshift-ansible/pull/3718

Comment 34 Scott Dodson 2017-03-22 20:28:55 UTC

I don't think that PR works. I tested this today and this order of operations seems to solve the problem.

1) systemctl stop atomic-openshift-node
2) systemctl stop openvswitch
3) yum upgrade openvswitch
4) systemctl start openvswitch
5) systemctl start atomic-openshift-node

I'll try to get a PR up with this tonight after some more testing.

Comment 35 Scott Dodson 2017-03-23 02:39:14 UTC

https://github.com/openshift/openshift-ansible/pull/3748 merged into release-1.5 and seems to work for me

Comment 37 Anping Li 2017-03-23 05:59:16 UTC

The fix works. the upgrade success with atomic-openshift-utils-3.5.41-1.git.0.e33897c.el7.noarch

Comment 39 errata-xmlrpc 2017-04-12 19:01:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903