Bug 1887607 - [4.6] [upgrade] ovs pod crash for rhel worker when upgarde from 4.5 to 4.6
Summary: [4.6] [upgrade] ovs pod crash for rhel worker when upgarde from 4.5 to 4.6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.z
Assignee: Tim Rozet
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1890652 (view as bug list)
Depends On: 1887040
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-12 23:00 UTC by Feng Pan
Modified: 2020-11-09 15:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1887040
Environment:
Last Closed: 2020-11-09 15:50:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2164 0 None closed Bug 1887607: OVS config: check if OVS is installed 2021-01-18 12:47:27 UTC
Red Hat Product Errata RHBA-2020:4339 0 None None None 2020-11-09 15:51:22 UTC

Description Feng Pan 2020-10-12 23:00:50 UTC
+++ This bug was initially created as a clone of Bug #1887040 +++

Description of problem:
upgrade from 4.5 to 4.6 with rhel worker and sdn plugin.

ovs pod crashed due to 

oc logs ovs-r4sd8 -n openshift-sdn
openvswitch is running in systemd
id: openvswitch: no such user


Version-Release number of selected component (if applicable):

4.5.0-0.nightly-2020-10-08-190330  --> 4.6.0-rc.2

How reproducible:
always

Steps to Reproduce:
1. upgrade cluster from 4.5 to 4.6 with rhel worker and sdn plugin
2.
3.

Actual results:
Rhel worker ovs pod crashed with logs:
oc logs ovs-r4sd8 -n openshift-sdn
openvswitch is running in systemd
id: openvswitch: no such user


and it blocked the upgrade process



Expected results:


Additional info:

--- Additional comment from zhaozhanqi on 2020-10-10 10:22:24 UTC ---

this issue happen since rhel worker has not been upgraded to 4.6 version and no openvswith2.13 package installed.
When I met the ovs pod crashed.  then I upgraded the rhel worker to 4.6 version.  after the rhel worker upgraded finished. The rest of worker of cluster can continue upgrade and finally the cluster can upgrade successfully.

is there a way to avoid the ovs pod crashed before upgrade rhel worker, if not. we at least tell customer this situation:  when met ovs pod crash for rhel worker during upgrade to 4.6 version.  it's normal and upgrade rhel worker can resolve this issue.

--- Additional comment from zhaozhanqi on 2020-10-12 01:49:44 UTC ---

from the document:  https://docs.openshift.com/container-platform/4.5/updating/updating-cluster-rhel-compute.html#rhel-compute-updating_updating-cluster-rhel-compute

 >> After you update your cluster, you must update the Red Hat Enterprise Linux (RHEL) compute machines in your cluster

it's after upgrade the cluster. and then upgrdae the rhcl worker.  if so. this is an issue.

--- Additional comment from Tim Rozet on 2020-10-12 14:35:51 UTC ---

It looks like openvswitch is upgraded with a playbook post upgrade. This is the order of operations with UPI install so moving it to installer team.

--- Additional comment from Scott Dodson on 2020-10-12 17:15:50 UTC ---

We're going to have to make sure that the OVS pods in 4.6 maintain compatibility until OVS can be installed on the RHEL Workers as part of the RHEL worker upgrade playbooks. I assume that the reason this is working in RHCOS is because OVS was actually installed in RHCOS 4.5 whereas that wasn't done for RHEL 7 workers.

--- Additional comment from Tim Rozet on 2020-10-12 22:45:56 UTC ---

Zhanqi, can you please provide the systemd journal to one of your nodes, or provide a setup please? If you didn't have openvswitch installed, I don't see how ovs-configuration.service would have executed and written the /var/run/ovs-config-executed, which we use to determine if OVS is running in systemd.

--- Additional comment from Feng Pan on 2020-10-12 22:55:58 UTC ---

Moving this to 4.7 with 4.6.z backport as this does not actually affect overall upgrade success.

Comment 1 Sunil Choudhary 2020-10-23 14:08:50 UTC
*** Bug 1890652 has been marked as a duplicate of this bug. ***

Comment 4 zhaozhanqi 2020-10-30 06:28:53 UTC
Verified this bug on 4.6.0-0.nightly-2020-10-27-154553

Comment 6 errata-xmlrpc 2020-11-09 15:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4339


Note You need to log in before you can comment on or make changes to this bug.