1988440 – Network operator changes ovnkube-config too early causing ovnkube-master pods to crashloop during cluster upgrade

Bug 1988440 - Network operator changes ovnkube-config too early causing ovnkube-master pods to crashloop during cluster upgrade

Summary: Network operator changes ovnkube-config too early causing ovnkube-master pods...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Christoph Stäbler
QA Contact:	Mehul Modi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2027983
TreeView+	depends on / blocked

Reported:	2021-07-30 14:42 UTC by Neil Girard
Modified:	2024-12-20 20:35 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: ovnkube-node & -master pods fail to start, when the config file contains an unknown field or section. Consequence: Can lead to failures on ovn-kubernetes updates, if a new config field or section was introduced. Imagine the following scenario: 1. ConfigMap is updated 2. ovnkube-node rollout starts 3. somehow an ovnkube-master pod needs to be (re-)started (be it through eviction from a node or something else) 4. the newly started ovnkube-master pod isn't aware of the new config structure (as it is still on the old version) and fails to parse the config, resulting in a crashloop of the newly ovnkube-master. This can result in a stucking rollout. Fix: Make ovn-kube resilient for unknown field in config files and logs a warning instead of exiting if such a field was found. Result: ovn-kube updates do not fail if config file contains an unknown field or section.
Clone Of:
Clones:	2027983 (view as bug list)
Environment:
Last Closed:	2022-03-12 04:36:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 812	None	open	[DownstreamMerge] Fix previous downstream merge	2021-11-09 16:26:14 UTC
Github	openshift ovn-kubernetes pull 834	None	open	[DownstreamMerge] Revert revert	2021-11-22 08:57:47 UTC
Github	ovn-org ovn-kubernetes pull 2579	None	Merged	Make config parsing more resilient for unknown config fields	2021-11-08 09:28:26 UTC
Red Hat Knowledge Base (Solution)	6227951	None	None	None	2021-07-30 14:43:54 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-12 04:37:05 UTC

Description Neil Girard 2021-07-30 14:42:10 UTC

Description of problem:

It seems that during the upgrade of the network operator, the operator is changing the ovnkube-config too early causing ovnkube-master nodes to crashloop due to older pods not allowing variable "host-network-namespace:

During the upgrade, from what I can tell, the operator changes the ConfigMap before it starts updating the daemon sets.  The first daemon set it updates is the ovnkube-node pods.  This is leaving a large gap in time till the operator updates the daemonset for the ovnkube-master pods which will accept the new variable.

$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         5m12s  Unable to apply 4.7.19: the update could not be applied

$ omg get co | grep -v "True       False        False"
NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.7.19   True       True         True      -3s
ingress                                   4.7.19   True       False        True      15m
kube-apiserver                            4.7.19   True       True         True      2h52m
machine-config                            4.6.26   False      False        True      2h49m
network                                   4.6.26   True       True         True      3h2m
openshift-apiserver                       4.7.19   True       False        True      -1s

$ omg get pods -o wide
NAME                  READY  STATUS   RESTARTS  AGE    IP           NODE
NAME                  READY  STATUS   RESTARTS  AGE    IP           NODE
ovnkube-master-9cwb2  4/6    Running  40        8d     10.230.0.9   ip-10-230-0-9.cluster.example.com  <--- Seems to be restarting and not completely ready
ovnkube-master-j9t5d  4/6    Running  41        8d     10.230.0.7   ip-10-230-0-7.cluster.example.com  <--- Seems to be restarting and not completely ready
ovnkube-master-z7nq8  5/6    Running  41        69d    10.230.0.11  ip-10-230-0-11.cluster.example.com  <--- Seems to be restarting and not completely ready
ovnkube-node-2f99b    3/3    Running  0         3h12m  10.230.0.7   ip-10-230-0-7.cluster.example.com
ovnkube-node-77smm    3/3    Running  0         14d    10.230.0.5   ip-10-230-0-5.cluster.example.com
ovnkube-node-bbflh    3/3    Running  0         3h6m   10.230.0.10  ip-10-230-0-10.cluster.example.com
ovnkube-node-bkpzh    3/3    Running  2         14d    10.230.0.6   ip-10-230-0-6.cluster.example.com
ovnkube-node-hggzp    3/3    Running  0         3h9m   10.230.0.4   ip-10-230-0-4.cluster.example.com
ovnkube-node-hh5w6    3/3    Running  0         69d    10.230.0.2   ip-10-230-0-2.cluster.example.com
ovnkube-node-j92sf    3/3    Running  1         69d    10.230.0.11  ip-10-230-0-11.cluster.example.com
ovnkube-node-l5j26    3/3    Running  0         69d    10.230.0.1   ip-10-230-0-1.cluster.example.com
ovnkube-node-mzqsd    2/3    Running  24        3h11m  10.230.0.3   ip-10-230-0-3.cluster.example.com  <--- Also not having a good time
ovnkube-node-qmlgn    2/3    Running  24        3h4m   10.230.0.9   ip-10-230-0-9.cluster.example.com  <--- Also not having a good time
ovnkube-node-w6cxv    3/3    Running  0         3h7m   10.230.0.8   ip-10-230-0-8.cluster.example.com

$ omg logs ovnkube-master-z7nq8 -c ovnkube-master -p
2021-07-29T17:15:04.743180681Z + [[ -f /env/_master ]]
2021-07-29T17:15:04.743180681Z + gateway_mode_flags=
2021-07-29T17:15:04.743248215Z + grep -q OVNKubernetes /etc/systemd/system/ovs-configuration.service
2021-07-29T17:15:04.744238205Z + '[' -f /host/var/run/ovs-config-executed ']'
2021-07-29T17:15:04.744268618Z + gateway_mode_flags='--gateway-mode local --gateway-interface br-ex'
2021-07-29T17:15:04.744595205Z ++ date '+%m%d %H:%M:%S.%N'
2021-07-29T17:15:04.746512592Z + echo 'I0729 17:15:04.746071717 - ovnkube-master - start nbctl daemon for caching'
2021-07-29T17:15:04.746523207Z I0729 17:15:04.746071717 - ovnkube-master - start nbctl daemon for caching
2021-07-29T17:15:04.746866455Z ++ ovn-nbctl --pidfile=/var/run/ovn/ovn-nbctl.pid --detach -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db ssl:10.230.0.11:9641,ssl:10.230.0.7:9641,ssl:10.230.0.9:9641
2021-07-29T17:15:04.778434481Z + export OVN_NB_DAEMON=/var/run/ovn/ovn-nbctl.13.ctl
2021-07-29T17:15:04.778434481Z + OVN_NB_DAEMON=/var/run/ovn/ovn-nbctl.13.ctl
2021-07-29T17:15:04.778460700Z + ln -sf /var/run/ovn/ovn-nbctl.13.ctl /var/run/ovn/
2021-07-29T17:15:04.779890164Z ln: 2021-07-29T17:15:04.779922602Z '/var/run/ovn/ovn-nbctl.13.ctl' and '/var/run/ovn/ovn-nbctl.13.ctl' are the same file2021-07-29T17:15:04.779937974Z 
2021-07-29T17:15:04.780133557Z + true
2021-07-29T17:15:04.780459651Z ++ date '+%m%d %H:%M:%S.%N'
2021-07-29T17:15:04.782171747Z + echo 'I0729 17:15:04.781766566 - ovnkube-master - start ovnkube --init-master ip-10-230-0-11.cluster.example.com'
2021-07-29T17:15:04.782181628Z I0729 17:15:04.781766566 - ovnkube-master - start ovnkube --init-master ip-10-230-0-11.cluster.example.com
2021-07-29T17:15:04.782292929Z + exec /usr/bin/ovnkube --init-master ip-10-230-0-11.cluster.example.com --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --metrics-bind-address 127.0.0.1:29102 --gateway-mode local --gateway-interface br-ex --sb-address ssl:10.230.0.11:9642,ssl:10.230.0.7:9642,ssl:10.230.0.9:9642 --sb-client-privkey /ovn-cert/tls.key --sb-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --nb-address ssl:10.230.0.11:9641,ssl:10.230.0.7:9641,ssl:10.230.0.9:9641 --nb-client-privkey /ovn-cert/tls.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nbctl-daemon-mode --nb-cert-common-name ovn --enable-multicast
2021-07-29T17:15:04.791275721Z F0729 17:15:04.791226       1 ovnkube.go:130] failed to parse config file /run/ovnkube-config/ovnkube.conf: warning:
2021-07-29T17:15:04.791275721Z can't store data at section "kubernetes", variable "host-network-namespace"

Not sure what caused the ovnkube-master pods to pick up the new config map.  I think during the upgrade, a MC change occurred and the masters were restarted.  Once we worked around this exception and all the Pending pods were scheduled, we noticed the nodes continued to reboot from updates.

I have documented our workaround to get the upgrades to progress.  I believe we can prevent this issue by making sure we only change the config map right when we also are changing the daemon set for the ovnkube-master pods.

Version-Release number of selected component (if applicable):

4.6

How reproducible:

Only reproduces in customer environment

Steps to Reproduce:
N/A

Actual results:

network operator fails to upgrade

Expected results:

network operator upgrades successfully

Additional info:

must-gather (the first one) for the attached case has the logs and shows the state of the cluster.

Comment 5 Surya Seetharaman 2021-10-21 08:59:08 UTC

(In reply to Aniket Bhat from comment #1)
> @ngirard great analysis on the bug. I think we haven't seen this
> in our upgrade jobs, but I do understand the problem. If the master pods
> don't get restarted between the time the CNO updates the config map and when
> the ovnkube-master daemonset rolls out with the new image, we should be
> covered.
> 
> I will try to figure out updating the config map closer to the daemonset
> roll out of the masters to narrow down the window of failure.

Yeah actually we can't move the configmap update close to the master rollouts.
ovnkube-nodes also use the same configmap, so if the nodes roll out before
CNO picks up configmap (plus since CNO is level triggered for reconciliation this is hard to achieve)
then once CNO applies the configmap nodes will reboot a second time which
we don't want.

Comment 6 Surya Seetharaman 2021-10-21 09:09:40 UTC

(In reply to cstabler from comment #4)

> What do you think about making ovnkube-(node&master) more resilient against
> unknown fields in the configmap?

So if this configmap cannot be manipulated by users as in its not user facing (which I think its not since CNO would reconcile any manual changes), we should be good. Same for the upstream scenario. If we don't expose the knobs to users and its an internal thing; I don't mind us silencing/ignoring unknown fields. But just wanna call out that its bad api ui (again since its not user facing we could get away with this).

I just don't want folks to supply values and be surprised that changes aren't taking effect. Apart from the gateway-mode-config overrides that we allow users to do which doesn't get passed directly to ovn-k and is just parsed into the exec commands directly, I don't think we allow changing the configmap values. So we should be good from OCP perspective to do this change, let's make sure upstream is fine as well.

Comment 14 Mehul Modi 2021-12-06 17:42:58 UTC

Added QE testcoverage: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-46654

Comment 17 errata-xmlrpc 2022-03-12 04:36:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.