Bug 1988440

Summary:	Network operator changes ovnkube-config too early causing ovnkube-master pods to crashloop during cluster upgrade
Product:	OpenShift Container Platform	Reporter:	Neil Girard <ngirard>
Component:	Networking	Assignee:	Christoph Stäbler <cstabler>
Networking sub component:	ovn-kubernetes	QA Contact:	Mehul Modi <memodi>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	anbhat, astoycos, bpickard, chdeshpa, memodi, surya, zzhao
Version:	4.6
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: ovnkube-node & -master pods fail to start, when the config file contains an unknown field or section. Consequence: Can lead to failures on ovn-kubernetes updates, if a new config field or section was introduced. Imagine the following scenario: 1. ConfigMap is updated 2. ovnkube-node rollout starts 3. somehow an ovnkube-master pod needs to be (re-)started (be it through eviction from a node or something else) 4. the newly started ovnkube-master pod isn't aware of the new config structure (as it is still on the old version) and fails to parse the config, resulting in a crashloop of the newly ovnkube-master. This can result in a stucking rollout. Fix: Make ovn-kube resilient for unknown field in config files and logs a warning instead of exiting if such a field was found. Result: ovn-kube updates do not fail if config file contains an unknown field or section.	Story Points:	---
Clone Of:
Clones:	2027983 (view as bug list)		Environment:
Last Closed:	2022-03-12 04:36:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2027983

Description Neil Girard 2021-07-30 14:42:10 UTC

Description of problem:

It seems that during the upgrade of the network operator, the operator is changing the ovnkube-config too early causing ovnkube-master nodes to crashloop due to older pods not allowing variable "host-network-namespace:

During the upgrade, from what I can tell, the operator changes the ConfigMap before it starts updating the daemon sets.  The first daemon set it updates is the ovnkube-node pods.  This is leaving a large gap in time till the operator updates the daemonset for the ovnkube-master pods which will accept the new variable.

$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         5m12s  Unable to apply 4.7.19: the update could not be applied

$ omg get co | grep -v "True       False        False"
NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.7.19   True       True         True      -3s
ingress                                   4.7.19   True       False        True      15m
kube-apiserver                            4.7.19   True       True         True      2h52m
machine-config                            4.6.26   False      False        True      2h49m
network                                   4.6.26   True       True         True      3h2m
openshift-apiserver                       4.7.19   True       False        True      -1s

$ omg get pods -o wide
NAME                  READY  STATUS   RESTARTS  AGE    IP           NODE
NAME                  READY  STATUS   RESTARTS  AGE    IP           NODE
ovnkube-master-9cwb2  4/6    Running  40        8d     10.230.0.9   ip-10-230-0-9.cluster.example.com  <--- Seems to be restarting and not completely ready
ovnkube-master-j9t5d  4/6    Running  41        8d     10.230.0.7   ip-10-230-0-7.cluster.example.com  <--- Seems to be restarting and not completely ready
ovnkube-master-z7nq8  5/6    Running  41        69d    10.230.0.11  ip-10-230-0-11.cluster.example.com  <--- Seems to be restarting and not completely ready
ovnkube-node-2f99b    3/3    Running  0         3h12m  10.230.0.7   ip-10-230-0-7.cluster.example.com
ovnkube-node-77smm    3/3    Running  0         14d    10.230.0.5   ip-10-230-0-5.cluster.example.com
ovnkube-node-bbflh    3/3    Running  0         3h6m   10.230.0.10  ip-10-230-0-10.cluster.example.com
ovnkube-node-bkpzh    3/3    Running  2         14d    10.230.0.6   ip-10-230-0-6.cluster.example.com
ovnkube-node-hggzp    3/3    Running  0         3h9m   10.230.0.4   ip-10-230-0-4.cluster.example.com
ovnkube-node-hh5w6    3/3    Running  0         69d    10.230.0.2   ip-10-230-0-2.cluster.example.com
ovnkube-node-j92sf    3/3    Running  1         69d    10.230.0.11  ip-10-230-0-11.cluster.example.com
ovnkube-node-l5j26    3/3    Running  0         69d    10.230.0.1   ip-10-230-0-1.cluster.example.com
ovnkube-node-mzqsd    2/3    Running  24        3h11m  10.230.0.3   ip-10-230-0-3.cluster.example.com  <--- Also not having a good time
ovnkube-node-qmlgn    2/3    Running  24        3h4m   10.230.0.9   ip-10-230-0-9.cluster.example.com  <--- Also not having a good time
ovnkube-node-w6cxv    3/3    Running  0         3h7m   10.230.0.8   ip-10-230-0-8.cluster.example.com

$ omg logs ovnkube-master-z7nq8 -c ovnkube-master -p
2021-07-29T17:15:04.743180681Z + [[ -f /env/_master ]]
2021-07-29T17:15:04.743180681Z + gateway_mode_flags=
2021-07-29T17:15:04.743248215Z + grep -q OVNKubernetes /etc/systemd/system/ovs-configuration.service
2021-07-29T17:15:04.744238205Z + '[' -f /host/var/run/ovs-config-executed ']'
2021-07-29T17:15:04.744268618Z + gateway_mode_flags='--gateway-mode local --gateway-interface br-ex'
2021-07-29T17:15:04.744595205Z ++ date '+%m%d %H:%M:%S.%N'
2021-07-29T17:15:04.746512592Z + echo 'I0729 17:15:04.746071717 - ovnkube-master - start nbctl daemon for caching'
2021-07-29T17:15:04.746523207Z I0729 17:15:04.746071717 - ovnkube-master - start nbctl daemon for caching
2021-07-29T17:15:04.746866455Z ++ ovn-nbctl --pidfile=/var/run/ovn/ovn-nbctl.pid --detach -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db ssl:10.230.0.11:9641,ssl:10.230.0.7:9641,ssl:10.230.0.9:9641
2021-07-29T17:15:04.778434481Z + export OVN_NB_DAEMON=/var/run/ovn/ovn-nbctl.13.ctl
2021-07-29T17:15:04.778434481Z + OVN_NB_DAEMON=/var/run/ovn/ovn-nbctl.13.ctl
2021-07-29T17:15:04.778460700Z + ln -sf /var/run/ovn/ovn-nbctl.13.ctl /var/run/ovn/
2021-07-29T17:15:04.779890164Z ln: 2021-07-29T17:15:04.779922602Z '/var/run/ovn/ovn-nbctl.13.ctl' and '/var/run/ovn/ovn-nbctl.13.ctl' are the same file2021-07-29T17:15:04.779937974Z 
2021-07-29T17:15:04.780133557Z + true
2021-07-29T17:15:04.780459651Z ++ date '+%m%d %H:%M:%S.%N'
2021-07-29T17:15:04.782171747Z + echo 'I0729 17:15:04.781766566 - ovnkube-master - start ovnkube --init-master ip-10-230-0-11.cluster.example.com'
2021-07-29T17:15:04.782181628Z I0729 17:15:04.781766566 - ovnkube-master - start ovnkube --init-master ip-10-230-0-11.cluster.example.com
2021-07-29T17:15:04.782292929Z + exec /usr/bin/ovnkube --init-master ip-10-230-0-11.cluster.example.com --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --metrics-bind-address 127.0.0.1:29102 --gateway-mode local --gateway-interface br-ex --sb-address ssl:10.230.0.11:9642,ssl:10.230.0.7:9642,ssl:10.230.0.9:9642 --sb-client-privkey /ovn-cert/tls.key --sb-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --nb-address ssl:10.230.0.11:9641,ssl:10.230.0.7:9641,ssl:10.230.0.9:9641 --nb-client-privkey /ovn-cert/tls.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nbctl-daemon-mode --nb-cert-common-name ovn --enable-multicast
2021-07-29T17:15:04.791275721Z F0729 17:15:04.791226       1 ovnkube.go:130] failed to parse config file /run/ovnkube-config/ovnkube.conf: warning:
2021-07-29T17:15:04.791275721Z can't store data at section "kubernetes", variable "host-network-namespace"

Not sure what caused the ovnkube-master pods to pick up the new config map.  I think during the upgrade, a MC change occurred and the masters were restarted.  Once we worked around this exception and all the Pending pods were scheduled, we noticed the nodes continued to reboot from updates.

I have documented our workaround to get the upgrades to progress.  I believe we can prevent this issue by making sure we only change the config map right when we also are changing the daemon set for the ovnkube-master pods.

Version-Release number of selected component (if applicable):

4.6

How reproducible:

Only reproduces in customer environment

Steps to Reproduce:
N/A

Actual results:

network operator fails to upgrade

Expected results:

network operator upgrades successfully

Additional info:

must-gather (the first one) for the attached case has the logs and shows the state of the cluster.

Comment 5 Surya Seetharaman 2021-10-21 08:59:08 UTC

(In reply to Aniket Bhat from comment #1)
> @ngirard great analysis on the bug. I think we haven't seen this
> in our upgrade jobs, but I do understand the problem. If the master pods
> don't get restarted between the time the CNO updates the config map and when
> the ovnkube-master daemonset rolls out with the new image, we should be
> covered.
> 
> I will try to figure out updating the config map closer to the daemonset
> roll out of the masters to narrow down the window of failure.

Yeah actually we can't move the configmap update close to the master rollouts.
ovnkube-nodes also use the same configmap, so if the nodes roll out before
CNO picks up configmap (plus since CNO is level triggered for reconciliation this is hard to achieve)
then once CNO applies the configmap nodes will reboot a second time which
we don't want.

Comment 6 Surya Seetharaman 2021-10-21 09:09:40 UTC

(In reply to cstabler from comment #4)

> What do you think about making ovnkube-(node&master) more resilient against
> unknown fields in the configmap?

So if this configmap cannot be manipulated by users as in its not user facing (which I think its not since CNO would reconcile any manual changes), we should be good. Same for the upstream scenario. If we don't expose the knobs to users and its an internal thing; I don't mind us silencing/ignoring unknown fields. But just wanna call out that its bad api ui (again since its not user facing we could get away with this).

I just don't want folks to supply values and be surprised that changes aren't taking effect. Apart from the gateway-mode-config overrides that we allow users to do which doesn't get passed directly to ovn-k and is just parsed into the exec commands directly, I don't think we allow changing the configmap values. So we should be good from OCP perspective to do this change, let's make sure upstream is fine as well.

Comment 14 Mehul Modi 2021-12-06 17:42:58 UTC

Added QE testcoverage: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-46654

Comment 17 errata-xmlrpc 2022-03-12 04:36:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056