2038481 – kube-controller-manager-guard and openshift-kube-scheduler-guard pods being deleted and restarted on a cordoned node when drained

Bug 2038481 - kube-controller-manager-guard and openshift-kube-scheduler-guard pods being deleted and restarted on a cordoned node when drained

Summary: kube-controller-manager-guard and openshift-kube-scheduler-guard pods being d...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Jan Chaloupka
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	1961772 2038386 2040263 2042956 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-08 00:40 UTC by jamo luhrsen
Modified:	2022-03-31 22:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:37:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-apiserver-operator pull 1295	None	Merged	bug 2005901: Sync the library-go	2022-01-28 11:56:52 UTC
Github	openshift cluster-kube-controller-manager-operator pull 591	None	Merged	bug 2005901: Sync library go	2022-01-28 11:56:53 UTC
Github	openshift cluster-kube-scheduler-operator pull 397	None	Merged	bug 2005901: Sync the library-go	2022-01-28 11:56:55 UTC
Github	openshift library-go pull 1287	None	Merged	bug 2005901: guard controller: create the pdb if it does not exist	2022-01-28 11:56:56 UTC
Github	openshift origin pull 26776	None	Merged	Bug 2038481: Flake failed sandboxes from bug in new guard pods	2022-01-28 11:56:58 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:38:20 UTC

Description jamo luhrsen 2022-01-08 00:40:13 UTC

recently (~ Jan 2nd) our aws-ovn-upgrade job is failing almost every time on an
upgrade test "[sig-network] pods should successfully create sandboxes by other"
example here [0]


after the upgrade is complete, there are some failures along the same
lines of this:

  ns/openshift-kube-controller-manager pod/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 376.91 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-controller-manager_3a2d0426-d4df-4faa-9b7f-47acb5466fda_0(5e0f557c05f0dcb455b548b1213756c0300b814e9961c692b7e9211fc323e4ee): error adding pod openshift-kube-controller-manager_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal/3a2d0426-d4df-4faa-9b7f-47acb5466fda]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition

I did not see this happening on our GCP version of this job, which may give some clue.

The test code [1] is looking at all the events and parsing over all the "Failed to create pod sandbox" messages.
It does allow for these events to happen up to 5 seconds after the pod was deleted. as you can see in the test log
message above, it came 376 seconds after.



[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade/1479445783269871616
[1] https://github.com/openshift/origin/blob/0e6a62416ffcc8d2189a0243977006c5b2f9fa2c/pkg/synthetictests/networking.go

Comment 1 jamo luhrsen 2022-01-08 00:59:33 UTC

@dosmith, I spent some time trying to dig in to what is going on here, but I didn't know what to
look for. Please let me know if there is something I can try to gather to help figure this one out. The
job(s) have a lot of artifacts to dig through.

I also tried looking for any changes that could have happened recently in our test code, multus, onv
and nothing seemed relevant to match up with when this went south (Jan 2nd-ish)

Comment 2 Douglas Smith 2022-01-10 18:13:52 UTC

This error is indicative of OVN-K failing to produce a configuration file, and Multus has given the unfortunate news. Assigning to OVN-K component for triage.

Comment 3 jamo luhrsen 2022-01-10 19:08:58 UTC

*** Bug 2038386 has been marked as a duplicate of this bug. ***

Comment 4 jamo luhrsen 2022-01-11 18:55:29 UTC

*** Bug 1961772 has been marked as a duplicate of this bug. ***

Comment 5 jamo luhrsen 2022-01-13 06:06:43 UTC

update:

what's happening is that two new pods were introduced in 4.10 (kube-controller-manager-guard and
openshift-kube-scheduler-guard) from these two PRs [0][1] which was built on top of this library-go
PR [2]. They only run on master nodes and are being evicted when the nodes are going to be
rebooted. This eviction deletes these -guard pods, but they are actually restarted within ~5s
and come back fully before the node actually reboots.

Upon the node coming back up after the reboot, these pods initially timeout waiting for
the ovnk config file and the sandbox error is reported. The openshift-test code does have
a history of the original pod delete that happened with the eviction and to the test this
sandbox error is coming much later (~5m) so it reports it as a failure [3]

The new guard pod PRs indicate that they should have a poddisruptionbudget configured
along with them, but I don't see it in a running 4.10 cluster. I don't know if that
matters, or where I'm missing it otherwise.

I don't know yet how to explain the guard pods being deleted with the eviction but getting
started right back (the node is cordoned) again.

If this is proper behavior then I can look in to how to fix the test code to account for
it and not report a failure. If this is not proper behavior, I'm wondering if the pod
should stay evicted and only be fully started again after the node is rebooted and finally
uncordoned and the ovnk config file should be in place by then and no sandbox error would
be reported.


[0] https://github.com/openshift/cluster-kube-scheduler-operator/pull/373
[1] https://github.com/openshift/cluster-kube-controller-manager-operator/pull/568
[2] https://github.com/openshift/library-go/pull/1238
[3] https://github.com/openshift/origin/blob/b58b70a0d0084e3be2b2faf7c030d06c6df3f569/pkg/synthetictests/networking.go#L79-L81

Comment 6 jamo luhrsen 2022-01-13 18:32:00 UTC

slack thread with some confirmation that this belongs with kube-scheduler:
  https://coreos.slack.com/archives/CKJR6200N/p1642096272047700

Comment 7 jamo luhrsen 2022-01-20 17:32:37 UTC

*** Bug 2042956 has been marked as a duplicate of this bug. ***

Comment 9 jamo luhrsen 2022-01-24 17:44:53 UTC

the real fixes for this are still a work in progress, but in the meantime we have merged this
commit [0] which turns this very specific case of a guard pod and this sandbox failure after
a reboot in to a flake. The job is no longer perma-failing [1]

[0] https://github.com/openshift/origin/commit/333d91371a1835f499073208dbb712467921aea5
[1] https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

Comment 10 jamo luhrsen 2022-01-24 17:46:04 UTC

(In reply to jamo luhrsen from comment #9)
> the real fixes for this are still a work in progress, but in the meantime we
> have merged this
> commit [0] which turns this very specific case of a guard pod and this
> sandbox failure after
> a reboot in to a flake. The job is no longer perma-failing [1]
> 
> [0]
> https://github.com/openshift/origin/commit/
> 333d91371a1835f499073208dbb712467921aea5
> [1]
> https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-
> ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-
> upgrade

also, I have a jira item for myself to revert the test workaround once all the
real fixes make it in. No hurry, but didn't want to lose sight and forget to
revert:

https://issues.redhat.com/browse/SDN-2636

Comment 11 Jan Chaloupka 2022-01-24 18:09:11 UTC

Moving back to POST as there's still one PR left for merging in the KCM component.

Comment 14 RamaKasturi 2022-01-31 16:15:30 UTC

Verified bug with nightly build below and i see that guard pods for KS/KCM/KAS are not being deleted and restarted, although i see an issue where one of the installer pod for kube-apiserver is in error state and doing a describe on the pod shows below, will raise a bug for the same.

Status:               Failed
IP:                   10.128.0.65
IPs:
  IP:  10.128.0.65
Containers:
  installer:
    Container ID:  cri-o://ca4f50f6f6776cbca88811b00f9666c6f14c176b22e3b19292afaae3dcc8f11d
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2f3e877035668bafd5d8e87fd106dcb010973638b7993411eb5df074c5cffe3c
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2f3e877035668bafd5d8e87fd106dcb010973638b7993411eb5df074c5cffe3c
    Port:          <none>
    Host Port:     <none>
    Command:
      cluster-kube-apiserver-operator
      installer
    Args:
      -v=2
      --revision=6
      --namespace=openshift-kube-apiserver
      --pod=kube-apiserver-pod
      --resource-dir=/etc/kubernetes/static-pod-resources
      --pod-manifest-dir=/etc/kubernetes/manifests
      --configmaps=kube-apiserver-pod
      --configmaps=config
      --configmaps=kube-apiserver-cert-syncer-kubeconfig
      --optional-configmaps=oauth-metadata
      --optional-configmaps=cloud-config
      --configmaps=bound-sa-token-signing-certs
      --configmaps=etcd-serving-ca
      --optional-configmaps=kube-apiserver-server-ca
      --configmaps=kubelet-serving-ca
      --configmaps=sa-token-signing-certs
      --configmaps=kube-apiserver-audit-policies
      --secrets=etcd-client
      --optional-secrets=encryption-config
      --secrets=localhost-recovery-serving-certkey
      --secrets=localhost-recovery-client-token
      --optional-secrets=webhook-authenticator
      --cert-dir=/etc/kubernetes/static-pod-resources/kube-apiserver-certs
      --cert-configmaps=aggregator-client-ca
      --cert-configmaps=client-ca
      --optional-cert-configmaps=trusted-ca-bundle
      --cert-configmaps=control-plane-node-kubeconfig
      --cert-configmaps=check-endpoints-kubeconfig
      --cert-secrets=aggregator-client
      --cert-secrets=localhost-serving-cert-certkey
      --cert-secrets=service-network-serving-certkey
      --cert-secrets=external-loadbalancer-serving-certkey
      --cert-secrets=internal-loadbalancer-serving-certkey
      --cert-secrets=bound-service-account-signing-key
      --cert-secrets=control-plane-node-admin-client-cert-key
      --cert-secrets=check-endpoints-client-cert-key
      --cert-secrets=kubelet-client
      --cert-secrets=node-kubeconfigs
      --optional-cert-secrets=user-serving-cert
      --optional-cert-secrets=user-serving-cert-000
      --optional-cert-secrets=user-serving-cert-001
      --optional-cert-secrets=user-serving-cert-002
      --optional-cert-secrets=user-serving-cert-003
      --optional-cert-secrets=user-serving-cert-004
      --optional-cert-secrets=user-serving-cert-005
      --optional-cert-secrets=user-serving-cert-006
      --optional-cert-secrets=user-serving-cert-007
      --optional-cert-secrets=user-serving-cert-008
      --optional-cert-secrets=user-serving-cert-009
    State:      Terminated
      Reason:   Error
      Message:  01",
  (string) (len=21) "user-serving-cert-002",
  (string) (len=21) "user-serving-cert-003",
  (string) (len=21) "user-serving-cert-004",
  (string) (len=21) "user-serving-cert-005",
  (string) (len=21) "user-serving-cert-006",
  (string) (len=21) "user-serving-cert-007",
  (string) (len=21) "user-serving-cert-008",
  (string) (len=21) "user-serving-cert-009"
 },
 CertConfigMapNamePrefixes: ([]string) (len=4 cap=4) {
  (string) (len=20) "aggregator-client-ca",
  (string) (len=9) "client-ca",
  (string) (len=29) "control-plane-node-kubeconfig",
  (string) (len=26) "check-endpoints-kubeconfig"
 },
 OptionalCertConfigMapNamePrefixes: ([]string) (len=1 cap=1) {
  (string) (len=17) "trusted-ca-bundle"
 },
 CertDir: (string) (len=57) "/etc/kubernetes/static-pod-resources/kube-apiserver-certs",
 ResourceDir: (string) (len=36) "/etc/kubernetes/static-pod-resources",
 PodManifestDir: (string) (len=25) "/etc/kubernetes/manifests",
 Timeout: (time.Duration) 2m0s,
 StaticPodManifestsLockFile: (string) "",
 PodMutationFns: ([]installerpod.PodMutationFunc) <nil>,
 KubeletVersion: (string) ""
})
W0131 07:21:52.106709       1 cmd.go:413] unable to get owner reference (falling back to namespace): Get "https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-6-ip-10-0-178-236.us-east-2.compute.internal": dial tcp 172.30.0.1:443: i/o timeout
W0131 07:22:11.786877       1 cmd.go:426] unable to get kubelet version for node "ip-10-0-178-236.us-east-2.compute.internal": Get "https://172.30.0.1:443/api/v1/nodes/ip-10-0-178-236.us-east-2.compute.internal": context deadline exceeded
I0131 07:22:11.786930       1 cmd.go:284] Creating target resource directory "/etc/kubernetes/static-pod-resources/kube-apiserver-pod-6" ...
I0131 07:22:11.787032       1 cmd.go:212] Creating target resource directory "/etc/kubernetes/static-pod-resources/kube-apiserver-pod-6" ...
I0131 07:22:11.787042       1 cmd.go:220] Getting secrets ...
F0131 07:22:11.796879       1 cmd.go:101] failed to copy: timed out waiting for the condition


Below are the steps i followed to verify the bug:
=====================================================
1) Install latest 4.10 cluster
2) Drain a master node using the command `oc adm drain ip-10-0-140-122.us-east-2.compute.internal --force --ignore-daemonsets --delete-emptydir-data`
3) I see that no guard pod for kube-scheduler, kcm, kas are present on the drained node.

[knarra@knarra openshift-client-linux-4.9.0-0.nightly-2022-01-28-192738]$ oc get pods -n openshift-kube-scheduler
NAME                                                                        READY   STATUS      RESTARTS   AGE
installer-2-ip-10-0-178-236.us-east-2.compute.internal                      0/1     Completed   0          8h
installer-3-ip-10-0-178-236.us-east-2.compute.internal                      0/1     Completed   0          8h
installer-4-ip-10-0-178-236.us-east-2.compute.internal                      0/1     Completed   0          8h
installer-5-ip-10-0-178-236.us-east-2.compute.internal                      0/1     Completed   0          8h
installer-5-ip-10-0-192-9.us-east-2.compute.internal                        0/1     Completed   0          8h
installer-6-ip-10-0-178-236.us-east-2.compute.internal                      0/1     Completed   0          8h
installer-6-ip-10-0-192-9.us-east-2.compute.internal                        0/1     Completed   0          8h
openshift-kube-scheduler-guard-ip-10-0-178-236.us-east-2.compute.internal   1/1     Running     0          8h
openshift-kube-scheduler-guard-ip-10-0-192-9.us-east-2.compute.internal     1/1     Running     0          6h12m
openshift-kube-scheduler-ip-10-0-140-122.us-east-2.compute.internal         3/3     Running     0          8h
openshift-kube-scheduler-ip-10-0-178-236.us-east-2.compute.internal         3/3     Running     0          8h
openshift-kube-scheduler-ip-10-0-192-9.us-east-2.compute.internal           3/3     Running     0          8h

[knarra@knarra openshift-client-linux-4.9.0-0.nightly-2022-01-28-192738]$ oc get pods -n openshift-kube-controller-manager
NAME                                                                       READY   STATUS      RESTARTS     AGE
installer-4-ip-10-0-178-236.us-east-2.compute.internal                     0/1     Completed   0            8h
installer-5-ip-10-0-178-236.us-east-2.compute.internal                     0/1     Completed   0            8h
installer-5-ip-10-0-192-9.us-east-2.compute.internal                       0/1     Completed   0            8h
installer-6-ip-10-0-178-236.us-east-2.compute.internal                     0/1     Completed   0            8h
installer-6-ip-10-0-192-9.us-east-2.compute.internal                       0/1     Completed   0            8h
installer-7-ip-10-0-178-236.us-east-2.compute.internal                     0/1     Completed   0            8h
installer-7-ip-10-0-192-9.us-east-2.compute.internal                       0/1     Completed   0            8h
kube-controller-manager-guard-ip-10-0-178-236.us-east-2.compute.internal   1/1     Running     0            8h
kube-controller-manager-guard-ip-10-0-192-9.us-east-2.compute.internal     1/1     Running     0            8h
kube-controller-manager-ip-10-0-140-122.us-east-2.compute.internal         4/4     Running     0            8h
kube-controller-manager-ip-10-0-178-236.us-east-2.compute.internal         4/4     Running     0            8h
kube-controller-manager-ip-10-0-192-9.us-east-2.compute.internal           4/4     Running     1 (8h ago)   8h

[knarra@knarra openshift-client-linux-4.9.0-0.nightly-2022-01-28-192738]$ oc get pods -n openshift-kube-apiserver
NAME                                                              READY   STATUS      RESTARTS     AGE
installer-2-ip-10-0-178-236.us-east-2.compute.internal            0/1     Completed   0            8h
installer-3-ip-10-0-178-236.us-east-2.compute.internal            0/1     Completed   0            8h
installer-4-ip-10-0-178-236.us-east-2.compute.internal            0/1     Completed   0            8h
installer-5-ip-10-0-178-236.us-east-2.compute.internal            0/1     Completed   0            8h
installer-6-ip-10-0-178-236.us-east-2.compute.internal            0/1     Error       0            8h
installer-7-ip-10-0-178-236.us-east-2.compute.internal            0/1     Completed   0            8h
installer-8-ip-10-0-178-236.us-east-2.compute.internal            0/1     Completed   0            8h
installer-8-ip-10-0-192-9.us-east-2.compute.internal              0/1     Completed   0            8h
kube-apiserver-guard-ip-10-0-178-236.us-east-2.compute.internal   1/1     Running     0            8h
kube-apiserver-guard-ip-10-0-192-9.us-east-2.compute.internal     1/1     Running     0            8h
kube-apiserver-ip-10-0-140-122.us-east-2.compute.internal         5/5     Running     0            8h
kube-apiserver-ip-10-0-178-236.us-east-2.compute.internal         5/5     Running     1 (8h ago)   8h
kube-apiserver-ip-10-0-192-9.us-east-2.compute.internal           5/5     Running     0            8h


Based on the above moving bug to verified state.

Comment 17 errata-xmlrpc 2022-03-10 16:37:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 18 jamo luhrsen 2022-03-31 22:44:42 UTC

*** Bug 2040263 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.