1973006 – Worker nodes are not being cordoned for upgrade

Bug 1973006 - Worker nodes are not being cordoned for upgrade

Summary: Worker nodes are not being cordoned for upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.7.z
Assignee:	Sinny Kumari
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	1968019
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-17 05:26 UTC by Matt Bargenquast
Modified:	2021-07-30 23:41 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-06-29 04:20:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2623	0	None	open	Bug 1973006: daemon: cordon the node before performing drain	2021-06-17 16:35:39 UTC
Red Hat Product Errata	RHBA-2021:2502	0	None	None	None	2021-06-29 04:20:35 UTC

Description Matt Bargenquast 2021-06-17 05:26:08 UTC

Description of problem:

A cluster has upgraded to v4.7.17. The control plane has upgraded successfully.

We have observed two consecutive occasions of worker nodes not draining successfully, degrading the machine-config operator.

Upon examination of the cluster, we found that the worker node being upgraded was not cordoned at all. There is no mention of any cordon attempts in the machine-config-operator log.

The impacted worker node remains in this state and blocks the overall progress of the upgrade of the remaining workers in the machine config pool.

Version-Release number of selected component (if applicable):

v4.7.17

How reproducible:

Has happened on two worker nodes in a row during this current upgrade.

Comment 3 W. Trevor King 2021-06-17 06:47:52 UTC

Bug 1968019 was fixed for master/4.8 via mco@a29604e [1], and that all seems fine.

Bug 1968759 was the 4.7 backport via mco@8bbec98 [2], picked up by 4.7.17 [3], and this bug seems to have snuck into the manual rebase.

The 4.7 PR dropped some cordon code:

  $ git diff 8bbec98^..8bbec98 | grep -i cordon
  -               err := drain.RunCordonOrUncordon(dn.drainer, dn.node, true)
  -                       glog.Infof("Cordon failed with: %v, retrying", err)

The 4.8 PR did not:

  $ git diff a29604e^..a29604e | grep -i cordon
  @@ -45,34 +45,39 @@ func (dn *Daemon) cordonOrUncordonNode(desired bool) error {
          // We are here, that means we need to cordon and drain node

Looking at CI, were we have a tightly controlled, consistent process and good asset gathering, I was hoping we'd have a good signal for "node got cordoned".  Master contains Node has been successfully cordoned logging [4] , but it's from some single-node work that landed in master for 4.8 and was not backported [5], so we never had it in 4.7.  But we can look at audit logs.  Here's 4.6.13 -> 4.6.16 (so before the regression) [6]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1403169385064960000/artifacts/launch/audit-logs.tar | tar -xz --strip-components 1
  $ zgrep -h '"resource":"nodes"' audit_logs/*/*.gz 2>/dev/null | jq -r 'select(.user.username == "system:serviceaccount:openshift-machine-config-operator:machine-config-daemon" and .verb == "patch" and .responseStatus.code == 200) | .stageTimestamp + " " + .objectRef.name' | grep master-0 | sort
  2021-06-11T03:22:28.716863Z ci-ln-hrijm42-f76d1-rrqw5-master-0
  2021-06-11T03:22:28.816802Z ci-ln-hrijm42-f76d1-rrqw5-master-0
  2021-06-11T03:27:12.663633Z ci-ln-hrijm42-f76d1-rrqw5-master-0
  2021-06-11T03:27:12.702060Z ci-ln-hrijm42-f76d1-rrqw5-master-0

So two patches pretty close together at 3:22:28.  One of those was probably a cordon.  And another pair at 3:27:12, one of which was probably an uncordon.

Moving to 4.6.13 -> 4.6.17 (so including the regression) [7]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056/artifacts/launch/audit-logs.tar | tar -xz --strip-components 1
  $ zgrep -h '"resource":"nodes"' audit_logs/*/*.gz 2>/dev/null | jq -r 'select(.user.username == "system:serviceaccount:openshift-machine-config-operator:machine-config-daemon" and .verb == "patch" and .responseStatus.code == 200) | .stageTimestamp + " " + .objectRef.name' | grep master-0 | sort
  2021-06-16T09:40:01.786752Z ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:51:16.450430Z ci-ln-vgg2rck-002ac-t945d-master-0

I'm not sure why we lost two, maybe because we don't have to uncordon if we weren't cordoned already?  Anyhow, looking at the MCD logs for that node:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056/artifacts/launch/pods/openshift-machine-config-operator_machine-config-daemon-fxjsm_machine-config-daemon.log | grep 'Starting to manage node'
  I0616 09:30:46.631349   87085 update.go:1906] Starting to manage node: ci-ln-vgg2rck-002ac-t945d-master-0
  I0616 09:51:10.749972    3101 update.go:1906] Starting to manage node: ci-ln-vgg2rck-002ac-t945d-master-0
  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056/artifacts/launch/pods/openshift-machine-config-operator_machine-config-daemon-fxjsm_machine-config-daemon.log | grep -B2 -A1 'Starting update from'
  I0616 09:30:47.744667   87085 daemon.go:1167] In desired config rendered-master-e2befb43e96b67ba330d84c2a0a7a603
  I0616 09:40:01.787993   87085 update.go:593] Checking Reconcilable for config rendered-master-e2befb43e96b67ba330d84c2a0a7a603 to rendered-master-9ee8c4da7dbeec4f51b0a58c88e55aea
  I0616 09:40:01.825242   87085 update.go:1906] Starting update from rendered-master-e2befb43e96b67ba330d84c2a0a7a603 to rendered-master-9ee8c4da7dbeec4f51b0a58c88e55aea: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
  I0616 09:40:01.841381   87085 update.go:1906] Update prepared; beginning drain

So drain starts at 9:40:01, around the first patch, and ends at 9:51 when we come back post-reboot.  Look for work scheduled during that window, when we should be cordoned:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056/artifacts/launch/events.json | jq -r '.items[] | select(.reason == "Scheduled" and (.message | contains("master-0"))) | (.firstTimestamp // .metadata.creationTimestamp) + " " + .message' | grep 'T09:4[2-9]:' | sort
  2021-06-16T09:46:33Z Successfully assigned openshift-cluster-storage-operator/csi-snapshot-controller-operator-6965fb996b-z2tzq to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:46:33Z Successfully assigned openshift-cluster-version/cluster-version-operator-677d8ffdd4-dnbtt to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:46:33Z Successfully assigned openshift-console-operator/console-operator-7b5b9c47d6-66ft7 to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:46:33Z Successfully assigned openshift-console/downloads-7c44f96fbf-xqrtn to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:46:34Z Successfully assigned openshift-operator-lifecycle-manager/packageserver-574c5ddd56-hftmm to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:46:36Z Successfully assigned openshift-etcd/etcd-quorum-guard-7c7768f8bb-r464j to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:46:57Z Successfully assigned openshift-oauth-apiserver/apiserver-5d6f768bf5-m85kd to ci-ln-vgg2rck-002ac-t945d-master-0
  2021-06-16T09:47:53Z Successfully assigned openshift-apiserver/apiserver-84b454bf4-xxxzx to ci-ln-vgg2rck-002ac-t945d-master-0

Doesn't seem like we're cordoned.  Picking on downloads:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056/artifacts/launch/pods/openshift-machine-config-operator_machine-config-daemon-fxjsm_machine-config-daemon.log | grep downloads-
  I0616 09:40:02.703533   87085 daemon.go:328] evicting pod openshift-console/downloads-7c44f96fbf-kkjb9
  I0616 09:40:21.826808   87085 daemon.go:314] Evicted pod openshift-console/downloads-7c44f96fbf-kkjb9
  I0616 09:46:33.649381   87085 daemon.go:328] evicting pod openshift-console/downloads-7c44f96fbf-cqmvx
  I0616 09:46:46.083065   87085 daemon.go:314] Evicted pod openshift-console/downloads-7c44f96fbf-cqmvx

So the MCD had to kick it off twice.  And checking the tolerations:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056/artifacts/launch/pods.json | jq -c '.items[] | select(.metadata.name | startswith("downloads-")).spec.tolerations[]' | sort | uniq
  {"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":120}
  {"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":120}
  {"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"}
  {"effect":"NoSchedule","key":"node.kubernetes.io/memory-pressure","operator":"Exists"}

I dunno why they want it to ignore memory-pressure on schedule, but there's no node.kubernetes.io/unschedulable exception that would allow it to schedule on a node that was actually cordoned.

So... would be nice if 4.7 had a logged line around successful cordon like master does, but CI at least looks pretty suspicious for missing cordons in 4.7.17, and that meshes with my reading of the code changes.

Suggested mitigation is to look for nodes where the machineconfiguration.openshift.io/currentConfig annotation doesn't match the machineconfiguration.openshift.io/desiredConfig, and cordon those nodes by hand.  Although I haven't tested that out; you might also need to manually uncordon once they reboot.

[1]: https://github.com/openshift/machine-config-operator/pull/2597
[2]: https://github.com/openshift/machine-config-operator/pull/2605
[3]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.7.17
[4]: https://github.com/openshift/machine-config-operator/blame/a29604e/pkg/daemon/drain.go#L92
[5]: https://github.com/openshift/machine-config-operator/pull/2457/commits/1261271ddbc0e3ab28bfc34be80f379db813528a#diff-58db213cf6f[…]f66f0b1e78a946R87
[6]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1403169385064960000
[7]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure/1405073849015341056

Comment 5 Steve Milner 2021-06-17 13:49:05 UTC

Trevor: Fantastic debugging!

Comment 6 Sinny Kumari 2021-06-17 15:04:22 UTC

Indeed, great debugging Trevor.

From the log provided, my analysis is as follows:
1. upgrade is stuck because one of the node took lot of time to drain.

From pod machine-config-daemon-7bpbt/machine-config-daemon/machine-config-daemon/logs/current.log:
MCD kept trying to drain rhsm-subscriptions-cron-hourly-tally-1622822400-n8wwm pod as it has been failing in multiple attempt
...
2021-06-16T16:18:57.511490296Z I0616 16:18:57.511440  741378 update.go:1906] Starting update from rendered-worker-60d93a47abd796e50f45bb4a5d10c1a0 to rendered-worker-06834005d5f3137c8a8671dd784f0833: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
2021-06-16T16:18:57.526287340Z I0616 16:18:57.526243  741378 update.go:1906] Update prepared; beginning drain

...
 2021-06-17T00:19:07.869189167Z E0617 00:19:07.869132  741378 writer.go:135] Marking Degraded due to: failed to drain node : ip-10-110-163-219.ec2.internal after 1 hour
...
2021-06-17T00:45:19.678032427Z I0617 00:45:19.678017  741378 update.go:1906] drain complete
2021-06-17T00:45:19.680626718Z I0617 00:45:19.680593  741378 update.go:231] Successful drain took 1570.45894659 seconds
...

Actual OS update started at 2021-06-16T16:18:57 but was stuck at drain until 2021-06-17T00:45:19

2. In the past we didn't have log for successfully cordoning of node. This has been added in 4.8 https://github.com/openshift/machine-config-operator/pull/2457/files#diff-58db213cf6f6259a621f8d3421b5e8af55ed71c9f52c15b255f66f0b1e78a946R87.


3. As Trevor figured it out, we need to bring back lost code during backport for cordonOrUncordonNode() and we can add in log message in 4.7 as well for "Successful cordoning"

Comment 7 Sinny Kumari 2021-06-17 15:27:56 UTC

created PR - https://github.com/openshift/machine-config-operator/pull/2623

Comment 10 W. Trevor King 2021-06-17 19:39:40 UTC

Credit to Matt for pointing out that OSD's managed-upgrade operator wasn't stomping the slow-to-drain workload because the draining node wasn't cordoned :).  All I did was try to follow that lead up in the code and CI.

Comment 13 Michael Nguyen 2021-06-21 15:32:03 UTC

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-06-20-093308   True        False         30m     Cluster version is 4.7.0-0.nightly-2021-06-20-093308
$ cat pdb.yaml 
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: dontevict
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: dontevict
$ oc create -f pdb.yaml 
poddisruptionbudget.policy/dontevict created
$ oc get pdb
NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
dontevict   1               N/A               0                     4s
$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-143-129.us-west-2.compute.internal   Ready    master   52m   v1.20.0+87cc9a4
ip-10-0-145-198.us-west-2.compute.internal   Ready    worker   47m   v1.20.0+87cc9a4
ip-10-0-162-222.us-west-2.compute.internal   Ready    worker   47m   v1.20.0+87cc9a4
ip-10-0-173-218.us-west-2.compute.internal   Ready    master   53m   v1.20.0+87cc9a4
ip-10-0-220-204.us-west-2.compute.internal   Ready    master   53m   v1.20.0+87cc9a4
ip-10-0-222-240.us-west-2.compute.internal   Ready    worker   42m   v1.20.0+87cc9a4
$ oc get node/ip-10-0-222-240.us-west-2.compute.internal -o yaml | grep hostname
    kubernetes.io/hostname: ip-10-0-222-240
          f:kubernetes.io/hostname: {}
$ oc run --restart=Never --labels app=dontevict --overrides='{ "spec": { "nodeSelector": { "kubernetes.io/hostname": "ip-10-0-222-240"} } }' --image=quay.io/prometheus/busybox dont-evict-this-pod -- sleep 4h
pod/dont-evict-this-pod created
$ oc get pod
NAME                  READY   STATUS              RESTARTS   AGE
dont-evict-this-pod   0/1     ContainerCreating   0          4s
$ cat << EOF > file-ig3.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>   name: test-file
> spec:
>   config:
>     ignition:
>       version: 3.1.0
>     storage:
>       files:
>       - contents:
>           source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
>         filesystem: root
>         mode: 0644
>         path: /etc/test
> EOF
$ oc create file -f file-ig3.yaml 
error: Unexpected args: [file]
See 'oc create -h' for help and examples.
$ oc create  -f file-ig3.yaml 
machineconfig.machineconfiguration.openshift.io/test-file created
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
00-worker                                          8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
01-master-container-runtime                        8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
01-master-kubelet                                  8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
01-worker-container-runtime                        8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
01-worker-kubelet                                  8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
99-master-generated-registries                     8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
99-master-ssh                                                                                 3.2.0             62m
99-worker-generated-registries                     8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
99-worker-ssh                                                                                 3.2.0             62m
rendered-master-13a8e238f99c2aae13c24eac159a2db2   8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
rendered-worker-96cc244920a4ac522616e39024e1d35d   8530c27d3d9b6155923d348058bc025a6a98ec3c   3.2.0             53m
test-file                                                                                     3.1.0             3s
$ oc get pods -A --field-selector spec.nodeName=ip-10-0-222-240.us-west-2.compute.internal | grep machine-config-daemon
openshift-machine-config-operator         machine-config-daemon-6gkv4                2/2     Running             0          45m


$ oc -n openshift-machine-config-operator logs -f machine-config-daemon-6gkv4 -c machine-config-daemon | grep -5 'Node has been successfully'
I0621 13:25:27.562892    1747 update.go:1943] completed update for config rendered-worker-96cc244920a4ac522616e39024e1d35d
I0621 13:25:27.566468    1747 daemon.go:1167] In desired config rendered-worker-96cc244920a4ac522616e39024e1d35d
I0621 14:13:52.569029    1747 update.go:630] Checking Reconcilable for config rendered-worker-96cc244920a4ac522616e39024e1d35d to rendered-worker-67458a674a2808ed356c90b65125bc46
I0621 14:13:52.616935    1747 update.go:1943] Starting update from rendered-worker-96cc244920a4ac522616e39024e1d35d to rendered-worker-67458a674a2808ed356c90b65125bc46: &{osUpdate:false kargs:false fips:false passwd:false files:true units:false kernelType:false extensions:false}
I0621 14:13:52.664672    1747 update.go:542] File diff: /etc/test was added
I0621 14:13:52.676667    1747 update.go:1943] Node has been successfully cordoned
I0621 14:13:52.680568    1747 update.go:1943] Update prepared; beginning drain
E0621 14:13:52.945865    1747 daemon.go:328] WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/dont-evict-this-pod, openshift-marketplace/certified-operators-jxlc7, openshift-marketplace/community-operators-jt2cn, openshift-marketplace/redhat-marketplace-5wf9z, openshift-marketplace/redhat-operators-t5jwl; ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-668pp, openshift-cluster-node-tuning-operator/tuned-c2m87, openshift-dns/dns-default-2q4wc, openshift-image-registry/node-ca-jzgwx, openshift-ingress-canary/ingress-canary-hsqns, openshift-machine-config-operator/machine-config-daemon-6gkv4, openshift-monitoring/node-exporter-qxnl4, openshift-multus/multus-8g2g2, openshift-multus/network-metrics-daemon-9rshb, openshift-network-diagnostics/network-check-target-mp7cg, openshift-sdn/sdn-mqw9g
I0621 14:13:52.951733    1747 daemon.go:328] evicting pod openshift-image-registry/image-registry-6c75544c-6sqw4
I0621 14:13:52.951744    1747 daemon.go:328] evicting pod openshift-ingress/router-default-6fd598466c-qb8jr
I0621 14:13:52.951733    1747 daemon.go:328] evicting pod openshift-network-diagnostics/network-check-source-575c4794b8-7b4tm
--
I0621 15:13:28.922201    1747 update.go:241] Draining failed with: error when evicting pods/"dont-evict-this-pod" -n "default": global timeout reached: 1m30s, retrying
E0621 15:13:52.683567    1747 writer.go:135] Marking Degraded due to: failed to drain node : ip-10-0-222-240.us-west-2.compute.internal after 1 hour
I0621 15:13:52.711672    1747 update.go:630] Checking Reconcilable for config rendered-worker-96cc244920a4ac522616e39024e1d35d to rendered-worker-67458a674a2808ed356c90b65125bc46
I0621 15:13:52.741981    1747 update.go:1943] Starting update from rendered-worker-96cc244920a4ac522616e39024e1d35d to rendered-worker-67458a674a2808ed356c90b65125bc46: &{osUpdate:false kargs:false fips:false passwd:false files:true units:false kernelType:false extensions:false}
I0621 15:13:52.770285    1747 update.go:542] File diff: /etc/test was added
I0621 15:13:52.770394    1747 update.go:1943] Node has been successfully cordoned
I0621 15:13:52.773180    1747 update.go:1943] Update prepared; beginning drain
E0621 15:13:53.006501    1747 daemon.go:328] WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/dont-evict-this-pod; ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/aws-ebs-csi-driver-node-668pp, openshift-cluster-node-tuning-operator/tuned-c2m87, openshift-dns/dns-default-2q4wc, openshift-image-registry/node-ca-jzgwx, openshift-ingress-canary/ingress-canary-hsqns, openshift-machine-config-operator/machine-config-daemon-6gkv4, openshift-monitoring/node-exporter-qxnl4, openshift-multus/multus-8g2g2, openshift-multus/network-metrics-daemon-9rshb, openshift-network-diagnostics/network-check-target-mp7cg, openshift-sdn/sdn-mqw9g
I0621 15:13:53.009706    1747 daemon.go:328] evicting pod default/dont-evict-this-pod
E0621 15:13:53.017078    1747 daemon.go:328] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0621 15:13:58.017197    1747 daemon.go:328] evicting pod default/dont-evict-this-pod

Comment 15 errata-xmlrpc 2021-06-29 04:20:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2502

Note You need to log in before you can comment on or make changes to this bug.