1906298 – MCDDrainError "Drain failed on , updates may be blocked" missing rendered node name

Bug 1906298 - MCDDrainError "Drain failed on , updates may be blocked" missing rendered node name

Summary: MCDDrainError "Drain failed on , updates may be blocked" missing rendered no...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Kirsten Garrison
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	1866873
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-10 07:52 UTC by OpenShift BugZilla Robot
Modified:	2021-03-09 20:16 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-09 20:16:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2419	0	None	open	[release-4.6] Bug 1906298: update MCDDrainErr to reduce cardinality & fix nodename	2021-02-18 18:46:46 UTC
Red Hat Product Errata	RHBA-2021:0674	0	None	None	None	2021-03-09 20:16:26 UTC

Comment 3 Michael Nguyen 2021-03-02 21:17:29 UTC

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-03-01-051748   True        False         23m     Cluster version is 4.6.0-0.nightly-2021-03-01-051748

$ cat << EOF > pdb.yaml 
> apiVersion: policy/v1beta1
> kind: PodDisruptionBudget
> metadata:
>   name: dontevict
> spec:
>   minAvailable: 1
>   selector:
>     matchLabels:
>       app: dontevict
> EOF
$ oc create -f pdb.yaml 
poddisruptionbudget.policy/dontevict created

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-131-245.us-west-2.compute.internal   Ready    worker   37m   v1.19.0+8d12420
ip-10-0-134-204.us-west-2.compute.internal   Ready    master   47m   v1.19.0+8d12420
ip-10-0-160-219.us-west-2.compute.internal   Ready    master   47m   v1.19.0+8d12420
ip-10-0-171-80.us-west-2.compute.internal    Ready    worker   41m   v1.19.0+8d12420
ip-10-0-197-138.us-west-2.compute.internal   Ready    worker   37m   v1.19.0+8d12420
ip-10-0-212-244.us-west-2.compute.internal   Ready    master   47m   v1.19.0+8d12420


$ oc run --restart=Never --labels app=dontevict --overrides='{ "spec": { "nodeSelector": { "kubernetes.io/hostname": "ip-10-0-197-138"} } }' --image=docker.io/busybox dont-evict-this-pod -- sleep 1h
pod/dont-evict-this-pod created

$ oc get pods
NAME                  READY   STATUS    RESTARTS   AGE
dont-evict-this-pod   1/1     Running   0          78s
$ cat << EOF > file.yaml 
>   apiVersion: machineconfiguration.openshift.io/v1
>   kind: MachineConfig
>   metadata:
>     labels:
>       machineconfiguration.openshift.io/role: worker
>     name: test-file
>   spec:
>     config:
>       ignition:
>         version: 3.1.0
>       storage:
>         files:
>         - contents:
>             source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
>           filesystem: root
>           mode: 0644
>           path: /etc/test
> EOF

$ oc create -f file.yaml
machineconfig.machineconfiguration.openshift.io/test-file created

$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
00-worker                                          6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
01-master-container-runtime                        6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
01-master-kubelet                                  6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
01-worker-container-runtime                        6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
01-worker-kubelet                                  6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
99-master-generated-registries                     6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
99-master-ssh                                                                                 3.1.0             58m
99-worker-generated-registries                     6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             47m
99-worker-ssh                                                                                 3.1.0             58m
rendered-master-729c64858414894b4b4b9d165bfc271d   6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             46m
rendered-worker-9e360b6a01562e88abdf3033091db14c   6c8d2ae995cfbbf3ce3d8fa3b95e39d29d40a449   3.1.0             46m
test-file            
                                                                         3.1.0             4s
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-9e360b6a01562e88abdf3033091db14c   False     True       False      3              0                   0                     0                      48m

$ oc get nodes
NAME                                         STATUS                     ROLES    AGE   VERSION
ip-10-0-131-245.us-west-2.compute.internal   Ready                      worker   39m   v1.19.0+8d12420
ip-10-0-134-204.us-west-2.compute.internal   Ready                      master   49m   v1.19.0+8d12420
ip-10-0-160-219.us-west-2.compute.internal   Ready                      master   49m   v1.19.0+8d12420
ip-10-0-171-80.us-west-2.compute.internal    Ready,SchedulingDisabled   worker   43m   v1.19.0+8d12420
ip-10-0-197-138.us-west-2.compute.internal   Ready                      worker   39m   v1.19.0+8d12420
ip-10-0-212-244.us-west-2.compute.internal   Ready                      master   49m   v1.19.0+8d12420
$ oc get nodes
NAME                                         STATUS                     ROLES    AGE   VERSION
ip-10-0-131-245.us-west-2.compute.internal   Ready                      worker   50m   v1.19.0+8d12420
ip-10-0-134-204.us-west-2.compute.internal   Ready                      master   60m   v1.19.0+8d12420
ip-10-0-160-219.us-west-2.compute.internal   Ready                      master   60m   v1.19.0+8d12420
ip-10-0-171-80.us-west-2.compute.internal    Ready                      worker   54m   v1.19.0+8d12420
ip-10-0-197-138.us-west-2.compute.internal   Ready,SchedulingDisabled   worker   50m   v1.19.0+8d12420
ip-10-0-212-244.us-west-2.compute.internal   Ready                      master   60m   v1.19.0+8d12420

$ oc  -n openshift-machine-config-operator get pods --field-selector spec.nodeName=ip-10-0-197-138.us-west-2.compute.internal
NAME                          READY   STATUS    RESTARTS   AGE
machine-config-daemon-rnzd6   2/2     Running   0          51m


$ oc  -n openshift-machine-config-operator logs machine-config-daemon-rnzd6 -c machine-config-daemon
..SNIP..
I0302 21:11:44.505083    1899 daemon.go:344] evicting pod default/dont-evict-this-pod
E0302 21:11:44.512766    1899 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0302 21:11:49.512890    1899 daemon.go:344] evicting pod default/dont-evict-this-pod
E0302 21:11:49.520548    1899 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0302 21:11:54.520679    1899 daemon.go:344] evicting pod default/dont-evict-this-pod
E0302 21:11:54.529015    1899 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0302 21:11:59.529143    1899 daemon.go:344] evicting pod default/dont-evict-this-pod
E0302 21:11:59.536653    1899 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

$ oc -n openshift-monitoring get routes
NAME                HOST/PORT                                                                           PATH   SERVICES            PORT    TERMINATION          WILDCARD
alertmanager-main   alertmanager-main-openshift-monitoring.apps.mnguyen46abz.devcluster.openshift.com          alertmanager-main   web     reencrypt/Redirect   None
grafana             grafana-openshift-monitoring.apps.mnguyen46abz.devcluster.openshift.com                    grafana             https   reencrypt/Redirect   None
prometheus-k8s      prometheus-k8s-openshift-monitoring.apps.mnguyen46abz.devcluster.openshift.com             prometheus-k8s      web     reencrypt/Redirect   None
thanos-querier      thanos-querier-openshift-monitoring.apps.mnguyen46abz.devcluster.openshift.com             thanos-querier      web     reencrypt/Redirect   None

Prometheus shows
mcd_drain_err{container="oauth-proxy",endpoint="metrics",instance="10.0.197.138:9001",job="machine-config-daemon",namespace="openshift-machine-config-operator",node="ip-10-0-197-138.us-west-2.compute.internal",pod="machine-config-daemon-rnzd6",service="machine-config-daemon"}

Comment 5 errata-xmlrpc 2021-03-09 20:16:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.20 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0674

Note You need to log in before you can comment on or make changes to this bug.