Bug 1866873
Summary: | MCDDrainError "Drain failed on , updates may be blocked" missing rendered node name | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | jerzhang, jnaess, kgarriso, mkrejci |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:15:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1906298 |
Description
W. Trevor King
2020-08-06 16:52:15 UTC
Can confirm this is happening on 4.4.14 aswell. no label present in prometheus called node=nodename or similar. Moving to 4.7, since this is not a blocking issue for 4.6. Verified on 4.7.0-0.nightly-2020-11-10-093436 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-10-093436 True False 4h38m Cluster version is 4.7.0-0.nightly-2020-11-10-093436 $ cat << EOF > pdb.yaml > apiVersion: policy/v1beta1 > kind: PodDisruptionBudget > metadata: > name: dontevict > spec: > minAvailable: 1 > selector: > matchLabels: > app: dontevict > EOF $ oc create -f pdb.yaml poddisruptionbudget.policy/dontevict created $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-143-20.us-west-2.compute.internal Ready master 4h44m v1.19.2+9c2f84c ip-10-0-154-71.us-west-2.compute.internal Ready worker 4h31m v1.19.2+9c2f84c ip-10-0-171-153.us-west-2.compute.internal Ready master 4h40m v1.19.2+9c2f84c ip-10-0-189-196.us-west-2.compute.internal Ready worker 4h31m v1.19.2+9c2f84c ip-10-0-194-240.us-west-2.compute.internal Ready worker 4h31m v1.19.2+9c2f84c ip-10-0-209-84.us-west-2.compute.internal Ready master 4h40m v1.19.2+9c2f84c $ oc run --restart=Never --labels app=dontevict --overrides='{ "spec": { "nodeSelector": { "kubernetes.io/hostname": "ip-10-0-154-71"} } }' --image=docker.io/busybox dont-evict-this-pod -- sleep 1h pod/dont-evict-this-pod created $ oc get pods NAME READY STATUS RESTARTS AGE dont-evict-this-pod 0/1 ContainerCreating 0 5s $ cat << EOF > file.yaml > apiVersion: machineconfiguration.openshift.io/v1 > kind: MachineConfig > metadata: > labels: > machineconfiguration.openshift.io/role: worker > name: test-file > spec: > config: > ignition: > version: 3.1.0 > storage: > files: > - contents: > source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK > filesystem: root > mode: 0644 > path: /etc/test > EOF $ oc create -f file.yaml machineconfig.machineconfiguration.openshift.io/test-file created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 00-worker da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 01-master-container-runtime da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 01-master-kubelet da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 01-worker-container-runtime da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 01-worker-kubelet da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 03-worker-extensions 3.1.0 3h21m 99-master-generated-registries da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 99-master-ssh 3.1.0 4h49m 99-worker-generated-registries da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m 99-worker-ssh 3.1.0 4h49m rendered-master-8d25b9ae487bc5e7ffb021bd93bfff7d da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m rendered-worker-69dac79db33505219af92d594dbbc383 da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 4h43m rendered-worker-e6858708d022f5e2ad4b50ef033be75a da75bdfb74bbb30568b58b1526ba369b6441d281 3.1.0 3h21m test-file 3.1.0 3s $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-e6858708d022f5e2ad4b50ef033be75a False True False 3 0 0 0 4h45m $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-143-20.us-west-2.compute.internal Ready master 4h46m v1.19.2+9c2f84c ip-10-0-154-71.us-west-2.compute.internal Ready worker 4h32m v1.19.2+9c2f84c ip-10-0-171-153.us-west-2.compute.internal Ready master 4h41m v1.19.2+9c2f84c ip-10-0-189-196.us-west-2.compute.internal Ready worker 4h32m v1.19.2+9c2f84c ip-10-0-194-240.us-west-2.compute.internal Ready,SchedulingDisabled worker 4h33m v1.19.2+9c2f84c ip-10-0-209-84.us-west-2.compute.internal Ready master 4h41m v1.19.2+9c2f84c $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-143-20.us-west-2.compute.internal Ready master 4h51m v1.19.2+9c2f84c ip-10-0-154-71.us-west-2.compute.internal Ready,SchedulingDisabled worker 4h38m v1.19.2+9c2f84c ip-10-0-171-153.us-west-2.compute.internal Ready master 4h47m v1.19.2+9c2f84c ip-10-0-189-196.us-west-2.compute.internal Ready worker 4h38m v1.19.2+9c2f84c ip-10-0-194-240.us-west-2.compute.internal Ready worker 4h38m v1.19.2+9c2f84c ip-10-0-209-84.us-west-2.compute.internal Ready master 4h47m v1.19.2+9c2f84c $ oc -n openshift-machine-config-operator get pods --field-selector spec.nodeName=ip-10-0-154-71.us-west-2.compute.internal NAME READY STATUS RESTARTS AGE machine-config-daemon-7n6bf 2/2 Running 0 4h38m $ oc -n openshift-machine-config-operator logs machine-config-daemon-7n6bf -c machine-config-daemon ... I1110 21:47:52.933055 2072 daemon.go:344] evicting pod default/dont-evict-this-pod E1110 21:47:52.962506 2072 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I1110 21:47:57.962645 2072 daemon.go:344] evicting pod default/dont-evict-this-pod E1110 21:47:57.970946 2072 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I1110 21:48:02.971070 2072 daemon.go:344] evicting pod default/dont-evict-this-pod E1110 21:48:03.013410 2072 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I1110 21:48:08.013504 2072 daemon.go:344] evicting pod default/dont-evict-this-pod E1110 21:48:08.021002 2072 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I1110 21:48:13.021128 2072 daemon.go:344] evicting pod default/dont-evict-this-pod E1110 21:48:13.030356 2072 daemon.go:344] error when evicting pod "dont-evict-this-pod" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. $ oc -n openshift-monitoring get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD alertmanager-main alertmanager-main-openshift-monitoring.apps.mnguyen47.devcluster.openshift.com alertmanager-main web reencrypt/Redirect None grafana grafana-openshift-monitoring.apps.mnguyen47.devcluster.openshift.com grafana https reencrypt/Redirect None prometheus-k8s prometheus-k8s-openshift-monitoring.apps.mnguyen47.devcluster.openshift.com prometheus-k8s web reencrypt/Redirect None thanos-querier thanos-querier-openshift-monitoring.apps.mnguyen47.devcluster.openshift.com thanos-querier web reencrypt/Redirect None Prometheus Shows mcd_drain_err{container="oauth-proxy",endpoint="metrics",err="WaitTimeout",instance="10.0.154.71:9001",job="machine-config-daemon",namespace="openshift-machine-config-operator",node="ip-10-0-154-71.us-west-2.compute.internal",pod="machine-config-daemon-7n6bf",service="machine-config-daemon"} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |